From nobody Tue Apr 7 18:46:54 2026 Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABAC719E839 for ; Wed, 11 Mar 2026 19:52:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773258735; cv=none; b=jF4rI3aj7AEqJdNSF4gPwU+We+uZcqoXrwX3S0gNGgjXvmk0Uay2beMjKZ9jRjqjDoCKknGa7KaFQnkLscWFlVuDelp1y0/fYJM/2srErAo/VO7jXOt6CGdrSMo+aBJR9qh9U2M4OmiDVd4v7fgos+FZHM/g4k8CF41IDzuFa90= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773258735; c=relaxed/simple; bh=qc0p7mZmuKWxiOXsyTmmf2kiIbU+0g44NqEbl2N314s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W0K9Do+YucRr7vIkaweWkAOyZectPnJWtfFo15lu2eQE/yCyqAtNIRBooRwsqGid11K5Xfy/ct7HMopxTLbP373u22B4pWKQ8eUs44eSBllSsfEZhtirfkW1MCMTmNXGO5v5jBOvuq2kF4eOcRJpmke94FprxJGe92jBe7VvIcc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GWG3LVyG; arc=none smtp.client-ip=209.85.210.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GWG3LVyG" Received: by mail-ot1-f52.google.com with SMTP id 46e09a7af769-7d4c383f2fcso271816a34.0 for ; Wed, 11 Mar 2026 12:52:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773258729; x=1773863529; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WGJ9MJa4Q34Av92QC7yL+Kbu9MyB37hTyusSYsRh4jE=; b=GWG3LVyGGN5FFvuOlbq3qQ2fT7au4gMdCKW8zCj6knBE0xCbwkxQ0OyekQJUN/nMAk 9+MC2rKJ6b708q99n1lh+bd2CpgXz2I4cuaqh6NQITC/8b40Xuh2tWJdTVUS5bj6JUyu vMargvam6QLepxbTvycsGJHBuljNgIWHZtuqz4Fzohw+ffuRuIKn1lj89SM0Ia+wE2yT 4v6qxf4d/WcbZLZzmwA2WhAebPu2DJSTAsL9/wv5uyRtjkxSFaCrBMryF5iZtsrOt4Gb xNr02nwJUuJPGmedLeiBGOaafwrq1PW5clyOvZlt9+qcIDL6efhusW7D3Oa4m0dHq9uq t2ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773258729; x=1773863529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=WGJ9MJa4Q34Av92QC7yL+Kbu9MyB37hTyusSYsRh4jE=; b=n6SIdK68cSg3jPjEhjCW/V1aieDnWZDFfOO/bKDYdkhTDgBfV5MzF6lK8fc85aivrh IkgPoxmRYyrPao2mmKUwY2Chjsj1isQZfWnfzolm1wdVEIibrIQSMcMne27QsZiiQMsN bBFqvUP/WBJN/EmhAHlHY6KmULObPk0d/LqEfXY3raA8M+f/HTZ5DLgn+6VIDJ14YuK4 IVIbHKgIss+yAvnB2rgB8rXw174Hi0tF7IVtw5PdFc0XbuAcfsQ7E6Gsmzb4bgrxo5Sy E76UtXbmmJOoCKK1/PITEUTr01uusHqbPHroYPTV7dmjDPUuaNTKciP5AnAwzJlp9HvG BjEg== X-Forwarded-Encrypted: i=1; AJvYcCVbh/9bxQ2mSuqN5+GJ5UvbN6E95NzzDtKAv67oG90XYfWUm4vOXammuM+eV5CHP/Yq3vulEDx/03UipXo=@vger.kernel.org X-Gm-Message-State: AOJu0Yxs2EKacCwOTIb0gYdXaaLwuk67aooPfiwlbWUE9Trs6bNuGYqq DUkSxfI4gKxEXJEdUALGdrQnpbH/mibUnWCrcpLYCxjNUoGcaZcxhskq8J9LWg== X-Gm-Gg: ATEYQzxWr4nv8mNG1hNk4SFmJ1rQuvHOutSDgkyhbrE/c3eVZuHlLBUsHSqrWBul0+W 9eh0VDzh43ibDZ0RCgpPlZF9uBFXPWoRHCBF63iRz7gMu8/3ZURw0xlytO54bMJrtsN1W8GMYHk cbnDALH6/lC+8cjhg+sujNBc8YqI5Syt5TFawtTqPNlEq2hDnzR9aHumHnbVsCIzWv3Hzas2cPu CCulkzCSsK2OAlBGvAusgHtFsAEVMM12G+u09EBXDyF5Kcx65eVy8BifMLYJHZEIlOYZh9IJ2Zl W6tvMu5SDhe2MIk16Vp2O9SyXydFsxACFX1nLb4layAptgGFBZdZWXpGhxIRtEqkIBu4+3w//2o NnMlu/cJ+NTx63G1WOEbhernL8ZQWjSB2eHdtXv1g4yoKwD9+szec7JDn3oFQbHGoXOlaiEo06d r8vvuEYhVwUhCOlol/GvZlXuFmK/SU/I9l X-Received: by 2002:a05:6830:7004:b0:7c7:61e0:a4ee with SMTP id 46e09a7af769-7d76a73b581mr2550615a34.11.1773258729419; Wed, 11 Mar 2026 12:52:09 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4e::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d76ae39b39sm2746167a34.15.2026.03.11.12.52.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 12:52:08 -0700 (PDT) From: Joshua Hahn To: Minchan Kim , Sergey Senozhatsky Cc: Johannes Weiner , Yosry Ahmed , Nhat Pham , Nhat Pham , Harry Yoo , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 11/11] mm/zsmalloc: Handle charge migration in zpdesc substitution Date: Wed, 11 Mar 2026 12:51:48 -0700 Message-ID: <20260311195153.4013476-12-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260311195153.4013476-1-joshua.hahnjy@gmail.com> References: <20260311195153.4013476-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In zsmalloc, there are two types of migrations: Migrations of single compressed objects from one zspage to another, and substitutions of zpdescs from zspages. In both of these migrations, memcg association for the compressed objects do not change. However, the physical location of the compressed objects may change, which alters their lruvec association. In this patch, handle the substitution of zpdescs from zspages, which may change the node of all objects present (wholly or partially). Take special care to address the partial compressed object at the beginning of the swapped out zpdesc. "Ownership" of spanning objects are associated to the zpdesc it begins on. Thus, when handling the first compressed object, we must iterate through the (up to 4) zpdescs present in the zspage to find the previous zpdesc, then retrieve the object's zspage-wide index. For the same reason, pool->uncompressed_stat, which can only be accounted at PAGE_SIZE granularity for the node statistics, are accounted for objects beginning in the zpdesc. Likewise for the spanning object at the end of the replaced zpdesc, account only the amount that lives on the zpdesc. Note that these operations cannot call the existing zs_{charge, uncharge}_objcg functions we introduced, since we are holding the class spin lock and obj_cgroup_charge can sleep. Signed-off-by: Joshua Hahn --- mm/zsmalloc.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index f3508ff8b3ab..a4c90447d28e 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1932,6 +1932,94 @@ static bool zs_page_isolate(struct page *page, isola= te_mode_t mode) return page_zpdesc(page)->zspage; } =20 +#ifdef CONFIG_MEMCG +static void zs_migrate_lruvec(struct zs_pool *pool, struct obj_cgroup *obj= cg, + int old_nid, int new_nid, int charge, + int obj_size) +{ + struct mem_cgroup *memcg; + struct lruvec *old_lruvec, *new_lruvec; + int partial; + + if (old_nid =3D=3D new_nid || !objcg) + return; + + /* Proportional (partial) uncompressed share for this portion */ + partial =3D (PAGE_SIZE * charge) / obj_size; + + rcu_read_lock(); + memcg =3D obj_cgroup_memcg(objcg); + old_lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(old_nid)); + new_lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(new_nid)); + + mod_memcg_lruvec_state(old_lruvec, pool->compressed_stat, -charge); + mod_memcg_lruvec_state(new_lruvec, pool->compressed_stat, charge); + + mod_memcg_lruvec_state(old_lruvec, pool->uncompressed_stat, -partial); + mod_memcg_lruvec_state(new_lruvec, pool->uncompressed_stat, partial); + rcu_read_unlock(); +} + +/* + * Transfer per-lruvec and node-level stats when a zspage replaces a zpdesc + * with one from a different NUMA node. Must be called while old_zpdesc is + * still linked to the zspage. memcg-level charges are unchanged. + */ +static void zs_page_migrate_lruvec(struct zs_pool *pool, struct zspage *zs= page, + struct zpdesc *old_zpdesc, + struct zpdesc *new_zpdesc, + struct size_class *class) +{ + int size =3D class->size; + int old_nid =3D page_to_nid(zpdesc_page(old_zpdesc)); + int new_nid =3D page_to_nid(zpdesc_page(new_zpdesc)); + unsigned int off, first_obj_offset, page_offset =3D 0; + unsigned int idx; + struct zpdesc *cursor =3D zspage->first_zpdesc; + + if (old_nid =3D=3D new_nid) + return; + + while (cursor !=3D old_zpdesc) { + cursor =3D get_next_zpdesc(cursor); + page_offset +=3D PAGE_SIZE; + } + + first_obj_offset =3D get_first_obj_offset(old_zpdesc); + idx =3D (page_offset + first_obj_offset) / size; + + /* Boundary object spaning from the previous zpdesc*/ + if (idx > 0 && zspage->objcgs[idx - 1]) + zs_migrate_lruvec(pool, zspage->objcgs[idx - 1], + old_nid, new_nid, first_obj_offset, size); + + for (off =3D first_obj_offset; + off < PAGE_SIZE && idx < class->objs_per_zspage; + idx++, off +=3D size) { + struct obj_cgroup *objcg =3D zspage->objcgs[idx]; + int bytes_on_page =3D min_t(int, size, PAGE_SIZE - off); + + if (!objcg) + continue; + + zs_migrate_lruvec(pool, objcg, old_nid, new_nid, + bytes_on_page, size); + + dec_node_page_state(zpdesc_page(old_zpdesc), + pool->uncompressed_stat); + inc_node_page_state(zpdesc_page(new_zpdesc), + pool->uncompressed_stat); + } +} +#else +static void zs_page_migrate_lruvec(struct zs_pool *pool, struct zspage *zs= page, + struct zpdesc *old_zpdesc, + struct zpdesc *new_zpdesc, + struct size_class *class) +{ +} +#endif + static int zs_page_migrate(struct page *newpage, struct page *page, enum migrate_mode mode) { @@ -2004,6 +2092,10 @@ static int zs_page_migrate(struct page *newpage, str= uct page *page, } kunmap_local(s_addr); =20 + /* Transfer lruvec/node stats while old zpdesc is still linked */ + if (pool->memcg_aware) + zs_page_migrate_lruvec(pool, zspage, zpdesc, newzpdesc, class); + replace_sub_page(class, zspage, newzpdesc, zpdesc); /* * Since we complete the data copy and set up new zspage structure, --=20 2.52.0