From nobody Sat May 30 13:23:32 2026 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAA7C2E22B5 for ; Wed, 27 May 2026 12:01:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779883300; cv=none; b=DViNdTJLCVb0lAmBwDbZDvBSEbLbg6JNir4SkyA9sOeC9JYJEf1czJkcT8csTaUQp+D/ROIjgPC2sS744jefiGVOCVOdsWQi/UTG2LG2oPxLUhaZhAcS0HPXMsUE5ph/fcvFlt7nRLzHFWsqI141bdA57R8yOU+ib39ZmWMQktA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779883300; c=relaxed/simple; bh=WnoFY+zGtT1r+C3lHmExFEvcZlJxTD5AUXF8lDS0ZXc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=e1Vd9jW3RR+skVDxEiG5mm7D7rdD0CzLCzFC38xcwQqlp66zwEU6ce9Q4G2jtIVEywCymsXJGqe0mrswZSehgaQzapB9SRAO1SPP7kUVsjVk/nAMqv0FfUDS+U7vGVDOWsFKe3QWqKAPHohr/Ht2c1rGOHY00rQY4RjYghXIBac= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LEwVox8x; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LEwVox8x" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-2ba856db1c0so85148175ad.3 for ; Wed, 27 May 2026 05:01:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779883298; x=1780488098; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uHNxVkzgOEtrCskHNiISj95cLTtaoMXSb01E6jPrlAE=; b=LEwVox8xsUgMznfruzwt9xUQC9cBkOsTfyRUqLqmEbDu1YXp/5y82ESFiooMWssG3k G7REr4+ygFbGEMUtIqRdisH0zfawRx6Ks1AXpY0X9/l/zRFcuOiEzqImtgSF6IKwnhgE AUKqhjkrusLF4QQjz1yNmGjY3V0wr5liRFlAxVLQfDUgJVG6xVDD2RkRO5iBGTJLO2Ox j2rjBGngT9aImcKNwSzsdbLo1GXvSDX/NV8gcVue3l+qUct34J0G8MCSdqtTaQ1ScZoF R6d/jof9ALsL4A2kVkwzYq4mIfM4zC9PMwu5PDKj4KthCFcBNm01+8IVAThqul8lHRnX 2E5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779883298; x=1780488098; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=uHNxVkzgOEtrCskHNiISj95cLTtaoMXSb01E6jPrlAE=; b=YLOoF+LBPovbsXJDf+g4rW0oOSrNkIRegMLrqxRjVPwPmh38Ej1jUhmyXkPrtUB+4t 64oUqsWJo2fLIHbVFJnCkxo87gPyqEBIYnFvKXM85DAMiCuD6WiemRyhH/RepC//4NP6 baxRNaCJK0ie0zfqYuAwjZacIvR8olREbpwRiU7ko05rnkuehAoRt1h56jcY+qq7aCZ4 MUcRXMoU9fqhT1nHxpPXeLnglu0OZnf1zG5Jhr2aAESKOc94chPuIVZfMYf7MohO1w6E MXHPN+l3hGpokk5uM2cDT3i/kdjV0AbzwZQjOIbu/0KU6y3J6SfIQaMZ9+5SxiUJhrOp dq0g== X-Forwarded-Encrypted: i=1; AFNElJ9CSYvccBMwlz65RlwH21hW1z2R9NREp/b8VAhpbU07J73i4FY57IKwVNOeAYpjnry5SQ32ruDURH21/N8=@vger.kernel.org X-Gm-Message-State: AOJu0Yx1rdMRgefxGrrJG3i5D+wohiFr98m2xA5WUZAsJ8efz0ch39nN PmFxewK3GqGsw0HL997Zydy6UcQvyNnKDXo8KurI9ldZInMcYL+YZGpp X-Gm-Gg: Acq92OGjPCOMWg6VpFLCCe/a95Ij8+7L717S4IzrpD6CEgbAhWowqMg8f8CIOQobXCw lMnGgQ9gIY6S5nCTym8Fxk9uh4yJlj9qaKlm8rCfaIFp2VAhIGVjRqWuESysDDUi6kidjz0uTYa LhIUQFFq+6EdC9/+aXzv8raIQS8ZTjfo8Vvaux/57fbEofh+iMqCFmLQYI9SElEHZHlTNPOsoTA zPY2zCmbFOQlTm4NSi0Nc9n5pCRrAwnDDuxCC3wxtCArU4V0IxF+FS9bjyWEj5JZBS37Rq0wR18 VtoER9fRv+a1EAtmH9ATlJkEtXY09X0gufTwUUMZaovCxWBGf8J2WUl3xKyY6zwLraqfO/0B+Sn fn9nGdTrOooL8SHD1DKyeQEUYac/G7utFc8WJgqHIM1jcZozCFmAjJnemyHB1yIxYqV5prFW9pA b77dXNGj22E78LKxsAzHuVCRLrb9jhBIJKwPJaidFZTUp2UiLk X-Received: by 2002:a17:903:4405:b0:2b9:6458:1a2c with SMTP id d9443c01a7336-2beb06733bemr252958595ad.13.1779883298133; Wed, 27 May 2026 05:01:38 -0700 (PDT) Received: from ubuntu22.mioffice.cn ([43.224.245.232]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2beb56b5886sm136250465ad.20.2026.05.27.05.01.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 05:01:37 -0700 (PDT) From: Wenchao Hao X-Google-Original-From: Wenchao Hao To: Andrew Morton , Nhat Pham , Yosry Ahmed , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim , Sergey Senozhatsky Cc: Wenchao Hao Subject: [RFC PATCH v2 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Date: Wed, 27 May 2026 19:59:28 +0800 Message-Id: <20260527115930.3138213-2-haowenchao@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260527115930.3138213-1-haowenchao@xiaomi.com> References: <20260527115930.3138213-1-haowenchao@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Encode the size_class index (class_idx) into the obj value so that zs_free() can determine the correct size_class without dereferencing the handle->obj->PFN->zpdesc->zspage->class chain under pool->lock. class_idx is invariant across page migration (only PFN is rewritten), so a lockless read of obj always yields a valid class_idx. The space below the PFN field in obj is over-provisioned on 64-bit systems, with more bits than obj_idx needs. Split that space into class_idx and obj_idx subfields: |<-- _PFN_BITS -->|<-- ZS_OBJ_CLASS_BITS -->|<-- ZS_OBJ_IDX_BITS -->| +-----------------+-------------------------+-----------------------+ | PFN | class_idx | obj_idx | +-----------------+-------------------------+-----------------------+ MSB ^ LSB | +-- ZS_OBJ_PFN_SHIFT The macro layout changes as follows: Before After Meaning ---------------- ------------------ ---------------------------- OBJ_INDEX_BITS ZS_OBJ_IDX_BITS width of obj_idx subfield OBJ_INDEX_MASK ZS_OBJ_IDX_MASK mask of obj_idx subfield (n/a) ZS_OBJ_CLASS_BITS width of class_idx subfield (n/a) ZS_OBJ_CLASS_MASK mask of class_idx subfield (n/a) ZS_OBJ_PFN_SHIFT bit offset of PFN in obj On 32-bit systems there is no spare room for class_idx, so the encoding is disabled (ZS_OBJ_CLASS_BITS =3D 0) and the obj layout remains [PFN | obj_idx]. Signed-off-by: Wenchao Hao Reviewed-by: Nhat Pham --- mm/zsmalloc.c | 80 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 66 insertions(+), 14 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 63128ddb7959..6b0014b43408 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -67,8 +67,8 @@ #define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS #else /* - * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just - * be PAGE_SHIFT + * If this definition of MAX_PHYSMEM_BITS is used, ZS_OBJ_PFN_SHIFT will + * just be PAGE_SHIFT */ #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG #endif @@ -88,8 +88,27 @@ #define OBJ_TAG_BITS 1 #define OBJ_TAG_MASK OBJ_ALLOCATED_TAG =20 -#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS) -#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) +/* + * obj is encoded as [PFN | class_idx | obj_idx] within an unsigned long: + * + * |<-- _PFN_BITS -->|<-- ZS_OBJ_CLASS_BITS -->|<-- ZS_OBJ_IDX_BITS -->| + * +-----------------+-------------------------+-----------------------+ + * | PFN | class_idx | obj_idx | + * +-----------------+-------------------------+-----------------------+ + * MSB ^ LSB + * | + * +-- ZS_OBJ_PFN_SHIFT + * + * Encoding class_idx into obj lets zs_free() locate the size_class + * without holding pool->lock; class_idx is invariant across page + * migration (only PFN changes), so a lockless read of the obj value + * always yields a valid class_idx. + * + * On 32-bit systems there is no spare room for class_idx, so + * ZS_OBJ_CLASS_BITS is 0 and the layout collapses to the original + * [PFN | obj_idx] without any ifdef in callers. + */ +#define ZS_OBJ_PFN_SHIFT (BITS_PER_LONG - _PFN_BITS) =20 #define HUGE_BITS 1 #define FULLNESS_BITS 4 @@ -98,9 +117,29 @@ =20 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL)) =20 +/* + * Reuse the width that struct zspage already reserves for its + * class field (zspage->class:CLASS_BITS + 1) for the class_idx + * field encoded in obj. On 32-bit there is no spare room, so set + * it to 0; the encoded class_idx then folds to a constant 0 and + * the layout collapses back to [PFN | obj_idx]. + */ +#if BITS_PER_LONG >=3D 64 +#define ZS_OBJ_CLASS_BITS (CLASS_BITS + 1) +#else +#define ZS_OBJ_CLASS_BITS 0 +#endif +#define ZS_OBJ_CLASS_MASK ((_AC(1, UL) << ZS_OBJ_CLASS_BITS) - 1) + +#define ZS_OBJ_IDX_BITS (ZS_OBJ_PFN_SHIFT - ZS_OBJ_CLASS_BITS) +#define ZS_OBJ_IDX_MASK ((_AC(1, UL) << ZS_OBJ_IDX_BITS) - 1) + +static_assert(ZS_OBJ_IDX_BITS > 0, + "zsmalloc: PFN + class_idx leave no room for obj_idx"); + /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ #define ZS_MIN_ALLOC_SIZE \ - MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) + MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> ZS_OBJ_IDX_BITS)) /* each chunk includes extra space to keep handle */ #define ZS_MAX_ALLOC_SIZE PAGE_SIZE =20 @@ -721,26 +760,38 @@ static struct zpdesc *get_next_zpdesc(struct zpdesc *= zpdesc) static void obj_to_location(unsigned long obj, struct zpdesc **zpdesc, unsigned int *obj_idx) { - *zpdesc =3D pfn_zpdesc(obj >> OBJ_INDEX_BITS); - *obj_idx =3D (obj & OBJ_INDEX_MASK); + *zpdesc =3D pfn_zpdesc(obj >> ZS_OBJ_PFN_SHIFT); + *obj_idx =3D (obj & ZS_OBJ_IDX_MASK); } =20 static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc) { - *zpdesc =3D pfn_zpdesc(obj >> OBJ_INDEX_BITS); + *zpdesc =3D pfn_zpdesc(obj >> ZS_OBJ_PFN_SHIFT); +} + +/* + * On 32-bit systems ZS_OBJ_CLASS_BITS is 0 and ZS_OBJ_CLASS_MASK is 0, + * so this collapses to a constant 0. No ifdef needed at the call site. + */ +static unsigned int obj_to_class_idx(unsigned long obj) +{ + return (obj >> ZS_OBJ_IDX_BITS) & ZS_OBJ_CLASS_MASK; } =20 /** - * location_to_obj - get obj value encoded from (, ) + * location_to_obj - encode (, , ) into obj va= lue * @zpdesc: zpdesc object resides in zspage * @obj_idx: object index + * @class_idx: size class index; ignored on 32-bit (ZS_OBJ_CLASS_BITS =3D= =3D 0) */ -static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int o= bj_idx) +static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int o= bj_idx, + unsigned int class_idx) { unsigned long obj; =20 - obj =3D zpdesc_pfn(zpdesc) << OBJ_INDEX_BITS; - obj |=3D obj_idx & OBJ_INDEX_MASK; + obj =3D zpdesc_pfn(zpdesc) << ZS_OBJ_PFN_SHIFT; + obj |=3D (unsigned long)(class_idx & ZS_OBJ_CLASS_MASK) << ZS_OBJ_IDX_BIT= S; + obj |=3D obj_idx & ZS_OBJ_IDX_MASK; =20 return obj; } @@ -1276,7 +1327,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, kunmap_local(vaddr); mod_zspage_inuse(zspage, 1); =20 - obj =3D location_to_obj(m_zpdesc, obj); + obj =3D location_to_obj(m_zpdesc, obj, zspage->class); record_obj(handle, obj); =20 return obj; @@ -1762,7 +1813,8 @@ static int zs_page_migrate(struct page *newpage, stru= ct page *page, =20 old_obj =3D handle_to_obj(handle); obj_to_location(old_obj, &dummy, &obj_idx); - new_obj =3D (unsigned long)location_to_obj(newzpdesc, obj_idx); + new_obj =3D location_to_obj(newzpdesc, obj_idx, + obj_to_class_idx(old_obj)); record_obj(handle, new_obj); } } --=20 2.34.1 From nobody Sat May 30 13:23:32 2026 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6256F29B77E for ; Wed, 27 May 2026 12:01:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779883305; cv=none; b=lM+zvsIdX68GIOete6/mfwHLiLQVIuLzq2CcZBY7XZLEfICnB1X8slX5/6uQkPLvgVngqP564d/k9IbZ67P1uyJuXhQwrj1311tqbfkSvaz3E2QZhwZ1Zt1FyQrOMlTDGFJzTbZIt/e6DD00BlhWHtnhecLwh9C27NGt9W3SCM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779883305; c=relaxed/simple; bh=Xxr95N7FoI88mjkApknerWpKhMCUee+03JJlZDpcOYk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=f3Pl8y3bnwPy/zp3SaqLEaHy/P7iGddqTQ1XWIA1/6SDVc+msuCDoJwrNIwwLinwygrrOGjd/jV+SVI5b0+DQn+DxyYuVkEWwIfvUJ/tYymUh72aQYfAcD+qDU9RM01AkepO1DwsoYuqSQhTM9wMZVlefGDSdVcE6/WAcW0R1rc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=qLqEzAsR; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="qLqEzAsR" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2b458ca2296so86998425ad.0 for ; Wed, 27 May 2026 05:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779883302; x=1780488102; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8QbCz8rR6YzSa7i52By5W+WoT1arBQd3V3OYvXGtxm4=; b=qLqEzAsRPQYMqJjHi86/hkjULC01KzRqdks3iTYNAYAtf3RlDcToZfIMzQ2dcyFhCB KC6xoy3BfpVQE54ZP88zN8u2ZOFlSh6bASc6cHNJ3p+wXuh3CvUn/3J5Rq0kPc13zHMZ rBCl25SRzdvAxgw4IpilrSNoHqekS7icvPFbt4feDSRA8SpRFWvglRQm6upZ5AEXkDlm 3ft6ozLd5bDwPpDssbA7osh10v2bOR71zGhEWcPHYWV0/QOuszkrrih1ING+mTVNJNB+ LxLtXey/GvAeXt4f/S00e1A/GGzGK54MhewnxJ5t6wI7tEA/KvSOhQXo9ItHDCJoEPqT DsZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779883302; x=1780488102; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=8QbCz8rR6YzSa7i52By5W+WoT1arBQd3V3OYvXGtxm4=; b=swxEhXA556IkL6dzzeviKHuyvIfDZ5BeHrhtVeSCvcKrL4XQeBHeP841U1vFnnezjd 2V2sYgzcaMqpiHQfK/3V7jQCM2tY0jeEsw369yYLyvHye/CpihBZdjKHKllzScSSuiN1 46CYAttwIIPXU9UhM3tbv3DTyp4bTdW8lJ6I89wAWlMFDLf0rq3VSFa4lAkuIG3SsfCF 9VAaXEz7qJzf7jbbI2dxddBRTer1yfBKko/iOqY8Ke/UMOWznCl2ycviv0fWYKf2LepO opY/DISqcBSjPYgFmwoubltl+U5HFa3Ttotqpa1bjXZ0xmCX5xNd2xEpu25p0K6H6nga Y9Cw== X-Forwarded-Encrypted: i=1; AFNElJ+ew5AMoyqatRUkHCfJTQK1BYWbGAQhimpV5n2Dj1iD48R3+FjF3lypZHP1dlPGd+LKzySYQdQdfPu1EIE=@vger.kernel.org X-Gm-Message-State: AOJu0YyQt5EH6iwnHZphMGjZcR/xnVc3xvyjuYPQfQ62zEeXvg7v9vop kgtCzp7xmzQPNuOUmET+FilRB35B2Iy38+CgsWJyGzXc1h1lvRD3iYKk X-Gm-Gg: Acq92OEbWMF5RUhmo5fFEtV2uL8V+wtXDIBK7J1RI3/lEOkcYderOoR6HhJQqqpReQH ggLIwUsp28D137vp0nQJxd63Xm6iD6uWVRNoucw/b5Or1bcZo6xgakNBuHZC7ZPhmwP1V2e9iDG +x4/I/askrjnHWapt6ZO9QpZU+hddUCC7tcw8k57/eJyLR0dc8o+V9Y1vvwOVAPSHBHKtvi2CWo RtHvF4CkzeSlEykqftLprKTzx32HfrECsQrKbGZl26x/7UrruQuwU6OPjjgSOLM4w4KkRpY52vd wAKMxdGW3+tb/tb6nVKxQCKCpxUmA3oyT1E7gD9/0/MCvzrJKxlc7yy8zBWZpuyta0nfAniNYfI s+CK/G3+oCzYecuXR6Rcfi4WopGxW6E/ebcWr43inQfn8PB4c9yuu9kuXD9N9ByGuuOK+nLF2AD ppOW9jrI9wVDuQCkMqYqRbbjSOZfdFNzlzCXNmmo90bkca0xcX X-Received: by 2002:a17:903:19e8:b0:2bd:b6f4:4500 with SMTP id d9443c01a7336-2beb038c123mr252408395ad.11.1779883301141; Wed, 27 May 2026 05:01:41 -0700 (PDT) Received: from ubuntu22.mioffice.cn ([43.224.245.232]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2beb56b5886sm136250465ad.20.2026.05.27.05.01.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 05:01:40 -0700 (PDT) From: Wenchao Hao X-Google-Original-From: Wenchao Hao To: Andrew Morton , Nhat Pham , Yosry Ahmed , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim , Sergey Senozhatsky Cc: Wenchao Hao Subject: [RFC PATCH v2 2/3] mm/zsmalloc: drop pool->lock from zs_free on 64-bit systems Date: Wed, 27 May 2026 19:59:29 +0800 Message-Id: <20260527115930.3138213-3-haowenchao@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260527115930.3138213-1-haowenchao@xiaomi.com> References: <20260527115930.3138213-1-haowenchao@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With class_idx encoded in obj, zs_free() can locate the size_class without holding pool->lock on 64-bit systems. Page migration also takes class->lock and only rewrites the PFN field of obj, so: 1. read obj locklessly, 2. lock the size_class derived from obj's class_idx, 3. re-read obj under class->lock to get a stable PFN. This eliminates the rwlock read-side cacheline bouncing between zs_free() and migration/compaction on multi-core systems. On 32-bit systems pool->lock is preserved. Signed-off-by: Wenchao Hao Reviewed-by: Nhat Pham --- mm/zsmalloc.c | 67 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 55 insertions(+), 12 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 6b0014b43408..88d10f814da9 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -21,6 +21,10 @@ * pool->lock * class->lock * zspage->lock + * + * On 64-bit systems zs_free() does not take pool->lock; it locates + * the size_class using class_idx encoded in obj and serializes against + * page migration via class->lock. */ =20 #include @@ -1432,10 +1436,59 @@ static void obj_free(int class_size, unsigned long = obj) mod_zspage_inuse(zspage, -1); } =20 +/* + * Resolve @handle to its zspage / size_class and acquire class->lock. + * + * 64-bit: class_idx is encoded in obj and is invariant under page + * migration, so the handle can be read locklessly to pick the + * size_class. Once class->lock is held migration is blocked and the + * handle is re-read to obtain a stable PFN. + * + * 32-bit: the unlocked load of *(unsigned long *)handle is not + * single-copy atomic and class_idx is not encoded (ZS_OBJ_CLASS_BITS + * =3D=3D 0), so fall back to pool->lock for the lookup. + */ +#if BITS_PER_LONG >=3D 64 +static inline void obj_handle_class_lock(struct zs_pool *pool, unsigned lo= ng handle, + unsigned long *objp, struct zspage **zspagep, + struct size_class **classp) + __acquires(&(*classp)->lock) +{ + struct zpdesc *f_zpdesc; + unsigned long obj; + + obj =3D handle_to_obj(handle); + *classp =3D pool->size_class[obj_to_class_idx(obj)]; + spin_lock(&(*classp)->lock); + /* Re-read under class->lock: PFN is now stable vs migration. */ + obj =3D handle_to_obj(handle); + obj_to_zpdesc(obj, &f_zpdesc); + *zspagep =3D get_zspage(f_zpdesc); + *objp =3D obj; +} +#else +static inline void obj_handle_class_lock(struct zs_pool *pool, unsigned lo= ng handle, + unsigned long *objp, struct zspage **zspagep, + struct size_class **classp) + __acquires(&(*classp)->lock) +{ + struct zpdesc *f_zpdesc; + unsigned long obj; + + read_lock(&pool->lock); + obj =3D handle_to_obj(handle); + obj_to_zpdesc(obj, &f_zpdesc); + *zspagep =3D get_zspage(f_zpdesc); + *classp =3D zspage_class(pool, *zspagep); + spin_lock(&(*classp)->lock); + read_unlock(&pool->lock); + *objp =3D obj; +} +#endif + void zs_free(struct zs_pool *pool, unsigned long handle) { struct zspage *zspage; - struct zpdesc *f_zpdesc; unsigned long obj; struct size_class *class; int fullness; @@ -1443,17 +1496,7 @@ void zs_free(struct zs_pool *pool, unsigned long han= dle) if (IS_ERR_OR_NULL((void *)handle)) return; =20 - /* - * The pool->lock protects the race with zpage's migration - * so it's safe to get the page from handle. - */ - read_lock(&pool->lock); - obj =3D handle_to_obj(handle); - obj_to_zpdesc(obj, &f_zpdesc); - zspage =3D get_zspage(f_zpdesc); - class =3D zspage_class(pool, zspage); - spin_lock(&class->lock); - read_unlock(&pool->lock); + obj_handle_class_lock(pool, handle, &obj, &zspage, &class); =20 class_stat_sub(class, ZS_OBJS_INUSE, 1); obj_free(class->size, obj); --=20 2.34.1 From nobody Sat May 30 13:23:32 2026 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7701F3CC30F for ; Wed, 27 May 2026 12:01:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779883307; cv=none; b=fOC4Ujhf/4QM0sliiS4rwU9eyLu7EdBw2sAucJSbAi0PDQSQiP+lJxXejm/4c85SunIReWmy6yMJQCUbDOF79sh/6946NfJIJslV94OmP3tXuIFsB+fG3LG9OO2GCh+dFUncJA4TEa3K1IrhvzMxVtARTEH/xAIRfYdUTttXfYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779883307; c=relaxed/simple; bh=mqCjFoA3Bl5H+Kl1p6vF/4bkM7qtfYlpBLnONaydEDU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cQ8RjvWmhtBPDLdhBBOobqfGqeQFAvwB60BeO3ugovcvWvGkFw/uwQqxhe19xSgUrjr8ipTbduaPMOJtZeF4v/0TzXi/ZUddaJwV1w9/CRyNIl0G/lyba7bGzAivZG2wqLcD7Oh1ysRLJy24oaYw0056l4emYA05o5mjC1olZBg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hkqOkRFs; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hkqOkRFs" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2ba6485d219so89507885ad.3 for ; Wed, 27 May 2026 05:01:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779883304; x=1780488104; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3mefnb7wtXl9fo2oJQ/q5HPl8VRiw71wTKLdzGgpyQU=; b=hkqOkRFsOg3GJrO/VF/1vjXQmhvmSrh5nMUEwj08AG37y2vHIto0mOvq+qHi9BCO3E J7WaqoC5fYuvZRRtjCsVZuwD85Gb4KTuypOngLRfZ+kMbjJQryzSLRTUuPOPJMsPHy/A PFPGP9mAdAsWbExCdw9GysPEyI9OFSmokajeKChOrbFUjiE/SoxxN+A5satscYRnchLH oQRc2sIUjblPHrXVptaT5yeczFuabJU0/WbI7ffk5lw4I3apowHUajAbjR0f4PDOxlAQ /cYYiVd10tJ/Dpgd3KH8CtoSFuDAUv9jy7yebl0OMixcw5WA4+BIVxs9Tu993Jd+/BQB dR0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779883304; x=1780488104; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=3mefnb7wtXl9fo2oJQ/q5HPl8VRiw71wTKLdzGgpyQU=; b=E5UxUYneJU7AI8VJZPE5tWwYIK/BSLcBtJM9N23ovRSr06aIdRa15Rm46AnuRuIA6n krirAbnNcWUNK1F5u1sEGbfDSEcL5mfEWuQaKEkIZmKuyy1pwrYWlEcSFeMznAwdrfMP iPsU58SttYm+PvEk9wlRGIzHsqwStFDYBWoCoUJc3KtLrn9iVzcoYKj72cqexcZ83Zpl b6sPXNeGEPDPozksl9YxWrWEhZM1Kun4sUFF2Sn8fRRS/s+5qu7A/MA3nmUPeiKe7l1Q eymOJ2TfQ5Dq/wrZubuJAPuoZ1344ULqDnrwnfIcxDS1NvKGEsodpU0eT7V+xHG0Xh3F uX8g== X-Forwarded-Encrypted: i=1; AFNElJ+W+p3xwmyIGwr/bkO4NjMwB8JXq7YZB48YJ6JFu0/i7Teb6wazzTKxoOTd8ssHB7j1DY5kMCYVRmMdhJ0=@vger.kernel.org X-Gm-Message-State: AOJu0YxGTdWVrNUOt2egB9Uy2Ht1WnlXy8pZH4qcLcwU6m6TBuONzMcE 9agCm4L57kJ3CmAtSxoGeiPigIgkEwsreLJWMFqzy5lM827yqROvlUC7WPr1nw== X-Gm-Gg: Acq92OF/JFLwIM/1O7goPu8I3WM9tv65d+nPATYAMWBOgi13JlSkvHS2Q4Q9pL2uPaB MDGJnle1fJWgJW5UVSQYtm2EqEKl1Yg2w3lU1nqGoSaAN7MrA9VQmqXJgThyQh39kE8PxAjtRrO Vz+nOYs4+g/ZjrmjE3tk390pC7x9gXxdnuvx27cnON9BRSeg2Rh5ktAM7TNWOJ+RuBSN51X6w0J E7M7gfj3ZK/kvgT224ER7ccm5Gd4bM/4CQ6ljowgBf/8wdqJpiHBngS7EgysWYqMtB/hZp7djxY KfXMpl4RQlST/av4dNvmiqvA9tRYgTe6XjF4QA0zDOlfY15jYfyaoOYQSktGcI4Mx8rWTXQ2I9h ft0S0WGwhkY+yTUIhNnNMzp/KDIbecq8we6Eakd5AmsT+43L4iKQFt2ixL8woT5lsb63flBBldC dT0fKXKoDJoGRnFESG+jpbpWLBmnrUhmmEDCB9e/U463wvOUAk X-Received: by 2002:a17:903:b85:b0:2ba:bfc:7699 with SMTP id d9443c01a7336-2beb05908eamr256392215ad.17.1779883304577; Wed, 27 May 2026 05:01:44 -0700 (PDT) Received: from ubuntu22.mioffice.cn ([43.224.245.232]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2beb56b5886sm136250465ad.20.2026.05.27.05.01.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 05:01:44 -0700 (PDT) From: Wenchao Hao X-Google-Original-From: Wenchao Hao To: Andrew Morton , Nhat Pham , Yosry Ahmed , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Minchan Kim , Sergey Senozhatsky Cc: Xueyuan Chen , Wenchao Hao Subject: [RFC PATCH v2 3/3] mm/zsmalloc: drop class lock before freeing zspage Date: Wed, 27 May 2026 19:59:30 +0800 Message-Id: <20260527115930.3138213-4-haowenchao@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260527115930.3138213-1-haowenchao@xiaomi.com> References: <20260527115930.3138213-1-haowenchao@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Xueyuan Chen Currently in zs_free(), the class->lock is held until the zspage is completely freed and the counters are updated. However, freeing pages back to the buddy allocator requires acquiring the zone lock. Under heavy memory pressure, zone lock contention can be severe. When this happens, the CPU holding the class->lock will stall waiting for the zone lock, thereby blocking all other CPUs attempting to acquire the same class->lock. This patch shrinks the critical section of the class->lock to reduce lock contention. By moving the actual page freeing process outside the class->lock, we can improve the concurrency performance of zs_free(). Testing on the RADXA O6 platform shows that with 12 CPUs concurrently performing zs_free() operations, the execution time is reduced by 20%. Signed-off-by: Xueyuan Chen Signed-off-by: Wenchao Hao Reviewed-by: Joshua Hahn Reviewed-by: Nhat Pham --- mm/zsmalloc.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 88d10f814da9..5511c347d00b 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -856,13 +856,10 @@ static int trylock_zspage(struct zspage *zspage) return 0; } =20 -static void __free_zspage(struct zs_pool *pool, struct size_class *class, - struct zspage *zspage) +static inline void __free_zspage_lockless(struct zs_pool *pool, struct zsp= age *zspage) { struct zpdesc *zpdesc, *next; =20 - assert_spin_locked(&class->lock); - VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(zspage->fullness !=3D ZS_INUSE_RATIO_0); =20 @@ -878,7 +875,13 @@ static void __free_zspage(struct zs_pool *pool, struct= size_class *class, } while (zpdesc !=3D NULL); =20 cache_free_zspage(zspage); +} =20 +static void __free_zspage(struct zs_pool *pool, struct size_class *class, + struct zspage *zspage) +{ + assert_spin_locked(&class->lock); + __free_zspage_lockless(pool, zspage); class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage); atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated); } @@ -1492,6 +1495,7 @@ void zs_free(struct zs_pool *pool, unsigned long hand= le) unsigned long obj; struct size_class *class; int fullness; + struct zspage *zspage_to_free =3D NULL; =20 if (IS_ERR_OR_NULL((void *)handle)) return; @@ -1502,10 +1506,22 @@ void zs_free(struct zs_pool *pool, unsigned long ha= ndle) obj_free(class->size, obj); =20 fullness =3D fix_fullness_group(class, zspage); - if (fullness =3D=3D ZS_INUSE_RATIO_0) - free_zspage(pool, class, zspage); + if (fullness =3D=3D ZS_INUSE_RATIO_0) { + if (trylock_zspage(zspage)) { + remove_zspage(class, zspage); + class_stat_sub(class, ZS_OBJS_ALLOCATED, + class->objs_per_zspage); + zspage_to_free =3D zspage; + } else + kick_deferred_free(pool); + } =20 spin_unlock(&class->lock); + + if (likely(zspage_to_free)) { + __free_zspage_lockless(pool, zspage_to_free); + atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated); + } cache_free_handle(handle); } EXPORT_SYMBOL_GPL(zs_free); --=20 2.34.1