From nobody Fri Jan 31 00:21:00 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB05F16F0CF for ; Mon, 27 Jan 2025 08:03:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965015; cv=none; b=rvGmX9ZIwZU/7DXDTDGNmLcBJEeYfaPLMDuMfo+St2gUAYAbAhzbZ45SAoDGNTIylKlprnAiG4ntTB0KhfAtzyeK/6zBE+deDU8EaaFvkvabSYGrj+T60n0m4PEcP8d/OzRj3CIVHy+Y2oJUAIBMrUaCodV3+ZsjiGMbfE7OUiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737965015; c=relaxed/simple; bh=O2ZhDJ+hwIsgPn9zHz7znAiP52e4YpvGu7Dn5GgjGms=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oeDKd06OqisGUnD3nZL3ezU6V/ej+4I6PdrONWUr7VWrjFQfsf25jiw/3SZRp26JhjuqXdOM4J5Ip5zOVRMQIlzk5KEPSoxeGrIBjBQhxR9AFWGqQ7lPeRj2Hg4ZnKleFPC3VpfN53871zVcnx9Og5B1GcemOClYs274WvdWGak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=iIP3Zo3L; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="iIP3Zo3L" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2162c0f6a39so91693265ad.0 for ; Mon, 27 Jan 2025 00:03:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737965013; x=1738569813; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=iIP3Zo3LgdKJkVYpQnXP4VxaaJy9S2eB3jFZk2GlH1m9WmuqLkL3gkWkRXqHo51hB8 HRmjZFxu9ZSVTsxwTPBt+/T6iuFn7GOBjSJkD7Rn7NMpIOw3FYz5hDGVe22Gwt7eaBme TorZ3BgEmffTbu+Q6xq9sxHUEGtDm/apn2FEc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737965013; x=1738569813; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=K7C0puOLPdqRQdeK8H54LA77DtIg/3SVpq9a65Z04b4jGme7eznepyuqwHuigjeR9w hgRGNnGkMcEj3E7UdsU7zjCXarhSIHgUiNdfGLoQuqmPfQPEUamLEZn1TTRIW1DKiQvO IWNd8s1tco3bWp+PuNtUE5jDxcDNtKIj2WWDezQqfZolRgSagiIp29VkIt/0o4erBjDy VFJ+qdqu9JZHpxWp2W6PHM6292hd2WGlGJMrS2q75ujZWeV8qQzj6uD8tEKtF3MmYEoj 28LP93HUmnrJBEtpvZ+xKZPJ3hVC0qVyQQ3STG5dWNDFT0Ko+889y2V2tBwdhYluZkTh sLew== X-Forwarded-Encrypted: i=1; AJvYcCUKhC+J196EnrIhNJgZREDge6suDaEY++Z+b8H2xSgFFOnds/bS1OPc/Wtttc2PD5SlMhIgLgaXB0Lw1HA=@vger.kernel.org X-Gm-Message-State: AOJu0Yxvv9bLUTSjpa/KKvZFUv3SEzogdXnDvxdXsiFhVHHzMdfI2/B4 9/NXh2RJZ/vfrRxmJZHMshFHijS/n+sUiwrAg+Cy1sMI0CWBfsd6NzecsSEU1A== X-Gm-Gg: ASbGncsgkEWSvQE/ercQT1k+h83FzVSltuvABXFggqLW5NoXnUxBKXPGHtC+lrOkfn/ oYpoOoRtZQJhbZlQemoy5kmtkvmQuFJfAPxh5KQf0NewepNR8uMBZIJRPj1Sa5VVElVvhQfEEMb FiwmaagDJKGUVmjz4QCYnwG7pqOcL5la4fvlGbG4w/Q3kulikx0yhXHwNCujRjMZYW9qLAd2mA1 L3jFr7jDIQFzY9/mS25VsB5X9yxoF3GkoToFkwTRnRGZzkGpUXzfrNSAVUWMcbpZaaxOO1rWlzk ndsuZNk= X-Google-Smtp-Source: AGHT+IHCMZnq78KzilDYy3SoIuDu6zyyr9bdzoDlo65Vrz4W4RnlHVGP0xG9ma7YfvS5hlyVHvHmww== X-Received: by 2002:a05:6a00:ac4:b0:725:d9b6:3952 with SMTP id d2e1a72fcca58-72f7d1d9573mr28441245b3a.3.1737965012759; Mon, 27 Jan 2025 00:03:32 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-72f8a78dfa2sm6426964b3a.157.2025.01.27.00.03.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:32 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API Date: Mon, 27 Jan 2025 16:59:30 +0900 Message-ID: <20250127080254.1302026-6-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce new API to map/unmap zsmalloc handle/object. The key difference is that this API does not impose atomicity restrictions on its users, unlike zs_map_object() which returns with page-faults and preemption disabled - handle mapping API does not need a per-CPU vm-area because the users are required to provide an aux buffer for objects that span several physical pages. Keep zs_map_object/zs_unmap_object for the time being, as there are still users of it, but eventually old API will be removed. Signed-off-by: Sergey Senozhatsky --- include/linux/zsmalloc.h | 29 ++++++++ mm/zsmalloc.c | 148 ++++++++++++++++++++++++++++----------- 2 files changed, 138 insertions(+), 39 deletions(-) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index a48cd0ffe57d..72d84537dd38 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -58,4 +58,33 @@ unsigned long zs_compact(struct zs_pool *pool); unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size= ); =20 void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); + +struct zs_handle_mapping { + unsigned long handle; + /* Points to start of the object data either within local_copy or + * within local_mapping. This is what callers should use to access + * or modify handle data. + */ + void *handle_mem; + + enum zs_mapmode mode; + union { + /* + * Handle object data copied, because it spans across several + * (non-contiguous) physical pages. This pointer should be + * set by the zs_map_handle() caller beforehand and should + * never be accessed directly. + */ + void *local_copy; + /* + * Handle object mapped directly. Should never be used + * directly. + */ + void *local_mapping; + }; +}; + +int zs_map_handle(struct zs_pool *pool, struct zs_handle_mapping *map); +void zs_unmap_handle(struct zs_pool *pool, struct zs_handle_mapping *map); + #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index a5c1f9852072..281bba4a3277 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1132,18 +1132,14 @@ static inline void __zs_cpu_down(struct mapping_are= a *area) area->vm_buf =3D NULL; } =20 -static void *__zs_map_object(struct mapping_area *area, - struct zpdesc *zpdescs[2], int off, int size) +static void zs_obj_copyin(void *buf, struct zpdesc *zpdesc, int off, int s= ize) { + struct zpdesc *zpdescs[2]; size_t sizes[2]; - char *buf =3D area->vm_buf; - - /* disable page faults to match kmap_local_page() return conditions */ - pagefault_disable(); =20 - /* no read fastpath */ - if (area->vm_mm =3D=3D ZS_MM_WO) - goto out; + zpdescs[0] =3D zpdesc; + zpdescs[1] =3D get_next_zpdesc(zpdesc); + BUG_ON(!zpdescs[1]); =20 sizes[0] =3D PAGE_SIZE - off; sizes[1] =3D size - sizes[0]; @@ -1151,21 +1147,17 @@ static void *__zs_map_object(struct mapping_area *a= rea, /* copy object to per-cpu buffer */ memcpy_from_page(buf, zpdesc_page(zpdescs[0]), off, sizes[0]); memcpy_from_page(buf + sizes[0], zpdesc_page(zpdescs[1]), 0, sizes[1]); -out: - return area->vm_buf; } =20 -static void __zs_unmap_object(struct mapping_area *area, - struct zpdesc *zpdescs[2], int off, int size) +static void zs_obj_copyout(void *buf, struct zpdesc *zpdesc, int off, int = size) { + struct zpdesc *zpdescs[2]; size_t sizes[2]; - char *buf; =20 - /* no write fastpath */ - if (area->vm_mm =3D=3D ZS_MM_RO) - goto out; + zpdescs[0] =3D zpdesc; + zpdescs[1] =3D get_next_zpdesc(zpdesc); + BUG_ON(!zpdescs[1]); =20 - buf =3D area->vm_buf; buf =3D buf + ZS_HANDLE_SIZE; size -=3D ZS_HANDLE_SIZE; off +=3D ZS_HANDLE_SIZE; @@ -1176,10 +1168,6 @@ static void __zs_unmap_object(struct mapping_area *a= rea, /* copy per-cpu buffer to object */ memcpy_to_page(zpdesc_page(zpdescs[0]), off, buf, sizes[0]); memcpy_to_page(zpdesc_page(zpdescs[1]), 0, buf + sizes[0], sizes[1]); - -out: - /* enable page faults to match kunmap_local() return conditions */ - pagefault_enable(); } =20 static int zs_cpu_prepare(unsigned int cpu) @@ -1260,6 +1248,8 @@ EXPORT_SYMBOL_GPL(zs_get_total_pages); * against nested mappings. * * This function returns with preemption and page faults disabled. + * + * NOTE: this function is deprecated and will be removed. */ void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm) @@ -1268,10 +1258,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned l= ong handle, struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; - struct size_class *class; struct mapping_area *area; - struct zpdesc *zpdescs[2]; void *ret; =20 /* @@ -1309,12 +1297,14 @@ void *zs_map_object(struct zs_pool *pool, unsigned = long handle, goto out; } =20 - /* this object spans two pages */ - zpdescs[0] =3D zpdesc; - zpdescs[1] =3D get_next_zpdesc(zpdesc); - BUG_ON(!zpdescs[1]); + ret =3D area->vm_buf; + /* disable page faults to match kmap_local_page() return conditions */ + pagefault_disable(); + if (mm !=3D ZS_MM_WO) { + /* this object spans two pages */ + zs_obj_copyin(area->vm_buf, zpdesc, off, class->size); + } =20 - ret =3D __zs_map_object(area, zpdescs, off, class->size); out: if (likely(!ZsHugePage(zspage))) ret +=3D ZS_HANDLE_SIZE; @@ -1323,13 +1313,13 @@ void *zs_map_object(struct zs_pool *pool, unsigned = long handle, } EXPORT_SYMBOL_GPL(zs_map_object); =20 +/* NOTE: this function is deprecated and will be removed. */ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) { struct zspage *zspage; struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; - struct size_class *class; struct mapping_area *area; =20 @@ -1340,23 +1330,103 @@ void zs_unmap_object(struct zs_pool *pool, unsigne= d long handle) off =3D offset_in_page(class->size * obj_idx); =20 area =3D this_cpu_ptr(&zs_map_area); - if (off + class->size <=3D PAGE_SIZE) + if (off + class->size <=3D PAGE_SIZE) { kunmap_local(area->vm_addr); - else { - struct zpdesc *zpdescs[2]; + goto out; + } =20 - zpdescs[0] =3D zpdesc; - zpdescs[1] =3D get_next_zpdesc(zpdesc); - BUG_ON(!zpdescs[1]); + if (area->vm_mm !=3D ZS_MM_RO) + zs_obj_copyout(area->vm_buf, zpdesc, off, class->size); + /* enable page faults to match kunmap_local() return conditions */ + pagefault_enable(); =20 - __zs_unmap_object(area, zpdescs, off, class->size); - } +out: local_unlock(&zs_map_area.lock); - zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_unmap_object); =20 +void zs_unmap_handle(struct zs_pool *pool, struct zs_handle_mapping *map) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + obj =3D handle_to_obj(map->handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage =3D get_zspage(zpdesc); + class =3D zspage_class(pool, zspage); + off =3D offset_in_page(class->size * obj_idx); + + if (off + class->size <=3D PAGE_SIZE) { + kunmap_local(map->local_mapping); + goto out; + } + + if (map->mode !=3D ZS_MM_RO) + zs_obj_copyout(map->local_copy, zpdesc, off, class->size); + +out: + zspage_read_unlock(zspage); +} +EXPORT_SYMBOL_GPL(zs_unmap_handle); + +int zs_map_handle(struct zs_pool *pool, struct zs_handle_mapping *map) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + WARN_ON(in_interrupt()); + + /* It guarantees it can get zspage from handle safely */ + pool_read_lock(pool); + obj =3D handle_to_obj(map->handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage =3D get_zspage(zpdesc); + + /* + * migration cannot move any zpages in this zspage. Here, class->lock + * is too heavy since callers would take some time until they calls + * zs_unmap_object API so delegate the locking from class to zspage + * which is smaller granularity. + */ + zspage_read_lock(zspage); + pool_read_unlock(pool); + + class =3D zspage_class(pool, zspage); + off =3D offset_in_page(class->size * obj_idx); + + if (off + class->size <=3D PAGE_SIZE) { + /* this object is contained entirely within a page */ + map->local_mapping =3D kmap_local_zpdesc(zpdesc); + map->handle_mem =3D map->local_mapping + off; + goto out; + } + + if (WARN_ON_ONCE(!map->local_copy)) { + zspage_read_unlock(zspage); + return -EINVAL; + } + + map->handle_mem =3D map->local_copy; + if (map->mode !=3D ZS_MM_WO) { + /* this object spans two pages */ + zs_obj_copyin(map->local_copy, zpdesc, off, class->size); + } + +out: + if (likely(!ZsHugePage(zspage))) + map->handle_mem +=3D ZS_HANDLE_SIZE; + + return 0; +} +EXPORT_SYMBOL_GPL(zs_map_handle); + /** * zs_huge_class_size() - Returns the size (in bytes) of the first huge * zsmalloc &size_class. --=20 2.48.1.262.g85cc9f2d1e-goog