From nobody Sun May 24 21:39:23 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69402314B8C for ; Thu, 21 May 2026 04:03:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336189; cv=none; b=Z00KzoIpuZb13lIlnabPT3eQVZ6j+jSNND7T+yxAXy+M/+916zVxIpTXtxWxqCAs2fmQ5tICD2USq3fkw9OcF768+KfRrnSy8UmUNGaujVeusj0+3TvahDCqjx1145rxHSb1aLMM2Wv7+OFEURkvK4RIoaLOSlQv74+MggVlaJU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336189; c=relaxed/simple; bh=GWyeI+m+rFWI8ArYKyE1ilMxvNLhqzxY8+cdQAg4Wjg=; h=Subject:Content-Type:Message-Id:References:Date:In-Reply-To:To:Cc: From:Mime-Version; b=Rx8Pz0bq6tdrfaKgN1IHKAdi5ZjEKyESqmd5nQ0ylnfvQBpigxydDRgHwXw/nG36i3D7dHTyeh+NnQyPYpgVoPKHve++SrsVWnvvbdKBBKUOlyzotU6PT/EbkT+IjPUKW5cVx1LOfp2jLXuIS6KXdx52CfhpTNclmefcly4qgCA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=nN5kh5pf; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="nN5kh5pf" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336182; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=r8qhTx7yx3vMMJAHefyPo3ETAaMxzAsg7BKGCu+RwJg=; b=nN5kh5pfL9Zv5Rs68RqL8v/tzSR9cbPc9Ahep90oEPq2WAuLw3HRCUIOczDU7zuo2HXn4Q gW5eEyG+XzrZ+S/AwwjMzJayY4LHpZ9ueORqRbF94qX6jtKsOjTF9YUSOFtKzMGkk9Ikvh KCLrJ9h6nXSV60UxS90tqJmH6ARyqmniszXZe8fm9vsV5hu6004uy1jrj49i4BGs5GQsbp fpjvPTKEzYVeoNDeQtDxDBfLQhEkhHKdXZNd+hVOrqC9E+fmt9coGOOBWHpoC3cPbxxKTj frlC7hknG0Yrxy8HxyfKl4cKCfHKTu3q8ZVzSojPLIO5Km3KSHBM7BYvXHtr5w== Subject: [PATCH v2 1/7] mm: factor zone-device page init helpers out of __init_zone_device_page X-Original-From: Li Zhe Message-Id: <20260521040124.10608-2-lizhe.67@bytedance.com> References: <20260521040124.10608-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 Date: Thu, 21 May 2026 12:01:18 +0800 In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Lms-Return-Path: To: , , , , , , , , , Cc: , , , , , From: "Li Zhe" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" memmap_init_zone_device() currently mixes refcount policy, core ZONE_DEVICE page setup, and pageblock metadata handling in a single helper. Factor the refcount policy into zone_device_page_init_refcount(), move the common page initialization into __zone_device_page_init(), split pageblock handling into zone_device_page_init_pageblock(), and wrap the existing slow path in zone_device_page_init_slow(). This keeps the slow-path behaviour unchanged and gives later patches reusable helper boundaries. No functional change intended. Signed-off-by: Li Zhe Reviewed-by: Mike Rapoport (Microsoft) --- mm/mm_init.c | 64 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 41 insertions(+), 23 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index f9f8e1af921c..4ba506df93bc 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -987,11 +987,36 @@ static void __init memmap_init(void) } =20 #ifdef CONFIG_ZONE_DEVICE -static void __ref __init_zone_device_page(struct page *page, unsigned long= pfn, +static inline int zone_device_page_init_refcount( + const struct dev_pagemap *pgmap) +{ + /* + * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released + * directly to the driver page allocator which will set the page count + * to 1 when allocating the page. + * + * MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have + * their refcount reset to one whenever they are freed (ie. after + * their refcount drops to 0). + */ + switch (pgmap->type) { + case MEMORY_DEVICE_FS_DAX: + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: + case MEMORY_DEVICE_PCI_P2PDMA: + return 0; + case MEMORY_DEVICE_GENERIC: + return 1; + default: + WARN_ONCE(1, "Unknown memory type!"); + return 1; + } +} + +static void __ref __zone_device_page_init(struct page *page, unsigned long= pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap) { - __init_single_page(page, pfn, zone_idx, nid); =20 /* @@ -1010,7 +1035,11 @@ static void __ref __init_zone_device_page(struct pag= e *page, unsigned long pfn, */ page_folio(page)->pgmap =3D pgmap; page->zone_device_data =3D NULL; +} =20 +static void __ref zone_device_page_init_pageblock(struct page *page, + unsigned long pfn) +{ /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -1025,27 +1054,16 @@ static void __ref __init_zone_device_page(struct pa= ge *page, unsigned long pfn, init_pageblock_migratetype(page, MIGRATE_MOVABLE, false); cond_resched(); } +} =20 - /* - * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released - * directly to the driver page allocator which will set the page count - * to 1 when allocating the page. - * - * MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have - * their refcount reset to one whenever they are freed (ie. after - * their refcount drops to 0). - */ - switch (pgmap->type) { - case MEMORY_DEVICE_FS_DAX: - case MEMORY_DEVICE_PRIVATE: - case MEMORY_DEVICE_COHERENT: - case MEMORY_DEVICE_PCI_P2PDMA: +static void __ref zone_device_page_init_slow(struct page *page, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ + __zone_device_page_init(page, pfn, zone_idx, nid, pgmap); + if (!zone_device_page_init_refcount(pgmap)) set_page_count(page, 0); - break; - - case MEMORY_DEVICE_GENERIC: - break; - } + zone_device_page_init_pageblock(page, pfn); } =20 /* @@ -1084,7 +1102,7 @@ static void __ref memmap_init_compound(struct page *h= ead, for (pfn =3D head_pfn + 1; pfn < end_pfn; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); prep_compound_tail(page, head, order); set_page_count(page, 0); } @@ -1120,7 +1138,7 @@ void __ref memmap_init_zone_device(struct zone *zone, for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); =20 - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); =20 if (pfns_per_compound =3D=3D 1) continue; --=20 2.20.1 From nobody Sun May 24 21:39:23 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 785172853F3 for ; Thu, 21 May 2026 04:03:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336210; cv=none; b=UYRfweOW8BUSHdJxT/N9nR1Uze6Yiqyl7aQOwV5hEc820ISsxKxZ2jeEtEOEhXYTB8aJqqa8wCzZp/6ZY2zlWK/VFqFjnDQZm84DUrGoXXoIw271NygOH1qheLsBwRGGMOLwW3NnhQv8nTHLbMETM8jDcs4fb/tZQM1cdxc2q9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336210; c=relaxed/simple; bh=7rC09xI4HoYDikXsCEwPGZ6dlO7nH70xTrI0pEzY2uE=; h=Message-Id:Subject:References:Mime-Version:In-Reply-To:Cc:From: Date:Content-Type:To; b=OH2pb4hn64GzMluu5RL5CI+IrB2sQ3rlMsVcWDNn4MMi4IMc1uFnv1IgdDjkCrsB5m9dqnZbwH3BHALVLwG20znU+Yiyrvdab/WRKQzD8irTnM485fBdZREDRB6AWv7oCyUKDjFgvY8SP3xDkiQ6cBXe+G7i811rhVW9fO9jWcE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=QgheDEo5; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="QgheDEo5" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336203; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=Vp1wEakIyL2i5/j5G04i6m4y/xFgX3dqvqhZlULkuAQ=; b=QgheDEo5ToVzm/8TqFx7/h8iE0Gbm7QBfH+BFf9w9Fjr0o1MpD+Fi6VlIu2KIWugZ4iJ09 rDYsYsVck3IwSZj/+Vxt6MlELouX9Dp0IPw8G7h9PZ7TPyYbPKmAdWWq/PHvvBTUcjv7qk MwV53R1J904aCEbu8dM8wkKQO02a0DCxBbjFsKX1qtZxbPH8aB0k/ilUgy+D7nW53md4LL r3GYT0sAIXUkgV0UqV3SjaBHPO5t66MTXcRFW5U+wEEpvNdgFFFTHbsy2eqPG3wJyQCmWk SB6UZVQcjiOYnWkH2Wt5edZUqJkrHRd3R4J4AQIqZcPz0apVN/kzQuieeYcA7w== Message-Id: <20260521040124.10608-3-lizhe.67@bytedance.com> Subject: [PATCH v2 2/7] mm: add a set_page_section_from_pfn() helper X-Mailer: git-send-email 2.45.2 References: <20260521040124.10608-1-lizhe.67@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> X-Lms-Return-Path: Cc: , , , , , From: "Li Zhe" Date: Thu, 21 May 2026 12:01:19 +0800 To: , , , , , , , , , X-Original-From: Li Zhe Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Callers that want to update section bits from a PFN currently need to open-code: set_page_section(page, pfn_to_section_nr(pfn)); and guard that sequence with #ifdef SECTION_IN_PAGE_FLAGS. Add set_page_section_from_pfn() to wrap that update in one place. When section bits are stored in page flags, the helper derives the section number from the PFN and updates the page flags. Otherwise it degrades to a no-op. Convert set_page_links() to use the new helper so later ZONE_DEVICE fast-path patches can also update section bits without open-coding SECTION_IN_PAGE_FLAGS at each callsite. This keeps the PFN-to-section translation local to the configurations that actually store section bits in struct page flags, and avoids exposing that detail to generic callers. No functional change intended. Signed-off-by: Li Zhe Reviewed-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index af23453e9dbd..bf84e698385c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2507,11 +2507,26 @@ static inline void set_page_section(struct page *pa= ge, unsigned long section) page->flags.f |=3D (section & SECTIONS_MASK) << SECTIONS_PGSHIFT; } =20 +static inline void set_page_section_from_pfn(struct page *page, + unsigned long pfn) +{ + set_page_section(page, pfn_to_section_nr(pfn)); +} + static inline unsigned long memdesc_section(memdesc_flags_t mdf) { return (mdf.f >> SECTIONS_PGSHIFT) & SECTIONS_MASK; } #else /* !SECTION_IN_PAGE_FLAGS */ +static inline void set_page_section(struct page *page, unsigned long secti= on) +{ +} + +static inline void set_page_section_from_pfn(struct page *page, + unsigned long pfn) +{ +} + static inline unsigned long memdesc_section(memdesc_flags_t mdf) { return 0; @@ -2734,9 +2749,7 @@ static inline void set_page_links(struct page *page, = enum zone_type zone, { set_page_zone(page, zone); set_page_node(page, node); -#ifdef SECTION_IN_PAGE_FLAGS - set_page_section(page, pfn_to_section_nr(pfn)); -#endif + set_page_section_from_pfn(page, pfn); } =20 /** --=20 2.20.1 From nobody Sun May 24 21:39:23 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2A102BDC13 for ; Thu, 21 May 2026 04:03:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336231; cv=none; b=HoLldMMcTbsL9o/xngENj3xYn8LIB2cBLdIbaQgVYW05bfq1H3G47AxDhx5jA0eSOWaopYLBmGBEWAv+zmAD+ZMw+csih3hPyf392AWmllAa30qN1Ni47eHEuBJTIqxuXBGS3mU1ez+NZPrMzwm1b84YqSnzOj8J6MSGiRnFoA4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336231; c=relaxed/simple; bh=TUt/dRvpfji4gx0lLQS6xlyPGQUGCXfNhUwSRyiuUKs=; h=From:Mime-Version:Subject:Date:In-Reply-To:To:Cc:References: Content-Type:Message-Id; b=JiQ8vIm1uVUcxLi8wi3UqrPDnkTyoCziR575SS/8xkJ33gRYDbwtNzDa3nPk4rVBRCPwm3Px7S/YFfsnnI3h9s7WhKo900J2XZvPllZMHLxlsZnAtdqiITKGlZX2/YmnypWzzR3j164Of4FD1gDPplj/hAT6ytb20LZ+cvBsWyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=oP+cCJ8d; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="oP+cCJ8d" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336223; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=WzT6rgRltF1pu1V66u6WO6GZ1NJSxQBmJVnmonaqWWc=; b=oP+cCJ8dMsyfbEgYOTRLckiA3oy8+w1BUJHSGMskJ/XDSL48xyyYEZEQf3M4HukHjSZ+T1 H+2kwrHIeYpb/2RiUdGW5vsf7cuVF+/V+XjuG+OH3ZkqLtaNIiq3E6DpJn9/wPRpD9yvF4 sfhF6TBnKmHHyF1HT9naohF3VUdnBKc40YL5E/WV2919o7CYM7HD0Jrdg7U9BW0LqQAUZj 05oCLQ5Tp93LQoBDHFzOCCZP1vzTooGGnBsmbMqWwXK6H6/+5IB438uJDgjtdzKw6jOAtz JHtORZM6hhFoYnVTnUj8rS413VzCoO/a2ryL7LVaCoObJyFiVvxghZKHEmMZnA== Content-Transfer-Encoding: quoted-printable From: "Li Zhe" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Subject: [PATCH v2 3/7] mm: add a template-based fast path for zone-device page init Date: Thu, 21 May 2026 12:01:20 +0800 In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 X-Original-From: Li Zhe To: , , , , , , , , , Cc: , , , , , References: <20260521040124.10608-1-lizhe.67@bytedance.com> Message-Id: <20260521040124.10608-4-lizhe.67@bytedance.com> X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" memmap_init_zone_device() repeats nearly identical head-page initialization for each PFN. Prepare one reusable ZONE_DEVICE head-page template through the existing slow path, copy it into each destination page, and then fix up the PFN-dependent fields after the copy. The optimized path assigns _refcount through the copied template, so keep it disabled when the page_ref_set tracepoint is enabled. Also fall back to the slow path if struct page is not an integral number of u64 words. This patch accelerates the pfns_per_compound =3D=3D 1 case. Compound tails are handled in the next patch. Tested in a VM with a 100 GB fsdax namespace device configured with map=3Ddev on Intel Ice Lake server. This test exercises the nd_pmem rebind path (pfns_per_compound =3D=3D 1). Test procedure: Rebind the nd_pmem driver 30 times and collect the memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding: 1486 ms Average of subsequent rebinds: 273.52 ms With patches 1-3 applied: First binding: 1422 ms Average of subsequent rebinds: 245.73 ms This reduces the average rebind time from 273.52 ms to 245.73 ms, or about 10%. Signed-off-by: Li Zhe --- mm/mm_init.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 69 insertions(+), 1 deletion(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 4ba506df93bc..2992711351a0 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1066,6 +1066,63 @@ static void __ref zone_device_page_init_slow(struct = page *page, zone_device_page_init_pageblock(page, pfn); } =20 +/* + * ZONE_DEVICE depends on MEMORY_HOTPLUG, and MEMORY_HOTPLUG is 64-bit + * only. That means CONFIG_ZONE_DEVICE cannot be enabled on 32-bit + * builds, so this fast-path code does not need a separate 32-bit + * fallback implementation. + */ +static inline bool zone_device_page_init_optimization_enabled(void) +{ + /* + * The template fast path copies a preinitialized struct page as an + * array of u64 words. Skip it when the page_ref_set tracepoint is + * enabled, and fall back to the slow path if struct page is not an + * integral number of u64 words. + */ + return !page_ref_tracepoint_active(page_ref_set) && + IS_ALIGNED(sizeof(struct page), sizeof(u64)); +} + +static inline void zone_device_template_page_init(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap) +{ + __zone_device_page_init(template, pfn, zone_idx, nid, pgmap); + if (!zone_device_page_init_refcount(pgmap)) + set_page_count(template, 0); +} + +/* + * The copied template already provides the PFN-invariant portion of a + * ZONE_DEVICE struct page. Fix up the fields that still depend on @pfn + * after the copy, namely the section bits and page->virtual when present. + */ +static inline void zone_device_page_init_finish(struct page *page, + unsigned long pfn) +{ + set_page_section_from_pfn(page, pfn); +#ifdef WANT_PAGE_VIRTUAL + if (!is_highmem_idx(ZONE_DEVICE)) + set_page_address(page, __va(pfn << PAGE_SHIFT)); +#endif +} + +static void zone_device_page_init_from_template(struct page *page, + unsigned long pfn, const struct page *template) +{ + const u64 *src =3D (const u64 *)template; + u64 *dst =3D (u64 *)page; + unsigned int i; + + for (i =3D 0; i < sizeof(struct page) / sizeof(u64); i++) + dst[i] =3D src[i]; + zone_device_page_init_finish(page, pfn); + zone_device_page_init_pageblock(page, pfn); +} + /* * With compound page geometry and when struct pages are stored in ram most * tail pages are reused. Consequently, the amount of unique struct pages = to @@ -1114,6 +1171,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long nr_pages, struct dev_pagemap *pgmap) { + bool use_template =3D zone_device_page_init_optimization_enabled(); unsigned long pfn, end_pfn =3D start_pfn + nr_pages; struct pglist_data *pgdat =3D zone->zone_pgdat; struct vmem_altmap *altmap =3D pgmap_altmap(pgmap); @@ -1121,6 +1179,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long zone_idx =3D zone_idx(zone); unsigned long start =3D jiffies; int nid =3D pgdat->node_id; + struct page template; =20 if (WARN_ON_ONCE(!pgmap || zone_idx !=3D ZONE_DEVICE)) return; @@ -1135,10 +1194,19 @@ void __ref memmap_init_zone_device(struct zone *zon= e, nr_pages =3D end_pfn - start_pfn; } =20 + if (use_template) + zone_device_template_page_init(&template, start_pfn, zone_idx, + nid, pgmap); + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); =20 - zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); + if (use_template) + zone_device_page_init_from_template(page, pfn, + &template); + else + zone_device_page_init_slow(page, pfn, zone_idx, + nid, pgmap); =20 if (pfns_per_compound =3D=3D 1) continue; --=20 2.20.1 From nobody Sun May 24 21:39:23 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4669D384CF1 for ; Thu, 21 May 2026 04:04:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336250; cv=none; b=sX3KVY45EoxvkH6SUxfWricTQZQNI30F+QhnlXsPTYRkKFbYd5yxyVZbpTEp59kKgrBwCob1JiycQIom6Aquupw83B2gFmCKBRriOdnCH531/VKf0ESzmbB3Xo1ius+Bx5uFwliWJLOvKPjXc5lhZheh8YeSZ4W5hwpJm9zfZZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336250; c=relaxed/simple; bh=RV/jS1HZjbCKHpmnihIY2xDk2fZb53H6/Fht2BOMwkg=; h=To:Cc:In-Reply-To:Content-Type:Date:Mime-Version:References:From: Subject:Message-Id; b=BTjYdMUDt5tM/E3DCNGRRv8qYphII+rsdugGCp+My2a/qTCBebDs93NnWZ/41PMicuSDg+IwtNbBg4b0gBN9pK3xYKOfSVEHqetFCtm5rjrV+Sn3c6vSmkhOaQdO3MD1H1D/KuTsQH6NsxqFSElW5U3j5qW5cNwSGtmq3KSXEgk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=lLkvFm22; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="lLkvFm22" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336243; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=AofTeM9rzRb3Y51PR5zzdSLZE+dFDRAZqaps3o2hj6g=; b=lLkvFm229mLzQjJtQzLIywMtrv1W3GLcKSZ2rNjfYNeB0jku+lLJtcVxDN1RiZ7dYRhlAG xxunFHHFU7Yu0iPrfd82omJ247Ti24r6mysFBzV3Far1OMJF48Ja/9goxB5AAJHNzJ4ysh 7kKrKa9uPGbkImacrl2FWg+wCT8qFCKDXYYCUWzWpae3hlo3Uij6D7+Hy4BRSSdr5YE/xo E3lLETxV2O9X4vQz/bMz49fJuiBS7OoF6sCUc2zQSq3l0FVviRuSErK94UX/X13/pYbeGD 5RoJvN3FrKQcB4wWdciqTUBY4ghlJ9mektmcGTce72gXQNGTCg2Bw28Ac+rP8Q== To: , , , , , , , , , Cc: , , , , , In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> Date: Thu, 21 May 2026 12:01:21 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable References: <20260521040124.10608-1-lizhe.67@bytedance.com> From: "Li Zhe" X-Original-From: Li Zhe X-Lms-Return-Path: Subject: [PATCH v2 4/7] mm: extend the template fast path to zone-device compound tails Message-Id: <20260521040124.10608-5-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 Content-Type: text/plain; charset="utf-8" The template fast path from the previous patch only accelerates head pages. Compound tails in memmap_init_compound() still go through the slow path one by one. Build separate head and tail templates and reuse one prepared tail template across the tail pages in a compound range. Head pages keep the zone_device_page_init_refcount() policy, while compound tails always start with a refcount of 0 after prep_compound_tail(). This extends the template-copy fast path to pfns_per_compound > 1 without changing the existing slow path. Tested in a VM with a 100 GB devdax namespace (align=3D2097152) on Intel Ice Lake server. This test exercises the dax_pmem rebind path and measures memmap initialization latency. Test procedure: Unbind and rebind the dax_pmem driver 30 times, collect memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding: 1515 ms Average of subsequent rebinds: 313.45 ms With patches 1-4 applied: First binding: 1422 ms Average of subsequent rebinds: 256.56 ms This reduces the average rebind time from 313.45 ms to 256.56 ms, or about 18.1%. Signed-off-by: Li Zhe Reviewed-by: Mike Rapoport (Microsoft) --- mm/mm_init.c | 51 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 2992711351a0..17a84d4cda01 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1084,17 +1084,25 @@ static inline bool zone_device_page_init_optimizati= on_enabled(void) IS_ALIGNED(sizeof(struct page), sizeof(u64)); } =20 -static inline void zone_device_template_page_init(struct page *template, - unsigned long pfn, - unsigned long zone_idx, - int nid, - struct dev_pagemap *pgmap) +static inline void zone_device_template_head_page_init(struct page *templa= te, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) { __zone_device_page_init(template, pfn, zone_idx, nid, pgmap); if (!zone_device_page_init_refcount(pgmap)) set_page_count(template, 0); } =20 +static inline void zone_device_template_tail_page_init(struct page *templa= te, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap, const struct page *head, + unsigned int order) +{ + __zone_device_page_init(template, pfn, zone_idx, nid, pgmap); + prep_compound_tail(template, head, order); + set_page_count(template, 0); +} + /* * The copied template already provides the PFN-invariant portion of a * ZONE_DEVICE struct page. Fix up the fields that still depend on @pfn @@ -1144,10 +1152,12 @@ static void __ref memmap_init_compound(struct page = *head, unsigned long head_pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap, - unsigned long nr_pages) + unsigned long nr_pages, + bool use_template) { unsigned long pfn, end_pfn =3D head_pfn + nr_pages; unsigned int order =3D pgmap->vmemmap_shift; + struct page template; =20 /* * We have to initialize the pages, including setting up page links. @@ -1156,12 +1166,28 @@ static void __ref memmap_init_compound(struct page = *head, * the pages in the same go. */ __SetPageHead(head); + + /* + * A tail template can be reused for all tail pages in the same compound = page + * because shared state for compound tails is pre-set by prep_compound_ta= il(). + * The per-page page->virtual and section in flags are fixed up after cop= ying. + */ + if (use_template) + zone_device_template_tail_page_init(&template, head_pfn + 1, + zone_idx, nid, pgmap, + head, order); + for (pfn =3D head_pfn + 1; pfn < end_pfn; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 - zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); - prep_compound_tail(page, head, order); - set_page_count(page, 0); + if (use_template) { + zone_device_page_init_from_template(page, pfn, + &template); + } else { + zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); + prep_compound_tail(page, head, order); + set_page_count(page, 0); + } } prep_compound_head(head, order); } @@ -1195,8 +1221,8 @@ void __ref memmap_init_zone_device(struct zone *zone, } =20 if (use_template) - zone_device_template_page_init(&template, start_pfn, zone_idx, - nid, pgmap); + zone_device_template_head_page_init(&template, start_pfn, + zone_idx, nid, pgmap); =20 for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); @@ -1212,7 +1238,8 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; =20 memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pgmap)); + compound_nr_pages(altmap, pgmap), + use_template); } =20 pr_debug("%s initialised %lu pages in %ums\n", __func__, --=20 2.20.1 From nobody Sun May 24 21:39:23 2026 Received: from va-2-115.ptr.blmpb.com (va-2-115.ptr.blmpb.com [209.127.231.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07DC7368D72 for ; Thu, 21 May 2026 04:04:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336279; cv=none; b=I2vrthcHX1bbnkOhJUuYLGpkbrdJQTqZXZdIrOrF6ai21QpB6WaIlv3XLL7fznFc6YMMI4uJ9VFrqDDJ9y/WfVNSLRwl3QGqp5AY+VeIkylpf1s49QmEw4wWmCsvvqHmlWYcfNwNYXt5vtAzJzRetvURgySoBaD+al/VAB83tlE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336279; c=relaxed/simple; bh=zG3UBy3bN51MuM7g5fnOAGrmA/YDR+ftiY9OAsc6nXc=; h=Subject:Message-Id:References:From:Content-Type:Cc:To: Mime-Version:In-Reply-To:Date; b=o0RRS+TAjYwtjxO3j2WoCN0Y+Sslv84h/P4qvUsnfaoOkDzCYvmCslbhtDv7kBewiVjRVDtoZU98M5UdVa85PLTRmHMq9OMYNbndFN8LDeRN2h7cnk8acnQd5KXT6QspXtq1Gp2yGqLAj5m4gQ02WCyH+eypYu+u8IesMx+23hI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=EA4VpO6r; arc=none smtp.client-ip=209.127.231.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="EA4VpO6r" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336266; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=r9TVpMbLVKEQZRuZT9zKl5Z3AjyKm+f5ptKPy8P2UEQ=; b=EA4VpO6ra8HpDmB1WeSCujlUe50Ak9dOdMDG7jb8NQvtDGoNMzLP45SkTgr4nE9UMNWIv4 ddNKS2KrGi3t6ejQAr/YT346Y8HNUKwJjn1G6xQdwKaoIp5l3vukfZyo5YXTwSoQgKTUfh SRRur+i8ySagziSCUlKdbD4P3TMsDkGrpokbhq3zZIUm6IRapXjZalCtjmHvyn40069Eyi FlUeQpSrvUzmfNwvJq4Xq2my4rcq1CIZOeGFXmuwiiPSsrK+BD/VnOxsWiaQzyS6kqFlpn 8Ifc3Sx+lDTAfyiNoG1LI2XHVp1mMX0qGM1KGBoIrTWJaMUnkp7cVk0wdPT5rQ== Subject: [PATCH v2 5/7] string: introduce memcpy_streaming() helpers Message-Id: <20260521040124.10608-6-lizhe.67@bytedance.com> References: <20260521040124.10608-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 X-Original-From: Li Zhe From: "Li Zhe" X-Lms-Return-Path: Cc: , , , , , Content-Transfer-Encoding: quoted-printable To: , , , , , , , , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> Date: Thu, 21 May 2026 12:01:22 +0800 Content-Type: text/plain; charset="utf-8" Introduce a generic memcpy_streaming() interface for write-once copy sites that can fall back to memcpy() when no architecture-specific optimization is available. Add memcpy_streaming_drain() alongside it so callers can separate the copy primitive from any required ordering point. On x86, wire the helper to memcpy_flushcache() and sfence so common code can request a streaming copy without embedding x86-specific movnti details. Callers are responsible for invoking memcpy_streaming_drain() before later normal stores that must be ordered after the streaming copy. Signed-off-by: Li Zhe --- arch/x86/include/asm/string_64.h | 13 +++++++++++++ include/linux/string.h | 18 ++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string= _64.h index 4635616863f5..15504b844f1e 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -100,6 +100,19 @@ static __always_inline void memcpy_flushcache(void *ds= t, const void *src, size_t } __memcpy_flushcache(dst, src, cnt); } + +#define __HAVE_ARCH_MEMCPY_STREAMING 1 +static __always_inline void memcpy_streaming(void *dst, const void *src, + size_t cnt) +{ + memcpy_flushcache(dst, src, cnt); +} + +static __always_inline void memcpy_streaming_drain(void) +{ + asm volatile("sfence" : : : "memory"); +} + #endif =20 #endif /* __KERNEL__ */ diff --git a/include/linux/string.h b/include/linux/string.h index b850bd91b3d8..ba029be9c187 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -281,6 +281,24 @@ static inline void memcpy_flushcache(void *dst, const = void *src, size_t cnt) } #endif =20 +#ifndef __HAVE_ARCH_MEMCPY_STREAMING +/* + * memcpy_streaming() is for write-once copy sites that may use + * non-temporal stores on some architectures. Callers must follow it + * with memcpy_streaming_drain() before later normal stores that need to + * be ordered after the streaming copy. Implementations that use regular + * cached stores can make the drain a no-op. + */ +static inline void memcpy_streaming(void *dst, const void *src, size_t cnt) +{ + memcpy(dst, src, cnt); +} + +static inline void memcpy_streaming_drain(void) +{ +} +#endif + void *memchr_inv(const void *s, int c, size_t n); char *strreplace(char *str, char old, char new); =20 --=20 2.20.1 From nobody Sun May 24 21:39:23 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D73CB368D72 for ; Thu, 21 May 2026 04:04:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336295; cv=none; b=c0iu7U+IRIjV+LtbBDA/azW7D79NJ2mHoHxMPDdoZTob9MlJfQKVnesy5tpREjHiBsHHYQDfy723zGAs6MTPXB4DzK6pLivcFi66r2WDwRE1GepnodWn1evEfqsR6cr0o0BmFQIYFkEpa3pVtOo6qB2bqhxDcyVhFcmDk4WA0DA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336295; c=relaxed/simple; bh=AmeVe5Hp2ztXa2WuxT9zyFU1DfzjU9AbLYRTNtjd4MU=; h=Subject:Date:To:References:Content-Type:Cc:Mime-Version: In-Reply-To:From:Message-Id; b=soEscjW+WxldMrnEUOiB0cgzRvIFrGOUAlnTRbgkRuo2RTICXoD3X0W8xn7DPnjMe8hbYkRzfA/AdXAybWYDMK/FVhOIiesz8B2Q5osjHFFrQT9fl4jM/WuDJITt6F6MV/bP5pXlh1X7d3BlDKIHW/tGgUIb8+bDk6KmADDXgUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=g97XaaPN; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="g97XaaPN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336287; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=sicZWKG3rnoRbRKtoOQyKoCtsso6gsJrl+5CfarRJY4=; b=g97XaaPNiPJ3647z+V3QwvP0CO4Yc0HqfORoTCIKvFrimoqzaQkV1bvtnTRKOh/jY23kMa mMgrYAo4Cf6oswf4s7OQR832ytKyoXWXboJ36Ckxnl+KGmG1IpwejnG3Ay8RBH3ibJDH9a uhcHAxmCuRn1i42TfFCt4ASTlbTFjopmx1V81JSU32X9yv3UKClJT4EW+56+nZ3YbrtbCS OmXAlr0yWRj0Swco3HPuGWGnwq7Xtto0I2NT8drOzpF57VDe8rsb9b+76M6GzqdXyPvZic NDpdMTKmVZp0FAmdibr58dne/Q2Jb5Xej/1xsiD0LuIud1gXtLDXQid5HTzXMg== Subject: [PATCH v2 6/7] x86/string: extend memcpy_flushcache() fixed-size fastpaths Date: Thu, 21 May 2026 12:01:23 +0800 X-Original-From: Li Zhe To: , , , , , , , , , References: <20260521040124.10608-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable Cc: , , , , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> From: "Li Zhe" Message-Id: <20260521040124.10608-7-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 Content-Type: text/plain; charset="utf-8" Small constant-sized flushcache copies currently fall back to __memcpy_flushcache() unless they are exactly 4, 8, or 16 bytes. Factor the existing inline movnti sequences into small helpers and extend the fixed-size fastpath coverage to 24..96 bytes. This keeps common struct-page-sized copies on the inline path for the upcoming memcpy_streaming() user, while still falling back to __memcpy_flushcache() for uncommon sizes. Signed-off-by: Li Zhe --- arch/x86/include/asm/string_64.h | 87 +++++++++++++++++++++++++++----- 1 file changed, 73 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string= _64.h index 15504b844f1e..94dc92f287f3 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -82,22 +82,81 @@ int strcmp(const char *cs, const char *ct); #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE #define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1 void __memcpy_flushcache(void *dst, const void *src, size_t cnt); -static __always_inline void memcpy_flushcache(void *dst, const void *src, = size_t cnt) + +static __always_inline void memcpy_flushcache_4(void *dst, const void *src) +{ + asm ("movntil %1, %0" : "=3Dm"(*(u32 *)dst) : "r"(*(u32 *)src)); +} + +static __always_inline void memcpy_flushcache_8(void *dst, const void *src) +{ + asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)dst) : "r"(*(u64 *)src)); +} + +static __always_inline void memcpy_flushcache_16(void *dst, const void *sr= c) +{ + memcpy_flushcache_8(dst, src); + memcpy_flushcache_8(dst + 8, src + 8); +} + +/* + * Keep common fixed-size copies on the inline movnti path instead of + * dropping into the generic helper. + */ +static __always_inline int memcpy_flushcache_small(void *dst, const void *= src, + size_t cnt) { - if (__builtin_constant_p(cnt)) { - switch (cnt) { - case 4: - asm ("movntil %1, %0" : "=3Dm"(*(u32 *)dst) : "r"(*(u32 *)src)); - return; - case 8: - asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)dst) : "r"(*(u64 *)src)); - return; - case 16: - asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)dst) : "r"(*(u64 *)src)); - asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)(dst + 8)) : "r"(*(u64 *)(src += 8))); - return; - } + switch (cnt) { + case 96: + memcpy_flushcache_16(dst + 80, src + 80); + fallthrough; + case 80: + memcpy_flushcache_16(dst + 64, src + 64); + fallthrough; + case 64: + memcpy_flushcache_16(dst + 48, src + 48); + fallthrough; + case 48: + memcpy_flushcache_16(dst + 32, src + 32); + fallthrough; + case 32: + memcpy_flushcache_16(dst + 16, src + 16); + fallthrough; + case 16: + memcpy_flushcache_16(dst, src); + return 1; + + case 88: + memcpy_flushcache_16(dst + 72, src + 72); + fallthrough; + case 72: + memcpy_flushcache_16(dst + 56, src + 56); + fallthrough; + case 56: + memcpy_flushcache_16(dst + 40, src + 40); + fallthrough; + case 40: + memcpy_flushcache_16(dst + 24, src + 24); + fallthrough; + case 24: + memcpy_flushcache_16(dst + 8, src + 8); + fallthrough; + case 8: + memcpy_flushcache_8(dst, src); + return 1; + + case 4: + memcpy_flushcache_4(dst, src); + return 1; } + + return 0; +} + +static __always_inline void memcpy_flushcache(void *dst, const void *src, = size_t cnt) +{ + if (__builtin_constant_p(cnt) && memcpy_flushcache_small(dst, src, cnt)) + return; __memcpy_flushcache(dst, src, cnt); } =20 --=20 2.20.1 From nobody Sun May 24 21:39:23 2026 Received: from va-2-112.ptr.blmpb.com (va-2-112.ptr.blmpb.com [209.127.231.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F076367299 for ; Thu, 21 May 2026 04:05:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336332; cv=none; b=q0uVyqiODbjIU9IP9OxiASE6VspINf04lT15QyTvloSSbDGhceQ3jePX/e/MS+3kT7EY1yfN025mkJPb5TfSN6QBXWn+AeDIShG4E3DUSJH9zmFz4sj1WzJ/8LfYcl/1YAtqI5oAy1+ZUYVI2uVYmiMgEvkyeq8B0w1Q8zV2mcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779336332; c=relaxed/simple; bh=pUB7L16CREF3X4+TChJSGa3hJsWqMaPT0mHLMKAQY2E=; h=Subject:References:In-Reply-To:To:Cc:Date:Message-Id:From: Mime-Version:Content-Type; b=N2k1wG6byhP6ZeUdj90Glczaa2aifJOgOPm31wZJg/nP+GFDLO1WZRBXhlunWsRjllkJoj0zKnatQobnm5DZjbDA6ZyHMBr0xVOyLkSTvU5LjJolHIH3iuAY1GRfIiekSLocgQyoENbFXPLIrHzM5591i45uwF+5hgZquW7ZV0U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=H9E+oWZM; arc=none smtp.client-ip=209.127.231.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="H9E+oWZM" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779336325; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=6yOkcnsUQzUvHuhs1BUNozB1nWiAB67h4IULUSq8t6Y=; b=H9E+oWZMTtJppFi69MCqfOZ6fwGQY695lXdiU3sEfw9V5MUkc0G9mcrlmYaTTLo7wEGqNF 3ND1r4t0yQBSJpYt+CdgnoWsn0TN8LoQpsRAjNUfhy/8p/nZJUZHbroMbTJtxk/qirrGqv AdFjo1JmOncufKuCK9a430Hat3W4LNYESlN4sbp8CV0zaVzRoMj17VjI3popdg4LJihLpw KUJfu+eAADYp0Cx8xF1Eu591NUtQFwyMO/VwxaAefS7GsxviTUE7uRlfuiJgYsfYJh+241 eMg0QRTdxg8O++6K17wTh2xHDKX07zjmDhwND+2WjiFWOnH8kbMVqgpWBklYiw== Subject: [PATCH v2 7/7] mm: use memcpy_streaming() in zone-device template copies References: <20260521040124.10608-1-lizhe.67@bytedance.com> In-Reply-To: <20260521040124.10608-1-lizhe.67@bytedance.com> To: , , , , , , , , , X-Lms-Return-Path: Cc: , , , , , Date: Thu, 21 May 2026 12:01:24 +0800 Message-Id: <20260521040124.10608-8-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Original-From: Li Zhe From: "Li Zhe" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.45.2 Content-Type: text/plain; charset="utf-8" The template fast path still leaves the actual copy sequence up to the compiler. Use the streaming-copy helpers introduced in the previous patches for the ZONE_DEVICE template-copy path so common mm code can request a write-once copy primitive without embedding arch-specific store layout in the generic layer. ZONE_DEVICE memmap initialization is a write-once path: each struct page is populated once and is not expected to be reused from cache immediately afterwards. A regular cached copy can therefore incur write-allocate traffic and pollute the cache without much benefit. Using memcpy_streaming() lets this path use an architecture-optimized streaming copy where available, while still degrading to memcpy() on architectures that do not provide a specialized implementation. Update the PFN-dependent section bits and page->virtual state in the reusable template before each copy instead of patching the destination page afterwards. This keeps the hot path as a single streaming copy for the common case and avoids post-copy normal stores to cachelines that were just written through the streaming path. Keep pageblock-aligned PFNs on memcpy() so pageblock initialization can immediately read back page metadata without introducing a read-after-streaming dependency. When the streaming backend uses non-temporal stores, order them before entering memmap_init_compound(), before prep_compound_head() updates the overlapping compound metadata, and before returning from memmap_init_zone_device(). Keep sanitized builds on the slow path so KASAN/KMSAN retain their instrumented stores. Tested in a VM with a 100 GB fsdax namespace device configured with map=3Ddev and a 100 GB devdax namespace (align=3D2097152) on Intel Ice Lake server. Test procedure: Rebind the nd_pmem and dax_pmem driver 30 times and collect the memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding for nd_pmem driver: 1486 ms Average of subsequent rebinds: 273.52 ms First binding for dax_pmem driver: 1515 ms Average of subsequent rebinds: 313.45 ms With this series: First binding for nd_pmem driver: 1389 ms Average of subsequent rebinds: 111.08 ms First binding for dax_pmem driver: 1294 ms Average of subsequent rebinds: 110.24 ms This reduces the average rebind time by about 59.4% for nd_pmem and 64.8% for dax_pmem. Signed-off-by: Li Zhe --- mm/mm_init.c | 83 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 60 insertions(+), 23 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 17a84d4cda01..08feb24795b8 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1075,13 +1075,15 @@ static void __ref zone_device_page_init_slow(struct= page *page, static inline bool zone_device_page_init_optimization_enabled(void) { /* - * The template fast path copies a preinitialized struct page as an - * array of u64 words. Skip it when the page_ref_set tracepoint is - * enabled, and fall back to the slow path if struct page is not an - * integral number of u64 words. + * The template fast path copies a preinitialized struct page from + * a reusable template. Keep sanitized builds on the slow path so + * their instrumented stores remain intact, skip the fast path when + * the page_ref_set tracepoint is enabled, and fall back if + * struct page is not an integral number of u64 words. */ - return !page_ref_tracepoint_active(page_ref_set) && - IS_ALIGNED(sizeof(struct page), sizeof(u64)); + return !IS_ENABLED(CONFIG_KASAN) && !IS_ENABLED(CONFIG_KMSAN) && + !page_ref_tracepoint_active(page_ref_set) && + IS_ALIGNED(sizeof(struct page), sizeof(u64)); } =20 static inline void zone_device_template_head_page_init(struct page *templa= te, @@ -1104,30 +1106,42 @@ static inline void zone_device_template_tail_page_i= nit(struct page *template, } =20 /* - * The copied template already provides the PFN-invariant portion of a - * ZONE_DEVICE struct page. Fix up the fields that still depend on @pfn - * after the copy, namely the section bits and page->virtual when present. + * 'template' is a reusable page prototype rather than a strictly immutable + * object. Most ZONE_DEVICE fields stay constant across the pages covered = by + * the current template, but section bits and page->virtual may still depe= nd + * on the PFN. Refresh those PFN-dependent fields in the template before + * copying it into @page. */ -static inline void zone_device_page_init_finish(struct page *page, - unsigned long pfn) +static inline void zone_device_page_update_template(struct page *template, + unsigned long pfn) { - set_page_section_from_pfn(page, pfn); + set_page_section_from_pfn(template, pfn); #ifdef WANT_PAGE_VIRTUAL if (!is_highmem_idx(ZONE_DEVICE)) - set_page_address(page, __va(pfn << PAGE_SHIFT)); + set_page_address(template, __va(pfn << PAGE_SHIFT)); #endif } =20 static void zone_device_page_init_from_template(struct page *page, - unsigned long pfn, const struct page *template) + unsigned long pfn, struct page *template) { - const u64 *src =3D (const u64 *)template; - u64 *dst =3D (u64 *)page; - unsigned int i; + /* + * 'template' carries the invariant portion of a ZONE_DEVICE struct + * page. Update the PFN-dependent fields in place before copying it + * to the destination page. + * + * pageblock-aligned pages immediately feed + * init_pageblock_migratetype(), which reads back page metadata via + * helpers like page_zone(page). Avoid a read-after-streaming + * dependency for these rare pages by using regular cached stores + * instead of non-temporal ones. + */ + zone_device_page_update_template(template, pfn); + if (unlikely(pageblock_aligned(pfn))) + memcpy(page, template, sizeof(*page)); + else + memcpy_streaming(page, template, sizeof(*page)); =20 - for (i =3D 0; i < sizeof(struct page) / sizeof(u64); i++) - dst[i] =3D src[i]; - zone_device_page_init_finish(page, pfn); zone_device_page_init_pageblock(page, pfn); } =20 @@ -1168,9 +1182,10 @@ static void __ref memmap_init_compound(struct page *= head, __SetPageHead(head); =20 /* - * A tail template can be reused for all tail pages in the same compound = page - * because shared state for compound tails is pre-set by prep_compound_ta= il(). - * The per-page page->virtual and section in flags are fixed up after cop= ying. + * All tails of the same compound page share the state established by + * prep_compound_tail(). Reuse one tail template for the whole range + * and refresh only the PFN-dependent fields in that template before + * each copy. */ if (use_template) zone_device_template_tail_page_init(&template, head_pfn + 1, @@ -1189,6 +1204,15 @@ static void __ref memmap_init_compound(struct page *= head, set_page_count(page, 0); } } + + /* + * prep_compound_head() updates compound metadata in struct folio fields + * that alias the first tail-page descriptors. When the tail pages above + * were populated with non-temporal stores, order those writes before the + * overlapping metadata updates below. + */ + if (use_template) + memcpy_streaming_drain(); prep_compound_head(head, order); } =20 @@ -1237,10 +1261,23 @@ void __ref memmap_init_zone_device(struct zone *zon= e, if (pfns_per_compound =3D=3D 1) continue; =20 + /* + * Compound-head setup immediately updates head->flags, so make + * the streaming template copy visible before entering + * memmap_init_compound(). + */ + if (use_template) + memcpy_streaming_drain(); + memmap_init_compound(page, pfn, zone_idx, nid, pgmap, compound_nr_pages(altmap, pgmap), use_template); } + /* + * Drain any remaining non-temporal stores before returning. + */ + if (use_template) + memcpy_streaming_drain(); =20 pr_debug("%s initialised %lu pages in %ums\n", __func__, nr_pages, jiffies_to_msecs(jiffies - start)); --=20 2.20.1