From nobody Fri Jun 12 12:46:29 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 138003FA5C2 for ; Fri, 15 May 2026 08:21:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778833294; cv=none; b=JIreJXc6/p9XwVYfTA8PV63LfE8ToW3TPGvfh4ku+U50LHnjzESbRwlr6r+l6K6ODqoalhni0m7rkgcHOMJ0kyoXEc9FIarB4/r4zEyuL+t4n/iqmZWvwHs7RTAQ7iNHtMLJnLQ1buoLWcnBIPJZuBJGY3srAa7am+Lor9DI72E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778833294; c=relaxed/simple; bh=VaEEj9mt4C7OaP2qTTmaeriUw8TC2vZBewDgfVYnEYM=; h=To:From:Date:In-Reply-To:Content-Type:Subject:References: Message-Id:Cc:Mime-Version; b=Xs0R6o/wOasyTx9iO3Tlibu1Y6Vbs7vc5oMby9SZzhv0lmB/sT1JlmyR5UCztlBqeUpO7PoFwwdRitrPPJXGrHvZMZ/NEl8DHQJ/w2hyBwsPo1Vei2aWo/JL+BrlRilYPB2gpA7m0ngoc3bZEYycqwJ1mrwfWtslBCD7C5CE+Q0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Znklo0rk; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Znklo0rk" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1778833281; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=Pn4tF8TLTowTahB1RqTQqb2NJ664ubG6BvZ5yr4OYEU=; b=Znklo0rkP4IKckRsKDfhmMAydgK9R8Oap6EsM4b3azWMVFOejBmK0WwjMqBb4OsVV3N/DY pmzSLsqXATJkuXgQUJ5Yrb2HeRbfCYVL82Hp64zuP0eOe4DMXiSCAgeAd5Eo0VijZ9yqzV ckstGZj56zrU/zn4VilF+VwJs0exs44IlKtG1XydK7NzqXg05scReIcGA+dCJ+JjW7vWPA DOYXWYqf7iRB+/r8zxGI0QWa4qExNGQ9h9lM1HRUzaZJG2XqqSDAH1MckCMhaTytabuFfL +NIRg5NdJ84cf5c48eZGvEzjau4TIFLW449OBLYYX61iae4ZokdWAin3F7R95A== To: , , , , , , , From: "Li Zhe" Date: Fri, 15 May 2026 16:20:42 +0800 In-Reply-To: <20260515082045.63029-1-lizhe.67@bytedance.com> Subject: [PATCH 1/4] mm: factor zone-device page init helpers out of __init_zone_device_page References: <20260515082045.63029-1-lizhe.67@bytedance.com> Message-Id: <20260515082045.63029-2-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 Content-Transfer-Encoding: quoted-printable Cc: , , , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: X-Original-From: Li Zhe Content-Type: text/plain; charset="utf-8" __init_zone_device_page() currently mixes three different jobs: deciding the initial page refcount, initializing the generic ZONE_DEVICE state, and setting up pageblock metadata. Split the refcount policy into zone_device_page_init_refcount() and move the generic page initialization into generic_init_zone_device_page(). This keeps the slow path behavior unchanged, but makes the individual pieces reusable by later fast-path patches. No functional change intended. Signed-off-by: Li Zhe --- mm/mm_init.c | 62 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 38 insertions(+), 24 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index f9f8e1af921c..5244acb96dbb 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -987,11 +987,36 @@ static void __init memmap_init(void) } =20 #ifdef CONFIG_ZONE_DEVICE -static void __ref __init_zone_device_page(struct page *page, unsigned long= pfn, - unsigned long zone_idx, int nid, - struct dev_pagemap *pgmap) +static inline int zone_device_page_init_refcount( + const struct dev_pagemap *pgmap) { + /* + * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released + * directly to the driver page allocator which will set the page count + * to 1 when allocating the page. + * + * MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have + * their refcount reset to one whenever they are freed (ie. after + * their refcount drops to 0). + */ + switch (pgmap->type) { + case MEMORY_DEVICE_FS_DAX: + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: + case MEMORY_DEVICE_PCI_P2PDMA: + return 0; + case MEMORY_DEVICE_GENERIC: + return 1; + default: + WARN_ONCE(1, "Unknown memory type!"); + return 1; + } +} =20 +static void __ref generic_init_zone_device_page(struct page *page, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ __init_single_page(page, pfn, zone_idx, nid); =20 /* @@ -1011,6 +1036,16 @@ static void __ref __init_zone_device_page(struct pag= e *page, unsigned long pfn, page_folio(page)->pgmap =3D pgmap; page->zone_device_data =3D NULL; =20 + if (!zone_device_page_init_refcount(pgmap)) + set_page_count(page, 0); +} + +static void __ref __init_zone_device_page(struct page *page, unsigned long= pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ + generic_init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -1025,27 +1060,6 @@ static void __ref __init_zone_device_page(struct pag= e *page, unsigned long pfn, init_pageblock_migratetype(page, MIGRATE_MOVABLE, false); cond_resched(); } - - /* - * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released - * directly to the driver page allocator which will set the page count - * to 1 when allocating the page. - * - * MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have - * their refcount reset to one whenever they are freed (ie. after - * their refcount drops to 0). - */ - switch (pgmap->type) { - case MEMORY_DEVICE_FS_DAX: - case MEMORY_DEVICE_PRIVATE: - case MEMORY_DEVICE_COHERENT: - case MEMORY_DEVICE_PCI_P2PDMA: - set_page_count(page, 0); - break; - - case MEMORY_DEVICE_GENERIC: - break; - } } =20 /* --=20 2.20.1 From nobody Fri Jun 12 12:46:29 2026 Received: from va-2-114.ptr.blmpb.com (va-2-114.ptr.blmpb.com [209.127.231.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14202402BA8 for ; Fri, 15 May 2026 08:35:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778834142; cv=none; b=Zqc4259gfUq0lLeoU5mUA/Pjr6UL31Xkc6/Uy5oYyOjB0z/+sBdTC5Tl+dxX+vIYKJEfshDclrhjDVFIq1O2GxD0VTFHY0RHMrRdZvNzBeF9+8rBU0DhzA+gx5MLOXAyb1oGa62SJE04IczTEscGS4sxGSm2n3+1/SKL+9ygVUk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778834142; c=relaxed/simple; bh=J3EgBO/5LYc9AQPMmRRsFll7CbgJo+wX0KbyuXE8Vyc=; h=From:Message-Id:Mime-Version:To:Cc:Subject:Date:In-Reply-To: References:Content-Type; b=UIaOExVKZuPPLSc56xAWTVaq3sm5P/KmdTZnP/6esdwGmp40W87AV3e6rcX+QZpuK9v/lpeb7+t/vVNE0z7mtd1iUiczPs+VRf9fKdWa4n9C0JxoDexRA79mIWoLfDwSXgFKde4DAR5OqW2cGB8iFVrWzrAxh6a23MZcVKtBe5Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ikEngAOe; arc=none smtp.client-ip=209.127.231.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ikEngAOe" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1778833296; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=KYb1xv6fWKLs8dME6r+hxjVFSc26mPwlUU+4YrBCihU=; b=ikEngAOel18imdYLdMQmxTJ4kLHkc6P/0fmNIJB/ybRYYrnA6WDfFwMInw/H4JeSoPA1Jg h72QGwtYr5a7SQ6LHiPs+te8719/kHvTYdHJoRgtRUjfU2mt7bNVik49NYuRnI0+ccyP+F gll7JjgEC05N/nE3pBFnccLfoac+42eAzJwygxsiyba//hTywmbDNoVcBX3UVbK2ofRgYU QuOqyhVXd5EaBbb8zfBvr/1rH83MUJeoJ9qbnCj4WnEUcYMrbiLDX3nH4yOU1qu6AGCioC uBw+Gxj6ZZooBlrhvGwEB/34tJt7to3DiupxpnFwINTyRRhYblT2GJIzQg4B2A== From: "Li Zhe" Message-Id: <20260515082045.63029-3-lizhe.67@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 To: , , , , , , , Cc: , , , , X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable Subject: [PATCH 2/4] mm: add a template-based fast path for zone-device page init Date: Fri, 15 May 2026 16:20:43 +0800 In-Reply-To: <20260515082045.63029-1-lizhe.67@bytedance.com> X-Original-From: Li Zhe X-Mailer: git-send-email 2.45.2 References: <20260515082045.63029-1-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" On 64-bit builds, memmap_init_zone_device() spends most of its time repeating the same struct page initialization for every PFN. Prepare a template page through the existing slow path once, then copy that template into each destination page and fix up the PFN-dependent state afterwards. Keep the optimized path disabled when the page_ref_set tracepoint is active, because the template-copy path bypasses set_page_count() and would otherwise hide the corresponding trace event. Non-64-bit builds continue to use the existing slow path. Tested in a VM with a 100 GB fsdax namespace device configured with map=3Ddev on Intel Ice Lake server. This test exercises the nd_pmem rebind path (pfns_per_compound =3D=3D 1). Test procedure: Rebind the nd_pmem driver 30 times and collect the memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding: 1486 ms Average of subsequent rebinds: 273.52 ms With this patch: First binding: 1421 ms Average of subsequent rebinds: 246.14 ms This reduces the average rebind time from 273.52 ms to 246.14 ms, or about 10%. Signed-off-by: Li Zhe --- mm/mm_init.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 96 insertions(+), 7 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 5244acb96dbb..4c475c71a9d6 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1013,7 +1013,7 @@ static inline int zone_device_page_init_refcount( } } =20 -static void __ref generic_init_zone_device_page(struct page *page, +static void __ref generic_init_zone_device_page_slow(struct page *page, unsigned long pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap) { @@ -1040,12 +1040,9 @@ static void __ref generic_init_zone_device_page(stru= ct page *page, set_page_count(page, 0); } =20 -static void __ref __init_zone_device_page(struct page *page, unsigned long= pfn, - unsigned long zone_idx, int nid, - struct dev_pagemap *pgmap) +static void __ref zone_device_page_init_pageblock(struct page *page, + unsigned long pfn) { - generic_init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -1062,6 +1059,88 @@ static void __ref __init_zone_device_page(struct pag= e *page, unsigned long pfn, } } =20 +static inline void __init_zone_device_page(struct page *page, unsigned lon= g pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ + generic_init_zone_device_page_slow(page, pfn, zone_idx, nid, pgmap); + zone_device_page_init_pageblock(page, pfn); +} + +#if BITS_PER_LONG =3D=3D 64 +static inline bool zone_device_page_init_optimization_enabled(void) +{ + /* + * We use template pages and assign page->_refcount via memory copy. + * This means the optimized path bypasses set_page_count(), so the + * page_ref_set tracepoint cannot observe this initialization. + * Skip the optimized path when the tracepoint is enabled. + */ + return !page_ref_tracepoint_active(page_ref_set); +} + +static inline void struct_page_layout_check(void) +{ + BUILD_BUG_ON(sizeof(struct page) & (sizeof(u64) - 1)); +} + +static inline void init_template_page(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap) +{ + generic_init_zone_device_page_slow(template, pfn, zone_idx, nid, pgmap); +} + +/* + * Initialize parts that differ from the template + */ +static inline void generic_init_zone_device_page_finish(struct page *page, + unsigned long pfn) +{ +#ifdef SECTION_IN_PAGE_FLAGS + set_page_section(page, pfn_to_section_nr(pfn)); +#endif +#ifdef WANT_PAGE_VIRTUAL + if (!is_highmem_idx(ZONE_DEVICE)) + set_page_address(page, __va(pfn << PAGE_SHIFT)); +#endif +} + +static void init_zone_device_page_from_template(struct page *page, + unsigned long pfn, const struct page *template) +{ + const u64 *src =3D (const u64 *)template; + u64 *dst =3D (u64 *)page; + unsigned int i; + + for (i =3D 0; i < sizeof(struct page) / sizeof(u64); i++) + dst[i] =3D src[i]; + generic_init_zone_device_page_finish(page, pfn); + zone_device_page_init_pageblock(page, pfn); +} +#else +static inline bool zone_device_page_init_optimization_enabled(void) +{ + return false; +} +static inline void init_template_page(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap) +{ +} +static inline void struct_page_layout_check(void) +{ +} +static void init_zone_device_page_from_template(struct page *page, + unsigned long pfn, const struct page *template) +{ +} +#endif + /* * With compound page geometry and when struct pages are stored in ram most * tail pages are reused. Consequently, the amount of unique struct pages = to @@ -1110,6 +1189,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long nr_pages, struct dev_pagemap *pgmap) { + bool use_template =3D zone_device_page_init_optimization_enabled(); unsigned long pfn, end_pfn =3D start_pfn + nr_pages; struct pglist_data *pgdat =3D zone->zone_pgdat; struct vmem_altmap *altmap =3D pgmap_altmap(pgmap); @@ -1117,6 +1197,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long zone_idx =3D zone_idx(zone); unsigned long start =3D jiffies; int nid =3D pgdat->node_id; + struct page template; =20 if (WARN_ON_ONCE(!pgmap || zone_idx !=3D ZONE_DEVICE)) return; @@ -1131,10 +1212,18 @@ void __ref memmap_init_zone_device(struct zone *zon= e, nr_pages =3D end_pfn - start_pfn; } =20 + if (use_template) { + struct_page_layout_check(); + init_template_page(&template, start_pfn, zone_idx, nid, pgmap); + } + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); =20 - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + if (use_template) + init_zone_device_page_from_template(page, pfn, &template); + else + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); =20 if (pfns_per_compound =3D=3D 1) continue; --=20 2.20.1 From nobody Fri Jun 12 12:46:29 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 067953E3DAF for ; Fri, 15 May 2026 08:22:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778833322; cv=none; b=lu3OZP/Us+V5ZD4XQS7dNM9mkG9vTs0L0zlhQ76bbTBQbEVHNZbgi04PLzeyjA/m8CUxX2U1m+C+mw7x2vMpswpCd7Dai/a6dtJVpC3oesbyhYxOnCoJFhnr9Vn2dPDqC0e6HW1f3KEpWs7Z6YatV09rtztOkJs/QULhXx+hlWQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778833322; c=relaxed/simple; bh=EHy7Y6yPbAaCC7X9iIBaO18b4WkZQLBPwR5SiJ305uA=; h=Subject:Message-Id:Mime-Version:Content-Type:Cc:From:References: To:In-Reply-To:Date; b=aNALzjELhoJAL31TkAAdLL+F60PzgJUhdkp09yVQQdLL0+g7O0mhFqVY9TVZWFWQzt0mm7MMhmA3gpix+HB+Sjqm+iL+r9BlH41mLFj3gD0QcpCye+sPDWBPkCtCdyWALjV3T+idA9LlGimNIAse6InwHUoFPyS9RP5Kpzgj6kA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ZownNw4E; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ZownNw4E" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1778833311; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=HYCBjnYEKsIsIlLg+PNgaRy21lWqkR++g+DLMmjalIQ=; b=ZownNw4E13Cc0p/oxGHEh3k98nfl4IUBsVpQt5EySV10B7Qd1SmchQIMegOtgHFkB9fRqV ytvY1aIaYzD0xM/93Esv2zZ0ha3Yq1C5TRD4IVlAMOFSJQcNlLWOb9TtdinTp5xozvLUpv i8XSWafUORMqNhqaNfzxgfrRzhmPePpo1I7QGeiNva/NKAEpBHHV/yKSwxZCmCN9WEsrgt ps60DNlHiwfzcS7ftJzcbEv/J4ecv8BL+vCxR5P/W9HPJcCec4Z0poB3rMpuLujMmMD7Ao gvNllXB3YhXudZnhkSAyXJC0A8pygXDckvdE62PzjPrxd+JGlryc37StJ7sFvQ== Subject: [PATCH 3/4] mm: extend the template fast path to zone-device compound tails Message-Id: <20260515082045.63029-4-lizhe.67@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Cc: , , , , From: "Li Zhe" References: <20260515082045.63029-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable To: , , , , , , , X-Mailer: git-send-email 2.45.2 X-Original-From: Li Zhe In-Reply-To: <20260515082045.63029-1-lizhe.67@bytedance.com> Date: Fri, 15 May 2026 16:20:44 +0800 X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" The template-based fast path currently only accelerates head-page initialization in memmap_init_zone_device(). Compound tails still go through the slow path one by one in memmap_init_compound(). Add separate head and tail template builders and reuse a prepared tail template for all tail pages in the same compound range. Move the head-page refcount handling out of generic_init_zone_device_page_slow() so the two template builders can set their different initial refcount states explicitly: head pages follow zone_device_page_init_refcount(), while compound tails always start with a refcount of 0. This extends the template-copy fast path to pfns_per_compound > 1 without changing the existing slow path. Tested in a VM with a 100 GB devdax namespace (align=3D2097152) on Intel Ice Lake server. This test exercises the dax_pmem rebind path and measures memmap initialization latency. Test procedure: Unbind and rebind the dax_pmem driver, collect memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding: 1515 ms Average of subsequent rebinds: 313.45 ms With this patch First binding: 1425 ms Average of subsequent rebinds: 255.47 ms This reduces the average rebind time from 313.45 ms to 255.47 ms, or about 20%. Signed-off-by: Li Zhe --- mm/mm_init.c | 80 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 61 insertions(+), 19 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 4c475c71a9d6..5a9e6ecfa894 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1035,9 +1035,6 @@ static void __ref generic_init_zone_device_page_slow(= struct page *page, */ page_folio(page)->pgmap =3D pgmap; page->zone_device_data =3D NULL; - - if (!zone_device_page_init_refcount(pgmap)) - set_page_count(page, 0); } =20 static void __ref zone_device_page_init_pageblock(struct page *page, @@ -1064,6 +1061,8 @@ static inline void __init_zone_device_page(struct pag= e *page, unsigned long pfn, struct dev_pagemap *pgmap) { generic_init_zone_device_page_slow(page, pfn, zone_idx, nid, pgmap); + if (!zone_device_page_init_refcount(pgmap)) + set_page_count(page, 0); zone_device_page_init_pageblock(page, pfn); } =20 @@ -1084,13 +1083,28 @@ static inline void struct_page_layout_check(void) BUILD_BUG_ON(sizeof(struct page) & (sizeof(u64) - 1)); } =20 -static inline void init_template_page(struct page *template, - unsigned long pfn, - unsigned long zone_idx, - int nid, - struct dev_pagemap *pgmap) +static inline void init_template_head_page(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap) +{ + generic_init_zone_device_page_slow(template, pfn, zone_idx, nid, pgmap); + if (!zone_device_page_init_refcount(pgmap)) + set_page_count(template, 0); +} + +static inline void init_template_tail_page(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap, + const struct page *head, + unsigned int order) { generic_init_zone_device_page_slow(template, pfn, zone_idx, nid, pgmap); + prep_compound_tail(template, head, order); + set_page_count(template, 0); } =20 /* @@ -1125,11 +1139,11 @@ static inline bool zone_device_page_init_optimizati= on_enabled(void) { return false; } -static inline void init_template_page(struct page *template, - unsigned long pfn, - unsigned long zone_idx, - int nid, - struct dev_pagemap *pgmap) +static inline void init_template_head_page(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap) { } static inline void struct_page_layout_check(void) @@ -1139,6 +1153,15 @@ static void init_zone_device_page_from_template(stru= ct page *page, unsigned long pfn, const struct page *template) { } +static inline void init_template_tail_page(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap, + const struct page *head, + unsigned int order) +{ +} #endif =20 /* @@ -1162,10 +1185,12 @@ static void __ref memmap_init_compound(struct page = *head, unsigned long head_pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap, - unsigned long nr_pages) + unsigned long nr_pages, + bool use_template) { unsigned long pfn, end_pfn =3D head_pfn + nr_pages; unsigned int order =3D pgmap->vmemmap_shift; + struct page template; =20 /* * We have to initialize the pages, including setting up page links. @@ -1174,12 +1199,27 @@ static void __ref memmap_init_compound(struct page = *head, * the pages in the same go. */ __SetPageHead(head); + + /* + * A tail template can be reused for all tail pages in the same compound = page + * because shared state for compound tails is pre-set by prep_compound_ta= il(). + * The per-page page->virtual and section in flags are fixed up after cop= ying. + */ + if (use_template) + init_template_tail_page(&template, head_pfn + 1, zone_idx, nid, + pgmap, head, order); + for (pfn =3D head_pfn + 1; pfn < end_pfn; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - prep_compound_tail(page, head, order); - set_page_count(page, 0); + if (use_template) { + init_zone_device_page_from_template(page, pfn, + &template); + } else { + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + prep_compound_tail(page, head, order); + set_page_count(page, 0); + } } prep_compound_head(head, order); } @@ -1214,7 +1254,8 @@ void __ref memmap_init_zone_device(struct zone *zone, =20 if (use_template) { struct_page_layout_check(); - init_template_page(&template, start_pfn, zone_idx, nid, pgmap); + init_template_head_page(&template, start_pfn, zone_idx, + nid, pgmap); } =20 for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { @@ -1229,7 +1270,8 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; =20 memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pgmap)); + compound_nr_pages(altmap, pgmap), + use_template); } =20 pr_debug("%s initialised %lu pages in %ums\n", __func__, --=20 2.20.1 From nobody Fri Jun 12 12:46:29 2026 Received: from va-2-111.ptr.blmpb.com (va-2-111.ptr.blmpb.com [209.127.231.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B2DD346E46 for ; Fri, 15 May 2026 08:22:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778833338; cv=none; b=TIs+cEInxDVH+U/QwBJLBWSv0wtGn9ZwUw02OJRhySAJvSFDmRXmKf8YHglyy1nRa8K5/Gbgd6Wu/LCYOxA+8Ljb/1C88ihB6fUhtlDXWLuLOl+6mSVyod19ncZHUrcyq8bdbT1y7dZbU8kkDfPeqwIj2uTxcKoAxySNufOW3ys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778833338; c=relaxed/simple; bh=3Wh0IBiFlKD/4dq6zqC312lxfPw3T3OXDjTOCXnAaIM=; h=Date:Message-Id:Mime-Version:References:In-Reply-To:Subject:From: To:Content-Type:Cc; b=e6PN8Y4dno6i1Y6E5UXTJnMz6nDXzPWuWzUXpFef8AgOwWcD+/Y9v3Edb67SIM2Zhs5Ow3jpQFeOVdsU1qzLDfBVszlJzW9G6ZntLk8TMDZ+6bL5D1obj39qwHwZyamzcMhL9fAEQN7wpb/slqmi0fytWCoOdgjKZrP1z+3OPfg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=NY+3dFg0; arc=none smtp.client-ip=209.127.231.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="NY+3dFg0" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1778833326; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=/I1RhJJRl8LgLDwnRBLNgWvi4qB5F9lPUEH3scBIO9U=; b=NY+3dFg0p8H2udAxtk6wpcAH+UoszAVNuC79nSIc13C69Ck3iXoU2t+4s4uC0/6nuJ+2rS RZRasBqBp2iDgP0spE59BCmJw5quYEfyCnfcTGkn4q9q3KjUOxBT4t0+0XX4FxKFaz9Nts zgGMnZsoU59Ru6Zcgk6rw1tjKLvNiSefOAEC7fXLkofsVMGoIbMek8iWCczatn2/y+pGDe xPy0hy6Gtsv2Kjlp+8XrtA6py+m7/k0KpVEgffYIGQEVUEk7m18Rxbwu5TY0GxdMEUyXR5 PzFKvPew7ORgwDcRsvDx+s8XYifDGgnlh9Nvtl5N3Cl0eVJ8pjDmUvguQQpmgQ== Date: Fri, 15 May 2026 16:20:45 +0800 Message-Id: <20260515082045.63029-5-lizhe.67@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Li Zhe References: <20260515082045.63029-1-lizhe.67@bytedance.com> In-Reply-To: <20260515082045.63029-1-lizhe.67@bytedance.com> Subject: [PATCH 4/4] mm: use arch store helpers in zone-device template copies X-Mailer: git-send-email 2.45.2 Content-Transfer-Encoding: quoted-printable From: "Li Zhe" X-Lms-Return-Path: To: , , , , , , , Cc: , , , , Content-Type: text/plain; charset="utf-8" The template-based fast path still leaves the actual copy sequence up to the compiler. On x86-64 that can easily degrade back into a runtime copy loop in the hot path, which leaves performance on the table. Introduce arch_optimize_store_u64() and arch_optimize_store_drain(), with a generic fallback and an x86-64 MOVNTI/SFENCE implementation, and use them in the template copy path. Also open-code the word-at-a-time copy so the compiler emits fixed-offset stores for the hot path instead of a runtime loop. On x86-64, MOVNTI is a better fit for this write-once, streaming initialization pattern than normal cached stores. It reduces the write-allocate traffic and cache pollution that a regular store sequence would otherwise generate while filling large ranges of struct page. Refresh the PFN-dependent section bits and page->virtual state in the reusable template before each copy, instead of patching the destination page afterwards. This keeps the hot path as a fixed-offset store sequence and avoids post-copy normal stores to cachelines that were just written with non-temporal stores. Because non-temporal stores are not ordered against later normal stores, drain outstanding stores before memmap_init_compound() updates compound heads and before memmap_init_zone_device() returns. Disable the x86-64 override under KASAN or KMSAN so those builds keep their instrumented stores through the generic fallback. Tested in a VM with a 100 GB fsdax namespace device configured with map=3Ddev and a 100 GB devdax namespace (align=3D2097152) on Intel Ice Lake server. Test procedure: Rebind the nd_pmem and dax_pmem driver 30 times and collect the memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding for nd_pmem driver: 1486 ms Average of subsequent rebinds: 273.52 ms First binding for dax_pmem driver: 1515 ms Average of subsequent rebinds: 313.45 ms With this patch: First binding for nd_pmem driver: 1272 ms Average of subsequent rebinds: 104.59 ms First binding for dax_pmem driver: 1286 ms Average of subsequent rebinds: 116.93 ms This reduces the average rebind time by about 61.8% for nd_pmem and 62.7% for dax_pmem. Signed-off-by: Li Zhe --- arch/x86/include/asm/struct_page_init.h | 28 ++++++++ include/asm-generic/Kbuild | 1 + include/asm-generic/struct_page_init.h | 17 +++++ mm/mm_init.c | 89 +++++++++++++++++++++---- 4 files changed, 122 insertions(+), 13 deletions(-) create mode 100644 arch/x86/include/asm/struct_page_init.h create mode 100644 include/asm-generic/struct_page_init.h diff --git a/arch/x86/include/asm/struct_page_init.h b/arch/x86/include/asm= /struct_page_init.h new file mode 100644 index 000000000000..de8b4eab44de --- /dev/null +++ b/arch/x86/include/asm/struct_page_init.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_STRUCT_PAGE_INIT_H +#define _ASM_X86_STRUCT_PAGE_INIT_H + +#include +#include + +/* + * x86-64 guarantees SSE2, so MOVNTI and SFENCE are always available there. + * + * KASAN/KMSAN rely on compiler-instrumented stores. Keep the x86 override + * disabled for those configs and fall back to plain stores instead. + */ +#if defined(CONFIG_X86_64) && !defined(CONFIG_KASAN) && !defined(CONFIG_KM= SAN) +static __always_inline void arch_optimize_store_u64(u64 *dst, u64 val) +{ + asm volatile("movnti %1, %0" : "=3Dm"(*dst) : "r"(val)); +} + +static __always_inline void arch_optimize_store_drain(void) +{ + asm volatile("sfence" : : : "memory"); +} +#else +#include +#endif + +#endif /* _ASM_X86_STRUCT_PAGE_INIT_H */ diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 2c53a1e0b760..3a493fed6803 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -65,3 +65,4 @@ mandatory-y +=3D vermagic.h mandatory-y +=3D vga.h mandatory-y +=3D video.h mandatory-y +=3D word-at-a-time.h +mandatory-y +=3D struct_page_init.h diff --git a/include/asm-generic/struct_page_init.h b/include/asm-generic/s= truct_page_init.h new file mode 100644 index 000000000000..45a722103a51 --- /dev/null +++ b/include/asm-generic/struct_page_init.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_STRUCT_PAGE_INIT_H +#define _ASM_GENERIC_STRUCT_PAGE_INIT_H + +#include +#include + +static __always_inline void arch_optimize_store_u64(u64 *dst, u64 val) +{ + *dst =3D val; +} + +static __always_inline void arch_optimize_store_drain(void) +{ +} + +#endif /* _ASM_GENERIC_STRUCT_PAGE_INIT_H */ diff --git a/mm/mm_init.c b/mm/mm_init.c index 5a9e6ecfa894..a3211666ccd4 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -37,6 +37,7 @@ #include "shuffle.h" =20 #include +#include =20 #ifndef CONFIG_NUMA unsigned long max_mapnr; @@ -1078,9 +1079,21 @@ static inline bool zone_device_page_init_optimizatio= n_enabled(void) return !page_ref_tracepoint_active(page_ref_set); } =20 +/* + * The fast path copies struct page with fixed-offset u64 stores instead of + * a runtime loop. Keep that copy sequence in sync with the struct page + * layouts supported by this build. + * + * The sequence below requires struct page to be u64-aligned and currently + * handles layouts from 7 to 12 u64 words (56 to 96 bytes). If a future + * layout falls outside that range, fail the build so the store sequence is + * updated together with the layout change. + */ static inline void struct_page_layout_check(void) { BUILD_BUG_ON(sizeof(struct page) & (sizeof(u64) - 1)); + BUILD_BUG_ON(sizeof(struct page) < 56); + BUILD_BUG_ON(sizeof(struct page) > 96); } =20 static inline void init_template_head_page(struct page *template, @@ -1108,30 +1121,67 @@ static inline void init_template_tail_page(struct p= age *template, } =20 /* - * Initialize parts that differ from the template + * 'template' is a reusable page prototype rather than a strictly immutable + * object. Most ZONE_DEVICE fields stay constant across the pages covered = by + * the current template, but section bits and page->virtual may still depe= nd + * on the PFN. Refresh those PFN-dependent fields in the template before + * copying it into @page. */ -static inline void generic_init_zone_device_page_finish(struct page *page, - unsigned long pfn) +static inline void zone_device_page_update_template(struct page *template, + unsigned long pfn) { #ifdef SECTION_IN_PAGE_FLAGS - set_page_section(page, pfn_to_section_nr(pfn)); + set_page_section(template, pfn_to_section_nr(pfn)); #endif #ifdef WANT_PAGE_VIRTUAL if (!is_highmem_idx(ZONE_DEVICE)) - set_page_address(page, __va(pfn << PAGE_SHIFT)); + set_page_address(template, __va(pfn << PAGE_SHIFT)); #endif } =20 static void init_zone_device_page_from_template(struct page *page, - unsigned long pfn, const struct page *template) + unsigned long pfn, struct page *template) { const u64 *src =3D (const u64 *)template; u64 *dst =3D (u64 *)page; - unsigned int i; =20 - for (i =3D 0; i < sizeof(struct page) / sizeof(u64); i++) - dst[i] =3D src[i]; - generic_init_zone_device_page_finish(page, pfn); + /* + * 'template' carries the invariant portion of a ZONE_DEVICE struct + * page. Update the PFN-dependent fields in place before copying it + * to the destination page. + */ + zone_device_page_update_template(template, pfn); + + /* + * Keep the copy open-coded so the compiler emits fixed-offset stores + * for the hot path instead of a runtime copy loop. + */ + switch (sizeof(struct page)) { + case 96: + arch_optimize_store_u64(&dst[11], src[11]); + fallthrough; + case 88: + arch_optimize_store_u64(&dst[10], src[10]); + fallthrough; + case 80: + arch_optimize_store_u64(&dst[9], src[9]); + fallthrough; + case 72: + arch_optimize_store_u64(&dst[8], src[8]); + fallthrough; + case 64: + arch_optimize_store_u64(&dst[7], src[7]); + fallthrough; + case 56: + arch_optimize_store_u64(&dst[6], src[6]); + arch_optimize_store_u64(&dst[5], src[5]); + arch_optimize_store_u64(&dst[4], src[4]); + arch_optimize_store_u64(&dst[3], src[3]); + arch_optimize_store_u64(&dst[2], src[2]); + arch_optimize_store_u64(&dst[1], src[1]); + arch_optimize_store_u64(&dst[0], src[0]); + } + zone_device_page_init_pageblock(page, pfn); } #else @@ -1201,9 +1251,10 @@ static void __ref memmap_init_compound(struct page *= head, __SetPageHead(head); =20 /* - * A tail template can be reused for all tail pages in the same compound = page - * because shared state for compound tails is pre-set by prep_compound_ta= il(). - * The per-page page->virtual and section in flags are fixed up after cop= ying. + * All tails of the same compound page share the state established by + * prep_compound_tail(). Reuse one tail template for the whole range + * and refresh only the PFN-dependent fields in that template before + * each copy. */ if (use_template) init_template_tail_page(&template, head_pfn + 1, zone_idx, nid, @@ -1269,10 +1320,22 @@ void __ref memmap_init_zone_device(struct zone *zon= e, if (pfns_per_compound =3D=3D 1) continue; =20 + /* + * Compound-head setup immediately updates head->flags, so make + * the template copy visible before entering memmap_init_compound(). + */ + if (use_template) + arch_optimize_store_drain(); + memmap_init_compound(page, pfn, zone_idx, nid, pgmap, compound_nr_pages(altmap, pgmap), use_template); } + /* + * Drain any remaining non-temporal stores before returning. + */ + if (use_template) + arch_optimize_store_drain(); =20 pr_debug("%s initialised %lu pages in %ums\n", __func__, nr_pages, jiffies_to_msecs(jiffies - start)); --=20 2.20.1