From nobody Mon Jun 8 19:49:30 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50FE537FF46 for ; Wed, 27 May 2026 03:38:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853113; cv=none; b=U++oJthlztrBrF5r9BXhkjUZvOwab82i4F2AvxgIO0NYCupocmRSiRWrn6bVNnXxIwJlLSypp4z+nSvgGzxpOaXnDvm0Pv2vq7gntlMr9tR4MNmSsglS3XsSDgMOPdIiYxyz8HGclibUTGjtRnwrXXADpwxFcQhQMIib7RKtBSM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853113; c=relaxed/simple; bh=CKJrgkBXuJL76tNULqACYxiYPuJim8xWXoUjKVuwOWo=; h=Mime-Version:In-Reply-To:References:To:Subject:Date:Message-Id:Cc: From:Content-Type; b=dOyQ61thUqRcPD4aDAXAJM2tZWvLbe09p0xJW4RcWXHxdgLL1FTrPRLXdYqi1RGpKIUPFGnhUap/e+VCL7yJ9tXrbiih4FfzYBjQQdomOcq5gGCjTjxaT3/GJklYGk18pQ47w6Ur5bmtB+svELZy/jwANwYYHnWjk2OPGTOwKic= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=DF8GYFSG; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="DF8GYFSG" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853098; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=g/rmtPD1MKrWC7BjiewTdsAoLxWW7dzbcFZmhvRKNSg=; b=DF8GYFSG36lQNp3etCvNQwV+MLzas0UlhaS4SMYJRKA8c1VEcZC5GelEe/BjyDeV6kIUgp g3ylYPnjU59k5bnlRI6Y23Fm+AwC4Nqr4WyBufR22nGhL2Xyd7EEZ2eRvEZoVmhVuhzEBy oQgMMNZJ/DEecfsd5lAm5jhnIaJi93ExDdtzMMHoGCgJMzAhaBjM1uAErui6reMmeD7KCX /2bamP9NYkU2rzrMNXfg1A2pe5kdu9Vj4ad9+NBfJ4Ts9zI70Exv4yKvu+l5ueiPxcuDvs flvvW8fx+8GEskknjZj3SaVPj8MvebD91j0T3jr/f2+egZgzPL8O07xVA6yFNQ== Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.45.2 References: <20260527033636.28231-1-lizhe.67@bytedance.com> X-Lms-Return-Path: X-Original-From: Li Zhe To: , , , , , , , , , Subject: [PATCH v3 1/8] mm: fix stale ZONE_DEVICE refcount comment Date: Wed, 27 May 2026 11:36:29 +0800 Message-Id: <20260527033636.28231-2-lizhe.67@bytedance.com> Cc: , , , , , "Li Zhe" From: "Li Zhe" Content-Type: text/plain; charset="utf-8" The comment in __init_zone_device_page() still uses the old MEMORY_TYPE_* names and implies that FS_DAX pages regain a refcount of 1 in the free path. That no longer matches the code. Update the comment to describe the current policy correctly: MEMORY_DEVICE_GENERIC pages regain a refcount of 1 in the free path, while the remaining ZONE_DEVICE types start from 0 here and raise the count again when the allocator or driver hands the page out. No functional change intended. Signed-off-by: Li Zhe --- mm/mm_init.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index f9f8e1af921c..35de3b6a186d 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1027,13 +1027,9 @@ static void __ref __init_zone_device_page(struct pag= e *page, unsigned long pfn, } =20 /* - * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released - * directly to the driver page allocator which will set the page count - * to 1 when allocating the page. - * - * MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have - * their refcount reset to one whenever they are freed (ie. after - * their refcount drops to 0). + * MEMORY_DEVICE_GENERIC pages regain a refcount of 1 in the free + * path. The remaining ZONE_DEVICE types start from 0 here and raise + * the count again when the allocator or driver hands the page out. */ switch (pgmap->type) { case MEMORY_DEVICE_FS_DAX: --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D016E2D7DDD for ; Wed, 27 May 2026 03:38:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853134; cv=none; b=d1OsVhvm+LUnnlrOxs9La0+6DAqk5IiWxNE6kix/YrlkgJGcmnCXRoV/MvYJGiuObH9E2Rs9P9mS10qkXc3qH89B9h3ikfK7mUy12dWqN8i4cnNAblAqYFAD30wN5iZU+EE1vRpDT5zbKB8y93DgOSLETXixHZXvOoyoEDxLgiE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853134; c=relaxed/simple; bh=zEwM4ExvxDP39KFMRfs9oCMwfWnr8A/tcgD3Hi6W860=; h=Cc:References:Message-Id:Subject:Date:Mime-Version:In-Reply-To: Content-Type:To:From; b=k8f0IHSYmEJhRncb3QJuTD1oJEaS1cp975rQ93FT3PzyDlOdm1tO+M0bQqttK4zSzYtodjAzY40+bJWshogFIDx81ebjXL+/Opc2/XzvwRsdxmXu9mSbVPPi73Vg4lmon4N/PAM5pWZsRbgSy81zU9b/JSiKkpevnLbGc9uFs6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=neZhJ7m/; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="neZhJ7m/" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853122; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=PAoGwOogI0538Jjz3Jr5mPaMM02ePyByPYJiX1kFuRI=; b=neZhJ7m/rmzXpsdK/7NyQsm0Tyqm8pOxRJ21fCT9oMOcS4L7iqlXO0Nxhrc0V7VavhVyAe GwUkBQA5f66OnY0/zBq86VK6Qnwn/DYWUrh4lIryFRsEVRqLVHyTKx1VEDM3TJmdomuIKQ nB0c5FbiXbpal+2WX7Rq1AloKVMByqVMLVMw+42dZFa9Y1SmbEhBYktLgkbJRNYBtmlG0v BRsd1UirrTmEUwpxGDOimZZvjOOfm3G3+jR6COoa3LE1hbABUVSgD8chus7VccHzOuu2ZW xnSEjwyxcIMBMl47qzyu2T14Wv3X6AMlN+Avo/UGRT5GuqlR7C+calqK056fDA== Cc: , , , , , "Li Zhe" References: <20260527033636.28231-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable Message-Id: <20260527033636.28231-3-lizhe.67@bytedance.com> Subject: [PATCH v3 2/8] mm: factor zone-device page init helpers out of __init_zone_device_page Date: Wed, 27 May 2026 11:36:30 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.45.2 In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> X-Original-From: Li Zhe To: , , , , , , , , , From: "Li Zhe" X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" memmap_init_zone_device() currently mixes refcount policy, core ZONE_DEVICE page setup, and pageblock metadata handling in a single helper. Factor the refcount-reset predicate into pagemap_resets_refcount(), move the common page initialization into __zone_device_page_init(), split pageblock handling into zone_device_page_init_pageblock(), and wrap the existing slow path in zone_device_page_init_slow(). This keeps the slow-path behaviour unchanged and gives later patches reusable helper boundaries. No functional change intended. Signed-off-by: Li Zhe Reviewed-by: Mike Rapoport (Microsoft) --- mm/mm_init.c | 62 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 35de3b6a186d..2e5899c5cf35 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -987,11 +987,38 @@ static void __init memmap_init(void) } =20 #ifdef CONFIG_ZONE_DEVICE -static void __ref __init_zone_device_page(struct page *page, unsigned long= pfn, +/* + * Return true when the free path for this pagemap type restores the page + * refcount to 1, so memmap_init_zone_device() can keep the count set by + * __init_single_page(). Otherwise initialize the refcount to 0 and leave + * it to the allocator or pgmap callbacks to raise it when the page is + * handed out again. + */ +static inline bool pagemap_resets_refcount(const struct dev_pagemap *pgmap) +{ + /* + * MEMORY_DEVICE_GENERIC pages regain a refcount of 1 in the free + * path. The remaining ZONE_DEVICE types start from 0 here and raise + * the count again when the allocator or driver hands the page out. + */ + switch (pgmap->type) { + case MEMORY_DEVICE_FS_DAX: + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: + case MEMORY_DEVICE_PCI_P2PDMA: + return false; + case MEMORY_DEVICE_GENERIC: + return true; + default: + WARN_ONCE(1, "Unknown memory type!"); + return true; + } +} + +static void __ref __zone_device_page_init(struct page *page, unsigned long= pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap) { - __init_single_page(page, pfn, zone_idx, nid); =20 /* @@ -1010,7 +1037,11 @@ static void __ref __init_zone_device_page(struct pag= e *page, unsigned long pfn, */ page_folio(page)->pgmap =3D pgmap; page->zone_device_data =3D NULL; +} =20 +static void __ref zone_device_page_init_pageblock(struct page *page, + unsigned long pfn) +{ /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -1025,23 +1056,16 @@ static void __ref __init_zone_device_page(struct pa= ge *page, unsigned long pfn, init_pageblock_migratetype(page, MIGRATE_MOVABLE, false); cond_resched(); } +} =20 - /* - * MEMORY_DEVICE_GENERIC pages regain a refcount of 1 in the free - * path. The remaining ZONE_DEVICE types start from 0 here and raise - * the count again when the allocator or driver hands the page out. - */ - switch (pgmap->type) { - case MEMORY_DEVICE_FS_DAX: - case MEMORY_DEVICE_PRIVATE: - case MEMORY_DEVICE_COHERENT: - case MEMORY_DEVICE_PCI_P2PDMA: +static void __ref zone_device_page_init_slow(struct page *page, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ + __zone_device_page_init(page, pfn, zone_idx, nid, pgmap); + if (!pagemap_resets_refcount(pgmap)) set_page_count(page, 0); - break; - - case MEMORY_DEVICE_GENERIC: - break; - } + zone_device_page_init_pageblock(page, pfn); } =20 /* @@ -1080,7 +1104,7 @@ static void __ref memmap_init_compound(struct page *h= ead, for (pfn =3D head_pfn + 1; pfn < end_pfn; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); prep_compound_tail(page, head, order); set_page_count(page, 0); } @@ -1116,7 +1140,7 @@ void __ref memmap_init_zone_device(struct zone *zone, for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); =20 - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); =20 if (pfns_per_compound =3D=3D 1) continue; --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B487C2E975E for ; Wed, 27 May 2026 03:39:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853162; cv=none; b=rioNsgzK0/CMTVp62UQJnDxkqyDf9vgNRjpxkc5LY8vVmQhhqEVWfCN6kpvyFKeOfg+Z6Rw5LrctCjpArxUdumBUkpS5iLNQXy2psxPxvSgMkl0/bZJvE+FRakWPLosJQlFInK/3+5ST0JNfaOkQxAAAxFJe5OsLZet19LVaTnU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853162; c=relaxed/simple; bh=d0aczSMYu0LsWJWfpHcpC5eYLM8CWn7oG+3NwJ0kky8=; h=Subject:Message-Id:To:Content-Type:In-Reply-To:From:Date:Cc: References:Mime-Version; b=oLeLjUMH5pTsNNOW2U6AtWcgxcHNHOnfSm/3ZLV0ZPH6KEEohTUqiPcbfxEBs9X7OU1pRzcg+lEl12cOBNlKDE5pF0IY+gYPltbZzaViHdqHIB2130iQgTdtYUexCCQpeYvuw4IC0cqtDghjPHCU2yxUlstY2X3fZX8p6Kmc2Ko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XjjgQoyL; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XjjgQoyL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853145; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=+cxbxhZdlUYhyShSVcB1R4gZ5/57w7jfc0KexcGnvXs=; b=XjjgQoyLWWQRwtjh6asBScAkA0PK9toxFsBttGjSQcgZ4F9XO98k//o+68O3y2pHGXVkB/ 5K7HEiU46Gbz/1eE+Vye1TJUbGbKOymhOZ56PlwfiAQ4LbITsNPAUhefiM+P/sJM/cSL0K e9GDYrM65m1oJn3NeK8d1Xq8Tt8Yp4ce23/MPxQPmoOqsgK/UVlFFWgsQLrjhdbXorW5Ti XVGP56elmQ6qUiZsYmmeI7l+9FL/TJLGGSqfJCUQUN6ydG12yhFEUzUO4wI1x7V7F5CPfb wj8c/6YMrH5jlHA1FRssV1nkwLzSQ/xALh9/N/9xET8UOll52wJL1fSzepzz8A== Subject: [PATCH v3 3/8] mm: add a set_page_section_from_pfn() helper Message-Id: <20260527033636.28231-4-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Original-From: Li Zhe X-Lms-Return-Path: To: , , , , , , , , , In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> From: "Li Zhe" Date: Wed, 27 May 2026 11:36:31 +0800 Cc: , , , , , "Li Zhe" References: <20260527033636.28231-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Callers that want to update section bits from a PFN currently need to open-code: set_page_section(page, pfn_to_section_nr(pfn)); and guard that sequence with #ifdef SECTION_IN_PAGE_FLAGS. Add set_page_section_from_pfn() to wrap that update in one place. When section bits are stored in page flags, the helper derives the section number from the PFN and updates the page flags. Otherwise it degrades to a no-op. Convert set_page_links() to use the new helper so later ZONE_DEVICE fast-path patches can also update section bits without open-coding SECTION_IN_PAGE_FLAGS at each callsite. This keeps the PFN-to-section translation local to the configurations that actually store section bits in struct page flags, and avoids exposing that detail to generic callers. No functional change intended. Signed-off-by: Li Zhe Reviewed-by: Mike Rapoport (Microsoft) --- include/linux/mm.h | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index af23453e9dbd..bf84e698385c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2507,11 +2507,26 @@ static inline void set_page_section(struct page *pa= ge, unsigned long section) page->flags.f |=3D (section & SECTIONS_MASK) << SECTIONS_PGSHIFT; } =20 +static inline void set_page_section_from_pfn(struct page *page, + unsigned long pfn) +{ + set_page_section(page, pfn_to_section_nr(pfn)); +} + static inline unsigned long memdesc_section(memdesc_flags_t mdf) { return (mdf.f >> SECTIONS_PGSHIFT) & SECTIONS_MASK; } #else /* !SECTION_IN_PAGE_FLAGS */ +static inline void set_page_section(struct page *page, unsigned long secti= on) +{ +} + +static inline void set_page_section_from_pfn(struct page *page, + unsigned long pfn) +{ +} + static inline unsigned long memdesc_section(memdesc_flags_t mdf) { return 0; @@ -2734,9 +2749,7 @@ static inline void set_page_links(struct page *page, = enum zone_type zone, { set_page_zone(page, zone); set_page_node(page, node); -#ifdef SECTION_IN_PAGE_FLAGS - set_page_section(page, pfn_to_section_nr(pfn)); -#endif + set_page_section_from_pfn(page, pfn); } =20 /** --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1769133B6ED for ; Wed, 27 May 2026 03:39:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853176; cv=none; b=Nznz91YpgprybDnxm6xnkjc5oeipTa1ZblCbAVlcRb6nQGhSxTYL4TzbXdI7vB+27v7o0phKS4BUGSlyWhKORAdTQskPx66heQzWNsaFdHSho9kXTcW3nVVinFeWieIR7Uu6h5kC8tubswXM2Ph1ppgPpeZ5Sj9VxnAV2dx9huI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853176; c=relaxed/simple; bh=nWmNgDU3Xl1gL5b6TXRuY0AwEqhJMD4CdRVipNrdc6A=; h=Subject:Message-Id:Content-Type:References:Cc:Date:To:From: Mime-Version:In-Reply-To; b=Zz3FKtXcwy6JLUb54u1Gc4nobNCvyIzve8KGLrd0tbmzvpYcGjrIKJ3TOhyByOGBrZ4UBlr8/SxvXjKzTff5+urCtCFERpbr1KJRuTfXSQ/2GuSXC7YIo086Of0wJa4CGRgIph9Oa4aYSP+tNz2F71wJRB0OROLZgxJMA0IOkF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=YwdUF4vx; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="YwdUF4vx" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853170; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=KzkylVMxN4y+4R6jlaSHNSlINGBpPYCeLuqqe59IvVQ=; b=YwdUF4vxjEVMX3DVK9cFUXxI5I2jq/+n6ZCTldDEVU1vZQ7APxXxPD58Ss4VT1IMID7njM aQ3RpUkw1wXtTU7oAzFze26P6dCHoIBQ44klBbBvv6iL1JDKe3R8J7Y13uTbBE7tE7ZYs2 DhUPY09hXKDM5UB8B1GXm15mUZ0iixXuGY8tHCTiFAWUsYHUE9PeIZLo9C6oO5uBdlzTiz /7VqGcXOcDB66qNIfziSjqd52VFFLBF5317VSA+apSLkq+ZjPE83od1W1uoyqeBfsoiO5Q Y6BMWpzo6hVjMWrodwegLamrX0Jddpm9EAqxaW0JIteZ7YYmcQAAl7AZ3zSqag== Subject: [PATCH v3 4/8] mm: add a template-based fast path for zone-device page init Message-Id: <20260527033636.28231-5-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 References: <20260527033636.28231-1-lizhe.67@bytedance.com> Cc: , , , , , "Li Zhe" Date: Wed, 27 May 2026 11:36:32 +0800 X-Original-From: Li Zhe X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable To: , , , , , , , , , From: "Li Zhe" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" memmap_init_zone_device() repeats nearly identical head-page initialization for each PFN. Prepare one reusable ZONE_DEVICE head-page template through the existing slow path, refresh the PFN-dependent fields in that template before each copy, and memcpy it into each destination page. The optimized path assigns _refcount through the copied template, so keep it disabled when the page_ref_set tracepoint is enabled. This patch accelerates the pfns_per_compound =3D=3D 1 case. Compound tails are handled in the next patch. Tested in a VM with a 100 GB fsdax namespace device configured with map=3Ddev on Intel Ice Lake server. This test exercises the nd_pmem rebind path (pfns_per_compound =3D=3D 1). Test procedure: Rebind the nd_pmem driver 30 times and collect the memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding: 1486 ms Average of subsequent rebinds: 273.52 ms With this patch and its prerequisites applied: First binding: 1422 ms Average of subsequent rebinds: 246.65 ms This reduces the average rebind time from 273.52 ms to 246.65 ms, or about 10%. Signed-off-by: Li Zhe --- mm/mm_init.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 2e5899c5cf35..53c0241c66b7 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1068,6 +1068,56 @@ static void __ref zone_device_page_init_slow(struct = page *page, zone_device_page_init_pageblock(page, pfn); } =20 +static inline bool zone_device_page_init_optimization_enabled(void) +{ + /* + * The template fast path copies a preinitialized struct page image. + * Skip it when the page_ref_set tracepoint is enabled. + */ + return !page_ref_tracepoint_active(page_ref_set); +} + +static inline void zone_device_template_page_init(struct page *template, + unsigned long pfn, + unsigned long zone_idx, + int nid, + struct dev_pagemap *pgmap) +{ + __zone_device_page_init(template, pfn, zone_idx, nid, pgmap); + if (!pagemap_resets_refcount(pgmap)) + set_page_count(template, 0); +} + +/* + * 'template' is a reusable page prototype rather than a strictly immutable + * object. Most ZONE_DEVICE fields stay constant across the pages covered = by + * the current template, but section bits and page->virtual may still depe= nd + * on the PFN. Refresh those PFN-dependent fields in the template before + * copying it into @page. + */ +static inline void zone_device_page_update_template(struct page *template, + unsigned long pfn) +{ + set_page_section_from_pfn(template, pfn); +#ifdef WANT_PAGE_VIRTUAL + if (!is_highmem_idx(ZONE_DEVICE)) + set_page_address(template, __va(pfn << PAGE_SHIFT)); +#endif +} + +static void zone_device_page_init_from_template(struct page *page, + unsigned long pfn, struct page *template) +{ + /* + * 'template' carries the invariant portion of a ZONE_DEVICE struct + * page. Update the PFN-dependent fields in place before copying it + * to the destination page. + */ + zone_device_page_update_template(template, pfn); + memcpy(page, template, sizeof(*page)); + zone_device_page_init_pageblock(page, pfn); +} + /* * With compound page geometry and when struct pages are stored in ram most * tail pages are reused. Consequently, the amount of unique struct pages = to @@ -1116,6 +1166,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long nr_pages, struct dev_pagemap *pgmap) { + bool use_template =3D zone_device_page_init_optimization_enabled(); unsigned long pfn, end_pfn =3D start_pfn + nr_pages; struct pglist_data *pgdat =3D zone->zone_pgdat; struct vmem_altmap *altmap =3D pgmap_altmap(pgmap); @@ -1123,6 +1174,7 @@ void __ref memmap_init_zone_device(struct zone *zone, unsigned long zone_idx =3D zone_idx(zone); unsigned long start =3D jiffies; int nid =3D pgdat->node_id; + struct page template; =20 if (WARN_ON_ONCE(!pgmap || zone_idx !=3D ZONE_DEVICE)) return; @@ -1137,10 +1189,19 @@ void __ref memmap_init_zone_device(struct zone *zon= e, nr_pages =3D end_pfn - start_pfn; } =20 + if (use_template) + zone_device_template_page_init(&template, start_pfn, zone_idx, + nid, pgmap); + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); =20 - zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); + if (use_template) + zone_device_page_init_from_template(page, pfn, + &template); + else + zone_device_page_init_slow(page, pfn, zone_idx, + nid, pgmap); =20 if (pfns_per_compound =3D=3D 1) continue; --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEAD1194C98 for ; Wed, 27 May 2026 03:39:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853198; cv=none; b=OPfYbqCejliE0BlAYrQUD2ZD2Tf8offcKbwgHbf69MyzSCA82u9/yWlq+Ykhp3+OU5mBsw6/++eep2kueINLnF9gK4SooEE5hbKD8whRk+j29plMbhRR/wqVA95WXk5WpKBGUPcLj2Nms36zcOWV0uHCgetaqgg/L9uAEGkrXEQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853198; c=relaxed/simple; bh=DlK+g81q/+QLzt73oFhG2ez81k6aSbUj3H6PZsNZYsY=; h=Cc:Mime-Version:Date:Content-Type:Message-Id:References:To:From: Subject:In-Reply-To; b=br5z1FRG37lGOx6qG7xi8ubXWDuSSTAxrrJnKEj5lSF5dNylprNZpFwt95I1T4AKuUpto8MQa3lcLhJyTl/HztnMTc7rpUW/deY5ORsh9dEGyXPZeYg9tPu6G8aqonX82ZuzOyxxw3d3SiRo40S1cynMDiiDyrLAXtir9j9r2Zs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=VDBRTUy6; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="VDBRTUy6" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853192; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=qFCgbSTStAjZ31U1E/z3VlWcowp913bpgnmrB6tBGlA=; b=VDBRTUy6wTeRvedsk63DIVKDHdV91v5VfNxab6UMk/we/EWYeA9cDX9OvInP4kxIDXqock kdVqQ77BZzfG/PoVSNxWUMByu9yDsJXWu/C8yAQ8q0fJN6wXXj/ldsW/xGcZCxytMQq/ij RFlA9L6zsT5EjamwCwiURQpmJM3zkg2EhB6MZw+MoJ0Y/kVg875aHrYSm08Vy6lDUn+wrj WFOg2YhrPKvXmlPpaX9qAfS/FAbKpa/Xxzk0EKrxxAy1nxKmkyySwkjH4K4iNxxd9MIX4E s6O5zYuhHK3dZqInhXINxZI5NfjRuwTOFPNhfCRV7sUI7YKYFZNx27hgzOaX9w== Cc: , , , , , "Li Zhe" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Date: Wed, 27 May 2026 11:36:33 +0800 X-Lms-Return-Path: Message-Id: <20260527033636.28231-6-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 References: <20260527033636.28231-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable To: , , , , , , , , , From: "Li Zhe" Subject: [PATCH v3 5/8] mm: extend the template fast path to zone-device compound tails In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> X-Original-From: Li Zhe Content-Type: text/plain; charset="utf-8" The template fast path from the previous patch only accelerates head pages. Compound tails in memmap_init_compound() still go through the slow path one by one. Build separate head and tail templates and reuse one prepared tail template across the tail pages in a compound range. Head pages preserve the existing refcount policy, while compound tails always start with a refcount of 0 after prep_compound_tail(). This extends the template-copy fast path to pfns_per_compound > 1 without changing the existing slow path. Tail-page PFN-dependent fields are refreshed in the reusable tail template before each copy. Tested in a VM with a 100 GB devdax namespace (align=3D2097152) on Intel Ice Lake server. This test exercises the dax_pmem rebind path and measures memmap initialization latency. Test procedure: Unbind and rebind the dax_pmem driver 30 times, collect memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding: 1515 ms Average of subsequent rebinds: 313.45 ms With this patch and its prerequisites applied: First binding: 1422 ms Average of subsequent rebinds: 240.42 ms This reduces the average rebind time from 313.45 ms to 240.42 ms, or about 23.3%. Signed-off-by: Li Zhe Reviewed-by: Mike Rapoport (Microsoft) --- mm/mm_init.c | 45 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 9 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 53c0241c66b7..d5ccb49a048f 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1077,17 +1077,25 @@ static inline bool zone_device_page_init_optimizati= on_enabled(void) return !page_ref_tracepoint_active(page_ref_set); } =20 -static inline void zone_device_template_page_init(struct page *template, - unsigned long pfn, - unsigned long zone_idx, - int nid, - struct dev_pagemap *pgmap) +static inline void zone_device_template_head_page_init(struct page *templa= te, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) { __zone_device_page_init(template, pfn, zone_idx, nid, pgmap); if (!pagemap_resets_refcount(pgmap)) set_page_count(template, 0); } =20 +static inline void zone_device_template_tail_page_init(struct page *templa= te, + unsigned long pfn, unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap, const struct page *head, + unsigned int order) +{ + __zone_device_page_init(template, pfn, zone_idx, nid, pgmap); + prep_compound_tail(template, head, order); + set_page_count(template, 0); +} + /* * 'template' is a reusable page prototype rather than a strictly immutable * object. Most ZONE_DEVICE fields stay constant across the pages covered = by @@ -1139,10 +1147,12 @@ static void __ref memmap_init_compound(struct page = *head, unsigned long head_pfn, unsigned long zone_idx, int nid, struct dev_pagemap *pgmap, - unsigned long nr_pages) + unsigned long nr_pages, + bool use_template) { unsigned long pfn, end_pfn =3D head_pfn + nr_pages; unsigned int order =3D pgmap->vmemmap_shift; + struct page template; =20 /* * We have to initialize the pages, including setting up page links. @@ -1151,9 +1161,25 @@ static void __ref memmap_init_compound(struct page *= head, * the pages in the same go. */ __SetPageHead(head); + + /* + * All tails of the same compound page share the state established by + * prep_compound_tail(). Reuse one tail template for the whole range and + * refresh only the PFN-dependent fields in that template before each cop= y. + */ + if (use_template) + zone_device_template_tail_page_init(&template, head_pfn + 1, + zone_idx, nid, pgmap, + head, order); + for (pfn =3D head_pfn + 1; pfn < end_pfn; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 + if (use_template) { + zone_device_page_init_from_template(page, pfn, + &template); + continue; + } zone_device_page_init_slow(page, pfn, zone_idx, nid, pgmap); prep_compound_tail(page, head, order); set_page_count(page, 0); @@ -1190,8 +1216,8 @@ void __ref memmap_init_zone_device(struct zone *zone, } =20 if (use_template) - zone_device_template_page_init(&template, start_pfn, zone_idx, - nid, pgmap); + zone_device_template_head_page_init(&template, start_pfn, + zone_idx, nid, pgmap); =20 for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D pfns_per_compound) { struct page *page =3D pfn_to_page(pfn); @@ -1207,7 +1233,8 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; =20 memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pgmap)); + compound_nr_pages(altmap, pgmap), + use_template); } =20 pr_debug("%s initialised %lu pages in %ums\n", __func__, --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-2-111.ptr.blmpb.com (va-2-111.ptr.blmpb.com [209.127.231.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 364903806B8 for ; Wed, 27 May 2026 03:40:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853226; cv=none; b=a8zO7jDNv4bFaO4nHpmhI7owgp8xoB6Mp+VapnwQV49Q5IsXbHAVIBXroXrBUO0yMH5v5tzexhNEBbgkC0Bah/mzmbV4MN9Ul/7ab85K/G9/+de6oe+zoXYm4WKuQvq3qwZJrPo2M6Ix/aiOf+9jKJ5auyx0kZWdwSCVVniMIZI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853226; c=relaxed/simple; bh=/B9bHCzKxO2WBOFU4VOAAcAC/QSpAytvHDnH12PBs1Y=; h=Date:Mime-Version:In-Reply-To:References:Cc:Content-Type:From: Message-Id:To:Subject; b=o6UWvj8I9gHm1Sphp3jJUizLZCye5dP8ZEKVAyw8/e6OU7HZfrduelSvKw3sqaj+273QTvMyewA/pt6rLm++MMi14JIafwNTDocglcFo4cvkIIpwIECzEeF5zkm6kn3DhT73nfYaGo1DOLOyVtFNyhseve7osndM7oUm5ASgNEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Q6/FJunI; arc=none smtp.client-ip=209.127.231.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Q6/FJunI" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853213; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=F7+PWjKsjsPUzsocI71Frs9HaxpL6NKJS1KqgSbi3hg=; b=Q6/FJunI3E3ldgk6G6CS8xZcUrSUIeLAp8D7H0/U0QGv72Eg7Pg18pLR5I436RPhCG6xXk bhpgSKowZ6br4VQWkDBbJ3L03tU6jZ4BgwoBsEV2XxVwhmDkK7Etd+taAiGb+ogDNS5O1z bXsUAWFjHxxgu2rJNYDmZfPXuiSrwTincCbEM/ngWhRZSGMm4+W0v+e1ieuGwEhXaCgOLd D67mLm19hNsA0WwwRbNveN/TRKmSVR/qXsxqXQil24CqgK75o9iQN3g0dIF+r+F4eATCyv O1DEwX6d1ovyFbnU0Vurnj3nctAHjFemzh+Lgg2R2EJYwckwSTZOpgq98GTPbw== Date: Wed, 27 May 2026 11:36:34 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> References: <20260527033636.28231-1-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable Cc: , , , , , "Li Zhe" From: "Li Zhe" Message-Id: <20260527033636.28231-7-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 X-Lms-Return-Path: To: , , , , , , , , , X-Original-From: Li Zhe Subject: [PATCH v3 6/8] string: introduce memcpy_streaming() helpers Content-Type: text/plain; charset="utf-8" Introduce a generic memcpy_streaming() interface for write-once copy sites that can fall back to memcpy() when no architecture-specific optimization is available, or when an architecture-specific backend cannot safely handle a given transfer. Add memcpy_streaming_drain() alongside it so callers can separate the copy primitive from any required ordering point. On x86, use memcpy_flushcache() and sfence only for aligned transfers that can stay entirely on the non-temporal store path; otherwise fall back to memcpy() so the generic API does not expose flushcache semantics on cached head/tail fragments. Callers are responsible for invoking memcpy_streaming_drain() before later normal stores that must be ordered after the streaming copy. Signed-off-by: Li Zhe --- arch/x86/include/asm/string_64.h | 40 ++++++++++++++++++++++++++++++++ include/linux/string.h | 20 ++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string= _64.h index 4635616863f5..0b57e9e6f3db 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -100,6 +100,46 @@ static __always_inline void memcpy_flushcache(void *ds= t, const void *src, size_t } __memcpy_flushcache(dst, src, cnt); } + +/* + * Only reuse memcpy_flushcache() for transfers that can stay entirely + * on its non-temporal store path. Fall back to memcpy() for zero-length + * copies and for unaligned transfers so the generic streaming API does + * not expose flushcache semantics on cached head/tail fragments. + */ +static __always_inline int memcpy_flushcache_nt_safe(const void *dst, + const void *src, + size_t cnt) +{ + unsigned long d =3D (unsigned long)dst; + unsigned long s =3D (unsigned long)src; + + if (!cnt) + return 0; + + if (cnt >=3D 8) + return !(d & 7) && !(s & 7) && !(cnt & 7); + + return cnt =3D=3D 4 && !(d & 3) && !(s & 3); +} + +#define __HAVE_ARCH_MEMCPY_STREAMING 1 +static __always_inline void memcpy_streaming(void *dst, const void *src, + size_t cnt) +{ + if (!cnt) + return; + + if (memcpy_flushcache_nt_safe(dst, src, cnt)) + memcpy_flushcache(dst, src, cnt); + else + memcpy(dst, src, cnt); +} + +static __always_inline void memcpy_streaming_drain(void) +{ + asm volatile("sfence" : : : "memory"); +} #endif =20 #endif /* __KERNEL__ */ diff --git a/include/linux/string.h b/include/linux/string.h index b850bd91b3d8..a4c2d4347f58 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -281,6 +281,26 @@ static inline void memcpy_flushcache(void *dst, const = void *src, size_t cnt) } #endif =20 +#ifndef __HAVE_ARCH_MEMCPY_STREAMING +/* + * memcpy_streaming() is for write-once copy sites that may use + * non-temporal stores on some architectures. Callers must follow it + * with memcpy_streaming_drain() before later normal stores that need to + * be ordered after the streaming copy. Implementations may fall back to + * memcpy() when a specialized backend cannot safely handle the given + * transfer, and backends that use regular cached stores can make the + * drain a no-op. + */ +static inline void memcpy_streaming(void *dst, const void *src, size_t cnt) +{ + memcpy(dst, src, cnt); +} + +static inline void memcpy_streaming_drain(void) +{ +} +#endif + void *memchr_inv(const void *s, int c, size_t n); char *strreplace(char *str, char old, char new); =20 --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C252F2E975E for ; Wed, 27 May 2026 03:40:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853238; cv=none; b=gGBsO3mTJ2c1OW/3tLlqlJL8OE1YudLDYh8sALMzs6I5xMx5a1Hr//XV2TQ4gZUUWsi9iYYgqIspcQyFTMLA5oKWOGTw7ooqRXY3odk0C3d+F4i025wDSCgus/GVAdYn4L0Yn858Idrko62seGygDPQOI+700vYL+Cgp0jxFcVI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853238; c=relaxed/simple; bh=owEAbc5vf/a1Bh9MErm0fqtO4o73BLGK9Nbp32zKEYE=; h=Message-Id:References:Cc:From:Content-Type:To:Date:In-Reply-To: Subject:Mime-Version; b=cz2YNVLSrGWpPawHAIbowXNeUrCv9GqNhkTcyEXcqrEdHnoCgW5R0Ez4OOeI3qWx1hZ9jYXw3HnV32rTp2Al9xIBuo80rpwh2Bg3RJQ887YMMWAV20ZMGWzkdlu+mx9dzezfxoN9pJsKwZEXaVgzZiU455c6Pc32jghqCmX9DHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=oFjOk4qQ; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="oFjOk4qQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853232; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=kJpm339USLr3ZYM719LtoiE3fktdOEdjk2CrZTZZdLc=; b=oFjOk4qQhMiG8po92VVLk9UGTEdsfPyQrFr9BomK8iJHh708mm9zzBxXZQ6TVZIJm6acg+ pIOrHYW0J5avxnddrvuFXKJva1Hi+xwp5A/qPyVKdyQjqt4cR3TNoZX0enxsAL5kVNRoY+ fF+DTM/v0oDf5lilrLp4RORkejPyj2yzi2LOow2oCJpj4j9J7SgfFXi3FfBsn32FhodHg0 4vMLfjj2BgxfOzbLegZ0IR8ofrkFe6bw3FkWVMdJ0onztcyzil8mPQB4J9YdZh/S0hHn+t lc0us9jWcHzpFKiTWTavFA8FZaudtlu9tNRdj3/ZhzMQWoJms+gFIQsbYBrceQ== Message-Id: <20260527033636.28231-8-lizhe.67@bytedance.com> References: <20260527033636.28231-1-lizhe.67@bytedance.com> Cc: , , , , , "Li Zhe" From: "Li Zhe" Content-Transfer-Encoding: quoted-printable X-Original-From: Li Zhe X-Lms-Return-Path: To: , , , , , , , , , Date: Wed, 27 May 2026 11:36:35 +0800 In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 Subject: [PATCH v3 7/8] x86/string: extend memcpy_flushcache() fixed-size fastpaths Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Small constant-sized flushcache copies currently fall back to __memcpy_flushcache() unless they are exactly 4, 8, or 16 bytes. Factor the existing inline movnti sequences into small helpers and extend the fixed-size fastpath coverage to 24..96 bytes for naturally aligned transfers. This keeps common struct-page-sized copies on the inline path for the upcoming memcpy_streaming() user, while still falling back to __memcpy_flushcache() for unaligned or uncommon sizes. Zero-length copies return immediately. Issue the fixed-size stores in ascending address order so write-combining sees a forward stream. Signed-off-by: Li Zhe --- arch/x86/include/asm/string_64.h | 125 ++++++++++++++++++++++++++----- 1 file changed, 107 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string= _64.h index 0b57e9e6f3db..8e6fca0185ee 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -82,24 +82,6 @@ int strcmp(const char *cs, const char *ct); #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE #define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1 void __memcpy_flushcache(void *dst, const void *src, size_t cnt); -static __always_inline void memcpy_flushcache(void *dst, const void *src, = size_t cnt) -{ - if (__builtin_constant_p(cnt)) { - switch (cnt) { - case 4: - asm ("movntil %1, %0" : "=3Dm"(*(u32 *)dst) : "r"(*(u32 *)src)); - return; - case 8: - asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)dst) : "r"(*(u64 *)src)); - return; - case 16: - asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)dst) : "r"(*(u64 *)src)); - asm ("movntiq %1, %0" : "=3Dm"(*(u64 *)(dst + 8)) : "r"(*(u64 *)(src += 8))); - return; - } - } - __memcpy_flushcache(dst, src, cnt); -} =20 /* * Only reuse memcpy_flushcache() for transfers that can stay entirely @@ -123,6 +105,113 @@ static __always_inline int memcpy_flushcache_nt_safe(= const void *dst, return cnt =3D=3D 4 && !(d & 3) && !(s & 3); } =20 +static __always_inline void memcpy_flushcache_4(void *dst, const void *src) +{ + asm volatile("movntil %1, %0" + : "=3Dm"(*(u32 *)dst) + : "r"(*(const u32 *)src) + : "memory"); +} + +static __always_inline void memcpy_flushcache_8(void *dst, const void *src) +{ + asm volatile("movntiq %1, %0" + : "=3Dm"(*(u64 *)dst) + : "r"(*(const u64 *)src) + : "memory"); +} + +static __always_inline void memcpy_flushcache_16(void *dst, + const void *src) +{ + memcpy_flushcache_8(dst, src); + memcpy_flushcache_8(dst + 8, src + 8); +} + +static __always_inline void memcpy_flushcache_32(void *dst, + const void *src) +{ + memcpy_flushcache_16(dst, src); + memcpy_flushcache_16(dst + 16, src + 16); +} + +static __always_inline void memcpy_flushcache_64(void *dst, + const void *src) +{ + memcpy_flushcache_32(dst, src); + memcpy_flushcache_32(dst + 32, src + 32); +} + +/* + * Keep common fixed-size copies on the inline movnti path when they can + * stay entirely on aligned non-temporal stores. Issue the stores in + * ascending address order so write-combining sees a forward stream. + */ +static __always_inline int memcpy_flushcache_small(void *dst, + const void *src, + size_t cnt) +{ + char *d =3D dst; + const char *s =3D src; + + if (!memcpy_flushcache_nt_safe(dst, src, cnt)) + return 0; + + switch (cnt) { + case 4: + memcpy_flushcache_4(d, s); + return 1; + case 8: + memcpy_flushcache_8(d, s); + return 1; + } + + if (cnt & 8) { + memcpy_flushcache_8(d, s); + d +=3D 8; + s +=3D 8; + cnt -=3D 8; + } + + switch (cnt) { + case 16: + memcpy_flushcache_16(d, s); + return 1; + case 32: + memcpy_flushcache_32(d, s); + return 1; + case 48: + memcpy_flushcache_32(d, s); + memcpy_flushcache_16(d + 32, s + 32); + return 1; + case 64: + memcpy_flushcache_64(d, s); + return 1; + case 80: + memcpy_flushcache_64(d, s); + memcpy_flushcache_16(d + 64, s + 64); + return 1; + case 96: + memcpy_flushcache_64(d, s); + memcpy_flushcache_32(d + 64, s + 64); + return 1; + } + + return 0; +} + +static __always_inline void memcpy_flushcache(void *dst, const void *src, + size_t cnt) +{ + if (!cnt) + return; + + if (__builtin_constant_p(cnt) && memcpy_flushcache_small(dst, src, cnt)) + return; + + __memcpy_flushcache(dst, src, cnt); +} + #define __HAVE_ARCH_MEMCPY_STREAMING 1 static __always_inline void memcpy_streaming(void *dst, const void *src, size_t cnt) --=20 2.20.1 From nobody Mon Jun 8 19:49:30 2026 Received: from va-2-113.ptr.blmpb.com (va-2-113.ptr.blmpb.com [209.127.231.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12339380FE2 for ; Wed, 27 May 2026 03:40:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853259; cv=none; b=JRa1bH6jyPcKhIRor+PCAAoGC2kCl+0/A7oFT/N2glI7DX4zwb8rN2qSmJlvsZqg7TL3AGZ+PzSWux1qfGW7tDdZgm8ISHc92riUVyoNF6pPuxT8eVcfPW7vIimPAH1FQwkJn6XP8Ctr5FpQ7dBBK8c41ZTpGWUVje0TvYsBOvk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853259; c=relaxed/simple; bh=5VhLv2LjLV5L+tN7bDrC78lxsMFkMu3odqF+TAzBMb4=; h=To:Mime-Version:References:Date:Content-Type:Cc:Subject: In-Reply-To:From:Message-Id; b=FOF+LX78N/uSVE6lNYh/eXWZ+VE5HW1hR2GLODVeznhR1DpcbHyTl1866Ic7PcBncyoB9mC+gajmv/olT9hWSBIwHCNvffUD4DqKTj3R6r6vm6PpmAqLYgKEc6I/LfllAU3YnU6NALxok//5CMEilyYExXVorhc0a4w9KO8bbqE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=jxGxZUs3; arc=none smtp.client-ip=209.127.231.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="jxGxZUs3" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1779853253; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=/2RvXz8oWe2/xVcoRxII+0kZXLSgfUTV6Cn2seMf/Rg=; b=jxGxZUs3u2P4WcI+TbLrf1iJPbfO2E927/UO7AbjPGoyrqScQm+seFYm/WaaloWm9c5OhG CaiCfhVRWd7E4dyhdPw7AcSzVAMF0FoQcu4RrI6QjbIqq/lcLvCaCmNvpCn8c1VXkp9NR1 R8ltVzJBXD93p17b9n5T8bgR3Ui70rmuQRxIB2ioFZfSoEhtRiE7YQNg5qdDFQUJF/uM2q qsH8PvlckD2t5q2i6MwQeT12yQmqNTf9CNJkRhSm+TkKksxSI8C/T4UPetnnp7DDkarKaD RIpPZh0E21NPr4Cao6omM2bP2e0b1XmS0bv31+ZiTpgH4UZeBl0FWnle0319ZQ== X-Mailer: git-send-email 2.45.2 To: , , , , , , , , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260527033636.28231-1-lizhe.67@bytedance.com> Date: Wed, 27 May 2026 11:36:36 +0800 X-Lms-Return-Path: Cc: , , , , , "Li Zhe" Subject: [PATCH v3 8/8] mm: use memcpy_streaming() in zone-device template copies X-Original-From: Li Zhe In-Reply-To: <20260527033636.28231-1-lizhe.67@bytedance.com> From: "Li Zhe" Message-Id: <20260527033636.28231-9-lizhe.67@bytedance.com> Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The template fast path still leaves the actual copy sequence up to the compiler. Use the streaming-copy helpers introduced in the previous patches for the ZONE_DEVICE template-copy path so common mm code can request a write-once copy primitive without embedding arch-specific store layout in the generic layer. ZONE_DEVICE memmap initialization is a write-once path: each struct page is populated once and is not expected to be reused from cache immediately afterwards. A regular cached copy can therefore incur write-allocate traffic and pollute the cache without much benefit. Using memcpy_streaming() lets this path use an architecture-optimized streaming copy where available, while still degrading to memcpy() on architectures that do not provide a specialized implementation. Keep pageblock-aligned PFNs on memcpy() so pageblock initialization can immediately read back page metadata without introducing a read-after-streaming dependency. For the remaining PFNs, use memcpy_streaming() so the hot path can avoid write-allocate traffic while still leaving unsupported or unsuitable cases to the fallback implementation. When the streaming backend uses non-temporal stores, order them before entering memmap_init_compound(), before prep_compound_head() updates the overlapping compound metadata, and before returning from memmap_init_zone_device(). Keep sanitized builds on the slow path so KASAN/KMSAN retain their instrumented stores. Tested in a VM with a 100 GB fsdax namespace device configured with map=3Ddev and a 100 GB devdax namespace (align=3D2097152) on Intel Ice Lake server. Test procedure: Rebind the nd_pmem and dax_pmem driver 30 times and collect the memmap initialization time from the pr_debug() output of memmap_init_zone_device(). Base(v7.1-rc3): First binding for nd_pmem driver: 1486 ms Average of subsequent rebinds: 273.52 ms First binding for dax_pmem driver: 1515 ms Average of subsequent rebinds: 313.45 ms With this series: First binding for nd_pmem driver: 1285 ms Average of subsequent rebinds: 114.31 ms First binding for dax_pmem driver: 1331 ms Average of subsequent rebinds: 99.37 ms This reduces the average rebind time by about 58.2% for nd_pmem and 68.3% for dax_pmem. Signed-off-by: Li Zhe --- mm/mm_init.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index d5ccb49a048f..1f56765b92e1 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1070,11 +1070,21 @@ static void __ref zone_device_page_init_slow(struct= page *page, =20 static inline bool zone_device_page_init_optimization_enabled(void) { + /* + * Keep sanitized builds on the slow path so their stores stay + * instrumented. + */ + if (IS_ENABLED(CONFIG_KASAN) || IS_ENABLED(CONFIG_KMSAN)) + return false; + /* * The template fast path copies a preinitialized struct page image. * Skip it when the page_ref_set tracepoint is enabled. */ - return !page_ref_tracepoint_active(page_ref_set); + if (page_ref_tracepoint_active(page_ref_set)) + return false; + + return true; } =20 static inline void zone_device_template_head_page_init(struct page *templa= te, @@ -1120,9 +1130,19 @@ static void zone_device_page_init_from_template(stru= ct page *page, * 'template' carries the invariant portion of a ZONE_DEVICE struct * page. Update the PFN-dependent fields in place before copying it * to the destination page. + * + * pageblock-aligned pages immediately feed + * init_pageblock_migratetype(), which reads back page metadata via + * helpers like page_zone(page). Avoid a read-after-streaming + * dependency for these rare pages by using regular cached stores + * instead of non-temporal ones. */ zone_device_page_update_template(template, pfn); - memcpy(page, template, sizeof(*page)); + if (unlikely(pageblock_aligned(pfn))) + memcpy(page, template, sizeof(*page)); + else + memcpy_streaming(page, template, sizeof(*page)); + zone_device_page_init_pageblock(page, pfn); } =20 @@ -1184,6 +1204,15 @@ static void __ref memmap_init_compound(struct page *= head, prep_compound_tail(page, head, order); set_page_count(page, 0); } + + /* + * prep_compound_head() updates compound metadata in struct folio fields + * that alias the first tail-page descriptors. When the tail pages above + * were populated with non-temporal stores, order those writes before the + * overlapping metadata updates below. + */ + if (use_template) + memcpy_streaming_drain(); prep_compound_head(head, order); } =20 @@ -1232,10 +1261,24 @@ void __ref memmap_init_zone_device(struct zone *zon= e, if (pfns_per_compound =3D=3D 1) continue; =20 + /* + * memmap_init_compound() immediately updates compound-head + * metadata. If the head-page template copy above used + * non-temporal stores, order them before entering the + * compound setup path. + */ + if (use_template) + memcpy_streaming_drain(); + memmap_init_compound(page, pfn, zone_idx, nid, pgmap, compound_nr_pages(altmap, pgmap), use_template); } + /* + * Drain any remaining non-temporal stores before returning. + */ + if (use_template) + memcpy_streaming_drain(); =20 pr_debug("%s initialised %lu pages in %ums\n", __func__, nr_pages, jiffies_to_msecs(jiffies - start)); --=20 2.20.1