From nobody Thu Apr 16 20:49:12 2026 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 755363AE70B for ; Wed, 25 Feb 2026 16:34:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037280; cv=none; b=JP9VC6WGa16/ycolR3ilECKL2VnvFg7KzCTemDcYObxB9CrN2aXYGCRGrilhzk5LRzBk8O7uxQn7CvqY8GtIjjmu5lBGYl3ZPh6akzcPUDVlvCvFFVOuQkTDxhGCLtYArVVijS/Tbvz+jpVDBadLakPNTT4/jxQIB+H3YaNzl58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037280; c=relaxed/simple; bh=jelsogJBzN8xdAQ/OvSk5lfPtRJSgoM9AIUj11KW2Wc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=T6lHSNIiQBhf1b0MA7oCBTwgAHv88W/ufz7yuL4jFysWXkSau7F9SloCfFxZY3klc56FfwLPKjMLYJEvm4I1wK7kspD3NsiTggM6G6SRlHtshEFWG6mg1M2yIdRNAmZZGxaq+1jewNR6IYxZa/nHOmCD2Bz+AssiNB7kPHdBp9Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ARffEX5Y; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ARffEX5Y" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-b934e96af9dso43264766b.3 for ; Wed, 25 Feb 2026 08:34:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037278; x=1772642078; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VrqRFYhB7q55kvKhGXyA9CpUY7G4mfRc8hK/Uh4Zfu0=; b=ARffEX5YHdxiq8LrGUk96fvPdN3w5WwACFS49yWY95l1o3disRbq5ZjHKQFTEQPsaJ 1iQn6qwXA+gyFb9Z1c6myVvGXrQRvu726dp+6wIEbVQvrQEcey2Oem8e3Hh8phfnkxYl fxK7LYmMyaJ3n3+cUeIKs5KIuBEulLADtQsiCtPQschcovXIVe+b0UdQ64ssTwzj8z2Y C3M5lJgC9AKyY7U1u91GQwT6dCJVktoft92KKNwjwNYgO8n3hOMvVBJilOUztwqVzwF6 xs4Jlc+PxlxQfMZYCrcHTLskwbqqhRut2aODpsIFK8PNMoDn6UiO+EfcsM950YUVY8nw df7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037278; x=1772642078; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VrqRFYhB7q55kvKhGXyA9CpUY7G4mfRc8hK/Uh4Zfu0=; b=r5PbGwsCyq9Pdm449kXEU3PUWfdqDjXpvzN3fkwmNJMUs51hmSyWDmHKzSAqVCnhar lfNo221CVSO0Yx2XwM6JNtiR6D5QIZR7WUHk3Jw89I3SKh+qXCLRWPG7eOJbEEMfZpEs DrWJkGi6LltYa+6qjDg+g2MKt7/7w0aZVeAD0WmlvfkUrChsTghNJ+K/kdDTf38cciG6 iBomyWQWNxiumiD4EoKXxyEuCuFAe7eM9naA9gM+TcRfski1L1tBByOg1zLdsupMif+5 B3ZZ4ePB0k7ZfHxh2/RnW79q5prrihJ8OHM5b1h+HoOHxENWJbKuxYDhXxYE7UhZz9Dr xZvQ== X-Forwarded-Encrypted: i=1; AJvYcCUZEv7MYdWzcrNjg6YpqUwzHCfhfU0IiHQrcopx94Si76AERZwSs6n+O8PVM94joANWLacEAOTQ6DDiHj4=@vger.kernel.org X-Gm-Message-State: AOJu0YwWdBZmDQadXkP33JMJZwK20me6jI8pXfNLZcEyeBc6xQfo37AX V8pZbf082ZTHrm29X+gw+uT8sUeuuD0oTXy0I+MvxxWfgyrFj/sfHwOyM6ogtebJ6iuMhnn1vGA gNHOWQ83m0DtNGw== X-Received: from edge11-n2.prod.google.com ([2002:a05:6402:a54b:20b0:65a:3506:7e8]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:3e05:b0:b83:95c8:15d0 with SMTP id a640c23a62f3a-b9081c1880bmr1179838266b.52.1772037277341; Wed, 25 Feb 2026 08:34:37 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:26 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-1-e8808a03cd66@google.com> Subject: [PATCH RFC 01/19] x86/mm: split out preallocate_sub_pgd() From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This code will be needed elsewhere in a following patch. Split out the trivial code move for easy review. This changes the logging slightly: instead of panic() directly reporting the level of the failure, there is now a generic panic message which will be preceded by a separate warn that reports the level of the failure. This is a simple way to have this helper suit the needs of its new user as well as the existing one. Other than logging, no functional change intended. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/pgalloc.h | 33 +++++++++++++++++++++++++++++++ arch/x86/mm/init_64.c | 44 +++++++-------------------------------= ---- 2 files changed, 40 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index c88691b15f3c6..3541b86c9c6b0 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -2,6 +2,7 @@ #ifndef _ASM_X86_PGALLOC_H #define _ASM_X86_PGALLOC_H =20 +#include #include #include /* for struct page */ #include @@ -128,6 +129,38 @@ static inline void __pud_free_tlb(struct mmu_gather *t= lb, pud_t *pud, ___pud_free_tlb(tlb, pud); } =20 +/* Allocate a pagetable pointed to by the top hardware level. */ +static inline int preallocate_sub_pgd(struct mm_struct *mm, unsigned long = addr) +{ + const char *lvl; + p4d_t *p4d; + pud_t *pud; + + lvl =3D "p4d"; + p4d =3D p4d_alloc(mm, pgd_offset_pgd(mm->pgd, addr), addr); + if (!p4d) + goto failed; + + if (pgtable_l5_enabled()) + return 0; + + /* + * On 4-level systems, the P4D layer is folded away and + * the above code does no preallocation. Below, go down + * to the pud _software_ level to ensure the second + * hardware level is allocated on 4-level systems too. + */ + lvl =3D "pud"; + pud =3D pud_alloc(mm, p4d, addr); + if (!pud) + goto failed; + return 0; + +failed: + pr_warn_ratelimited("Failed to preallocate %s\n", lvl); + return -ENOMEM; +} + #if CONFIG_PGTABLE_LEVELS > 4 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p= 4d) { diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index df2261fa4f985..79806386dc42f 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1318,46 +1318,16 @@ static void __init register_page_bootmem_info(void) static void __init preallocate_vmalloc_pages(void) { unsigned long addr; - const char *lvl; =20 for (addr =3D VMALLOC_START; addr <=3D VMEMORY_END; addr =3D ALIGN(addr += 1, PGDIR_SIZE)) { - pgd_t *pgd =3D pgd_offset_k(addr); - p4d_t *p4d; - pud_t *pud; - - lvl =3D "p4d"; - p4d =3D p4d_alloc(&init_mm, pgd, addr); - if (!p4d) - goto failed; - - if (pgtable_l5_enabled()) - continue; - - /* - * The goal here is to allocate all possibly required - * hardware page tables pointed to by the top hardware - * level. - * - * On 4-level systems, the P4D layer is folded away and - * the above code does no preallocation. Below, go down - * to the pud _software_ level to ensure the second - * hardware level is allocated on 4-level systems too. - */ - lvl =3D "pud"; - pud =3D pud_alloc(&init_mm, p4d, addr); - if (!pud) - goto failed; + if (preallocate_sub_pgd(&init_mm, addr)) { + /* + * The pages have to be there now or they will be + * missing in process page-tables later. + */ + panic("Failed to pre-allocate pagetables for vmalloc area\n"); + } } - - return; - -failed: - - /* - * The pages have to be there now or they will be missing in - * process page-tables later. - */ - panic("Failed to pre-allocate %s pages for vmalloc area\n", lvl); } =20 void __init arch_mm_preinit(void) --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D13F3B961D for ; Wed, 25 Feb 2026 16:34:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037282; cv=none; b=aAWWNV+AX7s2gSErP/fOhDI77V8hgwcNj7DqPR9QV0YFA65u9mcfdoUPV312Kd/QYtfLDORc1lNMvt0XirI08zybdK/6ChnLnKglWdwIPoIi3CxMNfQhzA5/pr/Xd14EmBziK7U/Cb7sS2bL3hZMjD9ANMeLihUuhDIf8EueVAQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037282; c=relaxed/simple; bh=4ZboV5wgwVZlE40lc9km0aJaqv5XQxs8UcZtlU52mcs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=J+PVFdYcJyTfY8gIjikKL3XDNfeAP4LvY7gLjv5fXDgAGwuu41XJlH0xBV27nvl/2+FohpV4nHcfgqR1D9KtzlY4T+ZS0shtV2deKQQ2F3ma9v49faMiVtVVmR0plmJwIlEg9atmqfWRZcxpx6JlM1dzvKOQ1R5BGEOkdPIfusI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=O2QmX89J; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="O2QmX89J" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-43768240a29so1011174f8f.1 for ; Wed, 25 Feb 2026 08:34:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037278; x=1772642078; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YPQ+misXJTr57F+MdFkoDJ+8mZ8TPh1m44bWqZ7xGK0=; b=O2QmX89JuNj3gg0j4ScIoQqf/Fjgpd4EQo+Upl64LbIj7PeE0rJdST1obpNcUiWrQg ntusdSCKtCl5+ECtpP5IRQpxW/JRTtf+eamXr3yXwwYxyDAjykK5buf/b4egLUd7IDGi DuC4cyaEelJBrDXN8hypPXxwyCi5POG92tpreAXz2lFCFAgSVVxHHfDZkIJpe0btQlnE sFr8HB7vRuWifqXT0CsH3crBK9VMTLjM0qdoATRzuwF++mZZ8paIChH7/XYyYDuxcV40 b1C4jnnKkMcOdxUenTdYpokYCelHkQQcwolucYakpzMTIjs8jUjieCqaUTpolEt6x2Ff ZgMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037278; x=1772642078; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YPQ+misXJTr57F+MdFkoDJ+8mZ8TPh1m44bWqZ7xGK0=; b=fb7jrIxVBGPZiG7W6lHF40XYOFbOIWG3ENZpSq730+M+EI9Bs26C+SivUJPQ/3D6VY LhKZqvOkl92J01cZBiSEZnAgNy7E7e515MuLo3j0ArdOB3uPqfbHFS46RIgJhcckyx0B y5qgwY/048a8+Gjao4kE1szD2q8AoNCXyOM9BMfjXQhjpauqEkyvaoIvfvEISoWnzXVP 22n4rhNt+PyfPtvMxfZ+R2jYQDKsg44kCDEoa4LAK4b5buf43tYdPIXVauV94sFasFnK 4b/WIFyM41QBB3ByzCCVY40mAvmxvYmQu9zSphM/OJ9s16iKIseU8J1NefV+CUkeEEqH YXng== X-Forwarded-Encrypted: i=1; AJvYcCVlK6Jw3mgCczh7FJqrrLw6+yLODKYb3SZh8mnQ4mlOuuLNH/pJHmVo9cxnAx13/FMg8HC1RKOF/Ax+Gew=@vger.kernel.org X-Gm-Message-State: AOJu0YyQjt+MSOkagEm6val3qc28+Q4bMgtIr2nes1MHBjdU5d5dmo0x O/e206cHm/03qgu1/pCZZpVZztKaKWEwaKNRnXwbGfRtx4M811o85aAZ2kgTkJqghHWdq4rOVIV ra0tvgzCPifM8MA== X-Received: from wrsg1.prod.google.com ([2002:a5d:46c1:0:b0:436:332e:b900]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:1a89:b0:437:9ab5:905e with SMTP id ffacd0b85a97d-4398d83e509mr8718335f8f.11.1772037278494; Wed, 25 Feb 2026 08:34:38 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:27 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-2-e8808a03cd66@google.com> Subject: [PATCH RFC 02/19] x86/mm: Generalize LDT remap into "mm-local region" From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Various security features benefit from having process-local address mappings. Examples include no-direct-map guest_memfd [2] significant optimizations for ASI [1]. As pointed out by Andy in [0], x86 already has a PGD entry that is local to the mm, which is used for the LDT. So, simply redefine that entry's region as "the mm-local region" and then redefine the LDT region as a sub-region of that. With the currently-envisaged usecases, there will be many situations where almost no processes have any need for the mm-local region. Therefore, avoid its overhead (memory cost of pagetables, alloc/free overhead during fork/exit) for processes that don't use it by requiring its users to explicitly initialize it via the new mm_local_* API. Freeing the pagetables in this region is left to the mm_local_* API implementation and deferred until process exit. This means that the LDT remap code can be simplified: 1. map_ldt_struct_to_user() is now a NOP on 64-bit, since the mm-local region is defined as already being mapped into the user pagetables. 3. free_ldt_pgtables() is no long required at all, it's handled in the core mm teardown logic in both PAE and KPTI cases now. 2. The sanity-check logic is unified: in both cases just walk to the PMD and use presence of that as the proxy for whether an LDT mapping is present. This requires an extra null-check since the page walk will generally terminate early in the KPTI case. TODO: Agh, this is broken under PAE, looks like I had totally forgotten that KPTI supports 32-bit? Even though there is 32-bit KPTI code modified here. Oops. [0] https://lore.kernel.org/linux-mm/CALCETrXHbS9VXfZ80kOjiTrreM2EbapYeGp68= mvJPbosUtorYA@mail.gmail.com/ [1] https://linuxasi.dev/ [2] https://lore.kernel.org/all/20250924151101.2225820-1-patrick.roy@campus= .lmu.de Signed-off-by: Brendan Jackman --- Documentation/arch/x86/x86_64/mm.rst | 4 +- arch/x86/Kconfig | 2 + arch/x86/include/asm/mmu_context.h | 71 ++++++++++++++++- arch/x86/include/asm/pgtable_64_types.h | 13 ++- arch/x86/kernel/ldt.c | 137 +++++++++++-----------------= ---- arch/x86/mm/pgtable.c | 3 + include/linux/mm.h | 13 +++ include/linux/mm_types.h | 2 + kernel/fork.c | 1 + mm/Kconfig | 7 ++ 10 files changed, 155 insertions(+), 98 deletions(-) diff --git a/Documentation/arch/x86/x86_64/mm.rst b/Documentation/arch/x86/= x86_64/mm.rst index a6cf05d51bd8c..fa2bb7bab6a42 100644 --- a/Documentation/arch/x86/x86_64/mm.rst +++ b/Documentation/arch/x86/x86_64/mm.rst @@ -53,7 +53,7 @@ Complete virtual memory map with 4-level page tables ____________________________________________________________|___________= ________________________________________________ | | | | ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard = hole, also reserved for hypervisor - ffff880000000000 | -120 TB | ffff887fffffffff | 0.5 TB | LDT remap = for PTI + ffff880000000000 | -120 TB | ffff887fffffffff | 0.5 TB | MM-local k= ernel data. Includes LDT remap for PTI ffff888000000000 | -119.5 TB | ffffc87fffffffff | 64 TB | direct map= ping of all physical memory (page_offset_base) ffffc88000000000 | -55.5 TB | ffffc8ffffffffff | 0.5 TB | ... unused= hole ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/io= remap space (vmalloc_base) @@ -123,7 +123,7 @@ Complete virtual memory map with 5-level page tables ____________________________________________________________|___________= ________________________________________________ | | | | ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard = hole, also reserved for hypervisor - ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | LDT remap = for PTI + ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | MM-local k= ernel data. Includes LDT remap for PTI ff11000000000000 | -59.75 PB | ff90ffffffffffff | 32 PB | direct map= ping of all physical memory (page_offset_base) ff91000000000000 | -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused= hole ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/io= remap space (vmalloc_base) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e2df1b147184a..5bf68dcea3fee 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -133,6 +133,7 @@ config X86 select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_AUTOFDO_CLANG select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64 + select ARCH_SUPPORTS_MM_LOCAL_REGION if X86_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if X86_CX8 select ARCH_USE_MEMTEST @@ -2320,6 +2321,7 @@ config CMDLINE_OVERRIDE config MODIFY_LDT_SYSCALL bool "Enable the LDT (local descriptor table)" if EXPERT default y + select MM_LOCAL_REGION if MITIGATION_PAGE_TABLE_ISOLATION help Linux can allow user programs to install a per-process x86 Local Descriptor Table (LDT) using the modify_ldt(2) system diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_= context.h index 1acafb1c6a932..9016fe525bb62 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -8,8 +8,10 @@ =20 #include =20 +#include #include #include +#include #include #include #include @@ -59,7 +61,6 @@ static inline void init_new_context_ldt(struct mm_struct = *mm) } int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm); void destroy_context_ldt(struct mm_struct *mm); -void ldt_arch_exit_mmap(struct mm_struct *mm); #else /* CONFIG_MODIFY_LDT_SYSCALL */ static inline void init_new_context_ldt(struct mm_struct *mm) { } static inline int ldt_dup_context(struct mm_struct *oldmm, @@ -68,7 +69,6 @@ static inline int ldt_dup_context(struct mm_struct *oldmm, return 0; } static inline void destroy_context_ldt(struct mm_struct *mm) { } -static inline void ldt_arch_exit_mmap(struct mm_struct *mm) { } #endif =20 #ifdef CONFIG_MODIFY_LDT_SYSCALL @@ -226,10 +226,75 @@ static inline int arch_dup_mmap(struct mm_struct *old= mm, struct mm_struct *mm) return ldt_dup_context(oldmm, mm); } =20 +#ifdef CONFIG_MM_LOCAL_REGION +static inline void mm_local_region_free(struct mm_struct *mm) +{ + if (mm_local_region_used(mm)) { + struct mmu_gather tlb; + unsigned long start =3D MM_LOCAL_BASE_ADDR; + unsigned long end =3D MM_LOCAL_END_ADDR; + + /* + * Although free_pgd_range() is intended for freeing user + * page-tables, it also works out for kernel mappings on x86. + * We use tlb_gather_mmu_fullmm() to avoid confusing the + * range-tracking logic in __tlb_adjust_range(). + */ + tlb_gather_mmu_fullmm(&tlb, mm); + free_pgd_range(&tlb, start, end, start, end); + tlb_finish_mmu(&tlb); + + mm_flags_clear(MMF_LOCAL_REGION_USED, mm); + } +} + +/* Do initial setup of the user-local region. Call from process context. */ +static inline int mm_local_region_init(struct mm_struct *mm) +{ + int err; + + err =3D preallocate_sub_pgd(mm, MM_LOCAL_BASE_ADDR); + if (err) + return err; + +#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION + /* + * The mm-local region is shared with userspace. This is useful for the + * LDT remap. It's assuming nothing gets mapped in here that needs to be + * protected from Meltdown-type attacks from the current process. + * + * Note this can be called multiple times, also concurrently - it's + * assuming the set_pgd() is idempotent. + */ + if (boot_cpu_has(X86_FEATURE_PTI)) { + pgd_t *pgd =3D pgd_offset(mm, LDT_BASE_ADDR); + + set_pgd(kernel_to_user_pgdp(pgd), *pgd); + } +#endif + + mm_flags_set(MMF_LOCAL_REGION_USED, mm); + + return 0; +} + +static inline bool is_mm_local_addr(unsigned long addr) +{ + return addr >=3D MM_LOCAL_BASE_ADDR && addr < MM_LOCAL_END_ADDR; +} +#else +static inline void mm_local_region_free(struct mm_struct *mm) { } + +static inline bool is_mm_local_addr(unsigned long addr) +{ + return false; +} +#endif /* CONFIG_MM_LOCAL_REGION */ + static inline void arch_exit_mmap(struct mm_struct *mm) { paravirt_arch_exit_mmap(mm); - ldt_arch_exit_mmap(mm); + mm_local_region_free(mm); } =20 #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm= /pgtable_64_types.h index 7eb61ef6a185f..cfb51b65b5ce9 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -5,8 +5,11 @@ #include =20 #ifndef __ASSEMBLER__ +#include #include #include +#include +#include =20 /* * These are used to make use of C type-checking.. @@ -100,9 +103,13 @@ extern unsigned int ptrs_per_p4d; #define GUARD_HOLE_BASE_ADDR (GUARD_HOLE_PGD_ENTRY << PGDIR_SHIFT) #define GUARD_HOLE_END_ADDR (GUARD_HOLE_BASE_ADDR + GUARD_HOLE_SIZE) =20 -#define LDT_PGD_ENTRY -240UL -#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) -#define LDT_END_ADDR (LDT_BASE_ADDR + PGDIR_SIZE) +#define MM_LOCAL_PGD_ENTRY -240UL +#define MM_LOCAL_BASE_ADDR (MM_LOCAL_PGD_ENTRY << PGDIR_SHIFT) +#define MM_LOCAL_END_ADDR ((MM_LOCAL_PGD_ENTRY + 1) << PGDIR_SHIFT) + +#define LDT_BASE_ADDR MM_LOCAL_BASE_ADDR +#define LDT_REMAP_SIZE PMD_SIZE +#define LDT_END_ADDR (LDT_BASE_ADDR + LDT_REMAP_SIZE) =20 #define __VMALLOC_BASE_L4 0xffffc90000000000UL #define __VMALLOC_BASE_L5 0xffa0000000000000UL diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 0f19ef355f5f1..86cf9704e4d57 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -31,6 +31,8 @@ =20 #include =20 +/* LDTs are double-buffered, the buffers are called slots. */ +#define LDT_NUM_SLOTS 2 /* This is a multiple of PAGE_SIZE. */ #define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE) =20 @@ -186,31 +188,30 @@ static struct ldt_struct *alloc_ldt_struct(unsigned i= nt num_entries) =20 #ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION =20 -static void do_sanity_check(struct mm_struct *mm, - bool had_kernel_mapping, - bool had_user_mapping) +#ifdef CONFIG_X86_PAE + +static void map_ldt_struct_to_user(struct mm_struct *mm) { - if (mm->context.ldt) { - /* - * We already had an LDT. The top-level entry should already - * have been allocated and synchronized with the usermode - * tables. - */ - WARN_ON(!had_kernel_mapping); - if (boot_cpu_has(X86_FEATURE_PTI)) - WARN_ON(!had_user_mapping); - } else { - /* - * This is the first time we're mapping an LDT for this process. - * Sync the pgd to the usermode tables. - */ - WARN_ON(had_kernel_mapping); - if (boot_cpu_has(X86_FEATURE_PTI)) - WARN_ON(had_user_mapping); - } + pgd_t *k_pgd =3D pgd_offset(mm, LDT_BASE_ADDR); + pgd_t *u_pgd =3D kernel_to_user_pgdp(k_pgd); + pmd_t *k_pmd, *u_pmd; + + k_pmd =3D pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR); + u_pmd =3D pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR); + + BUILD_BUG_ON(LDT_SLOT_STRIDE * LDT_NUM_SLOTS > PMD_SIZE); + if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt) + set_pmd(u_pmd, *k_pmd); } =20 -#ifdef CONFIG_X86_PAE +#else /* !CONFIG_X86_PAE */ + +static void map_ldt_struct_to_user(struct mm_struct *mm) +{ + /* Nothing to do; the whole mm-local region is shared with userspace. */ +} + +#endif /* CONFIG_X86_PAE */ =20 static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va) { @@ -231,19 +232,6 @@ static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned lon= g va) return pmd_offset(pud, va); } =20 -static void map_ldt_struct_to_user(struct mm_struct *mm) -{ - pgd_t *k_pgd =3D pgd_offset(mm, LDT_BASE_ADDR); - pgd_t *u_pgd =3D kernel_to_user_pgdp(k_pgd); - pmd_t *k_pmd, *u_pmd; - - k_pmd =3D pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR); - u_pmd =3D pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR); - - if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt) - set_pmd(u_pmd, *k_pmd); -} - static void sanity_check_ldt_mapping(struct mm_struct *mm) { pgd_t *k_pgd =3D pgd_offset(mm, LDT_BASE_ADDR); @@ -253,33 +241,29 @@ static void sanity_check_ldt_mapping(struct mm_struct= *mm) =20 k_pmd =3D pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR); u_pmd =3D pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR); - had_kernel =3D (k_pmd->pmd !=3D 0); - had_user =3D (u_pmd->pmd !=3D 0); + had_kernel =3D k_pmd && (k_pmd->pmd !=3D 0); + had_user =3D u_pmd && (u_pmd->pmd !=3D 0); =20 - do_sanity_check(mm, had_kernel, had_user); + if (mm->context.ldt) { + /* + * We already had an LDT. The top-level entry should already + * have been allocated and synchronized with the usermode + * tables. + */ + WARN_ON(!had_kernel); + if (boot_cpu_has(X86_FEATURE_PTI)) + WARN_ON(!had_user); + } else { + /* + * This is the first time we're mapping an LDT for this process. + * Sync the pgd to the usermode tables. + */ + WARN_ON(had_kernel); + if (boot_cpu_has(X86_FEATURE_PTI)) + WARN_ON(had_user); + } } =20 -#else /* !CONFIG_X86_PAE */ - -static void map_ldt_struct_to_user(struct mm_struct *mm) -{ - pgd_t *pgd =3D pgd_offset(mm, LDT_BASE_ADDR); - - if (boot_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt) - set_pgd(kernel_to_user_pgdp(pgd), *pgd); -} - -static void sanity_check_ldt_mapping(struct mm_struct *mm) -{ - pgd_t *pgd =3D pgd_offset(mm, LDT_BASE_ADDR); - bool had_kernel =3D (pgd->pgd !=3D 0); - bool had_user =3D (kernel_to_user_pgdp(pgd)->pgd !=3D 0); - - do_sanity_check(mm, had_kernel, had_user); -} - -#endif /* CONFIG_X86_PAE */ - /* * If PTI is enabled, this maps the LDT into the kernelmode and * usermode tables for the given mm. @@ -295,6 +279,8 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct = *ldt, int slot) if (!boot_cpu_has(X86_FEATURE_PTI)) return 0; =20 + mm_local_region_init(mm); + /* * Any given ldt_struct should have map_ldt_struct() called at most * once. @@ -390,28 +376,6 @@ static void unmap_ldt_struct(struct mm_struct *mm, str= uct ldt_struct *ldt) } #endif /* CONFIG_MITIGATION_PAGE_TABLE_ISOLATION */ =20 -static void free_ldt_pgtables(struct mm_struct *mm) -{ -#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION - struct mmu_gather tlb; - unsigned long start =3D LDT_BASE_ADDR; - unsigned long end =3D LDT_END_ADDR; - - if (!boot_cpu_has(X86_FEATURE_PTI)) - return; - - /* - * Although free_pgd_range() is intended for freeing user - * page-tables, it also works out for kernel mappings on x86. - * We use tlb_gather_mmu_fullmm() to avoid confusing the - * range-tracking logic in __tlb_adjust_range(). - */ - tlb_gather_mmu_fullmm(&tlb, mm); - free_pgd_range(&tlb, start, end, start, end); - tlb_finish_mmu(&tlb); -#endif -} - /* After calling this, the LDT is immutable. */ static void finalize_ldt_struct(struct ldt_struct *ldt) { @@ -472,7 +436,6 @@ int ldt_dup_context(struct mm_struct *old_mm, struct mm= _struct *mm) =20 retval =3D map_ldt_struct(mm, new_ldt, 0); if (retval) { - free_ldt_pgtables(mm); free_ldt_struct(new_ldt); goto out_unlock; } @@ -494,11 +457,6 @@ void destroy_context_ldt(struct mm_struct *mm) mm->context.ldt =3D NULL; } =20 -void ldt_arch_exit_mmap(struct mm_struct *mm) -{ - free_ldt_pgtables(mm); -} - static int read_ldt(void __user *ptr, unsigned long bytecount) { struct mm_struct *mm =3D current->mm; @@ -645,10 +603,9 @@ static int write_ldt(void __user *ptr, unsigned long b= ytecount, int oldmode) /* * This only can fail for the first LDT setup. If an LDT is * already installed then the PTE page is already - * populated. Mop up a half populated page table. + * populated. */ - if (!WARN_ON_ONCE(old_ldt)) - free_ldt_pgtables(mm); + WARN_ON_ONCE(!old_ldt); free_ldt_struct(new_ldt); goto out_unlock; } diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 2e5ecfdce73c3..492248cfadc08 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -375,6 +375,9 @@ pgd_t *pgd_alloc(struct mm_struct *mm) =20 void pgd_free(struct mm_struct *mm, pgd_t *pgd) { + /* Should be cleaned up in mmap exit path. */ + VM_WARN_ON_ONCE(mm_local_region_used(mm)); + pgd_mop_up_pmds(mm, pgd); pgd_dtor(pgd); paravirt_pgd_free(mm, pgd); diff --git a/include/linux/mm.h b/include/linux/mm.h index 5be3d8a8f806d..118399694ee20 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -904,6 +904,19 @@ static inline void mm_flags_clear_all(struct mm_struct= *mm) bitmap_zero(ACCESS_PRIVATE(&mm->flags, __mm_flags), NUM_MM_FLAG_BITS); } =20 +#ifdef CONFIG_MM_LOCAL_REGION +static inline bool mm_local_region_used(struct mm_struct *mm) +{ + return mm_flags_test(MMF_LOCAL_REGION_USED, mm); +} +#else +static inline bool mm_local_region_used(struct mm_struct *mm) +{ + VM_WARN_ON_ONCE(mm_flags_test(MMF_LOCAL_REGION_USED, mm)); + return false; +} +#endif + extern const struct vm_operations_struct vma_dummy_vm_ops; =20 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *= mm) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3cc8ae7228860..dbad8df91f153 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1919,6 +1919,8 @@ enum { #define MMF_TOPDOWN 31 /* mm searches top down by default */ #define MMF_TOPDOWN_MASK BIT(MMF_TOPDOWN) =20 +#define MMF_LOCAL_REGION_USED 32 + #define MMF_INIT_LEGACY_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK) diff --git a/kernel/fork.c b/kernel/fork.c index 65113a304518a..ee8a9450f0f1d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1139,6 +1139,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, fail_nocontext: mm_free_id(mm); fail_noid: + WARN_ON_ONCE(mm_local_region_used(mm)); mm_free_pgd(mm); fail_nopgd: futex_hash_free(mm); diff --git a/mm/Kconfig b/mm/Kconfig index ebd8ea353687e..15f4da9ba8f4a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1471,6 +1471,13 @@ config LAZY_MMU_MODE_KUNIT_TEST =20 If unsure, say N. =20 +config ARCH_SUPPORTS_MM_LOCAL_REGION + def_bool n + +config MM_LOCAL_REGION + def_bool n + depends on ARCH_SUPPORTS_MM_LOCAL_REGION + source "mm/damon/Kconfig" =20 endmenu --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D5663D649D for ; Wed, 25 Feb 2026 16:34:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037282; cv=none; b=h6avQhFY9voBnoNgq8EldyZYAw2i8t86HfK7nsDQgNJfTd08XB14hQpeKurmBo01I2refmPMLC5Y5Mj6CQPykEG6gxtNX0AJmHQUE2GAG3kMtPKo58DFXGmL7kppQzuXjoYRdBgx4qeFMZ6mYVW8MvnDSZY+HDBMxyWYovI4HfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037282; c=relaxed/simple; bh=96u75QxW3ouXv5Xblrwn8o0N5ML2LNnfKVOMVo1ZNhE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KnIfRBEi7GIRq9ZNzMJNKTDUnvrYJoMMqvMxcUuWbKTSIHKDdXU/jFwNubAQ6PproJPO3+P2wnzQYf5+hqWFRFuyK+hmvCddTH+2KubxG+WJpT1BrNwN672xAs+ERzjHH2r6hvAg0NqzU13urACSwrJP+KCm2Hy13cJwMRLIiQA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dVPY5zqj; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dVPY5zqj" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-483a24db6ecso73924065e9.1 for ; Wed, 25 Feb 2026 08:34:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037280; x=1772642080; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WDhIQvWuRYWeMcMrzum83Q6jG6Th3ZjaaGYgK9BTHd4=; b=dVPY5zqjsj6cY1iLWwUr1VAxT4x7n+V8E+C2I5EALGp2fJkJF7fYFvzs1Rvj2g1naP 4dhE6U0astdg/HtQmnfIO0iiAQyhiekz8leVt9/9OsgrHtUs13wDRlBdcP4KbakplTvD +9lGy954d0Ai+i63H1Js+IdD0buJbuT+Nqzvqthd3qIpU5Emh2XFdbJ0bZ7UjVsVV5aH qYb9VwnI2eFn0iqb63EDVPjzW8nscrDJtvbE1bLD0OuDQscsqgtVGcPVtvAySnmjQSZv K1GrDEd26VItdJ/F0Hsbb6295Ht78qeYcikWYZ3CdmpM398Lza7uCqsGSl9o8WpPY3EQ rvbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037280; x=1772642080; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WDhIQvWuRYWeMcMrzum83Q6jG6Th3ZjaaGYgK9BTHd4=; b=A/K0c+DGP2FsE0szaw5MIAA8S+mr2A9Omx/D/qbY5s6ujNd/slGNS5OfQo3SdkXnlR om3Pi6vDSvHomy791IggiFAre8bULc5gZggQPVzaJ8w55msEx1InLbGVNPr3c4UzRRUN hB4V34up0cCetrb/BcDAvAv7as9AGhyIrR5feo9kkc+eQ2u4DctpsGp2+PTtCseyRBWm tSK9nE+KYhCOmnDNYmZioRIk0eCfEhKh7vhKHAVDolAlNBfU2jBtPEUV4Vc6KbojTzoN YGXIqSjI3Z2zAVuyLyMrgKYLdp9/1Yp9EdPylWHwDkLGSez4+j58fN+fXHXCoCO1O3gi j3Xg== X-Forwarded-Encrypted: i=1; AJvYcCW/V6bM9E5AwHuYXzGhyFjCq7r/1UePVrgLy3JxqHWWi1BO/kxKybeeCWwLG20upxdCG3gRKknXZeIBy1A=@vger.kernel.org X-Gm-Message-State: AOJu0YyIbtWjMkFg8uAMluqA4VrOlxlaoNSRsgP1gJs42ib5oZ8RBmVc s2i5qvvUzY977TH+R+pwmJARHpaKaICW6ChhqTldtrASZFauUmYPhBDkNKFGey0yOmfQYf+SUkb ZU7jAW/LsdR32cA== X-Received: from wmbje2.prod.google.com ([2002:a05:600c:1f82:b0:480:6bd5:88b4]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4e46:b0:483:a8e9:201b with SMTP id 5b1f17b1804b1-483a95a3851mr276690895e9.0.1772037279527; Wed, 25 Feb 2026 08:34:39 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:28 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-3-e8808a03cd66@google.com> Subject: [PATCH RFC 03/19] x86/tlb: Expose some flush function declarations to modules From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable In commit bfe3d8f6313d ("x86/tlb: Restrict access to tlbstate") some low-level logic (the important detail here is flush_tlb_info) was hidden from modules, along with functions associated with that data. Later, the set of functions defined here changed and there are now a bunch of flush_tlb_*() functions that do not depend on x86 internals like flush_tlb_info. This leads to some build fragility: KVM (which can be a module) cares about TLB flushing and includes {linux->asm}/mmu_context.h which includes asm/tlb.h and asm/tlbflush.h. This x86 TLB code expects these helpers to be defined (e.g. tlb_flush() calls flush_tlb_mm_range()). Modules probably shouldn't call these helpers - luckily this is already enforced by the lack of EXPORT_SYMBOL(). Therefore keep things simple and just expose the declarations anyway to prevent build failures. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/tlbflush.h | 43 +++++++++++++++++++++----------------= ---- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 5a3cdc439e38d..ee49724403ef9 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -229,7 +229,6 @@ struct flush_tlb_info { u8 trim_cpumask; }; =20 -void flush_tlb_local(void); void flush_tlb_one_user(unsigned long addr); void flush_tlb_one_kernel(unsigned long addr); void flush_tlb_multi(const struct cpumask *cpumask, @@ -303,26 +302,6 @@ static inline void mm_clear_asid_transition(struct mm_= struct *mm) { } static inline bool mm_in_asid_transition(struct mm_struct *mm) { return fa= lse; } #endif /* CONFIG_BROADCAST_TLB_FLUSH */ =20 -#define flush_tlb_mm(mm) \ - flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true) - -#define flush_tlb_range(vma, start, end) \ - flush_tlb_mm_range((vma)->vm_mm, start, end, \ - ((vma)->vm_flags & VM_HUGETLB) \ - ? huge_page_shift(hstate_vma(vma)) \ - : PAGE_SHIFT, true) - -extern void flush_tlb_all(void); -extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, - unsigned long end, unsigned int stride_shift, - bool freed_tables); -extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); - -static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned lon= g a) -{ - flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false); -} - static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) { bool should_defer =3D false; @@ -487,4 +466,26 @@ static inline void __native_tlb_flush_global(unsigned = long cr4) native_write_cr4(cr4 ^ X86_CR4_PGE); native_write_cr4(cr4); } + +#define flush_tlb_mm(mm) \ + flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true) + +#define flush_tlb_range(vma, start, end) \ + flush_tlb_mm_range((vma)->vm_mm, start, end, \ + ((vma)->vm_flags & VM_HUGETLB) \ + ? huge_page_shift(hstate_vma(vma)) \ + : PAGE_SHIFT, true) + +void flush_tlb_local(void); +extern void flush_tlb_all(void); +extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned int stride_shift, + bool freed_tables); +extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); + +static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned lon= g a) +{ + flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false); +} + #endif /* _ASM_X86_TLBFLUSH_H */ --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BE493D7D9A for ; Wed, 25 Feb 2026 16:34:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037284; cv=none; b=RtAEfAFEaRQQUIMRmvMCZGKv2+J+c8wc501hdPvJ59CdsxpWdUNmwYOTBzpx8AjnNq+p7NDofSGsuI6pCbeWxso6a8DNyzrcF/v1lFSJik321R+TStsVZAGFQSgf75v1C9mylGlWGc55MkXNeZWGFw5KAVGFzZqPnD3YydESG98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037284; c=relaxed/simple; bh=HK/1+cuAEyeiySrLILyeq7Q+x4Wj/0mcs7xfLo8K2a8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=STpDFi7S983xS4UcvMhEToL6S1yFP6gMuzD7fKD2JUW6iD1+QIOp2R52dhyz0VwtTq2xvsp5TInTyNVxn7pcCKZk/7uxboMRsdAiFbTvn08em7q9t1enbrB9rbbqYP6vwCP8jFfNjanfJDWsKcbF0UeLXpl8GRCI/U2kzQNUX1E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=09VgAtwM; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="09VgAtwM" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-48379489438so81715055e9.2 for ; Wed, 25 Feb 2026 08:34:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037281; x=1772642081; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l6dMpkYViAUrdEwSDReAldFnaFb5enKamHBWexZy16E=; b=09VgAtwMO7KYODYKW9cayECafnR5lio5Xdfv4Hxy5V8fY4WNwjNN7B7k00+WfYGX+e W4RDGvqXtECIVCobQCPlmhxwfxfRV/fmw/ilo0LZpO9m6alTkrR5LfHP35sFQZ7bfSzL biDimGeCnOnblqgO2gHM5xMD2n9Fe1H7YLwxxAKfCYNuYE3OxChfAsOztrKBcswjBwlp HfvwdATWk/BktzaSSWJAI2xomRtDcmBQA9RRWZoArWUVcvJct3t+QFOvNvP5ZvCnhTTC QQFViK53o+8+COmIkNkUcJlHESA66H1dugLSuRNGcWNyLR2ekYJp9Tk5g1nof0Al9V2Z HbDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037281; x=1772642081; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l6dMpkYViAUrdEwSDReAldFnaFb5enKamHBWexZy16E=; b=pF0Io/EdxDRKFaKI6yGo8c8C28XZ36vsBVeZ7vsrnMG5sMjfRarcjibKaRF2/JDozp YcEETvhn2o4FYK4Frpj0zxHDI85VWMLJGaxOf5DLRKpJXSjuCBaDd06m+SMHErBEIyAV L/NZImHlQVaaGJma5+6J0BzKx2o/eBObgo9otIqA1WakOqRNyuyhXzQUngUr0a+kubVF Mx2j5RjNz112kAE9lGep8NOQoW2zL22td3ZeUMBtHme3cOn5Ty5Tq/8lpm/VYICjPOci AXHsgogDzfQhAZ1TJqZLRVCvxAafpSKjf3+vXG9rb0twLLJKLtM0CLBndRmoEEUxIDs6 9RnQ== X-Forwarded-Encrypted: i=1; AJvYcCXFUbYFsgEK9P2eS7hig6OufuI8NUbnApxmyk+qGlHDQi3i/Mj2ORVp0zWIspGN2GxoUPCsWNrq/z2sD2A=@vger.kernel.org X-Gm-Message-State: AOJu0YyzPXC10/euEZFQzIyvBWmRZI3ARzzwA+vL0Ho8dG9m3hErz5Lg 3qEgcM0asxMYkF4XkKqLbpr6U4kz/fboZOBbx2yN4IZbCETALl4/a2iRHdn8o0/jHhiJN6/qUbO GsmPXZcQQTN/cFg== X-Received: from wmhx9.prod.google.com ([2002:a05:600c:4209:b0:47e:e922:b080]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600d:8444:20b0:483:afbb:a086 with SMTP id 5b1f17b1804b1-483afbba1e8mr149626355e9.29.1772037280597; Wed, 25 Feb 2026 08:34:40 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:29 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-4-e8808a03cd66@google.com> Subject: [PATCH RFC 04/19] x86/mm: introduce the mermap From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The mermap provides a fast way to create ephemeral mm-local mappings of physical pages. The purpose of this is to access pages that have been removed from the direct map. Potential use cases are: 1. For zeroing __GFP_UNMAPPED pages (added in a later patch). 2. For populating guest_memfd pages that are protected by the GUEST_MEMFD_NO_DIRECT_MAP feature [0]. 3. For efficient access of pages protected by Address Space Isolation [1]. [0] https://lore.kernel.org/all/20250924151101.2225820-1-patrick.roy@campus= .lmu.de/ [1] https://linuxasi.dev The details of this mechanism are described in the API comments. However the key idea is to use CPU-local virtual regions to avoid a need for synchronizing. On x86, this can also be used to prevent TLB shootdowns. Because the virtual region is CPU-local, allocating from the mermap disables migration. The caller is forbidden to use the returned value from any other context, and migration is re-enabled when it's freed. One might notice that mermap_get() bears a strong similarity to kmap_local_page(). The most important differences between mermap_get() and kmap_local_page() are: 1. mermap_get() allows mapping variable sizes while kmap_local_page() specifically maps a single order-0 page. 2. As a consequence of 1 (combined with the need for mermap_get() to be an extremely simple allocator), mermap_get() should be expected to fail, while kmap_local_page() is guaranteed to work up to a certain degree of nesting. 3. While the mappings provided by kmap_local_page() are _logically_ local to the calling context (it's a bug for software to access them from elsewhere), they are _physically_ installed into the shared kernel pagetables. This means their locality doesn't provide any protection from hardware attacks. In contrast, the mermap is physically local to the creating mm, taking advantage of the new mm-local kernel address region. So that the mermap is available even in contexts where failure is not tolerable there is also a _reserved() variant, which is fixed at allocating a single base page. This is useful, for example, for zeroing __GFP_UNMAPPED pages, where handling failure would be extremely inconvenient. The _reserved() variant is simply implemented by leaving one base-page space unavailable for non-_reserved allocations, and requiring an atomic context. This mechanism obviously requires manipulating pagetables. The kernel doesn't have a "library" that is 100% suitable for the mermap's needs here. This is resolved with a hack, namely exploiting apply_to[_existing_page_range(), which is _almost_ suitable for the requirements. This will need some later refactoring (perhaps creating the "library") to resolve the hacks it introduces, which are: 1. It introduces an indirect branch, which is likely to be pretty slow on some platforms. 2. It uses a magic sentinel pagetable value, instead of pte_none(), for unmapped regions, to trick apply_to_existing_page_range() into operating on them (while still ensuring no pagetable allocations take place). Signed-off-by: Brendan Jackman --- arch/x86/Kconfig | 1 + arch/x86/include/asm/mermap.h | 23 +++ arch/x86/include/asm/pgtable_64_types.h | 8 +- include/linux/mermap.h | 63 +++++++ include/linux/mermap_types.h | 43 +++++ include/linux/mm_types.h | 4 + kernel/fork.c | 5 + mm/Kconfig | 11 ++ mm/Makefile | 1 + mm/mermap.c | 319 ++++++++++++++++++++++++++++= ++++ mm/pgalloc-track.h | 6 + 11 files changed, 483 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5bf68dcea3fee..c8b5b787ab5fb 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -37,6 +37,7 @@ config X86_64 select ZONE_DMA32 select EXECMEM if DYNAMIC_FTRACE select ACPI_MRRM if ACPI + select ARCH_SUPPORTS_MERMAP =20 config FORCE_DYNAMIC_FTRACE def_bool y diff --git a/arch/x86/include/asm/mermap.h b/arch/x86/include/asm/mermap.h new file mode 100644 index 0000000000000..9d7614716b718 --- /dev/null +++ b/arch/x86/include/asm/mermap.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MERMAP_H +#define _ASM_X86_MERMAP_H + +#include + +static inline void arch_mermap_flush_tlb(void) +{ + /* + * No shootdown allowed, IRQs may be off. Luckily other CPUs are not + * allowed to access our region so the stale mappings are harmless, as + * long as they still point to data belonging to this process. + */ + __flush_tlb_all(); +} + +static inline bool arch_mermap_pgprot_allowed(pgprot_t prot) +{ + /* Mermap is mm-local so global mappings would be a bug. */ + return !(pgprot_val(prot) & _PAGE_GLOBAL); +} + +#endif /* _ASM_X86_MERMAP_H */ diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm= /pgtable_64_types.h index cfb51b65b5ce9..b1d0bd6813cc7 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -105,12 +105,18 @@ extern unsigned int ptrs_per_p4d; =20 #define MM_LOCAL_PGD_ENTRY -240UL #define MM_LOCAL_BASE_ADDR (MM_LOCAL_PGD_ENTRY << PGDIR_SHIFT) -#define MM_LOCAL_END_ADDR ((MM_LOCAL_PGD_ENTRY + 1) << PGDIR_SHIFT) +#define MM_LOCAL_START_ADDR ((MM_LOCAL_PGD_ENTRY) << PGDIR_SHIFT) +#define MM_LOCAL_END_ADDR (MM_LOCAL_START_ADDR + (1UL << PGDIR_SHIFT)) =20 #define LDT_BASE_ADDR MM_LOCAL_BASE_ADDR #define LDT_REMAP_SIZE PMD_SIZE #define LDT_END_ADDR (LDT_BASE_ADDR + LDT_REMAP_SIZE) =20 +#define MERMAP_BASE_ADDR LDT_END_ADDR +#define MERMAP_CPU_REGION_SIZE PMD_SIZE +#define MERMAP_SIZE (MERMAP_CPU_REGION_SIZE * NR_CPUS) +#define MERMAP_END_ADDR (MERMAP_BASE_ADDR + (NR_CPUS * MERMAP_CPU_REGION_= SIZE)) + #define __VMALLOC_BASE_L4 0xffffc90000000000UL #define __VMALLOC_BASE_L5 0xffa0000000000000UL =20 diff --git a/include/linux/mermap.h b/include/linux/mermap.h new file mode 100644 index 0000000000000..5457dcb8c9789 --- /dev/null +++ b/include/linux/mermap.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MERMAP_H +#define _LINUX_MERMAP_H + +#include +#include + +#ifdef CONFIG_MERMAP + +#include + +int mermap_mm_prepare(struct mm_struct *mm); +void mermap_mm_init(struct mm_struct *mm); +void mermap_mm_teardown(struct mm_struct *mm); + +/* Can the mermap be called from this context? */ +static inline bool mermap_ready(void) +{ + return in_task() && current->mm && current->mm->mermap.cpu; +} + +struct mermap_alloc *mermap_get(struct page *page, unsigned long size, pgp= rot_t prot); +void *mermap_get_reserved(struct page *page, pgprot_t prot); +void mermap_put(struct mermap_alloc *alloc); + +static inline void *mermap_addr(struct mermap_alloc *alloc) +{ + return (void *)alloc->base; +} + +/* + * arch_mermap_flush_tlb() is called before a part of the local CPU's merm= ap + * region is remapped to a new address. No other CPU is allowed to _access= _ that + * region, but the region was mapped there. + * + * This may be called with IRQs off. + * + * On arm64, this will need to be a broadcast TLB flush. Although the othe= r CPUs + * are forbidden to access the region, they can leak the data that was map= ped + * there via CPU exploits. Violating break-before-make would mean the data + * available to these CPU exploits is unpredictable. + */ +extern void arch_mermap_flush_tlb(void); +extern bool arch_mermap_pgprot_allowed(pgprot_t prot); + +#if IS_ENABLED(CONFIG_KUNIT) +struct mermap_alloc *__mermap_get(struct mm_struct *mm, struct page *page, + unsigned long size, pgprot_t prot, bool use_reserve); +void __mermap_put(struct mm_struct *mm, struct mermap_alloc *alloc); +unsigned long mermap_cpu_base(int cpu); +unsigned long mermap_cpu_end(int cpu); +#endif + +#else /* CONFIG_MERMAP */ + +static inline int mermap_mm_prepare(struct mm_struct *mm) { return 0; } +static inline void mermap_mm_init(struct mm_struct *mm) { } +static inline void mermap_mm_teardown(struct mm_struct *mm) { } +static inline bool mermap_ready(void) { return false; } + +#endif /* CONFIG_MERMAP */ + +#endif /* _LINUX_MERMAP_H */ diff --git a/include/linux/mermap_types.h b/include/linux/mermap_types.h new file mode 100644 index 0000000000000..08e43100b790e --- /dev/null +++ b/include/linux/mermap_types.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MERMAP_TYPES_H +#define _LINUX_MERMAP_TYPES_H + +#include +#include +#include + +#ifdef CONFIG_MERMAP + +/* Tracks an individual allocation in the mermap. */ +struct mermap_alloc { + /* Currently allocated. */ + bool in_use; + /* Requires flush before reallocating. */ + bool need_flush; + unsigned long base; + /* Non-inclusive. */ + unsigned long end; +}; + +struct mermap_cpu { + /* Next address immediately available for alloc (no TLB flush needed). */ + unsigned long next_addr; + struct mermap_alloc allocs[4]; +#ifdef CONFIG_MERMAP_KUNIT_TEST + u64 tlb_flushes; +#endif +}; + +struct mermap { + struct mutex init_lock; + struct mermap_cpu __percpu *cpu; +}; + +#else /* CONFIG_MERMAP */ + +struct mermap {}; + +#endif /* CONFIG_MERMAP */ + +#endif /* _LINUX_MERMAP_TYPES_H */ + diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index dbad8df91f153..2760b0972c554 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -34,6 +35,7 @@ struct address_space; struct futex_private_hash; struct mem_cgroup; +struct mermap; =20 typedef struct { unsigned long f; @@ -1159,6 +1161,8 @@ struct mm_struct { atomic_t membarrier_state; #endif =20 + struct mermap mermap; + /** * @mm_users: The number of users including userspace. * diff --git a/kernel/fork.c b/kernel/fork.c index ee8a9450f0f1d..5d74b55a42c4c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -13,6 +13,7 @@ */ =20 #include +#include #include #include #include @@ -1130,6 +1131,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, =20 mm->user_ns =3D get_user_ns(user_ns); lru_gen_init_mm(mm); + + mermap_mm_init(mm); + return mm; =20 fail_pcpu: @@ -1173,6 +1177,7 @@ static inline void __mmput(struct mm_struct *mm) ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ exit_mmap(mm); + mermap_mm_teardown(mm); mm_put_huge_zero_folio(mm); set_mm_exe_file(mm, NULL); if (!list_empty(&mm->mmlist)) { diff --git a/mm/Kconfig b/mm/Kconfig index 15f4da9ba8f4a..06c1c125e9636 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1480,4 +1480,15 @@ config MM_LOCAL_REGION =20 source "mm/damon/Kconfig" =20 +config ARCH_SUPPORTS_MERMAP + bool + +config MERMAP + bool "Support for epheMERal mappings within the kernel" + default COMPILE_TEST + depends on ARCH_SUPPORTS_MERMAP + select MM_LOCAL_REGION + help + Support for epheMERal mappings within the kernel. + endmenu diff --git a/mm/Makefile b/mm/Makefile index 8ad2ab08244eb..b1ac133fe603e 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -150,3 +150,4 @@ obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o obj-$(CONFIG_LAZY_MMU_MODE_KUNIT_TEST) +=3D tests/lazy_mmu_mode_kunit.o +obj-$(CONFIG_MERMAP) +=3D mermap.o diff --git a/mm/mermap.c b/mm/mermap.c new file mode 100644 index 0000000000000..d65ecfc06b58e --- /dev/null +++ b/mm/mermap.c @@ -0,0 +1,319 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* + * As a hack to allow using apply_to_existing_page_range() for these mappi= ngs, + * which skips pte_none() entries, unmap using a special non-"none" sentin= el + * value. + */ +static inline int set_unmapped_pte(pte_t *ptep, unsigned long addr, void *= data) +{ + pte_t pte =3D pfn_pte(0, pgprot_nx(PAGE_NONE)); + + VM_BUG_ON(pte_none(pte)); + set_pte(ptep, pte); + return 0; +} + +static void __mermap_put(struct mm_struct *mm, struct mermap_alloc *alloc) +{ + unsigned long size =3D PAGE_ALIGN(alloc->end - alloc->base); + + if (WARN_ON_ONCE(!alloc->in_use)) + return; + + apply_to_page_range(mm, alloc->base, size, set_unmapped_pte, NULL); + + WRITE_ONCE(alloc->in_use, false); + + migrate_enable(); +} + +/* Return a region allocated by mermap_get(). */ +void mermap_put(struct mermap_alloc *alloc) +{ + __mermap_put(current->mm, alloc); +} +EXPORT_SYMBOL(mermap_put); + +static inline unsigned long mermap_cpu_base(int cpu) +{ + return MERMAP_BASE_ADDR + (cpu * MERMAP_CPU_REGION_SIZE); + +} + +/* Non-inclusive */ +static inline unsigned long mermap_cpu_end(int cpu) +{ + return MERMAP_BASE_ADDR + ((cpu + 1) * MERMAP_CPU_REGION_SIZE); + +} + +static inline void mermap_flush_tlb(int cpu, struct mermap_cpu *mc) +{ +#ifdef CONFIG_MERMAP_KUNIT_TEST + mc->tlb_flushes++; +#endif + arch_mermap_flush_tlb(); +} + +/* Call with migration disabled. */ +static inline struct mermap_alloc *mermap_alloc(struct mm_struct *mm, + unsigned long size, bool use_reserve) +{ + int cpu =3D raw_smp_processor_id(); + struct mermap_cpu *mc =3D this_cpu_ptr(mm->mermap.cpu); + unsigned long cpu_end =3D mermap_cpu_end(cpu); + struct mermap_alloc *alloc =3D NULL; + + /* + * This is an extremely stupid allocator, there can only ever be a small + * number of allocations so everything just works on linear search. + * + * Allocations are "in order", i.e. if the whole region is free it + * allocates from the beginning. If there are any existing allocations + * it allocates from right after the last (highest address) one. Any + * free space before that goes unused. + * + * Once an allocation has been freed, the space it occupied must be flush= ed + * from the TLB before it can be reused. + * + * Visual example of how this is suppose to behave (A for allocated, T for + * TLB-flush-pending): + * + * _______________ Start with everything free. + * AaaA___________ Allocate something. + * TttT___________ Free it. (Region needs a TLB flush now). + * TttTAaaaaaaaA__ Allocate something else. + * TttTAaaaaaaaAAA Allocate the remaining space. + * TttTTtttttttTAA Free the allocation before last. + * ^^^^^^^^^^^^^ This could all be reused now but for simplicity it + * isn't. Another allocation at this point will fail. + * TttTTtttttttTTT Free the last allocation. + * _______________ Next time we allocate, first flush the TLB + * AA_____________ Now we're back at the beginning. + */ + + if (use_reserve) { + if (WARN_ON_ONCE(size !=3D PAGE_SIZE)) + return NULL; + lockdep_assert_preemption_disabled(); + } else { + cpu_end -=3D PAGE_SIZE; + } + + if (WARN_ON_ONCE(!in_task())) + return NULL; + guard(preempt)(); + + /* Out of already-available space? */ + if (mc->next_addr + size > cpu_end) { + unsigned long new_next =3D mermap_cpu_base(cpu); + + /* Would we have space after a TLB flush? */ + for (int i =3D 0; i < ARRAY_SIZE(mc->allocs); i++) { + struct mermap_alloc *alloc =3D &mc->allocs[i]; + + /* + * The space between the uppermost allocated alloc->end + * (or the base of the CPU's region if there are no + * current allocations) and mc->next_addr has been + * unmapped in the pagetables, but not flushed from the + * TLB. Set new_next to point to the beginning of that + * space. + */ + if (READ_ONCE(alloc->in_use)) + new_next =3D max(new_next, alloc->end); + } + if (size > cpu_end - new_next) + return NULL; + + mermap_flush_tlb(cpu, mc); + mc->next_addr =3D new_next; + } + + /* Find an alloc-tracking structure to use */ + for (int i =3D 0; i < ARRAY_SIZE(mc->allocs); i++) { + if (!READ_ONCE(mc->allocs[i].in_use)) { + alloc =3D &mc->allocs[i]; + break; + } + } + if (!alloc) + return NULL; + alloc->in_use =3D true; + alloc->base =3D mc->next_addr; + alloc->end =3D alloc->base + size; + mc->next_addr +=3D size; + + return alloc; +} + +struct set_pte_ctx { + pgprot_t prot; + unsigned long next_pfn; +}; + +static inline int do_set_pte(pte_t *pte, unsigned long addr, void *data) +{ + struct set_pte_ctx *ctx =3D data; + + set_pte(pte, pfn_pte(ctx->next_pfn, ctx->prot)); + ctx->next_pfn++; + + return 0; +} + +static struct mermap_alloc * +__mermap_get(struct mm_struct *mm, struct page *page, + unsigned long size, pgprot_t prot, bool use_reserve) +{ + struct mermap_alloc *alloc =3D NULL; + struct set_pte_ctx ctx; + int err; + + if (size > MERMAP_CPU_REGION_SIZE || WARN_ON_ONCE(!mm || !mm->mermap.cpu)) + return NULL; + if (WARN_ON_ONCE(!arch_mermap_pgprot_allowed(prot))) + return NULL; + + size =3D PAGE_ALIGN(size); + + migrate_disable(); + + alloc =3D mermap_alloc(mm, size, use_reserve); + if (!alloc) { + migrate_enable(); + return NULL; + } + + /* This probably wants to be optimised. */ + ctx.prot =3D prot; + ctx.next_pfn =3D page_to_pfn(page); + err =3D apply_to_existing_page_range(mm, alloc->base, size, do_set_pte, &= ctx); + if (err) { + WRITE_ONCE(alloc->in_use, false); + return NULL; + } + + return alloc; +} + +/* + * Allocate a region of virtual memory, and map the page into it. This tri= es + * pretty hard to be fast but doesn't try very hard at all to actually suc= ceed. + * + * The returned region is physically local to the current mm. It is _logic= ally_ + * local to the current CPU but this is not enforced by hardware so it can= be + * exploited to mitigate CPU vulns. This means the caller must not map mem= ory + * here that doesn't belong to the current process. The caller must also p= erform + * a full TLB flush of the region before freeing the pages that have been = mapped + * here. + * + * This may only be called from process context, and the caller must arran= ge to + * first call mermap_mm_prepare(). (It would be possible to support this i= n IRQ, + * but it seems unlikely there's a valid usecase given the TLB flushing + * requirements). If it succeeds, it disables migration until you call + * mermap_put(). + * + * This is guaranteed not to allocate. + * + * Use mermap_addr() to get the actual address of the mapped region. + */ +struct mermap_alloc *mermap_get(struct page *page, unsigned long size, pgp= rot_t prot) +{ + return __mermap_get(current->mm, page, size, prot, false); +} +EXPORT_SYMBOL(mermap_get); + +/* + * Allocate a single PAGE_SIZE page via mermap_get(), requiring preemption= to be + * off until it is freed. This always succeeds. + */ +void *mermap_get_reserved(struct page *page, pgprot_t prot) +{ + lockdep_assert_preemption_disabled(); + return __mermap_get(current->mm, page, PAGE_SIZE, prot, true); +} +EXPORT_SYMBOL(mermap_get_reserved); + +/* + * Internal - do unconditional (cheap) setup that's done for every mm. This + * doesn't actually prepare the mermap for use until someone calls + * mermap_mm_prepare(). + */ +void mermap_mm_init(struct mm_struct *mm) +{ + mutex_init(&mm->mermap.init_lock); +} + +/* + * Set up the mermap for this mm. The caller doesn't need to call + * mermap_mm_teardown(), that's take care of by the normal mm teardown + * mechanism. This is idempotent and thread-safe. + */ +int mermap_mm_prepare(struct mm_struct *mm) +{ + int err =3D 0; + int cpu; + + guard(mutex)(&mm->mermap.init_lock); + + /* Already done? */ + if (likely(mm->mermap.cpu)) + return 0; + + mm->mermap.cpu =3D alloc_percpu_gfp(struct mermap_cpu, + GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!mm->mermap.cpu) + return -ENOMEM; + + /* So we can use this from the page allocator, preallocate pagetables. */ + mm_flags_set(MMF_LOCAL_REGION_USED, mm); + for_each_possible_cpu(cpu) { + unsigned long base =3D mermap_cpu_base(cpu); + + err =3D apply_to_page_range(mm, base, MERMAP_CPU_REGION_SIZE, + set_unmapped_pte, NULL); + if (err) { + /* + * Clear .cpu now to inform mermap_ready(). Any partial + * page tables get cleared up by mm teardown. + */ + free_percpu(mm->mermap.cpu); + mm->mermap.cpu =3D NULL; + break; + } + per_cpu_ptr(mm->mermap.cpu, cpu)->next_addr =3D base; + } + + return err; +} +EXPORT_SYMBOL_GPL(mermap_mm_prepare); + +/* Clean up mermap stuff on mm teardown. */ +void mermap_mm_teardown(struct mm_struct *mm) +{ + int cpu; + + if (!mm->mermap.cpu) + return; + + for_each_possible_cpu(cpu) { + struct mermap_cpu *mc =3D this_cpu_ptr(mm->mermap.cpu); + + for (int i =3D 0; i < ARRAY_SIZE(mc->allocs); i++) + WARN_ON_ONCE(mc->allocs[i].in_use); + } + + free_percpu(mm->mermap.cpu); +} diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h index e9e879de8649b..51fc4668d7177 100644 --- a/mm/pgalloc-track.h +++ b/mm/pgalloc-track.h @@ -2,6 +2,12 @@ #ifndef _LINUX_PGALLOC_TRACK_H #define _LINUX_PGALLOC_TRACK_H =20 +#include +#include +#include + +#include "internal.h" + #if defined(CONFIG_MMU) static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd, unsigned long address, --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 884693D905F for ; Wed, 25 Feb 2026 16:34:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037285; cv=none; b=QKWsbzMaasawa1Exf84PjXJMagk2I+Tt7VFu/d7jTIfh4EPZg4NZyKG+aUCOcJxNJlPMYw7Z8+SY41+6l/Q7ZySAa+UGs4b0yLXsSoWrYNW51TITanjXlDfnqa2vB+Xli3kGjbKEXyFEMiQ8vZUQGvkUyPZ1pICaDxdbm3eVBiY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037285; c=relaxed/simple; bh=Zt4v/rLyWkiyBSQ4wOAbCid14VXE6ZGnNJ2osz4lYGM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YC3E9JJbUXUqdu3LGI76N2xJ/XdmFG3cAtQMXj6IAYpu6vb97XU1715+V9kK7L2WiFYUBshbVB1QK0BZ9a3K/JuwSzAxGdc5Rt5LbNMK5VpajlubYApy6Pmvjz/L5/+27/yfLGhJVutBwLtWTE6ZO6Vq5zcOZDvHaGxBfETpbKM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PJ7+qcby; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PJ7+qcby" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-483a24db6ecso73924505e9.1 for ; Wed, 25 Feb 2026 08:34:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037282; x=1772642082; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CxmjLUm6WeIdQIBVxvaEI4PIO0FRY57U8dr5HAgJ3pU=; b=PJ7+qcbycpQOouS1plsbgoQr1TuFFNIi5XfXOLjpp0bD055Csl5jtap9TMXi0oph1h 5JBMYn62UWh7VGsArlnx3JuZ7soTdFgrgCvDsfvPQqjAluG8qiOzy8ye7kSP594qVjC1 mGPtdM35TbRDM4hx+pdvuwGiz/zrOMiUgWKBjgjQ+YDIMhPTBTmR9KsBEZi6AGkTVlam Dl3O4mVgZmRAG/csxokGCVTWMf5BrmCPXaqFXT24y1avA6+SwDdGkQLtMRMhFlZJe2QC 7Ctu/hChVxACDuUqOnQxWnXBm1tQvP8YZtt74Doh0T8PQ+evtDo4J1jwI+WB6zXOcnPK 3gRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037282; x=1772642082; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CxmjLUm6WeIdQIBVxvaEI4PIO0FRY57U8dr5HAgJ3pU=; b=N720KBsxORyQY+lTatrjneqSXgBn9axpJf0mc6hiCHyMHdcmYMr5zxxRlMcmZ3y1EQ iikizmc2Eq1oIV6TIdW5GVi8a/C9GDEOYzGWY3eG8lPLAmD/wEgYjRgqME500xS+b8JI kZ0lg70EHvfQRJRpl7/+UINRFRUYaKoWWnnZFIBQwZYqI6Zav6n4sfSiMxy0ntxCPhjD VzFZWTNgVkX9HKwmidffiIJc7LGXmdDK0CHRqMwg7cTDXfC5hRz7rTFhEMqqQFckce4B LafBDaGQ/L9DjJE4WZECOq9pfahvvy8uf7Nh8MWLeO9tLw89YBDKqNnEvsQRDdRrAjU3 qwMA== X-Forwarded-Encrypted: i=1; AJvYcCUa8Cu/JneEGHfLhp1GQJ4SCpnBZN1rxbBBfxRCgqY5m0JxXkTjhTVV8CZS6lnV9o8HibSQVBNMqrbvo80=@vger.kernel.org X-Gm-Message-State: AOJu0YyiN41B9JxlxbpsVzh6xQbNlKNpXpaLJijjOgHV0OmeQZ4PM9sj D0cZ/LiAAO9pjf6Sir6BAMXH35188SxLjdBLSLcR/1Q4zmy/X5GtY4EaSbYspLLWz76n+ERP4E8 9YfXxpWb0dZ6cbg== X-Received: from wmjf8.prod.google.com ([2002:a7b:cd08:0:b0:477:9b3e:67e3]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:a016:b0:477:fcb:2256 with SMTP id 5b1f17b1804b1-483a962e470mr289370365e9.17.1772037281848; Wed, 25 Feb 2026 08:34:41 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:30 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-5-e8808a03cd66@google.com> Subject: [PATCH RFC 05/19] mm: KUnit tests for the mermap From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Some simple smoke-tests for the mermap. Mainly aiming to test: 1. That there aren't any silly off-by-ones. 2. That the pagetables are not completely broken. 3. That the TLB appears to get flushed basically when expected. This last point requires a bit of ifdeffery to detect when the flushing has been performed. Signed-off-by: Brendan Jackman --- include/linux/mermap_types.h | 2 +- mm/Kconfig | 11 +++ mm/Makefile | 1 + mm/mermap.c | 14 ++- mm/tests/mermap_kunit.c | 231 +++++++++++++++++++++++++++++++++++++++= ++++ 5 files changed, 253 insertions(+), 6 deletions(-) diff --git a/include/linux/mermap_types.h b/include/linux/mermap_types.h index 08e43100b790e..6b295251b7b01 100644 --- a/include/linux/mermap_types.h +++ b/include/linux/mermap_types.h @@ -23,7 +23,7 @@ struct mermap_cpu { /* Next address immediately available for alloc (no TLB flush needed). */ unsigned long next_addr; struct mermap_alloc allocs[4]; -#ifdef CONFIG_MERMAP_KUNIT_TEST +#if IS_ENABLED(CONFIG_MERMAP_KUNIT_TEST) u64 tlb_flushes; #endif }; diff --git a/mm/Kconfig b/mm/Kconfig index 06c1c125e9636..bd49eb9ef2165 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1491,4 +1491,15 @@ config MERMAP help Support for epheMERal mappings within the kernel. =20 +config MERMAP_KUNIT_TEST + tristate "KUnit tests for the mermap" if !KUNIT_ALL_TESTS + depends on ARCH_SUPPORTS_MERMAP + depends on KUNIT + select MERMAP + default KUNIT_ALL_TESTS + help + KUnit test for the mermap. + + If unsure, say N. + endmenu diff --git a/mm/Makefile b/mm/Makefile index b1ac133fe603e..42c8ca32359ae 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -151,3 +151,4 @@ obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o obj-$(CONFIG_LAZY_MMU_MODE_KUNIT_TEST) +=3D tests/lazy_mmu_mode_kunit.o obj-$(CONFIG_MERMAP) +=3D mermap.o +obj-$(CONFIG_MERMAP_KUNIT_TEST) +=3D tests/mermap_kunit.o diff --git a/mm/mermap.c b/mm/mermap.c index d65ecfc06b58e..d840d27cae14c 100644 --- a/mm/mermap.c +++ b/mm/mermap.c @@ -24,7 +24,7 @@ static inline int set_unmapped_pte(pte_t *ptep, unsigned = long addr, void *data) return 0; } =20 -static void __mermap_put(struct mm_struct *mm, struct mermap_alloc *alloc) +VISIBLE_IF_KUNIT void __mermap_put(struct mm_struct *mm, struct mermap_all= oc *alloc) { unsigned long size =3D PAGE_ALIGN(alloc->end - alloc->base); =20 @@ -37,6 +37,7 @@ static void __mermap_put(struct mm_struct *mm, struct mer= map_alloc *alloc) =20 migrate_enable(); } +EXPORT_SYMBOL_IF_KUNIT(__mermap_put); =20 /* Return a region allocated by mermap_get(). */ void mermap_put(struct mermap_alloc *alloc) @@ -45,22 +46,24 @@ void mermap_put(struct mermap_alloc *alloc) } EXPORT_SYMBOL(mermap_put); =20 -static inline unsigned long mermap_cpu_base(int cpu) +VISIBLE_IF_KUNIT inline unsigned long mermap_cpu_base(int cpu) { return MERMAP_BASE_ADDR + (cpu * MERMAP_CPU_REGION_SIZE); =20 } +EXPORT_SYMBOL_IF_KUNIT(mermap_cpu_base); =20 /* Non-inclusive */ -static inline unsigned long mermap_cpu_end(int cpu) +VISIBLE_IF_KUNIT inline unsigned long mermap_cpu_end(int cpu) { return MERMAP_BASE_ADDR + ((cpu + 1) * MERMAP_CPU_REGION_SIZE); =20 } +EXPORT_SYMBOL_IF_KUNIT(mermap_cpu_end); =20 static inline void mermap_flush_tlb(int cpu, struct mermap_cpu *mc) { -#ifdef CONFIG_MERMAP_KUNIT_TEST +#if IS_ENABLED(CONFIG_MERMAP_KUNIT_TEST) mc->tlb_flushes++; #endif arch_mermap_flush_tlb(); @@ -173,7 +176,7 @@ static inline int do_set_pte(pte_t *pte, unsigned long = addr, void *data) return 0; } =20 -static struct mermap_alloc * +VISIBLE_IF_KUNIT struct mermap_alloc * __mermap_get(struct mm_struct *mm, struct page *page, unsigned long size, pgprot_t prot, bool use_reserve) { @@ -207,6 +210,7 @@ __mermap_get(struct mm_struct *mm, struct page *page, =20 return alloc; } +EXPORT_SYMBOL_IF_KUNIT(__mermap_get); =20 /* * Allocate a region of virtual memory, and map the page into it. This tri= es diff --git a/mm/tests/mermap_kunit.c b/mm/tests/mermap_kunit.c new file mode 100644 index 0000000000000..ec035b50b8250 --- /dev/null +++ b/mm/tests/mermap_kunit.c @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include + +#include + +#define MERMAP_NR_ALLOCS ARRAY_SIZE(((struct mm_struct *)NULL)->mermap.cpu= ->allocs) + +KUNIT_DEFINE_ACTION_WRAPPER(__free_page_wrapper, __free_page, struct page = *); + +static inline struct page *alloc_page_wrapper(struct kunit *test, gfp_t gf= p) +{ + struct page *page =3D alloc_page(gfp); + + KUNIT_ASSERT_NOT_NULL(test, page); + KUNIT_ASSERT_EQ(test, kunit_add_action_or_reset(test, __free_page_wrapper= , page), 0); + return page; +} + +KUNIT_DEFINE_ACTION_WRAPPER(mmput_wrapper, mmput, struct mm_struct *); + +static inline struct mm_struct *mm_alloc_wrapper(struct kunit *test) +{ + struct mm_struct *mm =3D mm_alloc(); + + KUNIT_ASSERT_NOT_NULL(test, mm); + KUNIT_ASSERT_EQ(test, kunit_add_action_or_reset(test, mmput_wrapper, mm),= 0); + return mm; +} + +static inline struct mm_struct *get_mm(struct kunit *test) +{ + struct mm_struct *mm =3D mm_alloc_wrapper(test); + + KUNIT_ASSERT_EQ(test, mermap_mm_prepare(mm), 0); + return mm; +} + +struct __mermap_put_args { + struct mm_struct *mm; + struct mermap_alloc *alloc; + unsigned long size; +}; + +static inline void __mermap_put_wrapper(void *ctx) +{ + struct __mermap_put_args *args =3D (struct __mermap_put_args *)ctx; + + __mermap_put(args->mm, args->alloc); +} + +/* Call __mermap_get() with use_reserve=3Dfalse, deal with cleanup. */ +static inline struct __mermap_put_args * +__mermap_get_wrapper(struct kunit *test, struct mm_struct *mm, + struct page *page, unsigned long size, pgprot_t prot) +{ + struct __mermap_put_args *args =3D + kunit_kmalloc(test, sizeof(struct __mermap_put_args), GFP_KERNEL); + + KUNIT_ASSERT_NOT_NULL(test, args); + args->mm =3D mm; + args->alloc =3D __mermap_get(mm, page, size, prot, false); + args->size =3D size; + + if (args->alloc) { + int err =3D kunit_add_action_or_reset(test, __mermap_put_wrapper, args); + + KUNIT_ASSERT_EQ(test, err, 0); + } + + return args; +} + +/* Do the cleanup from __mermap_get_wrapper, now. */ +static inline void __mermap_put_early(struct kunit *test, struct __mermap_= put_args *args) +{ + kunit_release_action(test, __mermap_put_wrapper, args); +} + +static void test_basic_alloc(struct kunit *test) +{ + struct page *page =3D alloc_page_wrapper(test, GFP_KERNEL); + struct mm_struct *mm =3D get_mm(test); + struct __mermap_put_args *args; + + args =3D __mermap_get_wrapper(test, mm, page, PAGE_SIZE, PAGE_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, args->alloc); +} + +/* Dumb check for off-by-ones. */ +static void test_size(struct kunit *test) +{ + struct page *page =3D alloc_page_wrapper(test, GFP_KERNEL); + struct __mermap_put_args *full, *large, *small, *fail; + struct mm_struct *mm =3D get_mm(test); + unsigned long region_size, large_size; + struct mermap_alloc *alloc; + int cpu; + + migrate_disable(); + cpu =3D raw_smp_processor_id(); + region_size =3D mermap_cpu_end(cpu) - mermap_cpu_base(cpu) - PAGE_SIZE; + large_size =3D region_size - PAGE_SIZE; + + /* Allocate whole region at once. */ + full =3D __mermap_get_wrapper(test, mm, page, region_size, PAGE_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, full->alloc); + __mermap_put_early(test, full); + + /* Allocate larger than region size. */ + fail =3D __mermap_get_wrapper(test, mm, page, region_size + PAGE_SIZE, PA= GE_KERNEL); + KUNIT_ASSERT_NULL(test, fail->alloc); + + /* Tiptoe up to the edge then past it. */ + large =3D __mermap_get_wrapper(test, mm, page, large_size, PAGE_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, large->alloc); + small =3D __mermap_get_wrapper(test, mm, page, PAGE_SIZE, PAGE_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, small->alloc); + fail =3D __mermap_get_wrapper(test, mm, page, PAGE_SIZE, PAGE_KERNEL); + KUNIT_ASSERT_NULL(test, fail->alloc); + + /* Can still allocate the reserved page. */ + local_irq_disable(); + alloc =3D __mermap_get(mm, page, PAGE_SIZE, PAGE_KERNEL, true); + local_irq_enable(); + KUNIT_ASSERT_NOT_NULL(test, alloc); + __mermap_put(mm, alloc); +} + +static void test_multiple_allocs(struct kunit *test) +{ + struct mm_struct *mm =3D get_mm(test); + struct __mermap_put_args *argss[MERMAP_NR_ALLOCS] =3D { }; + struct page *pages[MERMAP_NR_ALLOCS]; + int magic =3D 0xE4A4; + + for (int i =3D 0; i < ARRAY_SIZE(pages); i++) { + pages[i] =3D alloc_page_wrapper(test, GFP_KERNEL); + WRITE_ONCE(*(int *)page_to_virt(pages[i]), magic + i); + } + + for (int i =3D 0; i < ARRAY_SIZE(argss); i++) { + unsigned long base =3D mermap_cpu_base(raw_smp_processor_id()); + unsigned long end =3D mermap_cpu_end(raw_smp_processor_id()); + unsigned long addr; + + argss[i] =3D __mermap_get_wrapper(test, mm, pages[i], PAGE_SIZE, PAGE_KE= RNEL); + KUNIT_ASSERT_NOT_NULL_MSG(test, argss[i], "alloc %d failed", i); + + addr =3D (unsigned long) mermap_addr(argss[i]->alloc); + KUNIT_EXPECT_GE_MSG(test, addr, base, "alloc %d out of range", i); + KUNIT_EXPECT_LT_MSG(test, addr, end, "alloc %d out of range", i); + }; + + /* + * Read through the mappings to try and detect if they point to the + * pages we wrote earlier. + */ + kthread_use_mm(mm); + for (int i =3D 0; i < ARRAY_SIZE(pages); i++) { + int *ptr =3D (int *)mermap_addr(argss[i]->alloc); + + KUNIT_EXPECT_EQ(test, *ptr, magic + i); + } + kthread_unuse_mm(mm); +} + +static void test_tlb_flushed(struct kunit *test) +{ + struct page *page =3D alloc_page_wrapper(test, GFP_KERNEL); + struct mm_struct *mm =3D get_mm(test); + unsigned long addr, prev_addr =3D 0; + /* Avoid running for ever in failure case. */ + unsigned long max_iters =3D 1000000; + struct mermap_cpu *mc; + + migrate_disable(); + mc =3D this_cpu_ptr(mm->mermap.cpu); + + /* + * Allocate until we see an address less than what we had before - assume + * that means a reuse. + */ + for (int i =3D 0; i < max_iters; i++) { + struct mermap_alloc *alloc; + + /* + * Obviously flushing the TLB already is not wrong per se, but + * it's unexpected and probably means there's some bug. + * Use ASSERT to avoid spamming the log in the failure case. + */ + KUNIT_ASSERT_EQ_MSG(test, mc->tlb_flushes, 0, + "unexpected flush before alloc %d", i); + + alloc =3D __mermap_get(mm, page, PAGE_SIZE, PAGE_KERNEL, false); + KUNIT_ASSERT_NOT_NULL_MSG(test, alloc, "alloc %d failed", i); + + addr =3D (unsigned long)mermap_addr(alloc); + __mermap_put(mm, alloc); + if (addr < prev_addr) + break; + + prev_addr =3D addr; + cond_resched(); + } + KUNIT_ASSERT_TRUE_MSG(test, addr < prev_addr, "no address reuse"); + /* Again, more than one flush isn't wrong per se, but probably a bug. */ + KUNIT_ASSERT_EQ(test, mc->tlb_flushes, 1); + + migrate_enable(); +} + +static struct kunit_case mermap_test_cases[] =3D { + KUNIT_CASE(test_basic_alloc), + KUNIT_CASE(test_size), + KUNIT_CASE(test_multiple_allocs), + KUNIT_CASE(test_tlb_flushed), + {} +}; + +static struct kunit_suite mermap_test_suite =3D { + .name =3D "mermap", + .test_cases =3D mermap_test_cases, +}; +kunit_test_suite(mermap_test_suite); + +MODULE_DESCRIPTION("Mermap unit tests"); +MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING"); --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2BFB3DA7FC for ; Wed, 25 Feb 2026 16:34:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037286; cv=none; b=dE9WaMwyX/izpmjUHtgk2hSZ+h+pzCxZYHj1ubwInpxC/jOvYKDmuRhph4qLIfDnbPIWam1sX+nzjcbQdruARHatsivi0R8uaCIr8mppOY/EbVaMj0Q37uJjHmngmtIkmyTfCUtNSQ/4L/Qbns6QtDU97fKEHizhF5CRFeeTw4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037286; c=relaxed/simple; bh=uYm3jX0vY0z2aM7tPEpVTWYxMh+1rEvEzWnmhapx1uU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=l2OD5ECBNERUxptydQtRssFBBQ3gMVj5hGqvGOHcCmII2IWs5HilWvv7deCWJQCmGs9iYepNrqXtnxGeXmOYVchruVdUyvw3c8ErOJ99JVirAeS4wusvUSMTP2s1wswEOVvNsv+sHww0VPaikgdXNoE1QNinY66bRrUzk8JYgb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yceZhNj2; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yceZhNj2" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-b8704795d25so549016566b.2 for ; Wed, 25 Feb 2026 08:34:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037283; x=1772642083; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=F5PV0mRurjScdHoadFixO8I03dLLk9ZHLW+ci+eIdSk=; b=yceZhNj20UDt3Mh7cmZ6daEqQlKflEDvpCjowUY2hC1Xd7KlUDbboCZXj3RSNGAnxz Vwpez4MqVfcBkmYrzIYJXYDYl5qkZwabYEBMz8k/nOU00Hun9N/+lfbe5j9RAA8kcH8e HxHFm0j5xhH0Bb3m7DfceENwvEf+C0dg9Cm64t8xSDwyhwDQdAu3LgpmFHQ2aMPixz4/ iWimzog7x6ZfKWOM6j9BYgyCXGA/arqSisupPpd/aDxBigbKCuCaRNbQ5heVRuar0bFA STFFmfD5n6nKCyclFIC56N8Z1E6Ni6/RCbXQ3PESXLYFtldGL9Vm1soIaCZw4U27ljAm qPFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037283; x=1772642083; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F5PV0mRurjScdHoadFixO8I03dLLk9ZHLW+ci+eIdSk=; b=Ukp/oasHWU/IRKc3voVPTTWtMI3VPiBxUPdX9Nhd60cZPHyw9vHlPC8n1YGy4g0USs cFiqhZGfKETxYvcLra+AlkmTmaSNVzta6K2RV5hAJMGX8PO802n8VsemW1AdHSXr+XJy x/8og+U5fw91fTBrWilIVc4kf5esihMN7wUSYNyv9V3CCEPuxlHxJPJkv/ooMUr0YwMb xCA91dV/Y3D9Bfcy31IN8jO98LgkiIY9NeEaV62mcA90C2w9Wb2+y2RwZtLP3qTT4usZ nj68lDh89NEAUtKUAyrm0z8qVGp4woi7JtkI3i+xVuVcEjY9Okjw4K00iRWxOW2VVHRl Udsw== X-Forwarded-Encrypted: i=1; AJvYcCXuDnlgQlOjkDzn9Hoich81M33Uy9V0JiI7VbDxwOPogHlUPmxU3n/ohGruo5wygRsYL4k/cqaDynZpWZk=@vger.kernel.org X-Gm-Message-State: AOJu0YwDY538iwlZ/G+/ogaHbazHLDErQtMCsO6axAKcT8AEiQ/JMNRl joP1w8mgm94EsS0b6W6zaN1iGJn262i/BIc9jdkaj/ZZvHA+JKPydU28nfLeR2geukheVgTGto2 ZH4yNt9SvIOwgtg== X-Received: from ejbha21.prod.google.com ([2002:a17:906:a895:b0:b88:4922:9e97]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:1c23:b0:b83:1326:7d45 with SMTP id a640c23a62f3a-b9351736393mr62821066b.32.1772037283008; Wed, 25 Feb 2026 08:34:43 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:31 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-6-e8808a03cd66@google.com> Subject: [PATCH RFC 06/19] mm: introduce for_each_free_list() From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Later patches will rearrange the free areas, but there are a couple of places that iterate over them with the assumption that they have the current structure. It seems ideally, code outside of mm should not be directly aware of struct free_area in the first place, but that awareness seems relatively harmless so just make the minimal change. Now instead of letting users manually iterate over the free lists, just provide a macro to do that. Then adopt that macro in a couple of places. Signed-off-by: Brendan Jackman --- include/linux/mmzone.h | 7 +++++-- kernel/power/snapshot.c | 8 ++++---- mm/mm_init.c | 11 +++++++---- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 3e51190a55e4c..fc4d499fbbd2b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -123,9 +123,12 @@ static inline bool migratetype_is_mergeable(int mt) return mt < MIGRATE_PCPTYPES; } =20 -#define for_each_migratetype_order(order, type) \ +#define for_each_free_list(list, zone, order) \ for (order =3D 0; order < NR_PAGE_ORDERS; order++) \ - for (type =3D 0; type < MIGRATE_TYPES; type++) + for (unsigned int type =3D 0; \ + list =3D &zone->free_area[order].free_list[type], \ + type < MIGRATE_TYPES; \ + type++) \ =20 extern int page_group_by_mobility_disabled; =20 diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 0a946932d5c17..29a053d447c31 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1244,8 +1244,9 @@ unsigned int snapshot_additional_pages(struct zone *z= one) static void mark_free_pages(struct zone *zone) { unsigned long pfn, max_zone_pfn, page_count =3D WD_PAGE_COUNT; + struct list_head *free_list; unsigned long flags; - unsigned int order, t; + unsigned int order; struct page *page; =20 if (zone_is_empty(zone)) @@ -1269,9 +1270,8 @@ static void mark_free_pages(struct zone *zone) swsusp_unset_page_free(page); } =20 - for_each_migratetype_order(order, t) { - list_for_each_entry(page, - &zone->free_area[order].free_list[t], buddy_list) { + for_each_free_list(free_list, zone, order) { + list_for_each_entry(page, free_list, buddy_list) { unsigned long i; =20 pfn =3D page_to_pfn(page); diff --git a/mm/mm_init.c b/mm/mm_init.c index 61d983d23f553..a748fb6d6555d 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1432,11 +1432,14 @@ static void __meminit zone_init_internals(struct zo= ne *zone, enum zone_type idx, =20 static void __meminit zone_init_free_lists(struct zone *zone) { - unsigned int order, t; - for_each_migratetype_order(order, t) { - INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); + struct list_head *list; + unsigned int order; + + for_each_free_list(list, zone, order) + INIT_LIST_HEAD(list); + + for (order =3D 0; order < NR_PAGE_ORDERS; order++) zone->free_area[order].nr_free =3D 0; - } =20 #ifdef CONFIG_UNACCEPTED_MEMORY INIT_LIST_HEAD(&zone->unaccepted_pages); --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B11F23DA7F9 for ; Wed, 25 Feb 2026 16:34:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037288; cv=none; b=jrei2JZyWRupV6ZDDWv07hEfrTm4+5xttpCZuyreN+O3cYsRAQYzfsuiiAfSsliL3djkZZJvyLlTfKodh/5as+lralROfb1sl5sf1GWj4ndNXxpVsvtKVrZEE26T5EZzFs4Vx233K4N/da1FWJ1KZlSVLI2+6sBLgejohShvflc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037288; c=relaxed/simple; bh=sx3H3IOQ89Ikw+o22X/xCeXvyaUAPvxEmXbm7LaeCmE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PkcEwMhF5U40a4+Aq64Tn3iA+bIV6dO7tr/h3LDfW+1wVt6zR3+vCT81Po5j1HE6mDd0oyURN80JuV2fn3sA9trwjAUSXyA2YUWnhI0/ZeqK45FjYqqSIip9YMfL4L7GHsLKQL6Ezo7gBMKZqzJN96MAsKSyKJ4KS4IF8RImy4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ugivcgfm; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ugivcgfm" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-4376e25bb4dso4752824f8f.0 for ; Wed, 25 Feb 2026 08:34:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037285; x=1772642085; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Zn/sQ5//FH47xWrBpYa+2czdly0Dx/d99/N8Hn2JIac=; b=ugivcgfmcIr7FnvCRUGPj3X6PXCyUk5Kpnel2/Y+9zKShQUb80NvbAo4iwmTvMZgp/ nmL2t+Z16jAqF1yxaoFnFf9FYVtSvGK5PrsqwfB2Hw2Y7Z3P/P5uEMi8VdUkHc3XzNPs jf1A96nWANXhMBqT75KNgZOKBU3LH3rCM01RFNWr5sAmmZs4tNjqIK24mhhBHGLu3eMv 1iE74OvGq3lAX7597kXcRStYuB7p71ZgHyvI6FMsvXXyfUz5UdDaiv539FwPVRqwSIx9 ZneIsbvH1cMCLCEZqZMF8mJT5OZgW3Bo5xjqKHAS/elBHOGfSt+BdeuL3Z0/5UlNckm6 m13A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037285; x=1772642085; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Zn/sQ5//FH47xWrBpYa+2czdly0Dx/d99/N8Hn2JIac=; b=MBU1Afs9rgfBhD9BYbCJMrwAFpKT6ID609n4dLny1OdzS2u7Akxy8IwHe/h2ZMMeBm 9/AvKhmgkwoT/jl0fw7ozTSSC9JhydSSU094tfLGPsAWxthXMkJm4SObsc0gEijWk40Z NbWGV7qYx1eN5ekhofx8zImiMt4sz+nA2GpgB1Q9BFq4NxKykF3GVQgpT1i2mPfJuDM3 K+GCUknDs5BGAmEXhiUuTPb7Y+jyfe9fFdxv3OZefOJW5uZbYxepij6juxjqjUiA0dM0 agK7Z6RnT88vwO/cJb7OTSmozCBpVlcYYFrM3rOf/D2rQZ72Kgj3ZYRgeJ2ON0Jt2Dpu DZMA== X-Forwarded-Encrypted: i=1; AJvYcCWJvqYXGj4qM0iAn6B+vRV1vGC7YIOGAwyxfDSt//p/MXUBHHxdsrUxqfXYN1NBV/pCyOiP/PLXogAcgKQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwockQnBf2sn4/KU3JCX9efbFbAf6lV1ddiENmEv+buaIVv5bC9 YCvW1VjWJePPUuivZSsrl2h5G2ljDzD51rVMJU5gBWxeDG2iBWfhTKm8sLHmYdg5r11Mv5Qh5FW ppDNF8ENiMV9LDQ== X-Received: from wrbbn22.prod.google.com ([2002:a05:6000:616:b0:436:30bc:37dd]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:2c08:b0:439:94a7:715a with SMTP id ffacd0b85a97d-43994a771a8mr1769317f8f.17.1772037284432; Wed, 25 Feb 2026 08:34:44 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:32 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-7-e8808a03cd66@google.com> Subject: [PATCH RFC 07/19] mm/page_alloc: don't overload migratetype in find_suitable_fallback() From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This function currently returns a signed integer that encodes status in-band, as negative numbers, along with a migratetype. This function is about to be updated to a mode where this in-band signaling no longer makes sense. Therefore, switch to a more explicit/verbose style that encodes the status and migratetype separately. In the spirit of making things more explicit, also create an enum to avoid using magic integer literals with special meanings. This enables documenting the values at their definition instead of in one of the callers. Signed-off-by: Brendan Jackman --- mm/compaction.c | 3 ++- mm/internal.h | 14 +++++++++++--- mm/page_alloc.c | 40 +++++++++++++++++++++++----------------- 3 files changed, 36 insertions(+), 21 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 1e8f8eca318c6..cf65a3425500c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2323,7 +2323,8 @@ static enum compact_result __compact_finished(struct = compact_control *cc) * Job done if allocation would steal freepages from * other migratetype buddy lists. */ - if (find_suitable_fallback(area, order, migratetype, true) >=3D 0) + if (find_suitable_fallback(area, order, migratetype, true, NULL) + =3D=3D FALLBACK_FOUND) /* * Movable pages are OK in any pageblock. If we are * stealing for a non-movable allocation, make sure diff --git a/mm/internal.h b/mm/internal.h index cb0af847d7d99..1d88e56a9dee0 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1028,9 +1028,17 @@ static inline void init_cma_pageblock(struct page *p= age) } #endif =20 - -int find_suitable_fallback(struct free_area *area, unsigned int order, - int migratetype, bool claimable); +enum fallback_result { + /* Found suitable migratetype, *mt_out is valid. */ + FALLBACK_FOUND, + /* No fallback found in requested order. */ + FALLBACK_EMPTY, + /* Passed @claimable, but claiming whole block is a bad idea. */ + FALLBACK_NOCLAIM, +}; +enum fallback_result +find_suitable_fallback(struct free_area *area, unsigned int order, + int migratetype, bool claimable, unsigned int *mt_out); =20 static inline bool free_area_empty(struct free_area *area, int migratetype) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fcc32737f451e..1cd74a5901ded 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2280,25 +2280,29 @@ static bool should_try_claim_block(unsigned int ord= er, int start_mt) * we would do this whole-block claiming. This would help to reduce * fragmentation due to mixed migratetype pages in one pageblock. */ -int find_suitable_fallback(struct free_area *area, unsigned int order, - int migratetype, bool claimable) +enum fallback_result +find_suitable_fallback(struct free_area *area, unsigned int order, + int migratetype, bool claimable, unsigned int *mt_out) { int i; =20 if (claimable && !should_try_claim_block(order, migratetype)) - return -2; + return FALLBACK_NOCLAIM; =20 if (area->nr_free =3D=3D 0) - return -1; + return FALLBACK_EMPTY; =20 for (i =3D 0; i < MIGRATE_PCPTYPES - 1 ; i++) { int fallback_mt =3D fallbacks[migratetype][i]; =20 - if (!free_area_empty(area, fallback_mt)) - return fallback_mt; + if (!free_area_empty(area, fallback_mt)) { + if (mt_out) + *mt_out =3D fallback_mt; + return FALLBACK_FOUND; + } } =20 - return -1; + return FALLBACK_EMPTY; } =20 /* @@ -2408,16 +2412,16 @@ __rmqueue_claim(struct zone *zone, int order, int s= tart_migratetype, */ for (current_order =3D MAX_PAGE_ORDER; current_order >=3D min_order; --current_order) { - area =3D &(zone->free_area[current_order]); - fallback_mt =3D find_suitable_fallback(area, current_order, - start_migratetype, true); + enum fallback_result result; =20 - /* No block in that order */ - if (fallback_mt =3D=3D -1) + area =3D &(zone->free_area[current_order]); + result =3D find_suitable_fallback(area, current_order, + start_migratetype, true, &fallback_mt); + + if (result =3D=3D FALLBACK_EMPTY) continue; =20 - /* Advanced into orders too low to claim, abort */ - if (fallback_mt =3D=3D -2) + if (result =3D=3D FALLBACK_NOCLAIM) break; =20 page =3D get_page_from_free_area(area, fallback_mt); @@ -2447,10 +2451,12 @@ __rmqueue_steal(struct zone *zone, int order, int s= tart_migratetype) int fallback_mt; =20 for (current_order =3D order; current_order < NR_PAGE_ORDERS; current_ord= er++) { + enum fallback_result result; + area =3D &(zone->free_area[current_order]); - fallback_mt =3D find_suitable_fallback(area, current_order, - start_migratetype, false); - if (fallback_mt =3D=3D -1) + result =3D find_suitable_fallback(area, current_order, start_migratetype, + false, &fallback_mt); + if (result =3D=3D FALLBACK_EMPTY) continue; =20 page =3D get_page_from_free_area(area, fallback_mt); --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CB5B3E8C7E for ; Wed, 25 Feb 2026 16:34:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037292; cv=none; b=PPHCLZ2tPPflej2DtjprA30JF9yLgVryf7Hb+oZBMHqb97Xmjs5DvewF1eIKx6iMnN9Sz7OCue9DYBJ9fdo77RYW4zVISPrtxvrqD/Ogua5afQkMhynybdu1PRElqD/I4eBzyGWbuJTy5zW2zKYsim+PABH1+TQU2UM+n7fNMR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037292; c=relaxed/simple; bh=gZJcVWqKG35nxBUbo5bfRXwm10+0x+v8+2MS60Ap2GQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kGJWFNAcLQCbxoSk3GZxzCxhMttTEDknOzTNumGXQycHjCH5Ke1tYl2gTESdN/PSrzrI/NT06G0BoFmjDSpyTCmWQQmqRXFyepMifMQ6TiI3PQN4QhhUwZ2iJaxaefDOdxEzUJtNeMicell1u8Gr9K9Rwle0mvdzGL6236cOGfc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XAZeJfg8; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XAZeJfg8" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-b88649b11d2so694036966b.3 for ; Wed, 25 Feb 2026 08:34:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037287; x=1772642087; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PV8VKqCr5gwSy+3zYzyUYTFEHqp60EEFKebqYSiuyow=; b=XAZeJfg8dXBlDnVcW68HEjy7vjnTXH18QkkoswEutfmCo2DxMbelUo9igigruvfq4/ 7cKpp3t/R0WTQAizY5KnHO3pt45IbYlUQ4E4kOmsodk6XjpBQhrSwgqvY5vIdYuLGyEa EL124t61CJkrLnFt1dIQCI6DalvUI93b8s3xIGrY5wNjrkfEvGBWFCWj0w6h648T8eff QUYFPTWMZtt++OR+wpV4F6IX4mM6/tJWXZ/ofgNiGGIJ1Wluj9txVunH9FR9Ra/CoM9S 54jucqK6AmdRlfTk0ramo+1BIM8kO149kWjWj32KtbkIxsdAKZDS4E6Z7b6ujZaZid+J mLhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037287; x=1772642087; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PV8VKqCr5gwSy+3zYzyUYTFEHqp60EEFKebqYSiuyow=; b=lSKYD3bt4l/dnaKiJZJsYySEsAwWYy1Y89VizyYjvBYJMS6yY5HodJNCOdwAFJVNKc kf+ELN2/RAYtPR8Xy6ENJG8p5nVMd43f9x49g1Aiz6Lfmye8RpiYABfGNYIEXSYAaUTO UKrONtm3p9gOzEIZeXdZpGt9BhVoFIWLQ4OVZTGkRRwkUuNSybtEwcfADDcit/AoGEpd q1RXRAhxXgOUonQaefoho//NMbZ2m1Zl+1r6coieoWjwPvJGCTtcqVsfUP/7CzwakGzY LS0txiG6hPaildex2gn0KskJcWWtiyqYjI57RrZPucK8d9VEW7cEnP97OO8Qevzb6F/8 nglg== X-Forwarded-Encrypted: i=1; AJvYcCWY/KE7dchNlJRITbcMoFSw5f0CyFq2tc82jCYh1fyGI2Fg3SQ8ByUJx5GPvH0cyUxCO1EmYUuR3lmaKDM=@vger.kernel.org X-Gm-Message-State: AOJu0Ywwq5vrDQbSA84k7SpJZXPck0xxHi3d7y7J4rxmDNNFiJkNfP7U BdreRvo6sRI4ZgCX9i6j3YpP6dzvGQTuMSmsBWZhowr8V7n+7TwzDddkdV9PuykAmGpfUFyvsIg E2ti8pJ62RuAGkA== X-Received: from ejff26.prod.google.com ([2002:a17:906:31da:b0:b8f:9d74:ef7b]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:960a:b0:b8a:fd04:c789 with SMTP id a640c23a62f3a-b9351659aafmr63321066b.22.1772037286212; Wed, 25 Feb 2026 08:34:46 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:33 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-8-e8808a03cd66@google.com> Subject: [PATCH RFC 08/19] mm: introduce freetype_t From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This is preparation for teaching the page allocator to break up free pages according to properties that have nothing to do with mobility. For example it can be used to allocate pages that are non-present in the physmap, or pages that are sensitive in ASI. For these usecases, certain allocator behaviours are desirable: - A "pool" of pages with the given property is usually available, so that pages can be provided with the correct sensitivity without zeroing/TLB flushing. - Pages are physically grouped by the property, so that large allocations rarely have to alter the pagetables due to ASI. - The properties can be forced to vary only at a certain fixed address granularity, so that the pagetables can all be pre-allocated. This is desirable because the page allocator will be changing mappings: pre-allocation is a straightforward way to avoid recursive allocations (of pagetables). It seems that the existing infrastructure for grouping pages by mobility, i.e. pageblocks and migratetypes, serves this purpose pretty nicely. However, overloading migratetype itself for this purpose looks like a road to maintenance hell. In particular, as soon as such properties become orthogonal to migratetypes, it would start to require "doubling" the migratetypes. Therefore, introduce a new higher-level concept, called "freetype" (because it is used to index "free"lists) that can encode extra properties, orthogonally to mobility, via flags. Since freetypes and migratetypes would be very easy to mix up, freetypes are (at least for now) stored in a struct typedef similar to atomic_t. This provides type-safety, but comes at the expense of being pretty annoying to code with. For instance, freetype_t cannot be compared with the =3D=3D operator. Once this code matures, if the freetype/migratetype distinction gets less confusing, it might be wise to drop this struct and just use ints. Because this will eventually be needed from pageblock-flags.h, put this in its own header instead of directly in mmzone.h. To try and reduce review pain for such a churny patch, first introduce freetypes as nothing but an indirection over migratetypes. The helpers concerned with the flags are defined, but only as stubs. Convert everything over to using freetypes wherever they are needed to index freelists, but maintain references to migratetypes in code that really only cares specifically about mobility. Signed-off-by: Brendan Jackman --- include/linux/freetype.h | 38 +++++ include/linux/gfp.h | 16 +- include/linux/mmzone.h | 49 +++++- mm/compaction.c | 35 +++-- mm/internal.h | 17 ++- mm/page_alloc.c | 388 +++++++++++++++++++++++++++++--------------= ---- mm/page_isolation.c | 2 +- mm/page_owner.c | 7 +- mm/page_reporting.c | 4 +- mm/show_mem.c | 4 +- 10 files changed, 370 insertions(+), 190 deletions(-) diff --git a/include/linux/freetype.h b/include/linux/freetype.h new file mode 100644 index 0000000000000..9f857d10bb5db --- /dev/null +++ b/include/linux/freetype.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_FREETYPE_H +#define _LINUX_FREETYPE_H + +#include + +/* + * A freetype is the index used to identify free lists. This consists of a + * migratetype, and other bits which encode orthogonal properties of memor= y. + */ +typedef struct { + int migratetype; +} freetype_t; + +/* + * Return a dense linear index for freetypes that have lists in the free a= rea. + * Return -1 for other freetypes. + */ +static inline int freetype_idx(freetype_t freetype) +{ + return freetype.migratetype; +} + +/* No freetype flags actually exist yet. */ +#define NR_FREETYPE_IDXS MIGRATE_TYPES + +static inline unsigned int freetype_flags(freetype_t freetype) +{ + /* No flags supported yet. */ + return 0; +} + +static inline bool freetypes_equal(freetype_t a, freetype_t b) +{ + return a.migratetype =3D=3D b.migratetype; +} + +#endif /* _LINUX_FREETYPE_H */ diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 23240208a91fc..f189bee7a974c 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -17,8 +17,10 @@ struct mempolicy; #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) #define GFP_MOVABLE_SHIFT 3 =20 -static inline int gfp_migratetype(const gfp_t gfp_flags) +static inline freetype_t gfp_freetype(const gfp_t gfp_flags) { + int migratetype; + VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) =3D=3D GFP_MOVABLE_MASK); BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) !=3D ___GFP_MOVABLE); BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) !=3D MIGRATE_MOVABLE); @@ -26,11 +28,15 @@ static inline int gfp_migratetype(const gfp_t gfp_flags) BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >> GFP_MOVABLE_SHIFT) !=3D MIGRATE_HIGHATOMIC); =20 - if (unlikely(page_group_by_mobility_disabled)) - return MIGRATE_UNMOVABLE; + if (unlikely(page_group_by_mobility_disabled)) { + migratetype =3D MIGRATE_UNMOVABLE; + } else { + /* Group based on mobility */ + migratetype =3D (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) + >> GFP_MOVABLE_SHIFT; + } =20 - /* Group based on mobility */ - return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVAB= LE_SHIFT; + return migrate_to_freetype(migratetype, 0); } #undef GFP_MOVABLE_MASK #undef GFP_MOVABLE_SHIFT diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fc4d499fbbd2b..66a4cfc2afcb0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -5,6 +5,7 @@ #ifndef __ASSEMBLY__ #ifndef __GENERATING_BOUNDS_H =20 +#include #include #include #include @@ -125,24 +126,62 @@ static inline bool migratetype_is_mergeable(int mt) =20 #define for_each_free_list(list, zone, order) \ for (order =3D 0; order < NR_PAGE_ORDERS; order++) \ - for (unsigned int type =3D 0; \ - list =3D &zone->free_area[order].free_list[type], \ - type < MIGRATE_TYPES; \ - type++) \ + for (unsigned int idx =3D 0; \ + list =3D &zone->free_area[order].free_list[idx], \ + idx < NR_FREETYPE_IDXS; \ + idx++) + +static inline freetype_t migrate_to_freetype(enum migratetype mt, + unsigned int flags) +{ + freetype_t freetype; + + /* No flags supported yet. */ + VM_WARN_ON_ONCE(flags); + + freetype.migratetype =3D mt; + return freetype; +} + +static inline enum migratetype free_to_migratetype(freetype_t freetype) +{ + return freetype.migratetype; +} + +/* Convenience helper, return the freetype modified to have the migratetyp= e. */ +static inline freetype_t freetype_with_migrate(freetype_t freetype, + enum migratetype migratetype) +{ + return migrate_to_freetype(migratetype, freetype_flags(freetype)); +} =20 extern int page_group_by_mobility_disabled; =20 +freetype_t get_pfnblock_freetype(const struct page *page, unsigned long pf= n); + #define get_pageblock_migratetype(page) \ get_pfnblock_migratetype(page, page_to_pfn(page)) =20 +#define get_pageblock_freetype(page) \ + get_pfnblock_freetype(page, page_to_pfn(page)) + #define folio_migratetype(folio) \ get_pageblock_migratetype(&folio->page) =20 struct free_area { - struct list_head free_list[MIGRATE_TYPES]; + struct list_head free_list[NR_FREETYPE_IDXS]; unsigned long nr_free; }; =20 +static inline +struct list_head *free_area_list(struct free_area *area, freetype_t type) +{ + int idx =3D freetype_idx(type); + + VM_BUG_ON(idx < 0); + return &area->free_list[idx]; +} + struct pglist_data; =20 #ifdef CONFIG_NUMA diff --git a/mm/compaction.c b/mm/compaction.c index cf65a3425500c..2b26bd9405035 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1359,7 +1359,7 @@ isolate_migratepages_range(struct compact_control *cc= , unsigned long start_pfn, static bool suitable_migration_source(struct compact_control *cc, struct page *page) { - int block_mt; + freetype_t block_ft; =20 if (pageblock_skip_persistent(page)) return false; @@ -1367,12 +1367,12 @@ static bool suitable_migration_source(struct compac= t_control *cc, if ((cc->mode !=3D MIGRATE_ASYNC) || !cc->direct_compaction) return true; =20 - block_mt =3D get_pageblock_migratetype(page); + block_ft =3D get_pageblock_freetype(page); =20 - if (cc->migratetype =3D=3D MIGRATE_MOVABLE) - return is_migrate_movable(block_mt); + if (free_to_migratetype(cc->freetype) =3D=3D MIGRATE_MOVABLE) + return is_migrate_movable(free_to_migratetype(block_ft)); else - return block_mt =3D=3D cc->migratetype; + return freetypes_equal(block_ft, cc->freetype); } =20 /* Returns true if the page is within a block suitable for migration to */ @@ -1963,7 +1963,8 @@ static unsigned long fast_find_migrateblock(struct co= mpact_control *cc) * reduces the risk that a large movable pageblock is freed for * an unmovable/reclaimable small allocation. */ - if (cc->direct_compaction && cc->migratetype !=3D MIGRATE_MOVABLE) + if (cc->direct_compaction && + free_to_migratetype(cc->freetype) !=3D MIGRATE_MOVABLE) return pfn; =20 /* @@ -2234,7 +2235,7 @@ static bool should_proactive_compact_node(pg_data_t *= pgdat) static enum compact_result __compact_finished(struct compact_control *cc) { unsigned int order; - const int migratetype =3D cc->migratetype; + const freetype_t freetype =3D cc->freetype; int ret; =20 /* Compaction run completes if the migrate and free scanner meet */ @@ -2309,25 +2310,27 @@ static enum compact_result __compact_finished(struc= t compact_control *cc) for (order =3D cc->order; order < NR_PAGE_ORDERS; order++) { struct free_area *area =3D &cc->zone->free_area[order]; =20 - /* Job done if page is free of the right migratetype */ - if (!free_area_empty(area, migratetype)) + /* Job done if page is free of the right freetype */ + if (!free_area_empty(area, freetype)) return COMPACT_SUCCESS; =20 #ifdef CONFIG_CMA /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */ - if (migratetype =3D=3D MIGRATE_MOVABLE && - !free_area_empty(area, MIGRATE_CMA)) + if (free_to_migratetype(freetype) =3D=3D MIGRATE_MOVABLE && + !free_area_empty(area, freetype_with_migrate(cc->freetype, + MIGRATE_CMA))) return COMPACT_SUCCESS; #endif /* * Job done if allocation would steal freepages from - * other migratetype buddy lists. + * other freetype buddy lists. */ - if (find_suitable_fallback(area, order, migratetype, true, NULL) + if (find_suitable_fallback(area, order, freetype, true, NULL) =3D=3D FALLBACK_FOUND) /* - * Movable pages are OK in any pageblock. If we are - * stealing for a non-movable allocation, make sure + * Movable pages are OK in any pageblock of the right + * sensitivity. If we are * stealing for a + * non-movable allocation, make sure * we finish compacting the current pageblock first * (which is assured by the above migrate_pfn align * check) so it is as free as possible and we won't @@ -2532,7 +2535,7 @@ compact_zone(struct compact_control *cc, struct captu= re_control *capc) INIT_LIST_HEAD(&cc->freepages[order]); INIT_LIST_HEAD(&cc->migratepages); =20 - cc->migratetype =3D gfp_migratetype(cc->gfp_mask); + cc->freetype =3D gfp_freetype(cc->gfp_mask); =20 if (!is_via_compact_memory(cc->order)) { ret =3D compaction_suit_allocation_order(cc->zone, cc->order, diff --git a/mm/internal.h b/mm/internal.h index 1d88e56a9dee0..cac292dcd394f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -658,7 +659,7 @@ struct alloc_context { struct zonelist *zonelist; nodemask_t *nodemask; struct zoneref *preferred_zoneref; - int migratetype; + freetype_t freetype; =20 /* * highest_zoneidx represents highest usable zone index of @@ -809,8 +810,8 @@ static inline void clear_zone_contiguous(struct zone *z= one) } =20 extern int __isolate_free_page(struct page *page, unsigned int order); -extern void __putback_isolated_page(struct page *page, unsigned int order, - int mt); +void __putback_isolated_page(struct page *page, unsigned int order, + freetype_t freetype); extern void memblock_free_pages(unsigned long pfn, unsigned int order); extern void __free_pages_core(struct page *page, unsigned int order, enum meminit_context context); @@ -968,7 +969,7 @@ struct compact_control { short search_order; /* order to start a fast search at */ const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ - int migratetype; /* migratetype of direct compactor */ + freetype_t freetype; /* freetype of direct compactor */ const unsigned int alloc_flags; /* alloc flags of a direct compactor */ const int highest_zoneidx; /* zone index of a direct compactor */ enum migrate_mode mode; /* Async or sync migration mode */ @@ -1029,7 +1030,7 @@ static inline void init_cma_pageblock(struct page *pa= ge) #endif =20 enum fallback_result { - /* Found suitable migratetype, *mt_out is valid. */ + /* Found suitable fallback, *ft_out is valid. */ FALLBACK_FOUND, /* No fallback found in requested order. */ FALLBACK_EMPTY, @@ -1038,11 +1039,11 @@ enum fallback_result { }; enum fallback_result find_suitable_fallback(struct free_area *area, unsigned int order, - int migratetype, bool claimable, unsigned int *mt_out); + freetype_t freetype, bool claimable, freetype_t *ft_out); =20 -static inline bool free_area_empty(struct free_area *area, int migratetype) +static inline bool free_area_empty(struct free_area *area, freetype_t free= type) { - return list_empty(&area->free_list[migratetype]); + return list_empty(free_area_list(area, freetype)); } =20 /* mm/util.c */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1cd74a5901ded..66d4843da8512 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -457,6 +457,37 @@ bool get_pfnblock_bit(const struct page *page, unsigne= d long pfn, return test_bit(bitidx + pb_bit, bitmap_word); } =20 +/** + * __get_pfnblock_freetype - Return the freetype of a pageblock, optionally + * ignoring the fact that it's currently isolated. + * @page: The page within the block of interest + * @pfn: The target page frame number + * @ignore_iso: If isolated, return the migratetype that the block had bef= ore + * isolation. + */ +__always_inline freetype_t +__get_pfnblock_freetype(const struct page *page, unsigned long pfn, + bool ignore_iso) +{ + int mt =3D get_pfnblock_migratetype(page, pfn); + + return migrate_to_freetype(mt, 0); +} + +/** + * get_pfnblock_migratetype - Return the freetype of a pageblock + * @page: The page within the block of interest + * @pfn: The target page frame number + * + * Return: The freetype of the pageblock + */ +__always_inline freetype_t +get_pfnblock_freetype(const struct page *page, unsigned long pfn) +{ + return __get_pfnblock_freetype(page, pfn, 0); +} + + /** * get_pfnblock_migratetype - Return the migratetype of a pageblock * @page: The page within the block of interest @@ -768,8 +799,11 @@ static inline struct capture_control *task_capc(struct= zone *zone) =20 static inline bool compaction_capture(struct capture_control *capc, struct page *page, - int order, int migratetype) + int order, freetype_t freetype) { + enum migratetype migratetype =3D free_to_migratetype(freetype); + enum migratetype capc_mt; + if (!capc || order !=3D capc->cc->order) return false; =20 @@ -778,6 +812,8 @@ compaction_capture(struct capture_control *capc, struct= page *page, is_migrate_isolate(migratetype)) return false; =20 + capc_mt =3D free_to_migratetype(capc->cc->freetype); + /* * Do not let lower order allocations pollute a movable pageblock * unless compaction is also requesting movable pages. @@ -786,12 +822,12 @@ compaction_capture(struct capture_control *capc, stru= ct page *page, * have trouble finding a high-order free page. */ if (order < pageblock_order && migratetype =3D=3D MIGRATE_MOVABLE && - capc->cc->migratetype !=3D MIGRATE_MOVABLE) + capc_mt !=3D MIGRATE_MOVABLE) return false; =20 - if (migratetype !=3D capc->cc->migratetype) + if (migratetype !=3D capc_mt) trace_mm_page_alloc_extfrag(page, capc->cc->order, order, - capc->cc->migratetype, migratetype); + capc_mt, migratetype); =20 capc->page =3D page; return true; @@ -805,7 +841,7 @@ static inline struct capture_control *task_capc(struct = zone *zone) =20 static inline bool compaction_capture(struct capture_control *capc, struct page *page, - int order, int migratetype) + int order, freetype_t freetype) { return false; } @@ -830,23 +866,28 @@ static inline void account_freepages(struct zone *zon= e, int nr_pages, =20 /* Used for pages not on another list */ static inline void __add_to_free_list(struct page *page, struct zone *zone, - unsigned int order, int migratetype, + unsigned int order, freetype_t freetype, bool tail) { struct free_area *area =3D &zone->free_area[order]; int nr_pages =3D 1 << order; =20 - VM_WARN_ONCE(get_pageblock_migratetype(page) !=3D migratetype, - "page type is %d, passed migratetype is %d (nr=3D%d)\n", - get_pageblock_migratetype(page), migratetype, nr_pages); + if (IS_ENABLED(CONFIG_DEBUG_VM)) { + freetype_t block_ft =3D get_pageblock_freetype(page); + + VM_WARN_ONCE(!freetypes_equal(block_ft, freetype), + "page type is %d/%#x, passed type is %d/%3x (nr=3D%d)\n", + block_ft.migratetype, freetype_flags(block_ft), + freetype.migratetype, freetype_flags(freetype), nr_pages); + } =20 if (tail) - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); + list_add_tail(&page->buddy_list, free_area_list(area, freetype)); else - list_add(&page->buddy_list, &area->free_list[migratetype]); + list_add(&page->buddy_list, free_area_list(area, freetype)); area->nr_free++; =20 - if (order >=3D pageblock_order && !is_migrate_isolate(migratetype)) + if (order >=3D pageblock_order && !is_migrate_isolate(free_to_migratetype= (freetype))) __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages); } =20 @@ -856,17 +897,25 @@ static inline void __add_to_free_list(struct page *pa= ge, struct zone *zone, * allocation again (e.g., optimization for memory onlining). */ static inline void move_to_free_list(struct page *page, struct zone *zone, - unsigned int order, int old_mt, int new_mt) + unsigned int order, + freetype_t old_ft, freetype_t new_ft) { struct free_area *area =3D &zone->free_area[order]; + int old_mt =3D free_to_migratetype(old_ft); + int new_mt =3D free_to_migratetype(new_ft); int nr_pages =3D 1 << order; =20 /* Free page moving can fail, so it happens before the type update */ - VM_WARN_ONCE(get_pageblock_migratetype(page) !=3D old_mt, - "page type is %d, passed migratetype is %d (nr=3D%d)\n", - get_pageblock_migratetype(page), old_mt, nr_pages); + if (IS_ENABLED(CONFIG_DEBUG_VM)) { + freetype_t block_ft =3D get_pageblock_freetype(page); =20 - list_move_tail(&page->buddy_list, &area->free_list[new_mt]); + VM_WARN_ONCE(!freetypes_equal(block_ft, old_ft), + "page type is %d/%#x, passed freetype is %d/%#x (nr=3D%d)\n", + block_ft.migratetype, freetype_flags(block_ft), + old_ft.migratetype, freetype_flags(old_ft), nr_pages); + } + + list_move_tail(&page->buddy_list, free_area_list(area, new_ft)); =20 account_freepages(zone, -nr_pages, old_mt); account_freepages(zone, nr_pages, new_mt); @@ -909,9 +958,9 @@ static inline void del_page_from_free_list(struct page = *page, struct zone *zone, } =20 static inline struct page *get_page_from_free_area(struct free_area *area, - int migratetype) + freetype_t freetype) { - return list_first_entry_or_null(&area->free_list[migratetype], + return list_first_entry_or_null(free_area_list(area, freetype), struct page, buddy_list); } =20 @@ -978,9 +1027,10 @@ static void change_pageblock_range(struct page *pageb= lock_page, static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, - int migratetype, fpi_t fpi_flags) + freetype_t freetype, fpi_t fpi_flags) { struct capture_control *capc =3D task_capc(zone); + int migratetype =3D free_to_migratetype(freetype); unsigned long buddy_pfn =3D 0; unsigned long combined_pfn; struct page *buddy; @@ -989,16 +1039,17 @@ static inline void __free_one_page(struct page *page, VM_BUG_ON(!zone_is_initialized(zone)); VM_BUG_ON_PAGE(page->flags.f & PAGE_FLAGS_CHECK_AT_PREP, page); =20 - VM_BUG_ON(migratetype =3D=3D -1); + VM_BUG_ON(freetype.migratetype =3D=3D -1); VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); VM_BUG_ON_PAGE(bad_range(zone, page), page); =20 account_freepages(zone, 1 << order, migratetype); =20 while (order < MAX_PAGE_ORDER) { - int buddy_mt =3D migratetype; + freetype_t buddy_ft =3D freetype; + enum migratetype buddy_mt =3D free_to_migratetype(buddy_ft); =20 - if (compaction_capture(capc, page, order, migratetype)) { + if (compaction_capture(capc, page, order, freetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -1014,7 +1065,8 @@ static inline void __free_one_page(struct page *page, * pageblock isolation could cause incorrect freepage or CMA * accounting or HIGHATOMIC accounting. */ - buddy_mt =3D get_pfnblock_migratetype(buddy, buddy_pfn); + buddy_ft =3D get_pfnblock_freetype(buddy, buddy_pfn); + buddy_mt =3D free_to_migratetype(buddy_ft); =20 if (migratetype !=3D buddy_mt && (!migratetype_is_mergeable(migratetype) || @@ -1056,7 +1108,7 @@ static inline void __free_one_page(struct page *page, else to_tail =3D buddy_merge_likely(pfn, buddy_pfn, page, order); =20 - __add_to_free_list(page, zone, order, migratetype, to_tail); + __add_to_free_list(page, zone, order, freetype, to_tail); =20 /* Notify page reporting subsystem of freed page */ if (!(fpi_flags & FPI_SKIP_REPORT_NOTIFY)) @@ -1517,19 +1569,20 @@ static void free_pcppages_bulk(struct zone *zone, i= nt count, nr_pages =3D 1 << order; do { unsigned long pfn; - int mt; + freetype_t ft; =20 page =3D list_last_entry(list, struct page, pcp_list); pfn =3D page_to_pfn(page); - mt =3D get_pfnblock_migratetype(page, pfn); + ft =3D get_pfnblock_freetype(page, pfn); =20 /* must delete to avoid corrupting pcp list */ list_del(&page->pcp_list); count -=3D nr_pages; pcp->count -=3D nr_pages; =20 - __free_one_page(page, pfn, zone, order, mt, FPI_NONE); - trace_mm_page_pcpu_drain(page, order, mt); + __free_one_page(page, pfn, zone, order, ft, FPI_NONE); + trace_mm_page_pcpu_drain(page, order, + free_to_migratetype(ft)); } while (count > 0 && !list_empty(list)); } =20 @@ -1550,9 +1603,9 @@ static void split_large_buddy(struct zone *zone, stru= ct page *page, order =3D pageblock_order; =20 do { - int mt =3D get_pfnblock_migratetype(page, pfn); + freetype_t ft =3D get_pfnblock_freetype(page, pfn); =20 - __free_one_page(page, pfn, zone, order, mt, fpi); + __free_one_page(page, pfn, zone, order, ft, fpi); pfn +=3D 1 << order; if (pfn =3D=3D end) break; @@ -1730,7 +1783,7 @@ struct page *__pageblock_pfn_to_page(unsigned long st= art_pfn, * -- nyc */ static inline unsigned int expand(struct zone *zone, struct page *page, in= t low, - int high, int migratetype) + int high, freetype_t freetype) { unsigned int size =3D 1 << high; unsigned int nr_added =3D 0; @@ -1749,7 +1802,7 @@ static inline unsigned int expand(struct zone *zone, = struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; =20 - __add_to_free_list(&page[size], zone, high, migratetype, false); + __add_to_free_list(&page[size], zone, high, freetype, false); set_buddy_order(&page[size], high); nr_added +=3D size; } @@ -1759,12 +1812,13 @@ static inline unsigned int expand(struct zone *zone= , struct page *page, int low, =20 static __always_inline void page_del_and_expand(struct zone *zone, struct page *page, int low, - int high, int migratetype) + int high, freetype_t freetype) { + enum migratetype migratetype =3D free_to_migratetype(freetype); int nr_pages =3D 1 << high; =20 __del_page_from_free_list(page, zone, high, migratetype); - nr_pages -=3D expand(zone, page, low, high, migratetype); + nr_pages -=3D expand(zone, page, low, high, freetype); account_freepages(zone, -nr_pages, migratetype); } =20 @@ -1917,7 +1971,7 @@ static void prep_new_page(struct page *page, unsigned= int order, gfp_t gfp_flags */ static __always_inline struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, - int migratetype) + freetype_t freetype) { unsigned int current_order; struct free_area *area; @@ -1925,13 +1979,15 @@ struct page *__rmqueue_smallest(struct zone *zone, = unsigned int order, =20 /* Find a page of the appropriate size in the preferred list */ for (current_order =3D order; current_order < NR_PAGE_ORDERS; ++current_o= rder) { + enum migratetype migratetype =3D free_to_migratetype(freetype); + area =3D &(zone->free_area[current_order]); - page =3D get_page_from_free_area(area, migratetype); + page =3D get_page_from_free_area(area, freetype); if (!page) continue; =20 page_del_and_expand(zone, page, order, current_order, - migratetype); + freetype); trace_mm_page_alloc_zone_locked(page, order, migratetype, pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES); @@ -1956,13 +2012,18 @@ static int fallbacks[MIGRATE_PCPTYPES][MIGRATE_PCPT= YPES - 1] =3D { =20 #ifdef CONFIG_CMA static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zo= ne, - unsigned int order) + unsigned int order, unsigned int ft_flags) { - return __rmqueue_smallest(zone, order, MIGRATE_CMA); + freetype_t freetype =3D migrate_to_freetype(MIGRATE_CMA, ft_flags); + + return __rmqueue_smallest(zone, order, freetype); } #else static inline struct page *__rmqueue_cma_fallback(struct zone *zone, - unsigned int order) { return NULL; } + unsigned int order, bool sensitive) +{ + return NULL; +} #endif =20 /* @@ -1970,7 +2031,7 @@ static inline struct page *__rmqueue_cma_fallback(str= uct zone *zone, * change the block type. */ static int __move_freepages_block(struct zone *zone, unsigned long start_p= fn, - int old_mt, int new_mt) + freetype_t old_ft, freetype_t new_ft) { struct page *page; unsigned long pfn, end_pfn; @@ -1993,7 +2054,7 @@ static int __move_freepages_block(struct zone *zone, = unsigned long start_pfn, =20 order =3D buddy_order(page); =20 - move_to_free_list(page, zone, order, old_mt, new_mt); + move_to_free_list(page, zone, order, old_ft, new_ft); =20 pfn +=3D 1 << order; pages_moved +=3D 1 << order; @@ -2053,7 +2114,7 @@ static bool prep_move_freepages_block(struct zone *zo= ne, struct page *page, } =20 static int move_freepages_block(struct zone *zone, struct page *page, - int old_mt, int new_mt) + freetype_t old_ft, freetype_t new_ft) { unsigned long start_pfn; int res; @@ -2061,8 +2122,11 @@ static int move_freepages_block(struct zone *zone, s= truct page *page, if (!prep_move_freepages_block(zone, page, &start_pfn, NULL, NULL)) return -1; =20 - res =3D __move_freepages_block(zone, start_pfn, old_mt, new_mt); - set_pageblock_migratetype(pfn_to_page(start_pfn), new_mt); + VM_BUG_ON(freetype_flags(old_ft) !=3D freetype_flags(new_ft)); + + res =3D __move_freepages_block(zone, start_pfn, old_ft, new_ft); + set_pageblock_migratetype(pfn_to_page(start_pfn), + free_to_migratetype(new_ft)); =20 return res; =20 @@ -2130,8 +2194,7 @@ static bool __move_freepages_block_isolate(struct zon= e *zone, struct page *page, bool isolate) { unsigned long start_pfn, buddy_pfn; - int from_mt; - int to_mt; + freetype_t from_ft, to_ft; struct page *buddy; =20 if (isolate =3D=3D get_pageblock_isolate(page)) { @@ -2161,18 +2224,15 @@ static bool __move_freepages_block_isolate(struct z= one *zone, } =20 move: - /* Use MIGRATETYPE_MASK to get non-isolate migratetype */ if (isolate) { - from_mt =3D __get_pfnblock_flags_mask(page, page_to_pfn(page), - MIGRATETYPE_MASK); - to_mt =3D MIGRATE_ISOLATE; + from_ft =3D __get_pfnblock_freetype(page, page_to_pfn(page), true); + to_ft =3D freetype_with_migrate(from_ft, MIGRATE_ISOLATE); } else { - from_mt =3D MIGRATE_ISOLATE; - to_mt =3D __get_pfnblock_flags_mask(page, page_to_pfn(page), - MIGRATETYPE_MASK); + to_ft =3D __get_pfnblock_freetype(page, page_to_pfn(page), true); + from_ft =3D freetype_with_migrate(to_ft, MIGRATE_ISOLATE); } =20 - __move_freepages_block(zone, start_pfn, from_mt, to_mt); + __move_freepages_block(zone, start_pfn, from_ft, to_ft); toggle_pageblock_isolate(pfn_to_page(start_pfn), isolate); =20 return true; @@ -2276,15 +2336,16 @@ static bool should_try_claim_block(unsigned int ord= er, int start_mt) =20 /* * Check whether there is a suitable fallback freepage with requested orde= r. - * If claimable is true, this function returns fallback_mt only if + * If claimable is true, this function returns a fallback only if * we would do this whole-block claiming. This would help to reduce * fragmentation due to mixed migratetype pages in one pageblock. */ enum fallback_result find_suitable_fallback(struct free_area *area, unsigned int order, - int migratetype, bool claimable, unsigned int *mt_out) + freetype_t freetype, bool claimable, freetype_t *ft_out) { int i; + enum migratetype migratetype =3D free_to_migratetype(freetype); =20 if (claimable && !should_try_claim_block(order, migratetype)) return FALLBACK_NOCLAIM; @@ -2294,10 +2355,18 @@ find_suitable_fallback(struct free_area *area, unsi= gned int order, =20 for (i =3D 0; i < MIGRATE_PCPTYPES - 1 ; i++) { int fallback_mt =3D fallbacks[migratetype][i]; + /* + * Fallback to different migratetypes, but currently always with + * the same freetype flags. + */ + freetype_t fallback_ft =3D freetype_with_migrate(freetype, fallback_mt); =20 - if (!free_area_empty(area, fallback_mt)) { - if (mt_out) - *mt_out =3D fallback_mt; + if (freetype_idx(fallback_ft) < 0) + continue; + + if (!free_area_empty(area, fallback_ft)) { + if (ft_out) + *ft_out =3D fallback_ft; return FALLBACK_FOUND; } } @@ -2314,20 +2383,22 @@ find_suitable_fallback(struct free_area *area, unsi= gned int order, */ static struct page * try_to_claim_block(struct zone *zone, struct page *page, - int current_order, int order, int start_type, - int block_type, unsigned int alloc_flags) + int current_order, int order, freetype_t start_type, + freetype_t block_type, unsigned int alloc_flags) { int free_pages, movable_pages, alike_pages; + int block_mt =3D free_to_migratetype(block_type); + int start_mt =3D free_to_migratetype(start_type); unsigned long start_pfn; =20 /* Take ownership for orders >=3D pageblock_order */ if (current_order >=3D pageblock_order) { unsigned int nr_added; =20 - del_page_from_free_list(page, zone, current_order, block_type); - change_pageblock_range(page, current_order, start_type); + del_page_from_free_list(page, zone, current_order, block_mt); + change_pageblock_range(page, current_order, start_mt); nr_added =3D expand(zone, page, order, current_order, start_type); - account_freepages(zone, nr_added, start_type); + account_freepages(zone, nr_added, start_mt); return page; } =20 @@ -2349,7 +2420,7 @@ try_to_claim_block(struct zone *zone, struct page *pa= ge, * For movable allocation, it's the number of movable pages which * we just obtained. For other types it's a bit more tricky. */ - if (start_type =3D=3D MIGRATE_MOVABLE) { + if (start_mt =3D=3D MIGRATE_MOVABLE) { alike_pages =3D movable_pages; } else { /* @@ -2359,7 +2430,7 @@ try_to_claim_block(struct zone *zone, struct page *pa= ge, * vice versa, be conservative since we can't distinguish the * exact migratetype of non-movable pages. */ - if (block_type =3D=3D MIGRATE_MOVABLE) + if (block_mt =3D=3D MIGRATE_MOVABLE) alike_pages =3D pageblock_nr_pages - (free_pages + movable_pages); else @@ -2372,7 +2443,7 @@ try_to_claim_block(struct zone *zone, struct page *pa= ge, if (free_pages + alike_pages >=3D (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) { __move_freepages_block(zone, start_pfn, block_type, start_type); - set_pageblock_migratetype(pfn_to_page(start_pfn), start_type); + set_pageblock_migratetype(pfn_to_page(start_pfn), start_mt); return __rmqueue_smallest(zone, order, start_type); } =20 @@ -2388,14 +2459,13 @@ try_to_claim_block(struct zone *zone, struct page *= page, * condition simpler. */ static __always_inline struct page * -__rmqueue_claim(struct zone *zone, int order, int start_migratetype, +__rmqueue_claim(struct zone *zone, int order, freetype_t start_freetype, unsigned int alloc_flags) { struct free_area *area; int current_order; int min_order =3D order; struct page *page; - int fallback_mt; =20 /* * Do not steal pages from freelists belonging to other pageblocks @@ -2412,11 +2482,13 @@ __rmqueue_claim(struct zone *zone, int order, int s= tart_migratetype, */ for (current_order =3D MAX_PAGE_ORDER; current_order >=3D min_order; --current_order) { + int start_mt =3D free_to_migratetype(start_freetype); enum fallback_result result; + freetype_t fallback_ft; =20 area =3D &(zone->free_area[current_order]); - result =3D find_suitable_fallback(area, current_order, - start_migratetype, true, &fallback_mt); + result =3D find_suitable_fallback(area, current_order, start_freetype, + true, &fallback_ft); =20 if (result =3D=3D FALLBACK_EMPTY) continue; @@ -2424,13 +2496,13 @@ __rmqueue_claim(struct zone *zone, int order, int s= tart_migratetype, if (result =3D=3D FALLBACK_NOCLAIM) break; =20 - page =3D get_page_from_free_area(area, fallback_mt); + page =3D get_page_from_free_area(area, fallback_ft); page =3D try_to_claim_block(zone, page, current_order, order, - start_migratetype, fallback_mt, + start_freetype, fallback_ft, alloc_flags); if (page) { trace_mm_page_alloc_extfrag(page, order, current_order, - start_migratetype, fallback_mt); + start_mt, free_to_migratetype(fallback_ft)); return page; } } @@ -2443,26 +2515,27 @@ __rmqueue_claim(struct zone *zone, int order, int s= tart_migratetype, * the block as its current migratetype, potentially causing fragmentation. */ static __always_inline struct page * -__rmqueue_steal(struct zone *zone, int order, int start_migratetype) +__rmqueue_steal(struct zone *zone, int order, freetype_t start_freetype) { struct free_area *area; int current_order; struct page *page; - int fallback_mt; =20 for (current_order =3D order; current_order < NR_PAGE_ORDERS; current_ord= er++) { enum fallback_result result; + freetype_t fallback_ft; =20 area =3D &(zone->free_area[current_order]); - result =3D find_suitable_fallback(area, current_order, start_migratetype, - false, &fallback_mt); + result =3D find_suitable_fallback(area, current_order, start_freetype, + false, &fallback_ft); if (result =3D=3D FALLBACK_EMPTY) continue; =20 - page =3D get_page_from_free_area(area, fallback_mt); - page_del_and_expand(zone, page, order, current_order, fallback_mt); + page =3D get_page_from_free_area(area, fallback_ft); + page_del_and_expand(zone, page, order, current_order, fallback_ft); trace_mm_page_alloc_extfrag(page, order, current_order, - start_migratetype, fallback_mt); + free_to_migratetype(start_freetype), + free_to_migratetype(fallback_ft)); return page; } =20 @@ -2481,7 +2554,7 @@ enum rmqueue_mode { * Call me with the zone->lock already held. */ static __always_inline struct page * -__rmqueue(struct zone *zone, unsigned int order, int migratetype, +__rmqueue(struct zone *zone, unsigned int order, freetype_t freetype, unsigned int alloc_flags, enum rmqueue_mode *mode) { struct page *page; @@ -2495,7 +2568,8 @@ __rmqueue(struct zone *zone, unsigned int order, int = migratetype, if (alloc_flags & ALLOC_CMA && zone_page_state(zone, NR_FREE_CMA_PAGES) > zone_page_state(zone, NR_FREE_PAGES) / 2) { - page =3D __rmqueue_cma_fallback(zone, order); + page =3D __rmqueue_cma_fallback(zone, order, + freetype_flags(freetype)); if (page) return page; } @@ -2512,13 +2586,14 @@ __rmqueue(struct zone *zone, unsigned int order, in= t migratetype, */ switch (*mode) { case RMQUEUE_NORMAL: - page =3D __rmqueue_smallest(zone, order, migratetype); + page =3D __rmqueue_smallest(zone, order, freetype); if (page) return page; fallthrough; case RMQUEUE_CMA: if (alloc_flags & ALLOC_CMA) { - page =3D __rmqueue_cma_fallback(zone, order); + page =3D __rmqueue_cma_fallback(zone, order, + freetype_flags(freetype)); if (page) { *mode =3D RMQUEUE_CMA; return page; @@ -2526,7 +2601,7 @@ __rmqueue(struct zone *zone, unsigned int order, int = migratetype, } fallthrough; case RMQUEUE_CLAIM: - page =3D __rmqueue_claim(zone, order, migratetype, alloc_flags); + page =3D __rmqueue_claim(zone, order, freetype, alloc_flags); if (page) { /* Replenished preferred freelist, back to normal mode. */ *mode =3D RMQUEUE_NORMAL; @@ -2535,7 +2610,7 @@ __rmqueue(struct zone *zone, unsigned int order, int = migratetype, fallthrough; case RMQUEUE_STEAL: if (!(alloc_flags & ALLOC_NOFRAGMENT)) { - page =3D __rmqueue_steal(zone, order, migratetype); + page =3D __rmqueue_steal(zone, order, freetype); if (page) { *mode =3D RMQUEUE_STEAL; return page; @@ -2552,7 +2627,7 @@ __rmqueue(struct zone *zone, unsigned int order, int = migratetype, */ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, - int migratetype, unsigned int alloc_flags) + freetype_t freetype, unsigned int alloc_flags) { enum rmqueue_mode rmqm =3D RMQUEUE_NORMAL; unsigned long flags; @@ -2565,7 +2640,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned i= nt order, spin_lock_irqsave(&zone->lock, flags); } for (i =3D 0; i < count; ++i) { - struct page *page =3D __rmqueue(zone, order, migratetype, + struct page *page =3D __rmqueue(zone, order, freetype, alloc_flags, &rmqm); if (unlikely(page =3D=3D NULL)) break; @@ -2863,7 +2938,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, str= uct zone *zone, * reacquired. Return true if pcp is locked, false otherwise. */ static bool free_frozen_page_commit(struct zone *zone, - struct per_cpu_pages *pcp, struct page *page, int migratetype, + struct per_cpu_pages *pcp, struct page *page, freetype_t freetype, unsigned int order, fpi_t fpi_flags, unsigned long *UP_flags) { int high, batch; @@ -2880,7 +2955,7 @@ static bool free_frozen_page_commit(struct zone *zone, */ pcp->alloc_factor >>=3D 1; __count_vm_events(PGFREE, 1 << order); - pindex =3D order_to_pindex(migratetype, order); + pindex =3D order_to_pindex(free_to_migratetype(freetype), order); list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count +=3D 1 << order; =20 @@ -2975,6 +3050,7 @@ static void __free_frozen_pages(struct page *page, un= signed int order, struct zone *zone; unsigned long pfn =3D page_to_pfn(page); int migratetype; + freetype_t freetype; =20 if (!pcp_allowed_order(order)) { __free_pages_ok(page, order, fpi_flags); @@ -2992,13 +3068,14 @@ static void __free_frozen_pages(struct page *page, = unsigned int order, * excessively into the page allocator */ zone =3D page_zone(page); - migratetype =3D get_pfnblock_migratetype(page, pfn); + freetype =3D get_pfnblock_freetype(page, pfn); + migratetype =3D free_to_migratetype(freetype); if (unlikely(migratetype >=3D MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { free_one_page(zone, page, pfn, order, fpi_flags); return; } - migratetype =3D MIGRATE_MOVABLE; + freetype =3D freetype_with_migrate(freetype, MIGRATE_MOVABLE); } =20 if (unlikely((fpi_flags & FPI_TRYLOCK) && IS_ENABLED(CONFIG_PREEMPT_RT) @@ -3008,7 +3085,7 @@ static void __free_frozen_pages(struct page *page, un= signed int order, } pcp =3D pcp_spin_trylock(zone->per_cpu_pageset, UP_flags); if (pcp) { - if (!free_frozen_page_commit(zone, pcp, page, migratetype, + if (!free_frozen_page_commit(zone, pcp, page, freetype, order, fpi_flags, &UP_flags)) return; pcp_spin_unlock(pcp, UP_flags); @@ -3066,10 +3143,12 @@ void free_unref_folios(struct folio_batch *folios) struct zone *zone =3D folio_zone(folio); unsigned long pfn =3D folio_pfn(folio); unsigned int order =3D (unsigned long)folio->private; + freetype_t freetype; int migratetype; =20 folio->private =3D NULL; - migratetype =3D get_pfnblock_migratetype(&folio->page, pfn); + freetype =3D get_pfnblock_freetype(&folio->page, pfn); + migratetype =3D free_to_migratetype(freetype); =20 /* Different zone requires a different pcp lock */ if (zone !=3D locked_zone || @@ -3108,11 +3187,12 @@ void free_unref_folios(struct folio_batch *folios) * to the MIGRATE_MOVABLE pcp list. */ if (unlikely(migratetype >=3D MIGRATE_PCPTYPES)) - migratetype =3D MIGRATE_MOVABLE; + freetype =3D freetype_with_migrate(freetype, + MIGRATE_MOVABLE); =20 trace_mm_page_free_batched(&folio->page); if (!free_frozen_page_commit(zone, pcp, &folio->page, - migratetype, order, FPI_NONE, &UP_flags)) { + freetype, order, FPI_NONE, &UP_flags)) { pcp =3D NULL; locked_zone =3D NULL; } @@ -3180,14 +3260,16 @@ int __isolate_free_page(struct page *page, unsigned= int order) if (order >=3D pageblock_order - 1) { struct page *endpage =3D page + (1 << order) - 1; for (; page < endpage; page +=3D pageblock_nr_pages) { - int mt =3D get_pageblock_migratetype(page); + freetype_t old_ft =3D get_pageblock_freetype(page); + freetype_t new_ft =3D freetype_with_migrate(old_ft, + MIGRATE_MOVABLE); + /* * Only change normal pageblocks (i.e., they can merge * with others) */ if (migratetype_is_mergeable(mt)) - move_freepages_block(zone, page, mt, - MIGRATE_MOVABLE); + move_freepages_block(zone, page, old_ft, new_ft); } } =20 @@ -3203,7 +3285,8 @@ int __isolate_free_page(struct page *page, unsigned i= nt order) * This function is meant to return a page pulled from the free lists via * __isolate_free_page back to the free lists they were pulled from. */ -void __putback_isolated_page(struct page *page, unsigned int order, int mt) +void __putback_isolated_page(struct page *page, unsigned int order, + freetype_t freetype) { struct zone *zone =3D page_zone(page); =20 @@ -3211,7 +3294,7 @@ void __putback_isolated_page(struct page *page, unsig= ned int order, int mt) lockdep_assert_held(&zone->lock); =20 /* Return isolated page to tail of freelist. */ - __free_one_page(page, page_to_pfn(page), zone, order, mt, + __free_one_page(page, page_to_pfn(page), zone, order, freetype, FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL); } =20 @@ -3244,10 +3327,12 @@ static inline void zone_statistics(struct zone *pre= ferred_zone, struct zone *z, static __always_inline struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, unsigned int order, unsigned int alloc_flags, - int migratetype) + freetype_t freetype) { struct page *page; unsigned long flags; + freetype_t ft_high =3D freetype_with_migrate(freetype, + MIGRATE_HIGHATOMIC); =20 do { page =3D NULL; @@ -3258,11 +3343,11 @@ struct page *rmqueue_buddy(struct zone *preferred_z= one, struct zone *zone, spin_lock_irqsave(&zone->lock, flags); } if (alloc_flags & ALLOC_HIGHATOMIC) - page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); + page =3D __rmqueue_smallest(zone, order, ft_high); if (!page) { enum rmqueue_mode rmqm =3D RMQUEUE_NORMAL; =20 - page =3D __rmqueue(zone, order, migratetype, alloc_flags, &rmqm); + page =3D __rmqueue(zone, order, freetype, alloc_flags, &rmqm); =20 /* * If the allocation fails, allow OOM handling and @@ -3271,7 +3356,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zon= e, struct zone *zone, * high-order atomic allocation in the future. */ if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK))) - page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); + page =3D __rmqueue_smallest(zone, order, ft_high); =20 if (!page) { spin_unlock_irqrestore(&zone->lock, flags); @@ -3340,7 +3425,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, st= ruct zone *zone, int order) /* Remove page from the per-cpu list, caller must protect the list */ static inline struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, - int migratetype, + freetype_t freetype, unsigned int alloc_flags, struct per_cpu_pages *pcp, struct list_head *list) @@ -3354,7 +3439,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, uns= igned int order, =20 alloced =3D rmqueue_bulk(zone, order, batch, list, - migratetype, alloc_flags); + freetype, alloc_flags); =20 pcp->count +=3D alloced << order; if (unlikely(list_empty(list))) @@ -3372,7 +3457,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, uns= igned int order, /* Lock and remove page from the per-cpu list */ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct zone *zone, unsigned int order, - int migratetype, unsigned int alloc_flags) + freetype_t freetype, unsigned int alloc_flags) { struct per_cpu_pages *pcp; struct list_head *list; @@ -3390,8 +3475,8 @@ static struct page *rmqueue_pcplist(struct zone *pref= erred_zone, * frees. */ pcp->free_count >>=3D 1; - list =3D &pcp->lists[order_to_pindex(migratetype, order)]; - page =3D __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, li= st); + list =3D &pcp->lists[order_to_pindex(free_to_migratetype(freetype), order= )]; + page =3D __rmqueue_pcplist(zone, order, freetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp, UP_flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3416,19 +3501,19 @@ static inline struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, - int migratetype) + freetype_t freetype) { struct page *page; =20 if (likely(pcp_allowed_order(order))) { page =3D rmqueue_pcplist(preferred_zone, zone, order, - migratetype, alloc_flags); + freetype, alloc_flags); if (likely(page)) goto out; } =20 page =3D rmqueue_buddy(preferred_zone, zone, order, alloc_flags, - migratetype); + freetype); =20 out: /* Separate test+clear to avoid unnecessary atomics */ @@ -3450,7 +3535,7 @@ struct page *rmqueue(struct zone *preferred_zone, static void reserve_highatomic_pageblock(struct page *page, int order, struct zone *zone) { - int mt; + freetype_t ft, ft_high; unsigned long max_managed, flags; =20 /* @@ -3472,13 +3557,14 @@ static void reserve_highatomic_pageblock(struct pag= e *page, int order, goto out_unlock; =20 /* Yoink! */ - mt =3D get_pageblock_migratetype(page); + ft =3D get_pageblock_freetype(page); /* Only reserve normal pageblocks (i.e., they can merge with others) */ - if (!migratetype_is_mergeable(mt)) + if (!migratetype_is_mergeable(free_to_migratetype(ft))) goto out_unlock; =20 + ft_high =3D freetype_with_migrate(ft, MIGRATE_HIGHATOMIC); if (order < pageblock_order) { - if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) =3D=3D -1) + if (move_freepages_block(zone, page, ft, ft_high) =3D=3D -1) goto out_unlock; zone->nr_reserved_highatomic +=3D pageblock_nr_pages; } else { @@ -3523,9 +3609,11 @@ static bool unreserve_highatomic_pageblock(const str= uct alloc_context *ac, spin_lock_irqsave(&zone->lock, flags); for (order =3D 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area =3D &(zone->free_area[order]); + freetype_t ft_high =3D freetype_with_migrate(ac->freetype, + MIGRATE_HIGHATOMIC); unsigned long size; =20 - page =3D get_page_from_free_area(area, MIGRATE_HIGHATOMIC); + page =3D get_page_from_free_area(area, ft_high); if (!page) continue; =20 @@ -3552,14 +3640,14 @@ static bool unreserve_highatomic_pageblock(const st= ruct alloc_context *ac, */ if (order < pageblock_order) ret =3D move_freepages_block(zone, page, - MIGRATE_HIGHATOMIC, - ac->migratetype); + ft_high, + ac->freetype); else { move_to_free_list(page, zone, order, - MIGRATE_HIGHATOMIC, - ac->migratetype); + ft_high, + ac->freetype); change_pageblock_range(page, order, - ac->migratetype); + free_to_migratetype(ac->freetype)); ret =3D 1; } /* @@ -3665,18 +3753,18 @@ bool __zone_watermark_ok(struct zone *z, unsigned i= nt order, unsigned long mark, continue; =20 for (mt =3D 0; mt < MIGRATE_PCPTYPES; mt++) { - if (!free_area_empty(area, mt)) + if (!free_area_empty(area, migrate_to_freetype(mt, 0))) return true; } =20 #ifdef CONFIG_CMA if ((alloc_flags & ALLOC_CMA) && - !free_area_empty(area, MIGRATE_CMA)) { + !free_area_empty(area, migrate_to_freetype(MIGRATE_CMA, 0))) { return true; } #endif if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) && - !free_area_empty(area, MIGRATE_HIGHATOMIC)) { + !free_area_empty(area, migrate_to_freetype(MIGRATE_HIGHATOMIC, 0))) { return true; } } @@ -3800,7 +3888,7 @@ static inline unsigned int gfp_to_alloc_flags_cma(gfp= _t gfp_mask, unsigned int alloc_flags) { #ifdef CONFIG_CMA - if (gfp_migratetype(gfp_mask) =3D=3D MIGRATE_MOVABLE) + if (free_to_migratetype(gfp_freetype(gfp_mask)) =3D=3D MIGRATE_MOVABLE) alloc_flags |=3D ALLOC_CMA; #endif return alloc_flags; @@ -3963,7 +4051,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int o= rder, int alloc_flags, =20 try_this_zone: page =3D rmqueue(zonelist_zone(ac->preferred_zoneref), zone, order, - gfp_mask, alloc_flags, ac->migratetype); + gfp_mask, alloc_flags, ac->freetype); if (page) { prep_new_page(page, order, gfp_mask, alloc_flags); =20 @@ -4732,6 +4820,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, int reserve_flags; bool compact_first =3D false; bool can_retry_reserves =3D true; + enum migratetype migratetype =3D free_to_migratetype(ac->freetype); =20 if (unlikely(nofail)) { /* @@ -4762,8 +4851,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * try prevent permanent fragmentation by migrating from blocks of the * same migratetype. */ - if (can_compact && (costly_order || (order > 0 && - ac->migratetype !=3D MIGRATE_MOVABLE))) { + if (can_compact && (costly_order || (order > 0 && migratetype !=3D MIGRAT= E_MOVABLE))) { compact_first =3D true; compact_priority =3D INIT_COMPACT_PRIORITY; } @@ -5007,7 +5095,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask= , unsigned int order, ac->highest_zoneidx =3D gfp_zone(gfp_mask); ac->zonelist =3D node_zonelist(preferred_nid, gfp_mask); ac->nodemask =3D nodemask; - ac->migratetype =3D gfp_migratetype(gfp_mask); + ac->freetype =3D gfp_freetype(gfp_mask); =20 if (cpusets_enabled()) { *alloc_gfp |=3D __GFP_HARDWALL; @@ -5172,7 +5260,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int = preferred_nid, goto failed; =20 /* Attempt the batch allocation */ - pcp_list =3D &pcp->lists[order_to_pindex(ac.migratetype, 0)]; + pcp_list =3D &pcp->lists[order_to_pindex(free_to_migratetype(ac.freetype)= , 0)]; while (nr_populated < nr_pages) { =20 /* Skip existing pages */ @@ -5181,7 +5269,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int = preferred_nid, continue; } =20 - page =3D __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, + page =3D __rmqueue_pcplist(zone, 0, ac.freetype, alloc_flags, pcp, pcp_list); if (unlikely(!page)) { /* Try and allocate at least one page */ @@ -5275,7 +5363,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, u= nsigned int order, page =3D NULL; } =20 - trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); + trace_mm_page_alloc(page, order, alloc_gfp, + free_to_migratetype(ac.freetype)); kmsan_alloc_page(page, order, alloc_gfp); =20 return page; @@ -7500,11 +7589,11 @@ EXPORT_SYMBOL(is_free_buddy_page); =20 #ifdef CONFIG_MEMORY_FAILURE static inline void add_to_free_list(struct page *page, struct zone *zone, - unsigned int order, int migratetype, + unsigned int order, freetype_t freetype, bool tail) { - __add_to_free_list(page, zone, order, migratetype, tail); - account_freepages(zone, 1 << order, migratetype); + __add_to_free_list(page, zone, order, freetype, tail); + account_freepages(zone, 1 << order, free_to_migratetype(freetype)); } =20 /* @@ -7513,7 +7602,7 @@ static inline void add_to_free_list(struct page *page= , struct zone *zone, */ static void break_down_buddy_pages(struct zone *zone, struct page *page, struct page *target, int low, int high, - int migratetype) + freetype_t freetype) { unsigned long size =3D 1 << high; struct page *current_buddy; @@ -7532,7 +7621,7 @@ static void break_down_buddy_pages(struct zone *zone,= struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; =20 - add_to_free_list(current_buddy, zone, high, migratetype, false); + add_to_free_list(current_buddy, zone, high, freetype, false); set_buddy_order(current_buddy, high); } } @@ -7555,13 +7644,13 @@ bool take_page_off_buddy(struct page *page) =20 if (PageBuddy(page_head) && page_order >=3D order) { unsigned long pfn_head =3D page_to_pfn(page_head); - int migratetype =3D get_pfnblock_migratetype(page_head, - pfn_head); + freetype_t freetype =3D get_pfnblock_freetype(page_head, + pfn_head); =20 del_page_from_free_list(page_head, zone, page_order, - migratetype); + free_to_migratetype(freetype)); break_down_buddy_pages(zone, page_head, page, 0, - page_order, migratetype); + page_order, freetype); SetPageHWPoisonTakenOff(page); ret =3D true; break; @@ -7585,10 +7674,10 @@ bool put_page_back_buddy(struct page *page) spin_lock_irqsave(&zone->lock, flags); if (put_page_testzero(page)) { unsigned long pfn =3D page_to_pfn(page); - int migratetype =3D get_pfnblock_migratetype(page, pfn); + freetype_t freetype =3D get_pfnblock_freetype(page, pfn); =20 ClearPageHWPoisonTakenOff(page); - __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); + __free_one_page(page, pfn, zone, 0, freetype, FPI_NONE); if (TestClearPageHWPoison(page)) { ret =3D true; } @@ -7829,7 +7918,8 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t g= fp_flags, int nid, unsigned __free_frozen_pages(page, order, FPI_TRYLOCK); page =3D NULL; } - trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); + trace_mm_page_alloc(page, order, alloc_gfp, + free_to_migratetype(ac.freetype)); kmsan_alloc_page(page, order, alloc_gfp); return page; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c48ff5c002449..bec964b77b8e9 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -276,7 +276,7 @@ static void unset_migratetype_isolate(struct page *page) WARN_ON_ONCE(!pageblock_unisolate_and_move_free_pages(zone, page)); } else { clear_pageblock_isolate(page); - __putback_isolated_page(page, order, get_pageblock_migratetype(page)); + __putback_isolated_page(page, order, get_pageblock_freetype(page)); } zone->nr_isolate_pageblock--; out: diff --git a/mm/page_owner.c b/mm/page_owner.c index b6a394a130ecd..32e870225aa8e 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -481,7 +481,8 @@ void pagetypeinfo_showmixedcount_print(struct seq_file = *m, goto ext_put_continue; =20 page_owner =3D get_page_owner(page_ext); - page_mt =3D gfp_migratetype(page_owner->gfp_mask); + page_mt =3D free_to_migratetype( + gfp_freetype(page_owner->gfp_mask)); if (pageblock_mt !=3D page_mt) { if (is_migrate_cma(pageblock_mt)) count[MIGRATE_MOVABLE]++; @@ -566,7 +567,7 @@ print_page_owner(char __user *buf, size_t count, unsign= ed long pfn, =20 /* Print information relevant to grouping pages by mobility */ pageblock_mt =3D get_pageblock_migratetype(page); - page_mt =3D gfp_migratetype(page_owner->gfp_mask); + page_mt =3D free_to_migratetype(gfp_freetype(page_owner->gfp_mask)); ret +=3D scnprintf(kbuf + ret, count - ret, "PFN 0x%lx type %s Block %lu type %s Flags %pGp\n", pfn, @@ -617,7 +618,7 @@ void __dump_page_owner(const struct page *page) =20 page_owner =3D get_page_owner(page_ext); gfp_mask =3D page_owner->gfp_mask; - mt =3D gfp_migratetype(gfp_mask); + mt =3D free_to_migratetype(gfp_freetype(gfp_mask)); =20 if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) { pr_alert("page_owner info is not present (never set?)\n"); diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 8a03effda7494..403e5080ebcd0 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -113,10 +113,10 @@ page_reporting_drain(struct page_reporting_dev_info *= prdev, */ do { struct page *page =3D sg_page(sg); - int mt =3D get_pageblock_migratetype(page); + freetype_t ft =3D get_pageblock_freetype(page); unsigned int order =3D get_order(sg->length); =20 - __putback_isolated_page(page, order, mt); + __putback_isolated_page(page, order, ft); =20 /* If the pages were not reported due to error skip flagging */ if (!reported) diff --git a/mm/show_mem.c b/mm/show_mem.c index 24078ac3e6bca..84bd3e6440117 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -373,7 +373,9 @@ static void show_free_areas(unsigned int filter, nodema= sk_t *nodemask, int max_z =20 types[order] =3D 0; for (type =3D 0; type < MIGRATE_TYPES; type++) { - if (!free_area_empty(area, type)) + freetype_t ft =3D migrate_to_freetype(type, 0); + + if (!free_area_empty(area, ft)) types[order] |=3D 1 << type; } } --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17EB93E95A3 for ; Wed, 25 Feb 2026 16:34:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037291; cv=none; b=EBwuamGyU7FHEa0mjWcnFm9UaPexAIvua4qU1B3mHkcvHj1L3UZTpG2g7uItwaLYdIHhhdg0g/MoAhf+906m4NT1htOe1+thyUzlLytdgZymejDvVOr1E9UX8hA5dkA/egnD8Hesqq6T8FfXUol6E5F/DQ321Z/YMKxcuxCGEEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037291; c=relaxed/simple; bh=0Ir/PBvJZVYMBYViR320QXbxAMY6bdSdQ7UYbcrNNvQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hqjMKNdbKN+asqzFgPTer09D8maAk+IeR+tCTSZh9tGi7zwxDQCSUHxiC1gVdYELG3OnLP4N4oQuVJnFZ1csOTqSjfA2teAAVok7TOWZZl8FxwMRtPgB7OegO+iafm783gB+5GN7nMkQHvUeDGMkgCjHUSGy01OFh2x/ytipT20= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4Pmk+Ahs; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4Pmk+Ahs" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-b8fa51ade75so180349566b.3 for ; Wed, 25 Feb 2026 08:34:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037288; x=1772642088; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZPavfHtzfndzWIXWdPBl+t2V385Pl6oyLSG23Scoa5U=; b=4Pmk+AhsIjd00ukLIXWfmISxGbr5daUYtSRiQoCLE48OfHYCjiRYhk5+NtnK55EnnM gGMXA9xDAfWRCEhxLHZZTYzrirptfn2sEIaB/voGrclDryYnyZDu/Z9QOcMtpNa2PyT3 ZKREYgfgmWKPL5kalSVHKoDauUkHNJQsoVk5hT7j8zLRVXx5YqLXxgRlaVz8DgZH2l5r yvd3wrLXOCRGK+4WJK1fx8+yH+BwhUOmBid2N2b9xoVmNAX7cPF4VcR6l5eKEy9MAwAw 7/s2e84lcKMLnK+K0arXBODhMabKoQm/UxeF9xMrzB5PD43P1PVp0/gHUDBiId+8VXjS B72w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037288; x=1772642088; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZPavfHtzfndzWIXWdPBl+t2V385Pl6oyLSG23Scoa5U=; b=j3QGkaS/LT4CWp2yn3Vc20d+hXs3naryDXdSX3ta7ia0oMDiaTU9v/ILIvO9Lv/2l7 /N8OgoALULZsTeTUWfDjRNay8DzDS02svo8WBUISzF8QpIvSbM84cAFVIrlx98Q7/O16 fhLUfWgDDBzybG+jd4jQqXZAlXH/NisU0wSWWt+k4k+W1/1CoAe3VE5bvmta5HiJG2e6 e4fgiAanRgD8qP2DHFVuSlPMGMGhlxpYT9NAVPEjHXLyQVCvW/iyQVuaLoeM3I1tx/CK 6/FRKyH13i5OUlgfFYDDaZzq9Erb8IojuzJKRjloI+2+aFNzHlof3NBJYmTYTIMeWuAl JCtA== X-Forwarded-Encrypted: i=1; AJvYcCUrgxV4uQho5cWdiMxNBBIIQWxsd8lcbMVfOd/CHLsJOwzlgce91Mow6v0wNJWY1uw//csBxokMsOyUB/c=@vger.kernel.org X-Gm-Message-State: AOJu0YxcsDMwpdrzREW7x0no/v73TvYGbF3yzkrjF/XpWQTld2RQmawu BhFnhFP5fzjFPgfBQjJAnITCLNyfeksf93SRIcg0UnzL/DQxRDciCzHNo6aCZc5LnHA/9GBuwwV gc3ZsWCbMeIXlcg== X-Received: from ejdbx6.prod.google.com ([2002:a17:907:fdc6:b0:b93:4df6:8a2]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:906:8c5:b0:b80:3346:496 with SMTP id a640c23a62f3a-b9081b98d24mr912485466b.42.1772037288223; Wed, 25 Feb 2026 08:34:48 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:34 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-9-e8808a03cd66@google.com> Subject: [PATCH RFC 09/19] mm: move migratetype definitions to freetype.h From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Since migratetype are a sub-element of freetype, move the pure definitions into the new freetype.h. This will enable referring to these raw types from pageblock-flags.h. Signed-off-by: Brendan Jackman --- include/linux/freetype.h | 84 ++++++++++++++++++++++++++++++++++++++++++++= ++++ include/linux/mmzone.h | 73 ----------------------------------------- 2 files changed, 84 insertions(+), 73 deletions(-) diff --git a/include/linux/freetype.h b/include/linux/freetype.h index 9f857d10bb5db..11bd6d2b94349 100644 --- a/include/linux/freetype.h +++ b/include/linux/freetype.h @@ -3,6 +3,66 @@ #define _LINUX_FREETYPE_H =20 #include +#include + +/* + * A migratetype is the part of a freetype that encodes the mobility + * requirements for the allocations the freelist is intended to serve. + * + * It's also currently overloaded to encode page isolation state. + */ +enum migratetype { + MIGRATE_UNMOVABLE, + MIGRATE_MOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_HIGHATOMIC =3D MIGRATE_PCPTYPES, +#ifdef CONFIG_CMA + /* + * MIGRATE_CMA migration type is designed to mimic the way + * ZONE_MOVABLE works. Only movable pages can be allocated + * from MIGRATE_CMA pageblocks and page allocator never + * implicitly change migration type of MIGRATE_CMA pageblock. + * + * The way to use it is to change migratetype of a range of + * pageblocks to MIGRATE_CMA which can be done by + * __free_pageblock_cma() function. + */ + MIGRATE_CMA, + __MIGRATE_TYPE_END =3D MIGRATE_CMA, +#else + __MIGRATE_TYPE_END =3D MIGRATE_HIGHATOMIC, +#endif +#ifdef CONFIG_MEMORY_ISOLATION + MIGRATE_ISOLATE, /* can't allocate from here */ +#endif + MIGRATE_TYPES +}; + +/* In mm/page_alloc.c; keep in sync also with show_migration_types() there= */ +extern const char * const migratetype_names[MIGRATE_TYPES]; + +#ifdef CONFIG_CMA +# define is_migrate_cma(migratetype) unlikely((migratetype) =3D=3D MIGRAT= E_CMA) +#else +# define is_migrate_cma(migratetype) false +#endif + +static inline bool is_migrate_movable(int mt) +{ + return is_migrate_cma(mt) || mt =3D=3D MIGRATE_MOVABLE; +} + +/* + * Check whether a migratetype can be merged with another migratetype. + * + * It is only mergeable when it can fall back to other migratetypes for + * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c. + */ +static inline bool migratetype_is_mergeable(int mt) +{ + return mt < MIGRATE_PCPTYPES; +} =20 /* * A freetype is the index used to identify free lists. This consists of a @@ -35,4 +95,28 @@ static inline bool freetypes_equal(freetype_t a, freetyp= e_t b) return a.migratetype =3D=3D b.migratetype; } =20 +static inline freetype_t migrate_to_freetype(enum migratetype mt, + unsigned int flags) +{ + freetype_t freetype; + + /* No flags supported yet. */ + VM_WARN_ON_ONCE(flags); + + freetype.migratetype =3D mt; + return freetype; +} + +static inline enum migratetype free_to_migratetype(freetype_t freetype) +{ + return freetype.migratetype; +} + +/* Convenience helper, return the freetype modified to have the migratetyp= e. */ +static inline freetype_t freetype_with_migrate(freetype_t freetype, + enum migratetype migratetype) +{ + return migrate_to_freetype(migratetype, freetype_flags(freetype)); +} + #endif /* _LINUX_FREETYPE_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 66a4cfc2afcb0..301328cbb8449 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -62,39 +62,7 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3 =20 -enum migratetype { - MIGRATE_UNMOVABLE, - MIGRATE_MOVABLE, - MIGRATE_RECLAIMABLE, - MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ - MIGRATE_HIGHATOMIC =3D MIGRATE_PCPTYPES, #ifdef CONFIG_CMA - /* - * MIGRATE_CMA migration type is designed to mimic the way - * ZONE_MOVABLE works. Only movable pages can be allocated - * from MIGRATE_CMA pageblocks and page allocator never - * implicitly change migration type of MIGRATE_CMA pageblock. - * - * The way to use it is to change migratetype of a range of - * pageblocks to MIGRATE_CMA which can be done by - * __free_pageblock_cma() function. - */ - MIGRATE_CMA, - __MIGRATE_TYPE_END =3D MIGRATE_CMA, -#else - __MIGRATE_TYPE_END =3D MIGRATE_HIGHATOMIC, -#endif -#ifdef CONFIG_MEMORY_ISOLATION - MIGRATE_ISOLATE, /* can't allocate from here */ -#endif - MIGRATE_TYPES -}; - -/* In mm/page_alloc.c; keep in sync also with show_migration_types() there= */ -extern const char * const migratetype_names[MIGRATE_TYPES]; - -#ifdef CONFIG_CMA -# define is_migrate_cma(migratetype) unlikely((migratetype) =3D=3D MIGRAT= E_CMA) # define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) =3D= =3D MIGRATE_CMA) /* * __dump_folio() in mm/debug.c passes a folio pointer to on-stack struct = folio, @@ -103,27 +71,10 @@ extern const char * const migratetype_names[MIGRATE_TY= PES]; # define is_migrate_cma_folio(folio, pfn) \ (get_pfnblock_migratetype(&folio->page, pfn) =3D=3D MIGRATE_CMA) #else -# define is_migrate_cma(migratetype) false # define is_migrate_cma_page(_page) false # define is_migrate_cma_folio(folio, pfn) false #endif =20 -static inline bool is_migrate_movable(int mt) -{ - return is_migrate_cma(mt) || mt =3D=3D MIGRATE_MOVABLE; -} - -/* - * Check whether a migratetype can be merged with another migratetype. - * - * It is only mergeable when it can fall back to other migratetypes for - * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c. - */ -static inline bool migratetype_is_mergeable(int mt) -{ - return mt < MIGRATE_PCPTYPES; -} - #define for_each_free_list(list, zone, order) \ for (order =3D 0; order < NR_PAGE_ORDERS; order++) \ for (unsigned int idx =3D 0; \ @@ -131,30 +82,6 @@ static inline bool migratetype_is_mergeable(int mt) idx < NR_FREETYPE_IDXS; \ idx++) =20 -static inline freetype_t migrate_to_freetype(enum migratetype mt, - unsigned int flags) -{ - freetype_t freetype; - - /* No flags supported yet. */ - VM_WARN_ON_ONCE(flags); - - freetype.migratetype =3D mt; - return freetype; -} - -static inline enum migratetype free_to_migratetype(freetype_t freetype) -{ - return freetype.migratetype; -} - -/* Convenience helper, return the freetype modified to have the migratetyp= e. */ -static inline freetype_t freetype_with_migrate(freetype_t freetype, - enum migratetype migratetype) -{ - return migrate_to_freetype(migratetype, freetype_flags(freetype)); -} - extern int page_group_by_mobility_disabled; =20 freetype_t get_pfnblock_freetype(const struct page *page, unsigned long pf= n); --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 694613D4130 for ; Wed, 25 Feb 2026 16:34:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037293; cv=none; b=DvSeQ1HaqLC6ebbPUk0gAtjT65BIS01Q4GDIlYmTTtWM9uPByGkj7+WN4Qjs66uRdC0i2ySMlnhh6nADc4SZ1zOD/Ogxlyqq1rQQQ4YOJKmHQzS3mCXWLIJi80PGPQSs26kFApQFCVoeuHOO/BztpbkRFd8/axSZAmY3ae8gMyI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037293; c=relaxed/simple; bh=CRgIeXxfHd1+gygX8ykBVi2D3sWpCWeB04roUVjy0lw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=RO5YsyByFCRLtEWQ7vcIXgs1hDxESWHeSHH9iwDprIdbXO9LAN0cvRy5Zae4z6edrsHTJMsLBm841I+Ys7FP1DDa+kQFUd8fFra9alsxhIl2nSfKYnZ9ry3l4GZoTG6BTXEYMqKJiyDNiNcP34ippm0jJ3upLl0kNvfgxo4YGbM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xe6Zzqlp; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xe6Zzqlp" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4836cc0b38eso23335425e9.2 for ; Wed, 25 Feb 2026 08:34:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037290; x=1772642090; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=S3owdeu4kE4g7S0Z4PekGxXbX8IdWlNVillVf61xXvw=; b=xe6Zzqlp+kVzqx5YLJMWQleiSl+pCwY8zBppplR5htcoiDqmnS6q9QcGGoqDBl+2aR TcRTcoZgIaOMT0G9neGMTK080aJ8LOQwgboH1ZdvB26zIM0IbzkE2vxTmsC9OnKa7vNj goe7J9nJ7+TLFsiWzqW6Y4CEY2xeVkHzNNhKg/R5ADQ5yb1T6GWkiWS1eSPTdTBib6Xy Mgo3GAeN/v1WcKAtzAX4acmGfulicvlxeiejh6Ml8snKBKcxhwmeT8/EUYv/VmhlI1oZ yPWWZZgKAsYEv8ySpfhoSPtEQvfvQ4fIo7N1AiZwzdAfKLhWaNu0kqbB6EUqAJGrx3Fc XcIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037290; x=1772642090; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=S3owdeu4kE4g7S0Z4PekGxXbX8IdWlNVillVf61xXvw=; b=ccFlRDZWGJPvqoMKhlzCPSJDPpjcrznMAkg/KgKYzTPJhmvAsN0ThFfyfqx35H6q9g tpxvwCJ/KB0eUbumHoFJK4YR7ABbatw5RqqPAsaikN+HxBm7KRW34YPYk630fSEaOMw6 Mu+bEIORqYTQSTilHKVllMRJQL7YYRn9eDdNErCYUms9QNL6iGxK7gbuycxGndp1xj9T 4tL+ZUMh5NKbd/CBsC65Imx/r+HQ6epfs7K1ZAoxHv9E/Zik47wMZFdGlY02U71Op7da HJQ5BFdkjm3WXNwyxqPfSEx6Muz6zH/j5fMLukSucR5LMj272sNxIaDXGdFpxrmQKW6Y xrHw== X-Forwarded-Encrypted: i=1; AJvYcCXhEh0+lDPwa2m/VNoOI3roW2cfcdCdE1/KvcI2SIRpYjqaFVvFaYVk6nH/cPi5oA1s4KO1W0DC8QD36eE=@vger.kernel.org X-Gm-Message-State: AOJu0YxaU4wZIgLIEa5is1Y/9yWys3/3wUg0AjDivw8FYQmhUzP6vQHF RiBiP3FO0VZh1SNYuzWE1WNwNXHMR8N1MEg6cjHD+G6MW9nhqJ8U2Ced+9L+zcWuvY4VD6QMJw4 DyJ5vTWiQnf6yhA== X-Received: from wmo14.prod.google.com ([2002:a05:600c:230e:b0:483:41b6:adeb]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4589:b0:480:1e9e:f9b with SMTP id 5b1f17b1804b1-483a95dea3emr345095145e9.16.1772037289584; Wed, 25 Feb 2026 08:34:49 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:35 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-10-e8808a03cd66@google.com> Subject: [PATCH RFC 10/19] mm: add definitions for allocating unmapped pages From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Create __GFP_UNMAPPED, which requests pages that are not present in the direct map. Since this feature has a cost (e.g. more freelists), it's behind a kconfig. Unlike other conditionally-defined GFP flags, it doesn't fall back to being 0. This prevents building code that uses __GFP_UNMAPPED but doesn't depend on the necessary kconfig, since that would lead to invisible security issues. Create a freetype flag to record that pages on the freelists with this flag are unmapped. This is currently only needed for MIGRATE_UNMOVABLE pages, so the freetype encoding remains trivial. Also create the corresponding pageblock flag to record the same thing. To keep patches from being too overwhelming, the actual implementation is added separately, this is just types, Kconfig boilerplate, etc. Signed-off-by: Brendan Jackman --- include/linux/freetype.h | 41 +++++++++++++++++++++++++++++++++-----= --- include/linux/gfp_types.h | 26 ++++++++++++++++++++++++++ include/trace/events/mmflags.h | 9 ++++++++- mm/Kconfig | 4 ++++ 4 files changed, 71 insertions(+), 9 deletions(-) diff --git a/include/linux/freetype.h b/include/linux/freetype.h index 11bd6d2b94349..3b0d41b8c857f 100644 --- a/include/linux/freetype.h +++ b/include/linux/freetype.h @@ -2,6 +2,7 @@ #ifndef _LINUX_FREETYPE_H #define _LINUX_FREETYPE_H =20 +#include #include #include =20 @@ -64,30 +65,56 @@ static inline bool migratetype_is_mergeable(int mt) return mt < MIGRATE_PCPTYPES; } =20 +enum { + /* Defined unconditionally as a hack to avoid a zero-width bitfield. */ + FREETYPE_UNMAPPED_BIT, + NUM_FREETYPE_FLAGS, +}; + /* * A freetype is the index used to identify free lists. This consists of a * migratetype, and other bits which encode orthogonal properties of memor= y. */ typedef struct { - int migratetype; + unsigned int migratetype : order_base_2(MIGRATE_TYPES); + unsigned int flags : NUM_FREETYPE_FLAGS; } freetype_t; =20 +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +#define FREETYPE_UNMAPPED BIT(FREETYPE_UNMAPPED_BIT) +#define NUM_UNMAPPED_FREETYPES 1 +#else +#define FREETYPE_UNMAPPED 0 +#define NUM_UNMAPPED_FREETYPES 0 +#endif + /* * Return a dense linear index for freetypes that have lists in the free a= rea. * Return -1 for other freetypes. */ static inline int freetype_idx(freetype_t freetype) { + /* For FREETYPE_UNMAPPED, only MIGRATE_UNMOVABLE has an index. */ + if (freetype.flags & FREETYPE_UNMAPPED) { + VM_WARN_ON_ONCE(freetype.flags & ~FREETYPE_UNMAPPED); + if (!IS_ENABLED(CONFIG_PAGE_ALLOC_UNMAPPED)) + return -1; + if (freetype.migratetype !=3D MIGRATE_UNMOVABLE) + return -1; + return MIGRATE_TYPES; + } + /* No other flags are supported. */ + VM_WARN_ON_ONCE(freetype.flags); + return freetype.migratetype; } =20 -/* No freetype flags actually exist yet. */ -#define NR_FREETYPE_IDXS MIGRATE_TYPES +/* One for each migratetype, plus one for MIGRATE_UNMOVABLE-FREETYPE_UNMAP= PED */ +#define NR_FREETYPE_IDXS (MIGRATE_TYPES + NUM_UNMAPPED_FREETYPES) =20 static inline unsigned int freetype_flags(freetype_t freetype) { - /* No flags supported yet. */ - return 0; + return freetype.flags; } =20 static inline bool freetypes_equal(freetype_t a, freetype_t b) @@ -100,10 +127,8 @@ static inline freetype_t migrate_to_freetype(enum migr= atetype mt, { freetype_t freetype; =20 - /* No flags supported yet. */ - VM_WARN_ON_ONCE(flags); - freetype.migratetype =3D mt; + freetype.flags =3D flags; return freetype; } =20 diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 814bb2892f99b..1f4f49bacb5b4 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -56,6 +56,9 @@ enum { ___GFP_NOLOCKDEP_BIT, #endif ___GFP_NO_OBJ_EXT_BIT, +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED + ___GFP_UNMAPPED_BIT, +#endif ___GFP_LAST_BIT }; =20 @@ -97,6 +100,10 @@ enum { #define ___GFP_NOLOCKDEP 0 #endif #define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT) +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +#define ___GFP_UNMAPPED BIT(___GFP_UNMAPPED_BIT) +/* No #else - __GFP_UNMAPPED should never be a nop. Break the build if it = isn't supported. */ +#endif =20 /* * Physical address zone modifiers (see linux/mmzone.h - low four bits) @@ -293,6 +300,25 @@ enum { /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) =20 +/* + * Allocate pages that aren't present in the direct map. If the caller cha= nges + * direct map presence, it must be restored to the previous state before f= reeing + * the page. (This is true regardless of __GFP_UNMAPPED). + * + * This uses the mermap (when __GFP_ZERO), so it's only valid to allocate = with + * this flag where that's valid, namely from process context after the mer= map + * has been initialised for that process. This also means that the allocat= or + * leaves behind stale TLB entries in the mermap region. The caller is + * responsible for ensuring they are flushed as needed. + * + * This is currently incompatible with __GFP_MOVABLE and __GFP_RECLAIMABLE= , but + * only because of allocator implementation details, if a usecase arises t= his + * restriction could be dropped. + */ +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +#define __GFP_UNMAPPED ((__force gfp_t)___GFP_UNMAPPED) +#endif + /* Room for N __GFP_FOO bits */ #define __GFP_BITS_SHIFT ___GFP_LAST_BIT #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a6e5a44c9b429..bb365da355b3a 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -61,11 +61,18 @@ # define TRACE_GFP_FLAGS_SLAB #endif =20 +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +# define TRACE_GFP_FLAGS_UNMAPPED TRACE_GFP_EM(UNMAPPED) +#else +# define TRACE_GFP_FLAGS_UNMAPPED +#endif + #define TRACE_GFP_FLAGS \ TRACE_GFP_FLAGS_GENERAL \ TRACE_GFP_FLAGS_KASAN \ TRACE_GFP_FLAGS_LOCKDEP \ - TRACE_GFP_FLAGS_SLAB + TRACE_GFP_FLAGS_SLAB \ + TRACE_GFP_FLAGS_UNMAPPED =20 #undef TRACE_GFP_EM #define TRACE_GFP_EM(a) TRACE_DEFINE_ENUM(___GFP_##a##_BIT); diff --git a/mm/Kconfig b/mm/Kconfig index bd49eb9ef2165..ccf1cda90cf4a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1503,3 +1503,7 @@ config MERMAP_KUNIT_TEST If unsure, say N. =20 endmenu + +config PAGE_ALLOC_UNMAPPED + bool "Support allocating pages that aren't in the direct map" if COMPILE_= TEST + default COMPILE_TEST --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEFA03D7D91 for ; Wed, 25 Feb 2026 16:34:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037294; cv=none; b=pNjplhzSCTSxRzE/wRJTGHWthgi7aPvrBVd9mOV2McQWIysW2D28P2Enp6tfok3Hv81fOKQFGWi+d2Ap08J9ZVbSH9DMyWSn4App1XCcZE4HQUHy+7dgUdFrh+aOEoqChJN8WwQ5PxeJy3qSQ06oXlgO+ujq78Z8oyeuoaPRBIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037294; c=relaxed/simple; bh=GuBJPvFtQ4Em2r2ydRoPRSO8VxUF63NLsKDKDdWW/gM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=s0e6KBRU2CdLmfjuGwDfq/Oa1dJm1VHTgh5GirVFftwxWHejX8gXKsDa4Z8H9UxuEV5JObfESVPwdGy3xzchWFEDd06nt8shDFpEdJzGZxNJNajVWlKu5HriZRXu1yKnQ7M+bwoIqz+Ob979xVE0+iSD04oEGoIOL+nhry5VcVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=uu4HENR4; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uu4HENR4" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-43992d6398fso502389f8f.1 for ; Wed, 25 Feb 2026 08:34:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037291; x=1772642091; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RlNf3cTp43SDl4VGdgu3czPaveOo7YCqos20vV6ESzs=; b=uu4HENR4SL4soo4bnabBlHxgGPGNewsSLtfYOJafAj2mA3iPhV9GwY6AIAhJkgqc8v gxC3esnE3mpFXeyMq2AEKZuFW1GlvZmU21d1mYYG5jjHf8HSlj1QTPeTvbUAOx8IjXf4 vizkJcbNgwXxRwiOiyNr3uO9UEIVel/w+HB8IJvRvex94QUgE0ENdZCa8i3SkCgkeiW0 It4TSbWCP+UlrjDMHpppA76gnT0fspKgWP/5BrjsRxUerUMpKJbicKsEIF1Ad0JeuUmu KuRr4/Y98FEMUxe9/kPzue+Ew/0pmENypfZL0C3cxQB8BoWic+tMTB1P8y5glREHdR7w F/Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037291; x=1772642091; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RlNf3cTp43SDl4VGdgu3czPaveOo7YCqos20vV6ESzs=; b=X6LK1DXb2WFVdYwRvkreisiLTGl76MDDyPSiQaR6ClQwt+L/f5QOBOtMCPwxlj21Xm ez2jzWqK1NREfBEy2Crd+Y8Z6qZ2pojE2uiEAqWPan13xj8ZfeyBn07dCKjk5SNKm2qo jbVwqNZjlDCl7/ZmfMP6kN+DLwAO9zbDgp/9XI5zo84wMTYxtE/GL8y2hGWKY7KUalij MoRMqpqo2yrj7qitp3v/o5oNsoUJdqlOGOij4p4sM9cAo8zyIWMSK29UVr09JFnkJ5Oj ye+ypClL8AZ6zJHyrA5kZuqgMSM1QOyUO6z5KT7StqjM75Gi/cNOFBYE/PZhmrFuNa7t 54Og== X-Forwarded-Encrypted: i=1; AJvYcCXcg0t10g6VW8Yz2R5ze6l1/F2wEGuEprE9iOMUUOOqHHOkBkd3IODytlkkRT3ehC612VzfyaTINiXOv2A=@vger.kernel.org X-Gm-Message-State: AOJu0YyPsncrmb1OnYqF8mw19wcRtormnfzg1dTDvpLwpwmceXWa4Dfj 6I7gGXe/1cyT7BMxNW+ad8JpunFZA9206wwuZs/ImVl8bG5pjedmuHardX4PINy7JyF9glFKWK5 MugerpwPG2/VsBw== X-Received: from wrbfq13.prod.google.com ([2002:a05:6000:2a0d:b0:436:37dd:b500]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:2211:b0:439:9572:52a9 with SMTP id ffacd0b85a97d-43995725453mr896827f8f.41.1772037291074; Wed, 25 Feb 2026 08:34:51 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:36 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-11-e8808a03cd66@google.com> Subject: [PATCH RFC 11/19] mm: rejig pageblock mask definitions From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable A later patch will complicate the definition of these masks, this is a preparatory patch to make that patch easier to review. - More masks will be needed, so add a PAGEBLOCK_ prefix to the names to avoid polluting the "global namespace" too much. - This makes MIGRATETYPE_AND_ISO_MASK start to look pretty long. Well, that global mask only exists for quite a specific purpose so just drop it and take advantage of the newly-defined PAGEBLOCK_ISO_MASK. Signed-off-by: Brendan Jackman --- include/linux/pageblock-flags.h | 6 +++--- mm/page_alloc.c | 12 ++++++------ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flag= s.h index e046278a01fa8..9a6c3ea17684d 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -36,12 +36,12 @@ enum pageblock_bits { =20 #define NR_PAGEBLOCK_BITS (roundup_pow_of_two(__NR_PAGEBLOCK_BITS)) =20 -#define MIGRATETYPE_MASK (BIT(PB_migrate_0)|BIT(PB_migrate_1)|BIT(PB_migra= te_2)) +#define PAGEBLOCK_MIGRATETYPE_MASK (BIT(PB_migrate_0)|BIT(PB_migrate_1)|BI= T(PB_migrate_2)) =20 #ifdef CONFIG_MEMORY_ISOLATION -#define MIGRATETYPE_AND_ISO_MASK (MIGRATETYPE_MASK | BIT(PB_migrate_isolat= e)) +#define PAGEBLOCK_ISO_MASK BIT(PB_migrate_isolate) #else -#define MIGRATETYPE_AND_ISO_MASK MIGRATETYPE_MASK +#define PAGEBLOCK_ISO_MASK 0 #endif =20 #if defined(CONFIG_HUGETLB_PAGE) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 66d4843da8512..9635433c7d711 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -397,7 +397,7 @@ get_pfnblock_bitmap_bitidx(const struct page *page, uns= igned long pfn, #else BUILD_BUG_ON(NR_PAGEBLOCK_BITS !=3D 4); #endif - BUILD_BUG_ON(__MIGRATE_TYPE_END > MIGRATETYPE_MASK); + BUILD_BUG_ON(__MIGRATE_TYPE_END > PAGEBLOCK_MIGRATETYPE_MASK); VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); =20 bitmap =3D get_pageblock_bitmap(page, pfn); @@ -501,7 +501,7 @@ get_pfnblock_freetype(const struct page *page, unsigned= long pfn) __always_inline enum migratetype get_pfnblock_migratetype(const struct page *page, unsigned long pfn) { - unsigned long mask =3D MIGRATETYPE_AND_ISO_MASK; + unsigned long mask =3D PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK; unsigned long flags; =20 flags =3D __get_pfnblock_flags_mask(page, pfn, mask); @@ -510,7 +510,7 @@ get_pfnblock_migratetype(const struct page *page, unsig= ned long pfn) if (flags & BIT(PB_migrate_isolate)) return MIGRATE_ISOLATE; #endif - return flags & MIGRATETYPE_MASK; + return flags & PAGEBLOCK_MIGRATETYPE_MASK; } =20 /** @@ -598,11 +598,11 @@ static void set_pageblock_migratetype(struct page *pa= ge, } VM_WARN_ONCE(get_pageblock_isolate(page), "Use clear_pageblock_isolate() to unisolate pageblock"); - /* MIGRATETYPE_AND_ISO_MASK clears PB_migrate_isolate if it is set */ + /* PAGEBLOCK_ISO_MASK clears PB_migrate_isolate if it is set */ #endif __set_pfnblock_flags_mask(page, page_to_pfn(page), (unsigned long)migratetype, - MIGRATETYPE_AND_ISO_MASK); + PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK); } =20 void __meminit init_pageblock_migratetype(struct page *page, @@ -628,7 +628,7 @@ void __meminit init_pageblock_migratetype(struct page *= page, flags |=3D BIT(PB_migrate_isolate); #endif __set_pfnblock_flags_mask(page, page_to_pfn(page), flags, - MIGRATETYPE_AND_ISO_MASK); + PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK); } =20 #ifdef CONFIG_DEBUG_VM --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B3CE3DA7D4 for ; Wed, 25 Feb 2026 16:34:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037296; cv=none; b=SE3IkAj4P6495x5mHqFq6KlDINZYEv8zRhEdCJs9ZZk6FsXJOrgeOzzqYPTRgJc33uojpy8Pwxo8XVxmZSyvUDp6Bg07U/H56gBbT+qrhUYQnenDQFn0o+xq+VvlDSxc07ZXfJGdWWo5Bk42/kJWDsICqnm3LqEOJVexFP0WPrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037296; c=relaxed/simple; bh=cGOyuZFyWr/EMwK2983XzGq7Ec2kfMwKBcWiL0Hr9TM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hgy5DE8J9lpo8VnFgH/bWsPN21VFnj95I+mtzj+d33PxRj1cXIrXG/RXNARuMvyZnaEVlM3k8N7eNhXZUufYI2iLP5w1/SZg+6m7n92g197jRRukLT40wiveoy2SUGDy/WaKZE+YnXp3U9AWPHbCH0wscUoSMmZxUH7hEtE5xLA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gutidSd6; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gutidSd6" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-483786a09b1so73181335e9.3 for ; Wed, 25 Feb 2026 08:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037293; x=1772642093; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LE2SwiqlFKC41gD5ncV0ztisIzW1QHpma9eQC/BfO8Y=; b=gutidSd6DMVyOvYVZDKNfGXBS18KlSk/CA8yiO5gxKReWfUe12IQWuf5g7rXkDpQ2U yb4D/bauO9yoDtUUbegB4tpUUxpyQnX/O+sv5jR2XB2XDKyBlVe6W4V5SgQ/MF1sk4e4 OTAZvt04iXvpu5CZ2zCb6C5nW34cdb0T0+SQhOsGj68Tp+rlg717aLpWcadpL+6Wq+xf G03doaV6G4z/fn8PVlEP/D2WOgfGwFwXHOMkHgOv5aycu5LiweRDavPlpuGIq/KPWzNk EB3FT5VlEqUbVi2P0O7GQXAM9zWSYETdWMGY92RZgQ26Vm0ApK6SbKKmTp2QkgpFba5u oxuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037293; x=1772642093; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LE2SwiqlFKC41gD5ncV0ztisIzW1QHpma9eQC/BfO8Y=; b=rdTPzoOosScVPwqfjyMySGuKqD3cAx13QOtWxmMFqjTLE8o/LZW4zLDFopjdYYIAUr HjyI31ijjQuoRm6KbsOY43/OnnW5WbtPo3lP8LnsIQOcPYtBjgaiz4nHLBRgKhJ69ixe /+ChgWrah9TAIgY0QZm8mbmmoWx7ioNyfPb9meISASW2z42zHgImjKvSS6b2HTIpnuSo BfP9PJ3rki+iv+5DnWGzcoAKcCGn5GKSY5N2qyJIMqNmFwjPebprxxjvhQ9kENMKiArh Ti7aazD74/8Hd+cbWvpNnTEyJmezsGvdL9cNu5c1WWUI6DINuR3nFLSCo4BgOH4p1btq xm6w== X-Forwarded-Encrypted: i=1; AJvYcCWg6659sXbv7lqtpHXRbEm9dqOhqNdXWrmLa7xX22eO/8R/hf3/tNUwIXF5LOlquKGe74laVteYwUmUNI8=@vger.kernel.org X-Gm-Message-State: AOJu0YyaHzYxskD1BHGot+W7Gv+3MWs3dH7Q3zEpw5P4HK2gV3vl0fGn Ax/OEXfxOma3eDBzJgxch3/JFAAdyBJ+qtBs/u2aNiZuCeB3wF7BxlfQk/QJTbhh8mq+ecPt4se +yPC8Rno1pzy/nw== X-Received: from wmri10-n2.prod.google.com ([2002:a05:600c:8a0a:20b0:47e:e27f:e298]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600d:8450:20b0:483:b505:9db4 with SMTP id 5b1f17b1804b1-483b5059e2dmr135408765e9.31.1772037292676; Wed, 25 Feb 2026 08:34:52 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:37 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-12-e8808a03cd66@google.com> Subject: [PATCH RFC 12/19] mm: encode freetype flags in pageblock flags From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable In preparation for implementing allocation from FREETYPE_UNMAPPED lists. Since it works nicely with the existing allocator logic, and also offers a simple way to amortize TLB flushing costs, __GFP_UNMAPPED will be implemented by changing mappings at pageblock granularity. Therefore, encode the mapping state in the pageblock flags. Also add the necessary logic to record this from a freetype, and reconstruct a freetype from the pageblock flags. Signed-off-by: Brendan Jackman --- include/linux/pageblock-flags.h | 10 ++++++++++ mm/page_alloc.c | 33 +++++++++++++++++++++++++-------- 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flag= s.h index 9a6c3ea17684d..b634280050071 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -11,6 +11,8 @@ #ifndef PAGEBLOCK_FLAGS_H #define PAGEBLOCK_FLAGS_H =20 +#include +#include #include =20 /* Bit indices that affect a whole block of pages */ @@ -18,6 +20,9 @@ enum pageblock_bits { PB_migrate_0, PB_migrate_1, PB_migrate_2, + PB_freetype_flags, + PB_freetype_flags_end =3D PB_freetype_flags + NUM_FREETYPE_FLAGS - 1, + PB_compact_skip,/* If set the block is skipped by compaction */ =20 #ifdef CONFIG_MEMORY_ISOLATION @@ -37,6 +42,7 @@ enum pageblock_bits { #define NR_PAGEBLOCK_BITS (roundup_pow_of_two(__NR_PAGEBLOCK_BITS)) =20 #define PAGEBLOCK_MIGRATETYPE_MASK (BIT(PB_migrate_0)|BIT(PB_migrate_1)|BI= T(PB_migrate_2)) +#define PAGEBLOCK_FREETYPE_FLAGS_MASK (((1 << NUM_FREETYPE_FLAGS) - 1) << = PB_freetype_flags) =20 #ifdef CONFIG_MEMORY_ISOLATION #define PAGEBLOCK_ISO_MASK BIT(PB_migrate_isolate) @@ -44,6 +50,10 @@ enum pageblock_bits { #define PAGEBLOCK_ISO_MASK 0 #endif =20 +#define PAGEBLOCK_FREETYPE_MASK (PAGEBLOCK_MIGRATETYPE_MASK | \ + PAGEBLOCK_ISO_MASK | \ + PAGEBLOCK_FREETYPE_FLAGS_MASK) + #if defined(CONFIG_HUGETLB_PAGE) =20 #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9635433c7d711..b79f81b64d9d7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -392,11 +392,8 @@ get_pfnblock_bitmap_bitidx(const struct page *page, un= signed long pfn, unsigned long *bitmap; unsigned long word_bitidx; =20 -#ifdef CONFIG_MEMORY_ISOLATION - BUILD_BUG_ON(NR_PAGEBLOCK_BITS !=3D 8); -#else - BUILD_BUG_ON(NR_PAGEBLOCK_BITS !=3D 4); -#endif + /* NR_PAGEBLOCK_BITS must divide word size. */ + BUILD_BUG_ON(NR_PAGEBLOCK_BITS !=3D 4 && NR_PAGEBLOCK_BITS !=3D 8); BUILD_BUG_ON(__MIGRATE_TYPE_END > PAGEBLOCK_MIGRATETYPE_MASK); VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); =20 @@ -469,9 +466,20 @@ __always_inline freetype_t __get_pfnblock_freetype(const struct page *page, unsigned long pfn, bool ignore_iso) { - int mt =3D get_pfnblock_migratetype(page, pfn); + unsigned long mask =3D PAGEBLOCK_FREETYPE_MASK; + enum migratetype migratetype; + unsigned int ft_flags; + unsigned long flags; =20 - return migrate_to_freetype(mt, 0); + flags =3D __get_pfnblock_flags_mask(page, pfn, mask); + ft_flags =3D (flags & PAGEBLOCK_FREETYPE_FLAGS_MASK) >> PB_freetype_flags; + + migratetype =3D flags & PAGEBLOCK_MIGRATETYPE_MASK; +#ifdef CONFIG_MEMORY_ISOLATION + if (!ignore_iso && flags & BIT(PB_migrate_isolate)) + migratetype =3D MIGRATE_ISOLATE; +#endif + return migrate_to_freetype(migratetype, ft_flags); } =20 /** @@ -605,6 +613,15 @@ static void set_pageblock_migratetype(struct page *pag= e, PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK); } =20 +static inline void set_pageblock_freetype_flags(struct page *page, + unsigned int ft_flags) +{ + unsigned int flags =3D ft_flags << PB_freetype_flags; + + __set_pfnblock_flags_mask(page, page_to_pfn(page), flags, + PAGEBLOCK_FREETYPE_FLAGS_MASK); +} + void __meminit init_pageblock_migratetype(struct page *page, enum migratetype migratetype, bool isolate) @@ -628,7 +645,7 @@ void __meminit init_pageblock_migratetype(struct page *= page, flags |=3D BIT(PB_migrate_isolate); #endif __set_pfnblock_flags_mask(page, page_to_pfn(page), flags, - PAGEBLOCK_MIGRATETYPE_MASK | PAGEBLOCK_ISO_MASK); + PAGEBLOCK_FREETYPE_MASK); } =20 #ifdef CONFIG_DEBUG_VM --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCCBB3E9F7E for ; Wed, 25 Feb 2026 16:34:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037297; cv=none; b=bsx5xHDgLnaSJzuRcv7O83Vfq+b5mf1cZ0Z6dCUmvfJVHvhJjWnH+/lC/B86cSr9bdfacsIXSmz3MhUzw9MdT8UMzdpCGzJNtj5gr5taFb/Nl7Sep/UIVuqDYlxEXkmFcx6g6O58Ai4GcBjyneTmz3y1cSPYpQBuMg8ON93+3HA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037297; c=relaxed/simple; bh=QfeM2Fx20dPWAR+hUL3K/g2orRDrg/7RaYflxFLbAvE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qAw+b2P9J1YfzEp32/XJ5SAVDL8W9gRBJoGZ+smjSIOjiC+f2DQvdda4lgYfWo0cuhTvtW/jR67A6/RrKyAZcvXESLi9YGxaAfBP02ByEEoK6SYExrHvyo3LB3lrmaPhvYh89yaGLp+ApmcTLvZ3fQGzQgSa/WfHFjcJ53eBNT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CLjaJWNQ; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CLjaJWNQ" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4836bf1a920so72599025e9.3 for ; Wed, 25 Feb 2026 08:34:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037294; x=1772642094; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3TMbZ0g7Oj0OTJpFiuJio0PTSnHyS2DbyRdcs0PvRcU=; b=CLjaJWNQtHRk6kdgrpVU98JAxpywubv46CRem69OegiaTn5FKmNTHDTGIDOK7Fcq6+ eeeSp/aujKkFLItVLeUl4PkZZz5YlT2Z+oAoOBxIKd19MA5/01VwvBFu5/dgZI0cAk7j a9kafBVNhDgYt1CwDtcrSz9dJYnYGK4dwGH2XB+GETZlDTRfCloKcfCMK5GVDPIBEuLf RDhvk56hLuEcy+An8WJSEPY1kiRipsQC1W8/LneOle7Pry4ysudAvEDFgpqtwqjQ/iIx oZc5qPfrVgHvhp3E8wpOe1mVceWc3kVbGSH7a6x+dgUZfLOiwDGiFyZWqNlzHr/8kXEI p7Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037294; x=1772642094; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3TMbZ0g7Oj0OTJpFiuJio0PTSnHyS2DbyRdcs0PvRcU=; b=lJYraVCIeXMMSqtaU4hcx/afHCc7FwEAsCFmmW+LDffNDYFkcm8BJv80eLom85qvq/ pRVYRLXfSSth4PAAfX69UXCw+KqQtEIaj7U1DddjTHAI0LNDU3kukPfrQ3N+AA0dy75T he6+ETWEZ8+hVvpQZAkm5cmWJXINbURkptCUatApSerLS0fbs1NkTgj0/esix9cKfzW0 qAkIdm9hN/K0zXFwFMuw8EokAqd0bFwAo97cyZdOdm4x7hoW629j9QBCwQHkqfPlgDG+ +H7DVxVOpkx75oXIhB8YJ/DSBAFetUNMLCL8o3CBu7k2/sSCEv0tIpzPUBUQaoWAThTt N9ZQ== X-Forwarded-Encrypted: i=1; AJvYcCXzj3ON3/Z9HgF6qLeH0uxdI72x1yPvrk9/LIxNv8cJ5obQLYtlf7LpLbHzUnE8mjmg7PGdFMB1V7gBtm8=@vger.kernel.org X-Gm-Message-State: AOJu0YwvtvOKIHa8Sx/wuuk9Udb77NJ0y/JrL6Yr2LfA0aj9hz+dAqF1 zSrJlok8Yh8opl2+jbniVt4aLYWVLri3x9Q7ERNSgr131Qld4ElkIvdHgq05Nvpx0P30G/pzahP WpH7n5toCd2JHfg== X-Received: from wmbjj16.prod.google.com ([2002:a05:600c:6a10:b0:483:702a:341c]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d12:b0:483:5310:dc67 with SMTP id 5b1f17b1804b1-483c219b626mr16094975e9.20.1772037294131; Wed, 25 Feb 2026 08:34:54 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:38 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-13-e8808a03cd66@google.com> Subject: [PATCH RFC 13/19] mm/page_alloc: remove ifdefs from pindex helpers From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The ifdefs are not technically needed here, everything used here is always defined. They aren't doing much harm right now but a following patch will complicate these functions. Switching to IS_ENABLED() makes the code a bit less tiresome to read. Signed-off-by: Brendan Jackman --- mm/page_alloc.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b79f81b64d9d7..fa12fff2182c7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -731,19 +731,17 @@ static void bad_page(struct page *page, const char *r= eason) =20 static inline unsigned int order_to_pindex(int migratetype, int order) { + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + bool movable =3D migratetype =3D=3D MIGRATE_MOVABLE; =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - bool movable; - if (order > PAGE_ALLOC_COSTLY_ORDER) { - VM_BUG_ON(order !=3D HPAGE_PMD_ORDER); + if (order > PAGE_ALLOC_COSTLY_ORDER) { + VM_BUG_ON(order !=3D HPAGE_PMD_ORDER); =20 - movable =3D migratetype =3D=3D MIGRATE_MOVABLE; - - return NR_LOWORDER_PCP_LISTS + movable; + return NR_LOWORDER_PCP_LISTS + movable; + } + } else { + VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); } -#else - VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); -#endif =20 return (MIGRATE_PCPTYPES * order) + migratetype; } @@ -752,12 +750,12 @@ static inline int pindex_to_order(unsigned int pindex) { int order =3D pindex / MIGRATE_PCPTYPES; =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pindex >=3D NR_LOWORDER_PCP_LISTS) - order =3D HPAGE_PMD_ORDER; -#else - VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); -#endif + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + if (pindex >=3D NR_LOWORDER_PCP_LISTS) + order =3D HPAGE_PMD_ORDER; + } else { + VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); + } =20 return order; } --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-ed1-f73.google.com (mail-ed1-f73.google.com [209.85.208.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 005453E9F9C for ; Wed, 25 Feb 2026 16:34:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037298; cv=none; b=mRwj44X8Rolamqp3QvpeF3PYrlhu3ND9XgOdZKwMh47ozEZyR/nJ7rtM1+fl4b8Vnzg1e3SLCxZrAo3u77qsipcc1wLFf3QWFk93veGS0K8oBgBH0Cjj677mtPoluJCNNoCE/AfDNYWv5TkIC6zkvfjQU5VA1GALo/YYQe5j22k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037298; c=relaxed/simple; bh=75hKgIS9RzMMp4owYlRuElrSxMDZa6gGyt78/Cpptfw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WTFGCKdpfo7bvTri/HaB9DR8T+76O5l6sPNRyFTD1gQQyOQzVztNkzf17soLOZ+ENaQbfRoMOBa8olE01sZ3yP5h7SkDlRlAjyW1wqwJb2DZrtVlZJ/KhKTixbUM/Uh4pRTTLCsSpiHu6PnscfkEtGrT36XU1pY4cL4MUgSwzqE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=uEQbhqLF; arc=none smtp.client-ip=209.85.208.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uEQbhqLF" Received: by mail-ed1-f73.google.com with SMTP id 4fb4d7f45d1cf-65fa4713bc6so626965a12.1 for ; Wed, 25 Feb 2026 08:34:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037295; x=1772642095; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=q51ijtBTrIsvjhX+iA1BrCEddMf5G3kZ1UUUAgabwFs=; b=uEQbhqLFMwTn64Ph8OLzH64BwE2gtgPLKG80QwZm1/yQUOCF6pgQa6NwGYAzIuHx5R M2B4U1hbRpOKkyoAizk1F+eUkEM6sXFLYMvKkwcOaZrM4GNW/Pg3k5lemFbZgw/t8mUw /PaVljvN/bgaYZkXktBsvULS18GGxif+XYfQvo+jAV30ZWcggubBOQWoDPm/9WekzC6G ECztXvN7eHPN7yVUyQfW4mGpJ/+Ky0qHIWNEiiGyyMAHpfNnaOFJgQnAtmQugFuxC1Ei 8Oo4MTVwEbVkLAJnHx3HfC/5nkVszGmOAbWIo6RaOKfRqh/SwtCYrMjRurpTpoi5xGzg 0ZmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037295; x=1772642095; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q51ijtBTrIsvjhX+iA1BrCEddMf5G3kZ1UUUAgabwFs=; b=ezSvtL554Y0fbi86GdZUqo1JPBh1ui6f5MSvbGm6pJTZV1Gs9AYwtJ3u41uQxnnVlL z29uXp+meJ28XgcJiyrGDZSev+cBFZp/9o/Ni4M2fQiqtofJGV5tg3LYclVdFAr4K/sU 1MaEma+7ikhxru1gCUNdi1eaZWYh1WtPyIR4trwYxtuPgYVnZqmAyiHFM9awCy5fqITl 990244biN+HxOW/lQ2Wrvb6bP/oyhMSE1xqXpmTyZR29HkirhgN/8Qr2V21iwQUKGIQp uDdhd/lDYbzdv4Q3iusKT+Dt0ERSdgVn50Gd1dxC+wsgD3oxBwzi6LY33X5Hr43RIYGt iZQg== X-Forwarded-Encrypted: i=1; AJvYcCXItsZJTLHaYjO6p2xkpPzYn65y+FyuSc1PD32MKzcfCVHuu7wRuCTLopyHQqkLcgJ12Aj73gLKWHdv1XI=@vger.kernel.org X-Gm-Message-State: AOJu0YzhMA1x87zLWXlXHIaL1/Hal1/Jqt1lccFrbyNi2vMR5ce4qAmK LYXgXTd119onsRy6cMcl7k7xIeK32lenMIqxdnYF09AQQPpnw3aq5M4isiDPX3X91nUiYML2Dhd 4sRCmWE0G8PPwWw== X-Received: from edsj15.prod.google.com ([2002:a50:ed0f:0:b0:64b:8031:e69a]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:848:b0:658:bf9f:75be with SMTP id 4fb4d7f45d1cf-65f7bb13cd0mr2490510a12.16.1772037295237; Wed, 25 Feb 2026 08:34:55 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:39 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-14-e8808a03cd66@google.com> Subject: [PATCH RFC 14/19] mm/page_alloc: separate pcplists by freetype flags From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The normal freelists are already separated by this flag, so now update the pcplists accordingly. This follows the most "obvious" design where __GFP_UNMAPPED is supported at arbitrary orders. If necessary, it would be possible to avoid the proliferation of pcplists by restricting orders that can be allocated from them with this FREETYPE_UNMAPPED. On the other hand, there's currently no usecase for movable/reclaimable unmapped memory, and constraining the migratetype doesn't have any tricky plumbing implications. So, take advantage of that and assume that FREETYPE_UNMAPPED implies MIGRATE_UNMOVABLE. Overall, this just takes the existing space of pindices and tacks another bank on the end. For !THP this is just 4 more lists, with THP there is a single additional list for hugepages. Signed-off-by: Brendan Jackman --- include/linux/mmzone.h | 11 ++++++++++- mm/page_alloc.c | 44 +++++++++++++++++++++++++++++++++----------- 2 files changed, 43 insertions(+), 12 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 301328cbb8449..fc242b4090441 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -692,8 +692,17 @@ enum zone_watermarks { #else #define NR_PCP_THP 0 #endif +/* + * FREETYPE_UNMAPPED can currently only be used with MIGRATE_UNMOVABLE, no= for + * those there's no need to encode the migratetype in the pindex. + */ +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +#define NR_UNMAPPED_PCP_LISTS (PAGE_ALLOC_COSTLY_ORDER + 1 + !!NR_PCP_THP) +#else +#define NR_UNMAPPED_PCP_LISTS 0 +#endif #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER= + 1)) -#define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) +#define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP + NR_UNMAPPED_PCP= _LISTS) =20 /* * Flags used in pcp->flags field. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fa12fff2182c7..14098474afd07 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -729,18 +729,30 @@ static void bad_page(struct page *page, const char *r= eason) add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); } =20 -static inline unsigned int order_to_pindex(int migratetype, int order) +static inline unsigned int order_to_pindex(freetype_t freetype, int order) { + int migratetype =3D free_to_migratetype(freetype); + + VM_BUG_ON(migratetype >=3D MIGRATE_PCPTYPES); + VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER && + (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) || order !=3D HPAGE_PMD_ORDER)= ); + + /* FREETYPE_UNMAPPED currently always means MIGRATE_UNMOVABLE. */ + if (freetype_flags(freetype) & FREETYPE_UNMAPPED) { + int order_offset =3D order; + + VM_BUG_ON(migratetype !=3D MIGRATE_UNMOVABLE); + if (order > PAGE_ALLOC_COSTLY_ORDER) + order_offset =3D PAGE_ALLOC_COSTLY_ORDER + 1; + + return NR_LOWORDER_PCP_LISTS + NR_PCP_THP + order_offset; + } + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { bool movable =3D migratetype =3D=3D MIGRATE_MOVABLE; =20 - if (order > PAGE_ALLOC_COSTLY_ORDER) { - VM_BUG_ON(order !=3D HPAGE_PMD_ORDER); - + if (order > PAGE_ALLOC_COSTLY_ORDER) return NR_LOWORDER_PCP_LISTS + movable; - } - } else { - VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); } =20 return (MIGRATE_PCPTYPES * order) + migratetype; @@ -748,8 +760,18 @@ static inline unsigned int order_to_pindex(int migrate= type, int order) =20 static inline int pindex_to_order(unsigned int pindex) { - int order =3D pindex / MIGRATE_PCPTYPES; + unsigned int unmapped_base =3D NR_LOWORDER_PCP_LISTS + NR_PCP_THP; + int order; =20 + if (pindex >=3D unmapped_base) { + order =3D pindex - unmapped_base; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + order > PAGE_ALLOC_COSTLY_ORDER) + return HPAGE_PMD_ORDER; + return order; + } + + order =3D pindex / MIGRATE_PCPTYPES; if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (pindex >=3D NR_LOWORDER_PCP_LISTS) order =3D HPAGE_PMD_ORDER; @@ -2970,7 +2992,7 @@ static bool free_frozen_page_commit(struct zone *zone, */ pcp->alloc_factor >>=3D 1; __count_vm_events(PGFREE, 1 << order); - pindex =3D order_to_pindex(free_to_migratetype(freetype), order); + pindex =3D order_to_pindex(freetype, order); list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count +=3D 1 << order; =20 @@ -3490,7 +3512,7 @@ static struct page *rmqueue_pcplist(struct zone *pref= erred_zone, * frees. */ pcp->free_count >>=3D 1; - list =3D &pcp->lists[order_to_pindex(free_to_migratetype(freetype), order= )]; + list =3D &pcp->lists[order_to_pindex(freetype, order)]; page =3D __rmqueue_pcplist(zone, order, freetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp, UP_flags); if (page) { @@ -5275,7 +5297,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int = preferred_nid, goto failed; =20 /* Attempt the batch allocation */ - pcp_list =3D &pcp->lists[order_to_pindex(free_to_migratetype(ac.freetype)= , 0)]; + pcp_list =3D &pcp->lists[order_to_pindex(ac.freetype, 0)]; while (nr_populated < nr_pages) { =20 /* Skip existing pages */ --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89E413DA7E3 for ; Wed, 25 Feb 2026 16:34:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037300; cv=none; b=nvm06JzL9ZQ3UPYbLeUjuxfUJ7kQHYog4alnwN//kmByy6cWDOllpNrnUP5r3XGNOa/6n07I9vKGPkiqLDhsvDyRmPSHSG+NxZumefyz4BL7ewAlfsoqov19GTvTueYXZac/VVOWeCF4aS2YpQO/LoPLbYo0ZZ495fgeWzktzCA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037300; c=relaxed/simple; bh=ebS9I8aai7kV4sU2PAFUm9jjqS6JNJmM9daxxIuyudA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sUwdfqLN/RdFPRLss6wO55pSNQVxvgYlD8GEKeJPltOF9EBdo9QWD3/jHPGquknuSNjnTs4Fedgueoiv2qWG/aK/fWKwvQwPTtZ1k2TDhSYH+PpBjVHwW6cSnH6klqHeYYAGpT4KVQyLc1p2nqDyzEE23cGwb3sk0jVMTxxWFZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Bv18Nsad; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Bv18Nsad" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4832c4621c2so67396655e9.3 for ; Wed, 25 Feb 2026 08:34:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037297; x=1772642097; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=G/9fFi5oF0vRGQJfwEM4ETi3Y5U6mRHQzuEt7SowwhY=; b=Bv18NsadpvO4eYOM9CRb/cxuoqfyFYfgYVl1D5y5kBMoymEZ/VnmXWlx6R9FQSpMDB wHLyL2hbjR6WrrCjlwb97WOHgfCJkcO3q+nIzvRd5KZr/5PD8Y8ohfjnsHaUlvPpiL7+ kJSDmQU42IMovB6/YkTWgYlJEKpm6vfpF2koEKgbi5MrqmSplmI0u/5vnzbxDO68YLQ4 pU4TE+jp7bIAfaJkAkP2rw5Zi8ZfTrptTIrH9wPZeBMAMDlEsHGmFmv3GPESniIY5Zk0 dpukdmQ5NViG4WWtwbYXkiVa6y63qmXFH+gdJeUpKf57ZJo93zByrjy82nJDeDmDbZX5 jEzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037297; x=1772642097; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=G/9fFi5oF0vRGQJfwEM4ETi3Y5U6mRHQzuEt7SowwhY=; b=oPQzEXOwOVLtWd1Dqebqa0pgzM3/H02aUvJjdzvEb4vxb/NpKbrl68dXH+JkbNHdz6 kT95r7VFp6sn7spJi/b8zF61uo4HqgX5sGo+6bmr3rK6cQODGYWDcqm5Bev8bewdo4Pf GZFRvYWkm3bt73vQuo0/WfLJHdXOtIq5FbdZnqyEj0pzaP6XANKV4rFQ/B7F+hix7PRa fBYyQx4vP3uBjINwTmIBtu//v9W8D4ytc22JvhYtMXFimDNO0MnNcr2ubyiz3U8BTLdn 5JJj0/7KI4vnv8Z5hQG/JdnaUvoZ8CU+hpeDrjRAUoUyuc2nsRQfyWJiJ398pT2rWYn/ ZaIA== X-Forwarded-Encrypted: i=1; AJvYcCUN2Fdn70tg3Zt+HUj9crU+ifsqFshd27EB4qJELIcbAx1MwduL3xctr1kG/TVj6m0D6VFRoYAUy7yEMCE=@vger.kernel.org X-Gm-Message-State: AOJu0Yyni6nAsoEE2TQ2Y4h2/VnSnljkMoE8kYMbCL0hl6kly6qOrjKG tJADn4/Po88bKJP/WJxOJs05zmGd/G8UYJ0zARc+dWauP5x0dAtDFI8qp0Ogidp83YA8v9w3/FL lV7C86YvrT82/dw== X-Received: from wmbjr3.prod.google.com ([2002:a05:600c:5603:b0:480:6c79:b533]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:674f:b0:483:6ff1:18b with SMTP id 5b1f17b1804b1-483a9555bafmr280867955e9.0.1772037296543; Wed, 25 Feb 2026 08:34:56 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:40 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-15-e8808a03cd66@google.com> Subject: [PATCH RFC 15/19] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") renamed ALLOC_HARDER to ALLOC_NON_BLOCK because the former is "a vague description". However, vagueness is accurate here, this is a vague flag. It is not set for __GFP_NOMEMALLOC. It doesn't really mean "allocate without blocking" but rather "allow dipping into atomic reserves, _because_ of the need not to block". A later commit will need an alloc flag that really means "don't block here", so go back to the flag's old name and update the commentary to try and give it a slightly clearer meaning. Signed-off-by: Brendan Jackman --- mm/internal.h | 9 +++++---- mm/page_alloc.c | 8 ++++---- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index cac292dcd394f..5be53d25c89b7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1372,9 +1372,10 @@ unsigned int reclaim_clean_pages_from_list(struct zo= ne *zone, #define ALLOC_OOM ALLOC_NO_WATERMARKS #endif =20 -#define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access - * to 25% of the min watermark or - * 62.5% if __GFP_HIGH is set. +#define ALLOC_HARDER 0x10 /* Because the caller cannot block, + * allow access * to 25% of the min + * watermark or 62.5% if __GFP_HIGH is + * set. */ #define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50% * of the min watermark. @@ -1391,7 +1392,7 @@ unsigned int reclaim_clean_pages_from_list(struct zon= e *zone, #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 /* Flags that allow allocations below the min watermark. */ -#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC= |ALLOC_OOM) +#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|AL= LOC_OOM) =20 enum ttu_flags; struct tlbflush_unmap_batch; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 14098474afd07..42b807faca5fe 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3392,7 +3392,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zon= e, struct zone *zone, * reserves as failing now is worse than failing a * high-order atomic allocation in the future. */ - if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK))) + if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_HARDER))) page =3D __rmqueue_smallest(zone, order, ft_high); =20 if (!page) { @@ -3755,7 +3755,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int= order, unsigned long mark, * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get * access to the min reserve. */ - if (alloc_flags & ALLOC_NON_BLOCK) + if (alloc_flags & ALLOC_HARDER) min -=3D min / 4; } =20 @@ -4640,7 +4640,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH). + * set both ALLOC_HARDER and ALLOC_MIN_RESERVE(__GFP_HIGH). */ alloc_flags |=3D (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); @@ -4651,7 +4651,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) * if it can't schedule. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) { - alloc_flags |=3D ALLOC_NON_BLOCK; + alloc_flags |=3D ALLOC_HARDER; =20 if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) alloc_flags |=3D ALLOC_HIGHATOMIC; --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-ed1-f74.google.com (mail-ed1-f74.google.com [209.85.208.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA1933ECBE8 for ; Wed, 25 Feb 2026 16:34:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037301; cv=none; b=FGMbtS0dkUMeKEXjmLzkOZhCop2HlINfPeg/aOXJFwQjcSF38xQixn1ySIUccg6/LlHyRAMp2DagjNoGxghSIOW0Q56Z7qiAC8Hr+9LFtkYqxcm/3/yGLnoLOzm3xPo8pLl0pszJ6y7z63+ayZ70f1FksGzYROlvyKhG800ZZik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037301; c=relaxed/simple; bh=VJZGXNNp92gP9BwsFi6v8jVgYl1WPnDx6Fbp1Vutrhw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=V7aPLtotrLJz16XMObn1EEOcTsA7bnBNp+H6LfgI4/U2FOBIMP6xbUbjhKnPbL8tY53dU2DIMtsUAZJLEubWKHy7aaKbocnESp4LzplBKLHL25f46Z7S6irHgneoOtRmaZtRyyxvQSlvk8H//88PhgyNqJp3rrsivEPOJ994nCI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wcErE7JB; arc=none smtp.client-ip=209.85.208.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wcErE7JB" Received: by mail-ed1-f74.google.com with SMTP id 4fb4d7f45d1cf-65f767a8d6cso1576630a12.2 for ; Wed, 25 Feb 2026 08:34:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037298; x=1772642098; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/X71c0eq0chbAJtrestek7XlSR8mXWIyGEkHLGooRRo=; b=wcErE7JBCQN7z39SDomQB9u9tFrqwTO/kYdpjhEtT4a65hI7jnSiBtL2B0YtTAnBUH low4oac0hID0bwWYcioP3pFzWIbgF0xj2i8qQmg7AhNzZANmnLpfrxuouaMfOynOVc0b rxiCJWDUfE9XlTWQW5gOkerLzsWmz4c3NTucplZmaJAZleKtCri3UIbcAKFO1+DK2MSe BsgctOA4Hw5/19gQBGs0I0bzYqF4AQwFNAPWAQ6sY8bgHOrb+6BCczNbOoEha8KpT5IV 9ZdGgjocpyPcx8DBfbMjpurIkBABhhVzJrsboPVkPP/WNmrjd6iJDBm/Mrewnr8a/zZ9 p6mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037298; x=1772642098; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/X71c0eq0chbAJtrestek7XlSR8mXWIyGEkHLGooRRo=; b=rCKHwysV73L73EQQ+MjO/84cBSvS2/OH5nfsiCt13BRWJZlHUOVMBa58IwU4KN8MLu xrtjHxDQFzLtY68KSwH7q35p3tEH4zPmWt36NJGOVGQSEXIso01kgE5B4QFdzGjMEREY Kh0k+VRCUGVsQhpS92CpcNU2D0atpt99craIuSxT0jmuPRjpqDxKpLGMp1ppOxDhZPE4 LyKM/7u3z9POEmAxnL/7ILVcNP8DSH8/aF4iYxu0p7fxVj//fuOYMqV7dvDN2LZf3wRX Lxt39omfQldxCWnhskaOT/ORZ+rIMeg9n/hb+FQTyp18dUKnWBRfc83AJBpA90tSmw9q u2nA== X-Forwarded-Encrypted: i=1; AJvYcCXlH4cd+rgwf8l4UIKiANb1vdQnYwTvAM9GeadcGaINN3ZD/ikpi7OCy02+yW+lhmg1nu8YvryvLWXb7Ss=@vger.kernel.org X-Gm-Message-State: AOJu0YwcIp5JNaTREtu0nx7IgPm4w6WNMftx7FCRb7W3CGEkwCTSEkPi xc4zTnALCFnXJndXWeAD7Mg/+yfqmIe4niBnAPQ9IHdWbXk2BbPqfdFivhdUJbZ8AgsWb+RSuhC A0NVN3LkpjA2Xig== X-Received: from edyr17.prod.google.com ([2002:aa7:cfd1:0:b0:654:503e:5179]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:1e8a:b0:65c:3684:3294 with SMTP id 4fb4d7f45d1cf-65ea4ee919emr10937951a12.11.1772037297847; Wed, 25 Feb 2026 08:34:57 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:41 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-16-e8808a03cd66@google.com> Subject: [PATCH RFC 16/19] mm/page_alloc: introduce ALLOC_NOBLOCK From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This flag is set unless we can be sure the caller isn't in an atomic context. The allocator will soon start needing to call set_direct_map_* APIs which cannot be called with IRQs off. It will need to do this even before direct reclaim is possible. Despite the fact that, in principle, ALLOC_NOBLOCK is distinct from __GFP_DIRECT_RECLAIM, in order to avoid introducing a GFP flag, just infer the former based on whether the caller set the latter. This means that, in practice, ALLOC_NOBLOCK is just !__GFP_DIRECT_RECLAIM, except that it is not influenced by gfp_allowed_mask. This could change later, though. Call it ALLOC_NOBLOCK in order to try and mitigate confusion vs the recently-removed ALLOC_NON_BLOCK, which meant something different. Signed-off-by: Brendan Jackman --- mm/internal.h | 1 + mm/page_alloc.c | 29 ++++++++++++++++++++++------- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5be53d25c89b7..6f2eacf3d8f2c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1390,6 +1390,7 @@ unsigned int reclaim_clean_pages_from_list(struct zon= e *zone, #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ +#define ALLOC_NOBLOCK 0x1000 /* Caller may be atomic */ =20 /* Flags that allow allocations below the min watermark. */ #define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|AL= LOC_OOM) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 42b807faca5fe..5576bd6a26b7b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4646,6 +4646,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); =20 if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) { + alloc_flags |=3D ALLOC_NOBLOCK; + /* * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. @@ -4839,14 +4841,13 @@ check_retry_cpuset(int cpuset_mems_cookie, struct a= lloc_context *ac) =20 static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, - struct alloc_context *ac) + struct alloc_context *ac, unsigned int alloc_flags) { bool can_direct_reclaim =3D gfp_mask & __GFP_DIRECT_RECLAIM; bool can_compact =3D can_direct_reclaim && gfp_compaction_allowed(gfp_mas= k); bool nofail =3D gfp_mask & __GFP_NOFAIL; const bool costly_order =3D order > PAGE_ALLOC_COSTLY_ORDER; struct page *page =3D NULL; - unsigned int alloc_flags; unsigned long did_some_progress; enum compact_priority compact_priority; enum compact_result compact_result; @@ -4898,7 +4899,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * kswapd needs to be woken up, and to avoid the cost of setting up * alloc_flags precisely. So we do that now. */ - alloc_flags =3D gfp_to_alloc_flags(gfp_mask, order); + alloc_flags |=3D gfp_to_alloc_flags(gfp_mask, order); =20 /* * We need to recalculate the starting point for the zonelist iterator @@ -5124,6 +5125,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, return page; } =20 +static inline unsigned int init_alloc_flags(gfp_t gfp_mask, unsigned int f= lags) +{ + /* + * If the caller allowed __GFP_DIRECT_RECLAIM, they can't be atomic. + * Note this is a separate determination from whether direct reclaim is + * actually allowed, it must happen before applying gfp_allowed_mask. + */ + if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) + flags |=3D ALLOC_NOBLOCK; + return flags; +} + static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid, nodemask_t *nodemask, struct alloc_context *ac, gfp_t *alloc_gfp, @@ -5205,7 +5218,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int = preferred_nid, struct list_head *pcp_list; struct alloc_context ac; gfp_t alloc_gfp; - unsigned int alloc_flags =3D ALLOC_WMARK_LOW; + unsigned int alloc_flags =3D init_alloc_flags(gfp, ALLOC_WMARK_LOW); int nr_populated =3D 0, nr_account =3D 0; =20 /* @@ -5346,7 +5359,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, u= nsigned int order, int preferred_nid, nodemask_t *nodemask) { struct page *page; - unsigned int alloc_flags =3D ALLOC_WMARK_LOW; + unsigned int alloc_flags =3D init_alloc_flags(gfp, ALLOC_WMARK_LOW); gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */ struct alloc_context ac =3D { }; =20 @@ -5391,7 +5404,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, u= nsigned int order, */ ac.nodemask =3D nodemask; =20 - page =3D __alloc_pages_slowpath(alloc_gfp, order, &ac); + page =3D __alloc_pages_slowpath(alloc_gfp, order, &ac, alloc_flags); =20 out: if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page && @@ -7911,11 +7924,13 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t= gfp_flags, int nid, unsigned */ gfp_t alloc_gfp =3D __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC | __GFP_= COMP | gfp_flags; - unsigned int alloc_flags =3D ALLOC_TRYLOCK; + unsigned int alloc_flags =3D init_alloc_flags(alloc_gfp, ALLOC_TRYLOCK); struct alloc_context ac =3D { }; struct page *page; =20 VM_WARN_ON_ONCE(gfp_flags & ~__GFP_ACCOUNT); + VM_WARN_ON_ONCE(!(alloc_flags & ALLOC_NOBLOCK)); + /* * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is * unsafe in NMI. If spin_trylock() is called from hard IRQ the current --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 129F33ECBD8 for ; Wed, 25 Feb 2026 16:35:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037302; cv=none; b=eh4J8Y0TI1utDy3jBtQ73muaAiRXLWJDis1cwjSPcXve49BdzIweUZFK9ejf/7TA3tAPrsRaFwfYHrKnhZ714qWgfVdcEj/apIt0j8IJQmBAuAO+cAeuxzOo+B2DbkwSwRxpGMOG8OGDqy7yz5MkmYjW7YSMewKD4LELpwFyZVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037302; c=relaxed/simple; bh=oRyreGtnL62oSkr9fyAS3pkTjjKp6Tff2MuXp+QRjDg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mpRtsYc5BpHdm7Pa0KbJXJnU87BoCPrEcAhGL4dsci6BHiTUNlNG3WLdhuPlRZTXcMNedKmlHTqocVItwgwyBP/A550s4mKgSSvSB6VDALX3AHMlX5O8NnA07lgP2ynFY8IvAzB22dII6CrEIr3ifLRrtm0r9RngYQw26KfqjfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lOlLck/d; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lOlLck/d" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-483786a09b1so73182705e9.3 for ; Wed, 25 Feb 2026 08:35:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037299; x=1772642099; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=k/nafNPyke0FSiG9yz/rspM9/pGf7x0iSrG2uzWUqdI=; b=lOlLck/dww2gPCSNktj2Fqyio4/YWx8WxzfavlU5PAP0S+VH4d65HXTDfy23w0YSW8 YUi+zV9f4UmCOlSNRl5YdkAYOQ6HHesUoUoIMCwsmo7d9pCsWaAHZnmMCLBA3J0YG9C0 nVWvZOmSQKXZdZJZzWmtnfdFWQjbUZIjDYon90zWsxBLZOQA7nyea8FMh35Sb3lrSsoO SU9ApiSBF3wpKoVAjoEMyT1BG1Ii3e9ir6525ZFUwpxr75h7K+q0Jw0pfwTzyfg8IUYI ygB9PHJikpNM0nTyXQpLLjfi4BxAL6G7gxRGjplBRV60GpM+Kz1tlzqF4YHQyilPXaFL Heog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037299; x=1772642099; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k/nafNPyke0FSiG9yz/rspM9/pGf7x0iSrG2uzWUqdI=; b=Zn/Z/fTg1d2qnePOrTyL7l2KeQ6+QdVZ8oxe7NyRU8tT7CxOOIoN1D/OToreA6dp1A GGSQQy4lG4BiF0pChT9b/rctuPIkRWKg14mjcJRCoMFCvSEFZ1IghWHbpxnfVuUiwtW0 rDrfNFM0ueR+OkCk6v+Mpwe+WbRvuuGGGf0L7Mq0CSSYeviYxYjB0+A6HljRCccpqko6 Cvp+dHcPx68FywgF/EppMdpYjLw1sT0YcuoYJ0LeeU3dd81BGumX3I7WjdC6z1wfyGI8 SYmVrVzBiQzKTs2i06PXmhguXnD3nwlEUUmg2lQiwE4xW7FZ95NLfGs7pqsdiZ5EZfm1 Tj9w== X-Forwarded-Encrypted: i=1; AJvYcCWbmn0jci+ME2g4nAtO/px0fozFdnmNZHCTwMkCiJ/jvKIwJBuvKARQpFEqkr7phBpS3KhVPYAYkkusWGQ=@vger.kernel.org X-Gm-Message-State: AOJu0YykAZ8BYM1rY6G0l1ga8MmCRruXiqAutP2r/875yVgZX8QfmXAm dg1dvPgpLiO1mw62X3e5Nh7b1HOs+zL57rGnYq9JgNzz28nNNTT5FuWOkfEhJSf9ekr2vkUmrOK zdIXdYkkfW6lfWA== X-Received: from wmhk20.prod.google.com ([2002:a05:600c:4094:b0:483:6a60:3501]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d0a:b0:483:9139:4c29 with SMTP id 5b1f17b1804b1-483a95b58e6mr296400295e9.2.1772037299176; Wed, 25 Feb 2026 08:34:59 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:42 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-17-e8808a03cd66@google.com> Subject: [PATCH RFC 17/19] mm/page_alloc: implement __GFP_UNMAPPED allocations From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Currently __GFP_UNMAPPED allocs will always fail because, although the lists exist to hold them, there is no way to actually create an unmapped page block. This commit adds one, and also the logic to map it back again when that's needed. Doing this at pageblock granularity ensures that the pageblock flags can be used to infer which freetype a page belongs to. It also provides nice batching of TLB flushes, and also avoids creating too much unnecessary TLB fragmentation in the physmap. There are some functional requirements for flipping a block: - Unmapping requires a TLB shootdown, meaning IRQs must be enabled. - Because the main usecase of this feature is to protect against CPU exploits, when a block is mapped it needs to be zeroed to ensure no residual data is available to attackers. Zeroing a block with a spinlock held seems undesirable. - Updating the pagetables might require allocating a pagetable to break down a huge page. This would deadlock if the zone lock was held. This makes allocations that need to change sensitivity _somewhat_ similar to those that need to fallback to a different migratetype. But, the locking requirements mean that this can't just be squashed into the existing "fallback" allocator logic, instead a new allocator path just for this purpose is needed. The new path is assumed to be much cheaper than the really heavyweight stuff like compaction and reclaim. But at present it is treated as less desirable than the mobility-related "fallback" and "stealing" logic. This might turn out to need revision (in particular, maybe it's a problem that __rmqueue_steal(), which causes fragmentation, happens before __rmqueue_direct_map()), but that should be treated as a subsequent optimisation project. This currently forbids __GFP_ZERO, this is just to keep the patch from getting too large, the next patch will remove this restriction. Signed-off-by: Brendan Jackman --- include/linux/gfp.h | 11 +++- mm/Kconfig | 4 +- mm/page_alloc.c | 163 ++++++++++++++++++++++++++++++++++++++++++++++++= ---- 3 files changed, 164 insertions(+), 14 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f189bee7a974c..8abc9f4b1e7e6 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -20,6 +20,7 @@ struct mempolicy; static inline freetype_t gfp_freetype(const gfp_t gfp_flags) { int migratetype; + unsigned int ft_flags =3D 0; =20 VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) =3D=3D GFP_MOVABLE_MASK); BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) !=3D ___GFP_MOVABLE); @@ -36,7 +37,15 @@ static inline freetype_t gfp_freetype(const gfp_t gfp_fl= ags) >> GFP_MOVABLE_SHIFT; } =20 - return migrate_to_freetype(migratetype, 0); +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED + if (gfp_flags & __GFP_UNMAPPED) { + if (WARN_ON_ONCE(migratetype !=3D MIGRATE_UNMOVABLE)) + migratetype =3D MIGRATE_UNMOVABLE; + ft_flags |=3D FREETYPE_UNMAPPED; + } +#endif + + return migrate_to_freetype(migratetype, ft_flags); } #undef GFP_MOVABLE_MASK #undef GFP_MOVABLE_SHIFT diff --git a/mm/Kconfig b/mm/Kconfig index ccf1cda90cf4a..3200ea8836432 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1502,8 +1502,8 @@ config MERMAP_KUNIT_TEST =20 If unsure, say N. =20 -endmenu - config PAGE_ALLOC_UNMAPPED bool "Support allocating pages that aren't in the direct map" if COMPILE_= TEST default COMPILE_TEST + +endmenu diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5576bd6a26b7b..f7754080dd25b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -1037,6 +1038,26 @@ static void change_pageblock_range(struct page *page= block_page, } } =20 +/* + * Can pages of these two freetypes be combined into a single higher-order= free + * page? + */ +static inline bool can_merge_freetypes(freetype_t a, freetype_t b) +{ + if (freetypes_equal(a, b)) + return true; + + if (!migratetype_is_mergeable(free_to_migratetype(a)) || + !migratetype_is_mergeable(free_to_migratetype(b))) + return false; + + /* + * Mustn't "just" merge pages with different freetype flags, changing + * those requires updating pagetables. + */ + return freetype_flags(a) =3D=3D freetype_flags(b); +} + /* * Freeing function for a buddy system allocator. * @@ -1105,9 +1126,7 @@ static inline void __free_one_page(struct page *page, buddy_ft =3D get_pfnblock_freetype(buddy, buddy_pfn); buddy_mt =3D free_to_migratetype(buddy_ft); =20 - if (migratetype !=3D buddy_mt && - (!migratetype_is_mergeable(migratetype) || - !migratetype_is_mergeable(buddy_mt))) + if (!can_merge_freetypes(freetype, buddy_ft)) goto done_merging; } =20 @@ -1124,7 +1143,9 @@ static inline void __free_one_page(struct page *page, /* * Match buddy type. This ensures that an * expand() down the line puts the sub-blocks - * on the right freelists. + * on the right freelists. Freetype flags are + * already set correctly because of + * can_merge_freetypes(). */ change_pageblock_range(buddy, order, migratetype); } @@ -3361,6 +3382,117 @@ static inline void zone_statistics(struct zone *pre= ferred_zone, struct zone *z, #endif } =20 +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +/* Try to allocate a page by mapping/unmapping a block from the direct map= . */ +static inline struct page * +__rmqueue_direct_map(struct zone *zone, unsigned int request_order, + unsigned int alloc_flags, freetype_t freetype) +{ + unsigned int ft_flags_other =3D freetype_flags(freetype) ^ FREETYPE_UNMAP= PED; + freetype_t ft_other =3D migrate_to_freetype(free_to_migratetype(freetype), + ft_flags_other); + bool want_mapped =3D !(freetype_flags(freetype) & FREETYPE_UNMAPPED); + enum rmqueue_mode rmqm =3D RMQUEUE_NORMAL; + unsigned long irq_flags; + int nr_pageblocks; + struct page *page; + int alloc_order; + int err; + + if (freetype_idx(ft_other) < 0) + return NULL; + + /* + * Might need a TLB shootdown. Even if IRQs are on this isn't + * safe if the caller holds a lock (in case the other CPUs need that + * lock to handle the shootdown IPI). + */ + if (alloc_flags & ALLOC_NOBLOCK) + return NULL; + + if (!can_set_direct_map()) + return NULL; + + lockdep_assert(!irqs_disabled() || unlikely(early_boot_irqs_disabled)); + + /* + * Need to [un]map a whole pageblock (otherwise it might require + * allocating pagetables). First allocate it. + */ + alloc_order =3D max(request_order, pageblock_order); + nr_pageblocks =3D 1 << (alloc_order - pageblock_order); + spin_lock_irqsave(&zone->lock, irq_flags); + page =3D __rmqueue(zone, alloc_order, ft_other, alloc_flags, &rmqm); + spin_unlock_irqrestore(&zone->lock, irq_flags); + if (!page) + return NULL; + + /* + * Now that IRQs are on it's safe to do a TLB shootdown, and now that we + * released the zone lock it's possible to allocate a pagetable if + * needed to split up a huge page. + * + * Note that modifying the direct map may need to allocate pagetables. + * What about unbounded recursion? Here are the assumptions that make it + * safe: + * + * - The direct map starts out fully mapped at boot. (This is not really + * an assumption" as its in direct control of page_alloc.c). + * + * - Once pages in the direct map are broken down, they are not + * re-aggregated into larger pages again. + * + * - Pagetables are never allocated with __GFP_UNMAPPED. + * + * Under these assumptions, a pagetable might need to be allocated while + * _unmapping_ stuff from the direct map during a __GFP_UNMAPPED + * allocation. But, the allocation of that pagetable never requires + * allocating a further pagetable. + */ + err =3D set_direct_map_valid_noflush(page, + nr_pageblocks << pageblock_order, want_mapped); + if (err =3D=3D -ENOMEM || WARN_ONCE(err, "err=3D%d\n", err)) { + __free_one_page(page, page_to_pfn(page), zone, + alloc_order, freetype, FPI_SKIP_REPORT_NOTIFY); + return NULL; + } + + if (!want_mapped) { + unsigned long start =3D (unsigned long)page_address(page); + unsigned long end =3D start + (nr_pageblocks << (pageblock_order + PAGE_= SHIFT)); + + flush_tlb_kernel_range(start, end); + } + + for (int i =3D 0; i < nr_pageblocks; i++) { + struct page *block_page =3D page + (pageblock_nr_pages * i); + + set_pageblock_freetype_flags(block_page, freetype_flags(freetype)); + } + + if (request_order >=3D alloc_order) + return page; + + /* Free any remaining pages in the block. */ + spin_lock_irqsave(&zone->lock, irq_flags); + for (unsigned int i =3D request_order; i < alloc_order; i++) { + struct page *page_to_free =3D page + (1 << i); + + __free_one_page(page_to_free, page_to_pfn(page_to_free), zone, + i, freetype, FPI_SKIP_REPORT_NOTIFY); + } + spin_unlock_irqrestore(&zone->lock, irq_flags); + + return page; +} +#else /* CONFIG_PAGE_ALLOC_UNMAPPED */ +static inline struct page *__rmqueue_direct_map(struct zone *zone, unsigne= d int request_order, + unsigned int alloc_flags, freetype_t freetype) +{ + return NULL; +} +#endif /* CONFIG_PAGE_ALLOC_UNMAPPED */ + static __always_inline struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, unsigned int order, unsigned int alloc_flags, @@ -3394,13 +3526,15 @@ struct page *rmqueue_buddy(struct zone *preferred_z= one, struct zone *zone, */ if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_HARDER))) page =3D __rmqueue_smallest(zone, order, ft_high); - - if (!page) { - spin_unlock_irqrestore(&zone->lock, flags); - return NULL; - } } spin_unlock_irqrestore(&zone->lock, flags); + + /* Try changing direct map, now we've released the zone lock */ + if (!page) + page =3D __rmqueue_direct_map(zone, order, alloc_flags, freetype); + if (!page) + return NULL; + } while (check_new_pages(page, order)); =20 __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3625,6 +3759,8 @@ static void reserve_highatomic_pageblock(struct page = *page, int order, static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, bool force) { + freetype_t ft_high =3D freetype_with_migrate(ac->freetype, + MIGRATE_HIGHATOMIC); struct zonelist *zonelist =3D ac->zonelist; unsigned long flags; struct zoneref *z; @@ -3633,6 +3769,9 @@ static bool unreserve_highatomic_pageblock(const stru= ct alloc_context *ac, int order; int ret; =20 + if (freetype_idx(ft_high) < 0) + return false; + for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->highest_zoneidx, ac->nodemask) { /* @@ -3646,8 +3785,6 @@ static bool unreserve_highatomic_pageblock(const stru= ct alloc_context *ac, spin_lock_irqsave(&zone->lock, flags); for (order =3D 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area =3D &(zone->free_area[order]); - freetype_t ft_high =3D freetype_with_migrate(ac->freetype, - MIGRATE_HIGHATOMIC); unsigned long size; =20 page =3D get_page_from_free_area(area, ft_high); @@ -5147,6 +5284,10 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mas= k, unsigned int order, ac->nodemask =3D nodemask; ac->freetype =3D gfp_freetype(gfp_mask); =20 + /* Not implemented yet. */ + if (freetype_flags(ac->freetype) & FREETYPE_UNMAPPED && gfp_mask & __GFP_= ZERO) + return false; + if (cpusets_enabled()) { *alloc_gfp |=3D __GFP_HARDWALL; /* --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FC6F3ED126 for ; Wed, 25 Feb 2026 16:35:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037305; cv=none; b=XJT+Nf1/rDJTRQ1N94I1Kjj3B77A3z7JKhLTmexDBgH/yf5xeMefOxYRSNSB0mEm72th+I94CJ7RvAjsBt4srXq7S4OJyAEWReuNSExSemuRXmixvOBq1+9e/hcLWbOSSStFTdNTCc18DynUpVqesfgPTn3UhzxmD4Sv4J20PGI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037305; c=relaxed/simple; bh=ctPEukN6fm0ljH0l7Bo/HryqNSf7WZZpoSOg0AzOQys=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kgwtUY9yFfxQJRV3nq3KFLVdZPBEsWSWSMwssHjfLgf4J1VQ6Y4pjAyiRimaw+g7Q4vQcKTZjBYWk3IsoCmagHtBLXHWl44WLAXiRkklrd5lEZfIJQuMtSDWUTHEg9UElmcX7B8swSqKl91Tq+JAIEk7r7AZwU/zxUFGElwaWPY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mMx+AMCg; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mMx+AMCg" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-4398ad5d81dso1105893f8f.0 for ; Wed, 25 Feb 2026 08:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037302; x=1772642102; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8IxPs+shi9o6AXlEGCilkkLX6RGxoXvE/7sr5zHVQPU=; b=mMx+AMCgp3sLytjXSrEK/pC2Xez2H547V1cIYYdXjf5lqPVSYD3UKaVaTOTP3rJnJX /kiBYGoB+HSB6x4mOdN/BYvZScWyZIZcWkjb27VP1BulqyfzdMyiAhRZaeFAKtR4/nr9 dELMoF9C2+bbikknl/K/sXCJODTHa7oZK2obsP61yNxOrOxblOFlNFDa1I36Xplu8rH3 5uu70akLFpq41BIqVDEZ5aood1tFelrF0MtgbTZLTys562Hsoysmcf33c1WaSR3zqbLN uCXIpqV489gQsql8I09z18f3hxywL5jdEY4C/rSmpimsAFC5XCJxVuPaBHHxq0R6D5Qg KzrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037302; x=1772642102; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8IxPs+shi9o6AXlEGCilkkLX6RGxoXvE/7sr5zHVQPU=; b=ZWZH0xgZrtSNjE+JL+FvftNmJl6QG2f6v0ImmgUma8qug7pMKlAZiVOu6DLRgfGXhc rgwZhFsHeRd357t6ObTGuVGEeHmvC2gKPCD7zz6/6D2dOLCzw5dYNXKcJyoRgcrgVo7V CFR7WHUUnmMlZGi2IZGUyPEqLyn+xoQKreqiWX9MSUcwS++4+FGkHFu5UlDiOBeoHyDs nSkQmHaV49RmIgVqqKpQtjmSltNb0d6TzzkfTjElssPrGLJ0QKfdJUn8TMHUYE/qwV6c V5v5WyY9UDxuCUxded87HlUbpu/JuHPb0msSUj4WwOxal1uKEF1d1Jx70hAN0v3E0d3V fkwQ== X-Forwarded-Encrypted: i=1; AJvYcCUfiVM3JHtFUeBKPKACfeZFNEoaX4h0zsfuNdLGdmDeFVVPIpf8wLs8fOm9AMXfaVR/IcbzUl25g3xwU3c=@vger.kernel.org X-Gm-Message-State: AOJu0Yx39GA+1YxHo2MXyKzxSguoclTfQpq98j3V85yPkeeJjv6VPoMS IiuEaKDAN+JjBFzq+E+WNPWY5HnwkRGzu1PpzLfIvfJ8NCdekivwKjrrTvbMhsfaS00v3YgPCo4 PCQ4OTrDeyV1SYw== X-Received: from wrph15.prod.google.com ([2002:adf:f4cf:0:b0:42f:b132:61e6]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:64c6:0:b0:435:b755:c67e with SMTP id ffacd0b85a97d-439942fbd3cmr2235545f8f.49.1772037300603; Wed, 25 Feb 2026 08:35:00 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:43 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-18-e8808a03cd66@google.com> Subject: [PATCH RFC 18/19] mm/page_alloc: implement __GFP_UNMAPPED|__GFP_ZERO allocations From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The pages being zeroed here are unmapped, so they can't be zeroed via the direct map. Temporarily mapping them in the direct map is not possible because: - In general this requires allocating pagetables, - Unmapping them would require a TLB shootdown, which can't be done in general from the allocator (x86 requires IRQs on). Therefore, use the new mermap mechanism to zero these pages. The main mermap API is expected to fail very often. In order to avoid needing to fail allocations when that happens, instead fallback to the special mermap_get_reserved() variant, which is less efficient. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/pgtable_types.h | 2 + mm/Kconfig | 12 +++++- mm/page_alloc.c | 76 +++++++++++++++++++++++++++++++-= ---- 3 files changed, 79 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 2ec250ba467e2..c3d73bdfff1fa 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -223,6 +223,7 @@ enum page_cache_mode { #define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_NOGLOBAL (__PP|__RW| 0|___A|__NX|___D| 0| 0) #define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| _= _NC) #define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) @@ -245,6 +246,7 @@ enum page_cache_mode { #define __pgprot_mask(x) __pgprot((x) & __default_kernel_pte_mask) =20 #define PAGE_KERNEL __pgprot_mask(__PAGE_KERNEL | _ENC) +#define PAGE_KERNEL_NOGLOBAL __pgprot_mask(__PAGE_KERNEL_NOGLOBAL | _ENC) #define PAGE_KERNEL_NOENC __pgprot_mask(__PAGE_KERNEL | 0) #define PAGE_KERNEL_RO __pgprot_mask(__PAGE_KERNEL_RO | _ENC) #define PAGE_KERNEL_EXEC __pgprot_mask(__PAGE_KERNEL_EXEC | _ENC) diff --git a/mm/Kconfig b/mm/Kconfig index 3200ea8836432..134c6aab6fc50 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1503,7 +1503,15 @@ config MERMAP_KUNIT_TEST If unsure, say N. =20 config PAGE_ALLOC_UNMAPPED - bool "Support allocating pages that aren't in the direct map" if COMPILE_= TEST - default COMPILE_TEST + bool "Support allocating pages that aren't in the direct map" if COMPILE_= TEST || KUNIT + default COMPILE_TEST || KUNIT + depends on MERMAP + +config PAGE_ALLOC_KUNIT_TESTS + tristate "KUnit tests for the page allocator" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS + help + Builds KUnit tests for the page allocator. =20 endmenu diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f7754080dd25b..9b35e91dadeb5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -14,6 +14,7 @@ * (lots of bits borrowed from Ingo Molnar & Andrew Morton) */ =20 +#include #include #include #include @@ -1365,15 +1366,72 @@ static inline bool should_skip_kasan_poison(struct = page *page) return page_kasan_tag(page) =3D=3D KASAN_TAG_KERNEL; } =20 -static void kernel_init_pages(struct page *page, int numpages) +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED +static inline bool pageblock_unmapped(struct page *page) { - int i; + return freetype_flags(get_pageblock_freetype(page)) & FREETYPE_UNMAPPED; +} =20 - /* s390's use of memset() could override KASAN redzones. */ - kasan_disable_current(); - for (i =3D 0; i < numpages; i++) - clear_highpage_kasan_tagged(page + i); - kasan_enable_current(); +static inline void clear_page_mermap(struct page *page, unsigned int numpa= ges) +{ + void *mermap; + + BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHMEM)); + + /* Fast path: single mapping (may fail under preemption). */ + mermap =3D mermap_get(page, numpages << PAGE_SHIFT, PAGE_KERNEL_NOGLOBAL); + if (mermap) { + void *buf =3D kasan_reset_tag(mermap_addr(mermap)); + + for (int i =3D 0; i < numpages; i++) + clear_page(buf + (i << PAGE_SHIFT)); + mermap_put(mermap); + return; + } + + /* Slow path, map each page individually (always succeeds). */ + for (int i =3D 0; i < numpages; i++) { + unsigned long flags; + + local_irq_save(flags); + mermap =3D mermap_get_reserved(page + i, PAGE_KERNEL); + clear_page(kasan_reset_tag(mermap_addr(mermap))); + mermap_put(mermap); + local_irq_restore(flags); + } +} +#else +static inline bool pageblock_unmapped(struct page *page) +{ + return false; +} + +static inline void clear_page_mermap(struct page *page, unsigned int numpa= ges) +{ + BUG(); +} +#endif + +static void kernel_init_pages(struct page *page, unsigned int numpages) +{ + int num_blocks =3D DIV_ROUND_UP(numpages, pageblock_nr_pages); + + for (int block =3D 0; block < num_blocks; block++) { + struct page *block_page =3D page + (block << pageblock_order); + bool unmapped =3D pageblock_unmapped(block_page); + + /* s390's use of memset() could override KASAN redzones. */ + kasan_disable_current(); + if (unmapped) { + clear_page_mermap(page, numpages); + } else { + for (int i =3D 0; i < min(numpages, pageblock_nr_pages); i++) + clear_highpage_kasan_tagged(block_page + i); + } + kasan_enable_current(); + + numpages -=3D pageblock_nr_pages; + } } =20 #ifdef CONFIG_MEM_ALLOC_PROFILING @@ -5284,8 +5342,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask= , unsigned int order, ac->nodemask =3D nodemask; ac->freetype =3D gfp_freetype(gfp_mask); =20 - /* Not implemented yet. */ - if (freetype_flags(ac->freetype) & FREETYPE_UNMAPPED && gfp_mask & __GFP_= ZERO) + if (freetype_flags(ac->freetype) & FREETYPE_UNMAPPED && + WARN_ON(!mermap_ready())) return false; =20 if (cpusets_enabled()) { --=20 2.51.2 From nobody Thu Apr 16 20:49:12 2026 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C2B83ED131 for ; Wed, 25 Feb 2026 16:35:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037306; cv=none; b=uvmOOrnwSI4X934zCkO/FiSSMyF8M9YBIfOzT0kJ4vYa/LBUcMC+pcrhwLVoxjHO8jt+Smov10ACrFjUcovw8NWx35gIt7r7mJuwg7OfwfPiyAPtluUdUTMSsNNLg1Cy6S5xVG4H4jYkm5n23xHAx5GtPBptK5KIOK20d4H/mtE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772037306; c=relaxed/simple; bh=NF190AX2o/AFmn5FoCrUiTf78od2nfp5QnPxyfiDmOo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lIZ46nepxgEx9CccjQ/Aj5IRB17yAPPMrpmA4vYMgHIDu1k6aEO0uEjNmA9cdH1LBKUh9bjF9qwJTRoT9YNscxPB27WH5og1HKETKNGh0rMRXFozKnsfjpU3vN6OXiX5PRsjXGGQc9VnTJd3yBYM/J9JMegl4UUX6PPlSD62Cg0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q0YL4csp; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q0YL4csp" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4837bfcfe0dso77219345e9.1 for ; Wed, 25 Feb 2026 08:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772037302; x=1772642102; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ONJbbWNmB3g9j0UlyPhep4oJzEzZ23P3tEKa2pmUkxk=; b=Q0YL4cspajsc4RxFs+dB+d1M+IdP3obYRGyBjcN2Y9gZ9wtzz+cBqZEdpGZnIYgj+D pBIh9y8xknZ8avBgsN6wUtrYAyzfnheOKhWpiNiM7p+cubNa6mDSmjZ+CY15zUsg1HUw Nf5v5IJ7gYLT9lMXinrlkwQLdBAguUK+tWzPkxO+tzrG4GB0srtXV1WFSz4AmJGkM7r9 dFP/fiYMxGE/NJi7HkQWE5jsN6eRx/5t2Njlp60/wsVYIIm0G2YVH+ivrU4XJYNvu2Pe LKlTu/u5WkiGYbYND8Zmw353KsHY/9J4UXBX+hRY0lHmuPbYXnaMQxtol1Gj3G1LZa/x p9Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772037302; x=1772642102; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ONJbbWNmB3g9j0UlyPhep4oJzEzZ23P3tEKa2pmUkxk=; b=Y/oZN3QIQQgGAPniuxSUKk+ICMvb75ajSOPLknzH3pXGfGFMI8zdWL1rj2gF8+xSFy iU03V1SZaX2eYn+bk3/YsLl946JlBNkN+XVwiHP5Symd8BBgdLNNyddRCe1n6UCtrBCP O0bQnydzDqQT8XtyMTVDXEeKZlc5hUMjEPnDhkuplBZmFP9X5XNXEsINUs82a7vOOUMX JChuUT03r68oRx2jZcVnsOzSSYKukiLm5jZR0cJpagvvHEp7eBYuCAAUAPrU59lpixuL loOBLTMSEFBZUN79Iy24oSt66KMPKTVxC7dNiTlMqj+H9xCenqDympEWW3ptOyz5rU6j HJaw== X-Forwarded-Encrypted: i=1; AJvYcCUu9yjkdtej7MHDMmYZt/2lS/fwaHpKsNPCVi1I2NYeoaUHMXPjoMArKaaIa6lJHpY/T92jOXi40B0Lutw=@vger.kernel.org X-Gm-Message-State: AOJu0YzCxZ1Jb9cX3/sSV4SlcdY/bzqrzguwlHs+eFYMwcjixCDOM2Qf KqyWD2nNb/HNihrkP0hchSx9f8qBkzBo0aqDzRzELlsu9RMqqJf4PVROb69B0UTKk2wWfd33JVt 8HRtqUsFhWxJ+vA== X-Received: from wmap23.prod.google.com ([2002:a7b:cc97:0:b0:480:6ccb:80fd]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:348b:b0:483:612d:7a5c with SMTP id 5b1f17b1804b1-483c21a117cmr15454305e9.25.1772037301990; Wed, 25 Feb 2026 08:35:01 -0800 (PST) Date: Wed, 25 Feb 2026 16:34:44 +0000 In-Reply-To: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225-page_alloc-unmapped-v1-0-e8808a03cd66@google.com> X-Mailer: b4 0.14.3 Message-ID: <20260225-page_alloc-unmapped-v1-19-e8808a03cd66@google.com> Subject: [PATCH RFC 19/19] mm: Minimal KUnit tests for some new page_alloc logic From: Brendan Jackman To: Borislav Petkov , Dave Hansen , Peter Zijlstra , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Wei Xu , Johannes Weiner , Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, rppt@kernel.org, Sumit Garg , derkling@google.com, reijiw@google.com, Will Deacon , rientjes@google.com, "Kalyazin, Nikita" , patrick.roy@linux.dev, "Itazuri, Takahiro" , Andy Lutomirski , David Kaplan , Thomas Gleixner , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add a simple smoke test for __GFP_UNMAPPED that tries to exercise flipping pageblocks between mapped/unmapped state. Also add some basic tests for some freelist-indexing helpers. Simplest way to run these on x86: tools/testing/kunit/kunit.py run --arch=3Dx86_64 "page_alloc.*" \ --kconfig_add CONFIG_MERMAP=3Dy --kconfig_add CONFIG_PAGE_ALLOC_UNMAPPED= =3Dy Signed-off-by: Brendan Jackman --- kernel/panic.c | 2 + mm/Kconfig | 2 +- mm/Makefile | 1 + mm/init-mm.c | 3 + mm/internal.h | 6 ++ mm/page_alloc.c | 11 +- mm/tests/page_alloc_kunit.c | 250 ++++++++++++++++++++++++++++++++++++++++= ++++ 7 files changed, 271 insertions(+), 4 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index c78600212b6c1..1a170d907eab1 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #define PANIC_TIMER_STEP 100 #define PANIC_BLINK_SPD 18 @@ -900,6 +901,7 @@ unsigned long get_taint(void) { return tainted_mask; } +EXPORT_SYMBOL_IF_KUNIT(get_taint); =20 /** * add_taint: add a taint flag if not already set. diff --git a/mm/Kconfig b/mm/Kconfig index 134c6aab6fc50..27ce037cf82f5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1507,7 +1507,7 @@ config PAGE_ALLOC_UNMAPPED default COMPILE_TEST || KUNIT depends on MERMAP =20 -config PAGE_ALLOC_KUNIT_TESTS +config PAGE_ALLOC_KUNIT_TEST tristate "KUnit tests for the page allocator" if !KUNIT_ALL_TESTS depends on KUNIT default KUNIT_ALL_TESTS diff --git a/mm/Makefile b/mm/Makefile index 42c8ca32359ae..073a93b83acee 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -152,3 +152,4 @@ obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o obj-$(CONFIG_LAZY_MMU_MODE_KUNIT_TEST) +=3D tests/lazy_mmu_mode_kunit.o obj-$(CONFIG_MERMAP) +=3D mermap.o obj-$(CONFIG_MERMAP_KUNIT_TEST) +=3D tests/mermap_kunit.o +obj-$(CONFIG_PAGE_ALLOC_KUNIT_TEST) +=3D tests/page_alloc_kunit.o diff --git a/mm/init-mm.c b/mm/init-mm.c index c5556bb9d5f01..31103356da654 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -13,6 +13,8 @@ #include #include =20 +#include + #ifndef INIT_MM_CONTEXT #define INIT_MM_CONTEXT(name) #endif @@ -50,6 +52,7 @@ struct mm_struct init_mm =3D { .flexible_array =3D MM_STRUCT_FLEXIBLE_ARRAY_INIT, INIT_MM_CONTEXT(init_mm) }; +EXPORT_SYMBOL_IF_KUNIT(init_mm); =20 void setup_initial_init_mm(void *start_code, void *end_code, void *end_data, void *brk) diff --git a/mm/internal.h b/mm/internal.h index 6f2eacf3d8f2c..e37cb6cb8a9a2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1781,4 +1781,10 @@ static inline int io_remap_pfn_range_complete(struct= vm_area_struct *vma, return remap_pfn_range_complete(vma, addr, pfn, size, prot); } =20 +#if IS_ENABLED(CONFIG_KUNIT) +unsigned int order_to_pindex(freetype_t freetype, int order); +int pindex_to_order(unsigned int pindex); +bool pcp_allowed_order(unsigned int order); +#endif /* IS_ENABLED(CONFIG_KUNIT) */ + #endif /* __MM_INTERNAL_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9b35e91dadeb5..7f930eb454501 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -57,6 +57,7 @@ #include #include #include +#include #include "internal.h" #include "shuffle.h" #include "page_reporting.h" @@ -496,6 +497,7 @@ get_pfnblock_freetype(const struct page *page, unsigned= long pfn) { return __get_pfnblock_freetype(page, pfn, 0); } +EXPORT_SYMBOL_IF_KUNIT(get_pfnblock_freetype); =20 =20 /** @@ -731,7 +733,7 @@ static void bad_page(struct page *page, const char *rea= son) add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); } =20 -static inline unsigned int order_to_pindex(freetype_t freetype, int order) +VISIBLE_IF_KUNIT inline unsigned int order_to_pindex(freetype_t freetype, = int order) { int migratetype =3D free_to_migratetype(freetype); =20 @@ -759,8 +761,9 @@ static inline unsigned int order_to_pindex(freetype_t f= reetype, int order) =20 return (MIGRATE_PCPTYPES * order) + migratetype; } +EXPORT_SYMBOL_IF_KUNIT(order_to_pindex); =20 -static inline int pindex_to_order(unsigned int pindex) +VISIBLE_IF_KUNIT int pindex_to_order(unsigned int pindex) { unsigned int unmapped_base =3D NR_LOWORDER_PCP_LISTS + NR_PCP_THP; int order; @@ -783,8 +786,9 @@ static inline int pindex_to_order(unsigned int pindex) =20 return order; } +EXPORT_SYMBOL_IF_KUNIT(pindex_to_order); =20 -static inline bool pcp_allowed_order(unsigned int order) +VISIBLE_IF_KUNIT inline bool pcp_allowed_order(unsigned int order) { if (order <=3D PAGE_ALLOC_COSTLY_ORDER) return true; @@ -794,6 +798,7 @@ static inline bool pcp_allowed_order(unsigned int order) #endif return false; } +EXPORT_SYMBOL_IF_KUNIT(pcp_allowed_order); =20 /* * Higher-order pages are called "compound pages". They are structured th= usly: diff --git a/mm/tests/page_alloc_kunit.c b/mm/tests/page_alloc_kunit.c new file mode 100644 index 0000000000000..bd55d0bc35ac9 --- /dev/null +++ b/mm/tests/page_alloc_kunit.c @@ -0,0 +1,250 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "internal.h" + +struct free_pages_ctx { + unsigned int order; + struct list_head pages; +}; + +static inline void action_many__free_pages(void *context) +{ + struct free_pages_ctx *ctx =3D context; + struct page *page, *tmp; + + list_for_each_entry_safe(page, tmp, &ctx->pages, lru) + __free_pages(page, ctx->order); +} + +/* + * Allocate a bunch of pages with the same order and GFP flags, transparen= tly + * take care of error handling and cleanup. Does this all via a single KUn= it + * resource, i.e. has a fixed memory overhead. + */ +static inline struct free_pages_ctx * +do_many_alloc_pages(struct kunit *test, gfp_t gfp, + unsigned int order, unsigned int count) +{ + struct free_pages_ctx *ctx =3D kunit_kzalloc( + test, sizeof(struct free_pages_ctx), GFP_KERNEL); + + KUNIT_ASSERT_NOT_NULL(test, ctx); + INIT_LIST_HEAD(&ctx->pages); + ctx->order =3D order; + + for (int i =3D 0; i < count; i++) { + struct page *page =3D alloc_pages(gfp, order); + + if (!page) { + struct page *page, *tmp; + + list_for_each_entry_safe(page, tmp, &ctx->pages, lru) + __free_pages(page, order); + + KUNIT_FAIL_AND_ABORT(test, + "Failed to alloc order %d page (GFP *%pG) iter %d", + order, &gfp, i); + } + list_add(&page->lru, &ctx->pages); + } + + KUNIT_ASSERT_EQ(test, + kunit_add_action_or_reset(test, action_many__free_pages, ctx), 0); + return ctx; +} + +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED + +static const gfp_t gfp_params_array[] =3D { + 0, + __GFP_ZERO, +}; + +static void gfp_param_get_desc(const gfp_t *gfp, char *desc) +{ + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%pGg", gfp); +} + +KUNIT_ARRAY_PARAM(gfp, gfp_params_array, gfp_param_get_desc); + +/* Do some allocations that force the allocator to map/unmap some blocks. = */ +static void test_alloc_map_unmap(struct kunit *test) +{ + unsigned long page_majority; + struct free_pages_ctx *ctx; + const gfp_t *gfp_extra =3D test->param_value; + gfp_t gfp =3D GFP_KERNEL | __GFP_THISNODE | __GFP_UNMAPPED | *gfp_extra; + struct page *page; + + kunit_attach_mm(); + mermap_mm_prepare(current->mm); + + /* No cleanup here - assuming kthread "belongs" to this test. */ + set_cpus_allowed_ptr(current, cpumask_of_node(numa_node_id())); + + /* + * First allocate more than half of the memory in the node as + * unmapped. Assuming the memory starts out mapped, this should + * exercise the unmap. + */ + page_majority =3D (node_present_pages(numa_node_id()) / 2) + 1; + ctx =3D do_many_alloc_pages(test, gfp, 0, page_majority); + + /* Check pages are unmapped */ + list_for_each_entry(page, &ctx->pages, lru) { + freetype_t ft =3D get_pfnblock_freetype(page, page_to_pfn(page)); + + /* + * Logically it should be an EXPECT, but that would + * cause heavy log spam on failure so use ASSERT for + * concision. + */ + KUNIT_ASSERT_FALSE(test, kernel_page_present(page)); + KUNIT_ASSERT_TRUE(test, freetype_flags(ft) & FREETYPE_UNMAPPED); + } + + /* + * Now free them again and allocate the same amount without + * __GFP_UNMAPPED. This will exercise the mapping logic. + */ + kunit_release_action(test, action_many__free_pages, ctx); + gfp &=3D ~__GFP_UNMAPPED; + ctx =3D do_many_alloc_pages(test, gfp, 0, page_majority); + + /* Check pages are mapped. */ + list_for_each_entry(page, &ctx->pages, lru) + KUNIT_ASSERT_TRUE(test, kernel_page_present(page)); +} + +#endif /* CONFIG_PAGE_ALLOC_UNMAPPED */ + +static void __test_pindex_helpers(struct kunit *test, unsigned long *bitma= p, + int mt, unsigned int ftflags, unsigned int order) +{ + freetype_t ft =3D migrate_to_freetype(mt, ftflags); + unsigned int pindex; + int got_order; + + if (!pcp_allowed_order(order)) + return; + + if (mt >=3D MIGRATE_PCPTYPES) + return; + + if (freetype_idx(ft) < 0) + return; + + pindex =3D order_to_pindex(ft, order); + + KUNIT_ASSERT_LT_MSG(test, pindex, NR_PCP_LISTS, + "invalid pindex %d (order %d mt %d flags %#x)", + pindex, order, mt, ftflags); + KUNIT_EXPECT_TRUE_MSG(test, test_bit(pindex, bitmap), + "pindex %d reused (order %d mt %d flags %#x)", + pindex, order, mt, ftflags); + + /* + * For THP, two migratetypes map to the same pindex, + * just manually exclude one of those cases. + */ + if (!(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + order =3D=3D HPAGE_PMD_ORDER && + mt =3D=3D min(MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE))) + clear_bit(pindex, bitmap); + + got_order =3D pindex_to_order(pindex); + KUNIT_EXPECT_EQ_MSG(test, order, got_order, + "roundtrip failed, got %d want %d (pindex %d mt %d flags %#x)", + got_order, order, pindex, mt, ftflags); +} + +/* This just checks for basic arithmetic errors. */ +static void test_pindex_helpers(struct kunit *test) +{ + unsigned long bitmap[bitmap_size(NR_PCP_LISTS)]; + + /* Bit means "pindex not yet used". */ + bitmap_fill(bitmap, NR_PCP_LISTS); + + for (unsigned int order =3D 0; order < NR_PAGE_ORDERS; order++) { + for (int mt =3D 0; mt < MIGRATE_TYPES; mt++) { + __test_pindex_helpers(test, bitmap, mt, 0, order); + if (FREETYPE_UNMAPPED) + __test_pindex_helpers(test, bitmap, mt, + FREETYPE_UNMAPPED, order); + } + } + + KUNIT_EXPECT_TRUE_MSG(test, bitmap_empty(bitmap, NR_PCP_LISTS), + "unused pindices: %*pbl", NR_PCP_LISTS, bitmap); +} + +static void __test_freetype_idx(struct kunit *test, unsigned int order, + int migratetype, unsigned int ftflags, + unsigned long *bitmap) +{ + freetype_t ft =3D migrate_to_freetype(migratetype, ftflags); + int idx =3D freetype_idx(ft); + + if (idx =3D=3D -1) + return; + KUNIT_ASSERT_GE(test, idx, 0); + KUNIT_ASSERT_LT(test, idx, NR_FREETYPE_IDXS); + + KUNIT_EXPECT_LT_MSG(test, idx, NR_PCP_LISTS, + "invalid idx %d (order %d mt %d flags %#x)", + idx, order, migratetype, ftflags); + clear_bit(idx, bitmap); +} + +static void test_freetype_idx(struct kunit *test) +{ + unsigned long bitmap[bitmap_size(NR_FREETYPE_IDXS)]; + + /* Bit means "pindex not yet used". */ + bitmap_fill(bitmap, NR_FREETYPE_IDXS); + + for (unsigned int order =3D 0; order < NR_PAGE_ORDERS; order++) { + for (int mt =3D 0; mt < MIGRATE_TYPES; mt++) { + __test_freetype_idx(test, order, mt, 0, bitmap); + if (FREETYPE_UNMAPPED) + __test_freetype_idx(test, order, mt, + FREETYPE_UNMAPPED, bitmap); + } + } + + KUNIT_EXPECT_TRUE_MSG(test, bitmap_empty(bitmap, NR_FREETYPE_IDXS), + "unused idxs: %*pbl", NR_PCP_LISTS, bitmap); +} + +static struct kunit_case test_cases[] =3D { +#ifdef CONFIG_PAGE_ALLOC_UNMAPPED + KUNIT_CASE_PARAM(test_alloc_map_unmap, gfp_gen_params), +#endif + KUNIT_CASE(test_pindex_helpers), + KUNIT_CASE(test_freetype_idx), + {} +}; + +static struct kunit_suite test_suite =3D { + .name =3D "page_alloc", + .test_cases =3D test_cases, +}; + +kunit_test_suite(test_suite); + +MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING"); --=20 2.51.2