From nobody Sat Jun 13 23:21:56 2026 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A85B23AE9B for ; Sat, 13 Jun 2026 19:35:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781379354; cv=none; b=M/OkF3SW80Rdvf5EQvH4DSGAZSTLKQXVni5klbLsAp6tVrvohLFl/BOxEU03s6tel1QufU1ATxo59LWO/lQ9ZTHl32NTlzgYw/uWOTYCVeVtDt97WiruAXEo8w6XSM6kS33VKqvgXLunVBOCdhs+5+Cg7tHGNCK3YguMs5CPYes= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781379354; c=relaxed/simple; bh=7wJ4Zm8DzS7nayUeZk1952PvdhxjxEudkNxPDSmr6X4=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=RWC/EmrwbgnY3o6F6XvyuuYIVjtkEesUbguUaj2pFAwxSDar6ycVLqSFTzbvNEsyp0pGCooDr7VlBR53VjNb3XDO0NCx1CPRdKkTq0Jpy0tqi/T3u/0Zz19pmZYusop9KjjJiW7E+UtQR0jop5qNaRf0gjig2FDeMmnhhJazZPw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EaMLU2zF; arc=none smtp.client-ip=209.85.128.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EaMLU2zF" Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-490b1bbcf3aso15638085e9.1 for ; Sat, 13 Jun 2026 12:35:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781379352; x=1781984152; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=3Cmhs58uQKi44yY5Y2hGII54QusWM4xNolmjeKKYnbs=; b=EaMLU2zFQmxhruxZh99B2HpxiqOkVgnRjzPu8fEsQ7c1lqENS9mL5plqo/Cq13epM6 QJxPjYZ4W7nqDVuzNU5rghBfyCaUp5IIr42b9UAg9Jrjy9Qd+GA5pauR3l3HlE66KFH6 pRqJVeLyd/Ug4l3qmilL7R/snKW17RkIkq3MI0koLZGWNQk2zW+G5szKsvTzIE4N0oNU xYSH+IGwkmGcSNslG3fX7SpR927xwebdusOSl7Pmjq+KahizcEV3Ct0wP4R1bckW85VZ arwjdjI7e/4vSyrnsEBFAmYA4wNMHjUdZhgkZKPo0D1vFvcvISwxtm3mQkOTsBki9L4k DhHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781379352; x=1781984152; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3Cmhs58uQKi44yY5Y2hGII54QusWM4xNolmjeKKYnbs=; b=ks2MmTVj6JvzhNAaN26qlYILjMYYe1biI7oOOEyLbFbKB1ZwdVKz50h8wP++8Jvt7O X547swPWEknsj7ydK6s2f8P5SW4ZQFOSiTq6rcxWbNED4ILG9FSRlEZ3w7YwxZEL/+Kl O/cm6Aj+YFrbV2bfCrsVqWG6aibI8qRJ2o+VgsW+q1WBU2R95G2uPx8tq42EGJuYlscI AUn2wZZFk1wYgW+QkvdCM6gqPwxQMdMyvvpsFc+nijMBohDDjFzbA03sOgRkpZei1Zav cAIzRh62e/kIhrRFAnNcjZ73OjUc98WH0irqOq/7leAI5S95a7AyKAzezP/xpIR7xpZG OUcA== X-Forwarded-Encrypted: i=1; AFNElJ8GlAybJVFOOLqPMxVogdNEf05VEgSOLZhwf+Yv7mJK4/q0cH8cKSnXmF9T/NmToIJw7L2/Q20iA7ixCN0=@vger.kernel.org X-Gm-Message-State: AOJu0YyhgSWQzEdtOzY3DLnuo+JKaHzXZQLucSM7eA6eer5oEw/vRApS ICxw9kWNo9Dn/6/ag7fiHpGZgwanZPLY7XHaRKzLAXuiA1Ha523Behe7 X-Gm-Gg: Acq92OF3jA3bKksoMHf8i8T6CPqGUWQfopaG8ABsF2PsxVg7sEoHer7NnOvVttGHS9I gxh6NxvZn/sN/4bfm3Gpbot8LjkseSoCOqtnq1uInkTSVOjg3NZsBw8EdbEULC/AQF2soWsGoST csAcyTY8MdxLpdvwq2/xGd/MgQm0AD792qAwxeuK61MqWARNQ4IQLtVSjP2T7SMrSPezVw779Fs GWgTd+s7YBauqam48a5JFqAD00xPX1H9lsxy+LVHVK2bgpuwQ9KxnkjhjLko6d7hPJxSOKLsrl3 6ld/7uUcP2jZZPdqyfKzP7M8TwFTN3IorWbPdc59JSvYlDMv6HkOfyPimB2vBGAMkOcevWZuX7J 8hMWwAm0vFMhWbOdxTMMreR4l3ot8gvFmUYw0/VnhGA4vXUYX6XUTwpPv8nZii1O5NU/+boH+b1 5ll6oXvdlCItNJZEArnOAjpDCWxWwuuZk3OczVLXcVUnDUpe4y5f6kTB/MyYXAhk673ahunCGOn thX6IyQ0To= X-Received: by 2002:a7b:cbc3:0:b0:48a:5565:ec3d with SMTP id 5b1f17b1804b1-490ec502d01mr70797785e9.22.1781379351223; Sat, 13 Jun 2026 12:35:51 -0700 (PDT) Received: from dohko.chello.ie (188-141-5-72.dynamic.upc.ie. [188.141.5.72]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-492202edf89sm105098115e9.1.2026.06.13.12.35.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 12:35:50 -0700 (PDT) From: David Carlier To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com, David Carlier , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Kevin Tian , Lu Baolu , Jason Gunthorpe , linux-kernel@vger.kernel.org Subject: [PATCH v4] mm: pgtable: free kernel page tables via RCU to fix ptdump UAF Date: Sat, 13 Jun 2026 20:35:47 +0100 Message-ID: <20260613193547.183867-1-devnexen@gmail.com> X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" ptdump walks the kernel page tables holding only the init_mm mmap lock and the memory hotplug lock. Neither of those stops vmalloc or ioremap from freeing a kernel PTE page underneath the walk. When vmap_try_huge_pmd() installs a huge mapping it collapses the existing PTE table and frees it through pmd_free_pte_page(), and on x86 that happens without the init_mm mmap lock. syzbot caught the resulting use after free in ptdump_pte_entry() reading a page table that had already been freed. pagetable_free_kernel() used to free the page immediately on configurations without CONFIG_ASYNC_KERNEL_PGTABLE_FREE, and on the async ones it only batched a TLB flush before freeing. In both cases a lockless walker could still be dereferencing the page. Defer the free by a grace period instead. pagetable_free_kernel() now hands every kernel page table to call_rcu(), so the page stays valid until any walk that may have observed it has finished. The async path keeps doing its TLB flush first and then queues the RCU free per page. On the read side, walk_page_range_debug() takes the RCU read lock around the kernel walk through the new walk_kernel_page_table_range_rcu() helper. A walker either sees the cleared PMD and skips the page, or keeps it alive until it drops the lock. The plain walk_kernel_page_table_range() stays as it is for callers that already own their range and cannot race a free, such as the arm64 page table split paths. Fixes: 5ba2f0a15564 ("mm: introduce deferred freeing for kernel page tables= ") Reported-by: syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6a287988.39669fcc.33b062.00a0.GAE@googl= e.com/T/ Assisted-by: Claude:claude-opus-4-8 Signed-off-by: David Carlier --- v4: defer the free in both the async and non async configs, not just the async one. Move the walk under a named walk_kernel_page_table_range_rcu() helper instead of open coding rcu_read_lock() in walk_page_range_debug(). v3: take rcu_read_lock() in the init_mm branch of walk_page_range_debug() rather than inside the lockless walker, which the arm64 split paths also use with GFP_PGTABLE_KERNEL and can sleep. v2: use call_rcu() instead of synchronize_rcu(). --- include/linux/mm.h | 7 ------- mm/pagewalk.c | 18 ++++++++++++++++-- mm/pgtable-generic.c | 21 ++++++++++++++++++++- 3 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 485df9c2dbdd..79408a17a1b0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3695,14 +3695,7 @@ static inline void __pagetable_free(struct ptdesc *p= t) __free_pages(page, compound_order(page)); } =20 -#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE void pagetable_free_kernel(struct ptdesc *pt); -#else -static inline void pagetable_free_kernel(struct ptdesc *pt) -{ - __pagetable_free(pt); -} -#endif /** * pagetable_free - Free pagetables * @pt: The page table descriptor diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 3ae2586ff45b..5b5807a88394 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -664,6 +664,19 @@ int walk_kernel_page_table_range_lockless(unsigned lon= g start, unsigned long end return walk_pgd_range(start, end, &walk); } =20 +static int walk_kernel_page_table_range_rcu(unsigned long start, unsigned = long end, + const struct mm_walk_ops *ops, pgd_t *pgd, + void *private) +{ + int err; + + rcu_read_lock(); + err =3D walk_kernel_page_table_range(start, end, ops, pgd, private); + rcu_read_unlock(); + + return err; +} + /** * walk_page_range_debug - walk a range of pagetables not backed by a vma * @mm: mm_struct representing the target process of page table walk @@ -693,8 +706,9 @@ int walk_page_range_debug(struct mm_struct *mm, unsigne= d long start, =20 /* For convenience, we allow traversal of kernel mappings. */ if (mm =3D=3D &init_mm) - return walk_kernel_page_table_range(start, end, ops, - pgd, private); + return walk_kernel_page_table_range_rcu(start, end, ops, pgd, + private); + if (start >=3D end || !walk.mm) return -EINVAL; if (!check_ops_safe(ops)) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index b91b1a98029c..d45a556b4021 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -410,6 +410,13 @@ pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t= *pmd, goto again; } =20 +static void kernel_pgtable_free_rcu(struct rcu_head *head) +{ + struct ptdesc *pt =3D container_of(head, struct ptdesc, pt_rcu_head); + + __pagetable_free(pt); +} + #ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE static void kernel_pgtable_work_func(struct work_struct *work); =20 @@ -434,8 +441,15 @@ static void kernel_pgtable_work_func(struct work_struc= t *work) spin_unlock(&kernel_pgtable_work.lock); =20 iommu_sva_invalidate_kva_range(PAGE_OFFSET, TLB_FLUSH_ALL); + + /* + * Lockless kernel page table walkers (ptdump, and any other user of + * walk_kernel_page_table_range_lockless()) dereference these pages + * under rcu_read_lock(). Free them after a grace period so a walker + * cannot still be reading a page we release. + */ list_for_each_entry_safe(pt, next, &page_list, pt_list) - __pagetable_free(pt); + call_rcu(&pt->pt_rcu_head, kernel_pgtable_free_rcu); } =20 void pagetable_free_kernel(struct ptdesc *pt) @@ -446,4 +460,9 @@ void pagetable_free_kernel(struct ptdesc *pt) =20 schedule_work(&kernel_pgtable_work.work); } +#else +void pagetable_free_kernel(struct ptdesc *pt) +{ + call_rcu(&pt->pt_rcu_head, kernel_pgtable_free_rcu); +} #endif --=20 2.53.0