From nobody Thu Apr 2 20:21:59 2026 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3B29410D34 for ; Thu, 26 Mar 2026 16:26:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774542383; cv=none; b=mk3NybLpLSWATB6JoaCSBcjtXfUpG2JzIA+aN1TpYdA4yDLDNtEqvo+SWaFFI84IxthYABMY8wI1/dPGwZmfGjx92iP00G5xJ4MNYzwVqsWYbCQboAWhNhzha599/hG8eGcSzxR9ywh7j83Z1Cx8dU3yw08cxwpGT6+fKmjA+OI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774542383; c=relaxed/simple; bh=/rCQ5RHjpOdrFxQs6E/BFz7et49J70l7+uo/Tv9Uzv8=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=e8NWqn8HGr8vVUuCG6G1WP9Fezfc2is0DPeV3JELIVA38G93MNJut3HjklV4SqZ3sTlk0zkrWtG0IkvHK7yDq4GV7W5yS87jkoJpeP34ay1bdqyMKTJ+KxNy+hlxg9F6p2nl7OhVsc3RYnXT/ZZpLJ6mXSnikIOCuZSNBnZ1HO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=n3moEIvy; arc=none smtp.client-ip=209.85.222.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="n3moEIvy" Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-8cd8a189f44so124780085a.0 for ; Thu, 26 Mar 2026 09:26:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1774542379; x=1775147179; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=/cqQrfIeVRKhirbOTonOVGcajNoS4JNy4y1RyCuZwpE=; b=n3moEIvyloI/44YLDfg4zEBPuXV199v5SLZQP0fNl5BuE8hRyMrSo4gLGAbmUNshXQ qdniRO5VWhBzXKeMEr49k5mu2jkAZTR0cLtpvYVZbxbos9fZoSlru1p9zJoye35LnUrL b/JGyxN6AXoitvul7TYSPdo1kXQV7sDPwaUIsvax1dKVjcbCcw6Ls6YItmRxgaaa9JDY 5/nbSLNc8qWuteJzTOKvp4wEphnCNkgc54SGi2L8qDYelK/GSe9zZ6aOx4GmXbduQipB kPojj6GsVX9sJlayeNpU83T1pP88YCBRF4brjCNmJFKCZYKHGKCAs+2B4rmEkUuDYUtI p67A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774542379; x=1775147179; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/cqQrfIeVRKhirbOTonOVGcajNoS4JNy4y1RyCuZwpE=; b=SqMeuZysqVHzQa4swz+TLMJ/1OQ+RbeZObQvB2hu1WvLcYQv/yiJmQheQEZEAODtzs A1LSG0201LcNU7/O87+hjWzXu2qqjm7v2sX+sd1G341erV+rDkUva8VvLifJ7HxMEcMG gBsYgTLIvQnPRDZ8v7VD0AFc0eEx9Z0OMXLOK7eQkfiM7G1svbKjUT2owglvPQyvJ/AN fZJBWszGU+lq9yEsqZ6M9eblLUT50Iq7+mqsLIPOol0RqH2CBCfpbnlo8vabGSsmAU7w 9IDy31FVhMIwD1px/XcmPrtMRbOHmvKZPYcZ+fItax+rtl1KZwqWxy7Iz+U5n3OMBAFf JeAg== X-Forwarded-Encrypted: i=1; AJvYcCVKydknjz9xF0FRslBaCaznbT9904BAyvUWoKuoRGUv73TsYo3eFLL6Zt7LzoXlcl5xm+ycdpMtPnNw2so=@vger.kernel.org X-Gm-Message-State: AOJu0YxCHyFs2HQAWqhc/Gv92o5UGW86mfGFi2wdbo6s8INjPsq2FvRH Eap7A+fVQDrLJbs7CqNDXr5eMj8JebVKa6R2ibB1QL4Ias7iglPXCwtfi/8o5ZD5a5o= X-Gm-Gg: ATEYQzzSeeWq37KcqwQxMdf/dD65BbZxo/bvGa0OBvcVEnpP5a36Cgd342Jwi72yaBQ 9gFUKGAJzsZF8+ffIN5sTNXnZ7z/ZPkyOtc5ndOt8Wlzt2FMm6gSAOXc3tI6hkQ4DDNcmmUbrwC 5wn+B1H4aUAMKnAy1o2Vzo5+3W56IjX6fB5WxGdDErq3gZ89ZisbFcDY1nUzx5yD7Ow1NGGeWLi 0Sm7TeiOxaoBCMNVzVP+sJOqD7JTw7NXdZu64xU7qIxj1sK7kjYph4cxBeDK53SzuYdTsYRBPFX a4WEcGkmZKoALZzFcIn5Wne1OtEl2WX3LRf1hULbZAVotkIAY62pGT3Q6nH2u9ThmMrbuJer4J8 cfrvPUhKtc2zkCTrwBeeL/+7/BpcXmJbK6u7gdrZbujUSm1zT3IhokNufmq5Tlwk20CIzBJr70w YcVTZV8QRYlI9khpYdFVfA3gONZHR3DVhm7+NwBniAGQ== X-Received: by 2002:a05:620a:700b:b0:8cf:da76:58ea with SMTP id af79cd13be357-8d001003c8amr1290364085a.25.1774542378652; Thu, 26 Mar 2026 09:26:18 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.thefacebook.com ([2620:10d:c091:500::2:e5e8]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8d00e3c3a18sm297378885a.12.2026.03.26.09.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2026 09:26:18 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com Cc: david@kernel.org, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, kernel-team@meta.com, stable@vger.kernel.org Subject: [PATCH] mm/shmem: use invalidate_lock to fix hole-punch race Date: Thu, 26 Mar 2026 11:26:11 -0500 Message-ID: <20260326162611.693539-1-gourry@gourry.net> X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Inflating a VM's balloon while vhost-user-net fork+exec's a helper triggers "still mapped when deleted" on the memfd backing guest RAM: BUG: Bad page cache in process __balloon pfn:6520704 page dumped because: still mapped when deleted ... shmem_undo_range+0x3fa/0x570 shmem_fallocate+0x366/0x4d0 vfs_fallocate+0x13c/0x310 This BUG also resulted in guests seeing stale mappings backed by a zeroed page, causing guest kernel panics. I was unable to trace that specific interaction, but it appears to be related to THP splitting. Two races allow PTEs to be re-installed for a folio that fallocate is about to remove from page cache: Race 1 =E2=80=94 fault-around (filemap_map_pages): fallocate fault-around fork -------- ------------ ---- set i_private unmap_mapping_range() # zaps PTEs filemap_map_pages() # re-maps folio! dup_mmap() # child VMA # in tree shmem_undo_range() lock folio unmap_mapping_folio() # child VMA: # no PTE, skip copy_page_range() # copies PTE # parent VMA: # zaps PTE filemap_remove_folio() # mapcount=3D1, BUG! filemap_map_pages() is called directly as .map_pages, bypassing shmem_fault()'s i_private synchronization. Race 2 =E2=80=94 shmem_fault TOCTOU: fallocate shmem_fault -------- ----------- check i_private =E2=86=92 NULL set i_private unmap_mapping_range() # zaps PTEs shmem_get_folio_gfp() # finds folio in cache finish_fault() # installs PTE shmem_undo_range() truncate_inode_folio() # mapcount=3D1, BUG! Fix both races with invalidate_lock. This matches the existing pattern used by secretmem_fault(), udf_page_mkwrite(), and zonefs_filemap_page_mkwrite(), all of which take invalidate_lock shared under mmap_lock in their fault handlers. This also requires removing the rcu_read_lock() from do_fault_around() so that .map_pages may use sleeping locks. The outer rcu_read_lock is redundant for all in-tree .map_pages implementations: every one either IS filemap_map_pages (which takes rcu_read_lock) or is a thin wrapper around it. Fixes: d7c1755179b8 ("mm: implement ->map_pages for shmem/tmpfs") Cc: stable@vger.kernel.org Signed-off-by: Gregory Price --- mm/memory.c | 2 -- mm/shmem.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index e44469f9cf65..838583591fdf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5900,11 +5900,9 @@ static vm_fault_t do_fault_around(struct vm_fault *v= mf) return VM_FAULT_OOM; } =20 - rcu_read_lock(); ret =3D vmf->vma->vm_ops->map_pages(vmf, vmf->pgoff + from_pte - pte_off, vmf->pgoff + to_pte - pte_off); - rcu_read_unlock(); =20 return ret; } diff --git a/mm/shmem.c b/mm/shmem.c index 4ecefe02881d..5c654b86f3cf 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2731,7 +2731,8 @@ static vm_fault_t shmem_falloc_wait(struct vm_fault *= vmf, struct inode *inode) static vm_fault_t shmem_fault(struct vm_fault *vmf) { struct inode *inode =3D file_inode(vmf->vma->vm_file); - gfp_t gfp =3D mapping_gfp_mask(inode->i_mapping); + struct address_space *mapping =3D inode->i_mapping; + gfp_t gfp =3D mapping_gfp_mask(mapping); struct folio *folio =3D NULL; vm_fault_t ret =3D 0; int err; @@ -2747,8 +2748,15 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) } =20 WARN_ON_ONCE(vmf->page !=3D NULL); + /* + * shmem_fallocate(PUNCH_HOLE) holds invalidate_lock exclusive across + * unmap+truncate. Take it shared here so shmem_fault cannot obtain + * a folio in the process of being punched. + */ + filemap_invalidate_lock_shared(mapping); err =3D shmem_get_folio_gfp(inode, vmf->pgoff, 0, &folio, SGP_CACHE, gfp, vmf, &ret); + filemap_invalidate_unlock_shared(mapping); if (err) return vmf_error(err); if (folio) { @@ -3683,11 +3691,13 @@ static long shmem_fallocate(struct file *file, int = mode, loff_t offset, inode->i_private =3D &shmem_falloc; spin_unlock(&inode->i_lock); =20 + filemap_invalidate_lock(mapping); if ((u64)unmap_end > (u64)unmap_start) unmap_mapping_range(mapping, unmap_start, 1 + unmap_end - unmap_start, 0); shmem_truncate_range(inode, offset, offset + len - 1); /* No need to unmap again: hole-punching leaves COWed pages */ + filemap_invalidate_unlock(mapping); =20 spin_lock(&inode->i_lock); inode->i_private =3D NULL; @@ -5268,9 +5278,26 @@ static const struct super_operations shmem_ops =3D { #endif }; =20 +/* + * shmem_fallocate(PUNCH_HOLE) holds invalidate_lock for write across + * unmap+truncate. Take it for read here so fault-around cannot re-map + * pages being punched. + */ +static vm_fault_t shmem_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct address_space *mapping =3D vmf->vma->vm_file->f_mapping; + vm_fault_t ret; + + filemap_invalidate_lock_shared(mapping); + ret =3D filemap_map_pages(vmf, start_pgoff, end_pgoff); + filemap_invalidate_unlock_shared(mapping); + return ret; +} + static const struct vm_operations_struct shmem_vm_ops =3D { .fault =3D shmem_fault, - .map_pages =3D filemap_map_pages, + .map_pages =3D shmem_map_pages, #ifdef CONFIG_NUMA .set_policy =3D shmem_set_policy, .get_policy =3D shmem_get_policy, @@ -5282,7 +5309,7 @@ static const struct vm_operations_struct shmem_vm_ops= =3D { =20 static const struct vm_operations_struct shmem_anon_vm_ops =3D { .fault =3D shmem_fault, - .map_pages =3D filemap_map_pages, + .map_pages =3D shmem_map_pages, #ifdef CONFIG_NUMA .set_policy =3D shmem_set_policy, .get_policy =3D shmem_get_policy, --=20 2.53.0