From nobody Thu Jan 8 10:53:43 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40C4CE748F3 for ; Sun, 1 Oct 2023 00:58:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234252AbjJAA6H (ORCPT ); Sat, 30 Sep 2023 20:58:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234231AbjJAA6G (ORCPT ); Sat, 30 Sep 2023 20:58:06 -0400 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F7ABCA for ; Sat, 30 Sep 2023 17:58:04 -0700 (PDT) Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qmklZ-0008G8-0N; Sat, 30 Sep 2023 20:57:01 -0400 From: riel@surriel.com To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org, Rik van Riel Subject: [PATCH 1/3] hugetlbfs: extend hugetlb_vma_lock to private VMAs Date: Sat, 30 Sep 2023 20:55:48 -0400 Message-ID: <20231001005659.2185316-2-riel@surriel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231001005659.2185316-1-riel@surriel.com> References: <20231001005659.2185316-1-riel@surriel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: riel@surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Rik van Riel Extend the locking scheme used to protect shared hugetlb mappings from truncate vs page fault races, in order to protect private hugetlb mappings (with resv_map) against MADV_DONTNEED. Add a read-write semaphore to the resv_map data structure, and use that from the hugetlb_vma_(un)lock_* functions, in preparation for closing the race between MADV_DONTNEED and page faults. Signed-off-by: Rik van Riel Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 41 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5b2626063f4f..694928fa06a3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -60,6 +60,7 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; + struct rw_semaphore rw_sema; #ifdef CONFIG_CGROUP_HUGETLB /* * On private mappings, the counter to uncharge reservations is stored @@ -1231,6 +1232,11 @@ static inline bool __vma_shareable_lock(struct vm_ar= ea_struct *vma) return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; } =20 +static inline bool __vma_private_lock(struct vm_area_struct *vma) +{ + return (!(vma->vm_flags & VM_MAYSHARE)) && vma->vm_private_data; +} + /* * Safe version of huge_pte_offset() to check the locks. See comments * above huge_pte_offset(). diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ba6d39b71cb1..ee7497f37098 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -97,6 +97,7 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct = *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); +static struct resv_map *vma_resv_map(struct vm_area_struct *vma); =20 static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -267,6 +268,10 @@ void hugetlb_vma_lock_read(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 down_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + down_read(&resv_map->rw_sema); } } =20 @@ -276,6 +281,10 @@ void hugetlb_vma_unlock_read(struct vm_area_struct *vm= a) struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 up_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + up_read(&resv_map->rw_sema); } } =20 @@ -285,6 +294,10 @@ void hugetlb_vma_lock_write(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 down_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + down_write(&resv_map->rw_sema); } } =20 @@ -294,17 +307,27 @@ void hugetlb_vma_unlock_write(struct vm_area_struct *= vma) struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 up_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + up_write(&resv_map->rw_sema); } } =20 int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 - if (!__vma_shareable_lock(vma)) - return 1; + if (__vma_shareable_lock(vma)) { + struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 - return down_write_trylock(&vma_lock->rw_sema); + return down_write_trylock(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + return down_write_trylock(&resv_map->rw_sema); + } + + return 1; } =20 void hugetlb_vma_assert_locked(struct vm_area_struct *vma) @@ -313,6 +336,10 @@ void hugetlb_vma_assert_locked(struct vm_area_struct *= vma) struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 lockdep_assert_held(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + lockdep_assert_held(&resv_map->rw_sema); } } =20 @@ -345,6 +372,11 @@ static void __hugetlb_vma_unlock_write_free(struct vm_= area_struct *vma) struct hugetlb_vma_lock *vma_lock =3D vma->vm_private_data; =20 __hugetlb_vma_unlock_write_put(vma_lock); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map =3D vma_resv_map(vma); + + /* no free for anon vmas, but still need to unlock */ + up_write(&resv_map->rw_sema); } } =20 @@ -1068,6 +1100,7 @@ struct resv_map *resv_map_alloc(void) kref_init(&resv_map->refs); spin_lock_init(&resv_map->lock); INIT_LIST_HEAD(&resv_map->regions); + init_rwsem(&resv_map->rw_sema); =20 resv_map->adds_in_progress =3D 0; /* --=20 2.41.0