From nobody Mon Jun 15 19:26:53 2026 Received: from mailgw1.hygon.cn (unknown [101.204.27.37]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5A156348C5C; Thu, 11 Jun 2026 06:47:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.204.27.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781160463; cv=none; b=XW5BwATBhcUg020wp6nVEh6a4TBvdZrI1u7GV872qEGiwoapfE256Kd7ksSRAypT+e8Mar1gXc5DsO8NX0thNi+NV6x5QfTDOlhLrLOz1yX6sq22r49GPuZFzHcRQbj1eBO3THfeCUOKzZ1mS/VFGLRA+2B0w1h7Loe2xm/VA0A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781160463; c=relaxed/simple; bh=NwnuO8Z0UsG+3NUpzyI9s9wy3riFEcPQ4uqsjijXHjQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kGlF6pNQK6eoSS0y6DpE4XzDAEVJmi3s5grsg59atztoELRWGKa1lMJbeDEMXRg3lyXhZ+Eo5qPObS7a3MInhqKTcSKNcGqJTZTabdNm8S9Zn8qfW0uDfwdv96caS/QMWNwSPbQPWI0wonlD+lP/0ikJN1josPgVy4Q52MFq7GY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn; spf=pass smtp.mailfrom=hygon.cn; arc=none smtp.client-ip=101.204.27.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hygon.cn Received: from maildlp1.hygon.cn (unknown [127.0.0.1]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXbn74Ckz12NXN; Thu, 11 Jun 2026 14:21:41 +0800 (CST) Received: from maildlp1.hygon.cn (unknown [172.23.18.60]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXbn6tJgz12NXN; Thu, 11 Jun 2026 14:21:41 +0800 (CST) Received: from cncheex04.Hygon.cn (unknown [172.23.18.114]) by maildlp1.hygon.cn (Postfix) with ESMTPS id D585816CF; Thu, 11 Jun 2026 14:21:23 +0800 (CST) Received: from hsj-2U-Workstation.hygon.cn (172.19.20.61) by cncheex04.Hygon.cn (172.23.18.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Thu, 11 Jun 2026 14:21:36 +0800 From: Huang Shijie To: , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Huang Shijie Subject: [PATCH v2 1/4] mm: use mapping_mapped to simplify the code Date: Thu, 11 Jun 2026 14:18:57 +0800 Message-ID: <20260611061915.2354307-2-huangsj@hygon.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260611061915.2354307-1-huangsj@hygon.cn> References: <20260611061915.2354307-1-huangsj@hygon.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: cncheex05.Hygon.cn (172.23.18.115) To cncheex04.Hygon.cn (172.23.18.114) Content-Type: text/plain; charset="utf-8" Use mapping_mapped() to simplify the code, make the code tidy and clean. Signed-off-by: Huang Shijie Reviewed-by: Lorenzo Stoakes Reviewed-by: Pedro Falcato --- fs/hugetlbfs/inode.c | 4 ++-- mm/memory.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 78d61bf2bd9b..216e1a0dd0b2 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -614,7 +614,7 @@ static void hugetlb_vmtruncate(struct inode *inode, lof= f_t offset) =20 i_size_write(inode, offset); i_mmap_lock_write(mapping); - if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) + if (mapping_mapped(mapping)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); @@ -675,7 +675,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, l= off_t offset, loff_t len) =20 /* Unmap users of full pages in the hole. */ if (hole_end > hole_start) { - if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) + if (mapping_mapped(mapping)) hugetlb_vmdelete_list(&mapping->i_mmap, hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT, 0); diff --git a/mm/memory.c b/mm/memory.c index 86a973119bd4..5335077765e2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4386,7 +4386,7 @@ void unmap_mapping_folio(struct folio *folio) details.zap_flags =3D ZAP_FLAG_DROP_MARKER; =20 i_mmap_lock_read(mapping); - if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) + if (unlikely(mapping_mapped(mapping))) unmap_mapping_range_tree(&mapping->i_mmap, first_index, last_index, &details); i_mmap_unlock_read(mapping); @@ -4416,7 +4416,7 @@ void unmap_mapping_pages(struct address_space *mappin= g, pgoff_t start, last_index =3D ULONG_MAX; =20 i_mmap_lock_read(mapping); - if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) + if (unlikely(mapping_mapped(mapping))) unmap_mapping_range_tree(&mapping->i_mmap, first_index, last_index, &details); i_mmap_unlock_read(mapping); --=20 2.53.0 From nobody Mon Jun 15 19:26:53 2026 Received: from mailgw1.hygon.cn (unknown [101.204.27.37]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5A0B8345CAA; Thu, 11 Jun 2026 06:47:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.204.27.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781160464; cv=none; b=KKKRBosBuP+5A/LiAwSqGg0IdUpLINICOehD93YKIAm2dnnmt5T3IYOa49hjxoWZZrHzGnwXUsOsfRok3GqouWXDVqFm8bji+Xy5+AtgMp7QwYGyYGgMiGa9aDz+AxsEBtV2KNiK0EBmhATClGPW2pBozUZldfdKPZZwvt5jamA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781160464; c=relaxed/simple; bh=6Z8g1aKEb20QIlsuTPzMKT4oAaz6opjg2ssdEUfk2rU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=o0iYKEgpagvpQ5sjC1sBDzjrp65oAV4TQsmHFfBFwg/nTqZh9TsWPrlBucucAw0rzFtkX7B3+ZWtd6235FY5RL9dERio9m132k2nlRd2g0uRJI1nECQL0/6zap4ZkCwGbyIpL/KgqArNHcxDO8kslN2R1x/oG9Hn6oNzwKTWtDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn; spf=pass smtp.mailfrom=hygon.cn; arc=none smtp.client-ip=101.204.27.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hygon.cn Received: from maildlp2.hygon.cn (unknown [127.0.0.1]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXbv747zz1dd8h; Thu, 11 Jun 2026 14:21:47 +0800 (CST) Received: from maildlp2.hygon.cn (unknown [172.23.18.61]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXbv3FYwz1dd8h; Thu, 11 Jun 2026 14:21:47 +0800 (CST) Received: from cncheex04.Hygon.cn (unknown [172.23.18.114]) by maildlp2.hygon.cn (Postfix) with ESMTPS id 164A430004DB; Thu, 11 Jun 2026 14:20:22 +0800 (CST) Received: from hsj-2U-Workstation.hygon.cn (172.19.20.61) by cncheex04.Hygon.cn (172.23.18.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Thu, 11 Jun 2026 14:21:41 +0800 From: Huang Shijie To: , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Huang Shijie Subject: [PATCH v2 2/4] mm: use get_i_mmap_root to access the file's i_mmap Date: Thu, 11 Jun 2026 14:18:58 +0800 Message-ID: <20260611061915.2354307-3-huangsj@hygon.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260611061915.2354307-1-huangsj@hygon.cn> References: <20260611061915.2354307-1-huangsj@hygon.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: cncheex05.Hygon.cn (172.23.18.115) To cncheex04.Hygon.cn (172.23.18.114) Content-Type: text/plain; charset="utf-8" Do not access the file's i_mmap directly, use get_i_mmap_root() to access it. This patch makes preparations for later patch. Signed-off-by: Huang Shijie --- arch/arm/mm/fault-armv.c | 3 ++- arch/arm/mm/flush.c | 3 ++- arch/nios2/mm/cacheflush.c | 3 ++- arch/parisc/kernel/cache.c | 4 +++- fs/dax.c | 3 ++- fs/hugetlbfs/inode.c | 6 +++--- include/linux/fs.h | 5 +++++ include/linux/mm.h | 1 + kernel/events/uprobes.c | 3 ++- mm/hugetlb.c | 7 +++++-- mm/khugepaged.c | 6 ++++-- mm/memory-failure.c | 8 +++++--- mm/memory.c | 4 ++-- mm/mmap.c | 2 +- mm/nommu.c | 9 +++++---- mm/pagewalk.c | 2 +- mm/rmap.c | 2 +- mm/vma.c | 14 ++++++++------ 18 files changed, 54 insertions(+), 31 deletions(-) diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 91e488767783..1b5fe151e805 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -126,6 +126,7 @@ make_coherent(struct address_space *mapping, struct vm_= area_struct *vma, { const unsigned long pmd_start_addr =3D ALIGN_DOWN(addr, PMD_SIZE); const unsigned long pmd_end_addr =3D pmd_start_addr + PMD_SIZE; + struct rb_root_cached *root =3D get_i_mmap_root(mapping); struct mm_struct *mm =3D vma->vm_mm; struct vm_area_struct *mpnt; unsigned long offset; @@ -140,7 +141,7 @@ make_coherent(struct address_space *mapping, struct vm_= area_struct *vma, * cache coherency. */ flush_dcache_mmap_lock(mapping); - vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { + vma_interval_tree_foreach(mpnt, root, pgoff, pgoff) { /* * If we are using split PTE locks, then we need to take the pte * lock. Otherwise we are using shared mm->page_table_lock which diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 4d7ef5cc36b6..01588df81bfc 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -238,6 +238,7 @@ void __flush_dcache_folio(struct address_space *mapping= , struct folio *folio) static void __flush_dcache_aliases(struct address_space *mapping, struct f= olio *folio) { struct mm_struct *mm =3D current->active_mm; + struct rb_root_cached *root =3D get_i_mmap_root(mapping); struct vm_area_struct *vma; pgoff_t pgoff, pgoff_end; =20 @@ -251,7 +252,7 @@ static void __flush_dcache_aliases(struct address_space= *mapping, struct folio * pgoff_end =3D pgoff + folio_nr_pages(folio) - 1; =20 flush_dcache_mmap_lock(mapping); - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) { + vma_interval_tree_foreach(vma, root, pgoff, pgoff_end) { unsigned long start, offset, pfn; unsigned int nr; =20 diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c index 8321182eb927..ab6e064fabe2 100644 --- a/arch/nios2/mm/cacheflush.c +++ b/arch/nios2/mm/cacheflush.c @@ -78,11 +78,12 @@ static void flush_aliases(struct address_space *mapping= , struct folio *folio) unsigned long flags; pgoff_t pgoff; unsigned long nr =3D folio_nr_pages(folio); + struct rb_root_cached *root =3D get_i_mmap_root(mapping); =20 pgoff =3D folio->index; =20 flush_dcache_mmap_lock_irqsave(mapping, flags); - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) { + vma_interval_tree_foreach(vma, root, pgoff, pgoff + nr - 1) { unsigned long start; =20 if (vma->vm_mm !=3D mm) diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index 0170b69a21d3..f99dffd6cc22 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -473,6 +473,7 @@ static inline unsigned long get_upa(struct mm_struct *m= m, unsigned long addr) void flush_dcache_folio(struct folio *folio) { struct address_space *mapping =3D folio_flush_mapping(folio); + struct rb_root_cached *root; struct vm_area_struct *vma; unsigned long addr, old_addr =3D 0; void *kaddr; @@ -494,6 +495,7 @@ void flush_dcache_folio(struct folio *folio) return; =20 pgoff =3D folio->index; + root =3D get_i_mmap_root(mapping); =20 /* * We have carefully arranged in arch_get_unmapped_area() that @@ -503,7 +505,7 @@ void flush_dcache_folio(struct folio *folio) * on machines that support equivalent aliasing */ flush_dcache_mmap_lock_irqsave(mapping, flags); - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) { + vma_interval_tree_foreach(vma, root, pgoff, pgoff + nr - 1) { unsigned long offset =3D pgoff - vma->vm_pgoff; unsigned long pfn =3D folio_pfn(folio); =20 diff --git a/fs/dax.c b/fs/dax.c index 6d175cd47a99..d402edc3c1b8 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1138,6 +1138,7 @@ static int dax_writeback_one(struct xa_state *xas, st= ruct dax_device *dax_dev, struct address_space *mapping, void *entry) { unsigned long pfn, index, count, end; + struct rb_root_cached *root =3D get_i_mmap_root(mapping); long ret =3D 0; struct vm_area_struct *vma; =20 @@ -1201,7 +1202,7 @@ static int dax_writeback_one(struct xa_state *xas, st= ruct dax_device *dax_dev, =20 /* Walk all mappings of a given index of a file and writeprotect them */ i_mmap_lock_read(mapping); - vma_interval_tree_foreach(vma, &mapping->i_mmap, index, end) { + vma_interval_tree_foreach(vma, root, index, end) { pfn_mkclean_range(pfn, count, index, vma); cond_resched(); } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 216e1a0dd0b2..da5b41ea5bdd 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -380,7 +380,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h, struct address_space *mapping, struct folio *folio, pgoff_t index) { - struct rb_root_cached *root =3D &mapping->i_mmap; + struct rb_root_cached *root =3D get_i_mmap_root(mapping); struct hugetlb_vma_lock *vma_lock; unsigned long pfn =3D folio_pfn(folio); struct vm_area_struct *vma; @@ -615,7 +615,7 @@ static void hugetlb_vmtruncate(struct inode *inode, lof= f_t offset) i_size_write(inode, offset); i_mmap_lock_write(mapping); if (mapping_mapped(mapping)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + hugetlb_vmdelete_list(get_i_mmap_root(mapping), pgoff, 0, ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); @@ -676,7 +676,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, l= off_t offset, loff_t len) /* Unmap users of full pages in the hole. */ if (hole_end > hole_start) { if (mapping_mapped(mapping)) - hugetlb_vmdelete_list(&mapping->i_mmap, + hugetlb_vmdelete_list(get_i_mmap_root(mapping), hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT, 0); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 11559c513dfb..cd46615b8f53 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -556,6 +556,11 @@ static inline int mapping_mapped(const struct address_= space *mapping) return !RB_EMPTY_ROOT(&mapping->i_mmap.rb_root); } =20 +static inline struct rb_root_cached *get_i_mmap_root(struct address_space = *mapping) +{ + return &mapping->i_mmap; +} + /* * Might pages of this file have been modified in userspace? * Note that i_mmap_writable counts all VM_SHARED, VM_MAYWRITE vmas: do_mm= ap diff --git a/include/linux/mm.h b/include/linux/mm.h index 06bbe9eba636..0a45c6a8b9f2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4041,6 +4041,7 @@ struct vm_area_struct *vma_interval_tree_iter_first(s= truct rb_root_cached *root, struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *= node, unsigned long start, unsigned long last); =20 +/* Please use get_i_mmap_root() to get the @root */ #define vma_interval_tree_foreach(vma, root, start, last) \ for (vma =3D vma_interval_tree_iter_first(root, start, last); \ vma; vma =3D vma_interval_tree_iter_next(vma, start, last)) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4084e926e284..d8561a42aec8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1201,6 +1201,7 @@ static inline struct map_info *free_map_info(struct m= ap_info *info) static struct map_info * build_map_info(struct address_space *mapping, loff_t offset, bool is_regis= ter) { + struct rb_root_cached *root =3D get_i_mmap_root(mapping); unsigned long pgoff =3D offset >> PAGE_SHIFT; struct vm_area_struct *vma; struct map_info *curr =3D NULL; @@ -1210,7 +1211,7 @@ build_map_info(struct address_space *mapping, loff_t = offset, bool is_register) =20 again: i_mmap_lock_read(mapping); - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + vma_interval_tree_foreach(vma, root, pgoff, pgoff) { if (!valid_vma(vma, is_register)) continue; =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4b80b167cc9c..8bc49d57a116 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5360,6 +5360,7 @@ static void unmap_ref_private(struct mm_struct *mm, s= truct vm_area_struct *vma, struct hstate *h =3D hstate_vma(vma); struct vm_area_struct *iter_vma; struct address_space *mapping; + struct rb_root_cached *root; pgoff_t pgoff; =20 /* @@ -5370,6 +5371,7 @@ static void unmap_ref_private(struct mm_struct *mm, s= truct vm_area_struct *vma, pgoff =3D ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; mapping =3D vma->vm_file->f_mapping; + root =3D get_i_mmap_root(mapping); =20 /* * Take the mapping lock for the duration of the table walk. As @@ -5377,7 +5379,7 @@ static void unmap_ref_private(struct mm_struct *mm, s= truct vm_area_struct *vma, * __unmap_hugepage_range() is called as the lock is already held */ i_mmap_lock_write(mapping); - vma_interval_tree_foreach(iter_vma, &mapping->i_mmap, pgoff, pgoff) { + vma_interval_tree_foreach(iter_vma, root, pgoff, pgoff) { /* Do not unmap the current VMA */ if (iter_vma =3D=3D vma) continue; @@ -6850,6 +6852,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm= _area_struct *vma, unsigned long addr, pud_t *pud) { struct address_space *mapping =3D vma->vm_file->f_mapping; + struct rb_root_cached *root =3D get_i_mmap_root(mapping); pgoff_t idx =3D ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; struct vm_area_struct *svma; @@ -6858,7 +6861,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm= _area_struct *vma, pte_t *pte; =20 i_mmap_lock_read(mapping); - vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { + vma_interval_tree_foreach(svma, root, idx, idx) { if (svma =3D=3D vma) continue; =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b8452dbdb043..0f577e4a2ccd 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1773,10 +1773,11 @@ static bool file_backed_vma_is_retractable(struct v= m_area_struct *vma) =20 static void retract_page_tables(struct address_space *mapping, pgoff_t pgo= ff) { + struct rb_root_cached *root =3D get_i_mmap_root(mapping); struct vm_area_struct *vma; =20 i_mmap_lock_read(mapping); - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + vma_interval_tree_foreach(vma, root, pgoff, pgoff) { struct mmu_notifier_range range; struct mm_struct *mm; unsigned long addr; @@ -2194,7 +2195,8 @@ static enum scan_result collapse_file(struct mm_struc= t *mm, unsigned long addr, * not be able to observe any missing pages due to the * previously inserted retry entries. */ - vma_interval_tree_foreach(vma, &mapping->i_mmap, start, end) { + vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), + start, end) { if (userfaultfd_missing(vma)) { result =3D SCAN_EXCEED_NONE_PTE; goto immap_locked; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ee42d4361309..85196d9bb26c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -598,7 +598,7 @@ static void collect_procs_file(const struct folio *foli= o, =20 if (!t) continue; - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, + vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), pgoff, pgoff) { /* * Send early kill signal to tasks where a vma covers @@ -650,7 +650,8 @@ static void collect_procs_fsdax(const struct page *page, t =3D task_early_kill(tsk, true); if (!t) continue; - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), pgoff, + pgoff) { if (vma->vm_mm =3D=3D t->mm) add_to_kill_fsdax(t, page, vma, to_kill, pgoff); } @@ -2251,7 +2252,8 @@ static void collect_procs_pfn(struct pfn_address_spac= e *pfn_space, t =3D task_early_kill(tsk, true); if (!t) continue; - vma_interval_tree_foreach(vma, &mapping->i_mmap, 0, ULONG_MAX) { + vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), + 0, ULONG_MAX) { pgoff_t pgoff; =20 if (vma->vm_mm =3D=3D t->mm && diff --git a/mm/memory.c b/mm/memory.c index 5335077765e2..9ea5d6c8ef4d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4387,7 +4387,7 @@ void unmap_mapping_folio(struct folio *folio) =20 i_mmap_lock_read(mapping); if (unlikely(mapping_mapped(mapping))) - unmap_mapping_range_tree(&mapping->i_mmap, first_index, + unmap_mapping_range_tree(get_i_mmap_root(mapping), first_index, last_index, &details); i_mmap_unlock_read(mapping); } @@ -4417,7 +4417,7 @@ void unmap_mapping_pages(struct address_space *mappin= g, pgoff_t start, =20 i_mmap_lock_read(mapping); if (unlikely(mapping_mapped(mapping))) - unmap_mapping_range_tree(&mapping->i_mmap, first_index, + unmap_mapping_range_tree(get_i_mmap_root(mapping), first_index, last_index, &details); i_mmap_unlock_read(mapping); } diff --git a/mm/mmap.c b/mm/mmap.c index 5754d1c36462..d714fdb357e5 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1831,7 +1831,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, s= truct mm_struct *oldmm) flush_dcache_mmap_lock(mapping); /* insert tmp into the share list, just after mpnt */ vma_interval_tree_insert_after(tmp, mpnt, - &mapping->i_mmap); + get_i_mmap_root(mapping)); flush_dcache_mmap_unlock(mapping); i_mmap_unlock_write(mapping); } diff --git a/mm/nommu.c b/mm/nommu.c index ed3934bc2de4..0f18ffc658e9 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -569,7 +569,7 @@ static void setup_vma_to_mm(struct vm_area_struct *vma,= struct mm_struct *mm) =20 i_mmap_lock_write(mapping); flush_dcache_mmap_lock(mapping); - vma_interval_tree_insert(vma, &mapping->i_mmap); + vma_interval_tree_insert(vma, get_i_mmap_root(mapping)); flush_dcache_mmap_unlock(mapping); i_mmap_unlock_write(mapping); } @@ -585,7 +585,7 @@ static void cleanup_vma_from_mm(struct vm_area_struct *= vma) =20 i_mmap_lock_write(mapping); flush_dcache_mmap_lock(mapping); - vma_interval_tree_remove(vma, &mapping->i_mmap); + vma_interval_tree_remove(vma, get_i_mmap_root(mapping)); flush_dcache_mmap_unlock(mapping); i_mmap_unlock_write(mapping); } @@ -1804,6 +1804,7 @@ EXPORT_SYMBOL_GPL(copy_remote_vm_str); int nommu_shrink_inode_mappings(struct inode *inode, size_t size, size_t newsize) { + struct rb_root_cached *root =3D get_i_mmap_root(&inode->i_mapping); struct vm_area_struct *vma; struct vm_region *region; pgoff_t low, high; @@ -1816,7 +1817,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, = size_t size, i_mmap_lock_read(inode->i_mapping); =20 /* search for VMAs that fall within the dead zone */ - vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, low, high) { + vma_interval_tree_foreach(vma, root, low, high) { /* found one - only interested if it's shared out of the page * cache */ if (vma->vm_flags & VM_SHARED) { @@ -1832,7 +1833,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, = size_t size, * we don't check for any regions that start beyond the EOF as there * shouldn't be any */ - vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, 0, ULONG_MAX) { + vma_interval_tree_foreach(vma, root, 0, ULONG_MAX) { if (!(vma->vm_flags & VM_SHARED)) continue; =20 diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 3ae2586ff45b..8df1b5077951 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -810,7 +810,7 @@ int walk_page_mapping(struct address_space *mapping, pg= off_t first_index, return -EINVAL; =20 lockdep_assert_held(&mapping->i_mmap_rwsem); - vma_interval_tree_foreach(vma, &mapping->i_mmap, first_index, + vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), first_index, first_index + nr - 1) { /* Clip to the vma */ vba =3D vma->vm_pgoff; diff --git a/mm/rmap.c b/mm/rmap.c index 99e1b3dc390b..6cfcdb96071f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -3051,7 +3051,7 @@ static void __rmap_walk_file(struct folio *folio, str= uct address_space *mapping, i_mmap_lock_read(mapping); } lookup: - vma_interval_tree_foreach(vma, &mapping->i_mmap, + vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), pgoff_start, pgoff_end) { unsigned long address =3D vma_address(vma, pgoff_start, nr_pages); =20 diff --git a/mm/vma.c b/mm/vma.c index d90791b00a7b..6159650c1b42 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -234,7 +234,7 @@ static void __vma_link_file(struct vm_area_struct *vma, mapping_allow_writable(mapping); =20 flush_dcache_mmap_lock(mapping); - vma_interval_tree_insert(vma, &mapping->i_mmap); + vma_interval_tree_insert(vma, get_i_mmap_root(mapping)); flush_dcache_mmap_unlock(mapping); } =20 @@ -248,7 +248,7 @@ static void __remove_shared_vm_struct(struct vm_area_st= ruct *vma, mapping_unmap_writable(mapping); =20 flush_dcache_mmap_lock(mapping); - vma_interval_tree_remove(vma, &mapping->i_mmap); + vma_interval_tree_remove(vma, get_i_mmap_root(mapping)); flush_dcache_mmap_unlock(mapping); } =20 @@ -319,10 +319,11 @@ static void vma_prepare(struct vma_prepare *vp) =20 if (vp->file) { flush_dcache_mmap_lock(vp->mapping); - vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap); + vma_interval_tree_remove(vp->vma, + get_i_mmap_root(vp->mapping)); if (vp->adj_next) vma_interval_tree_remove(vp->adj_next, - &vp->mapping->i_mmap); + get_i_mmap_root(vp->mapping)); } =20 } @@ -341,8 +342,9 @@ static void vma_complete(struct vma_prepare *vp, struct= vma_iterator *vmi, if (vp->file) { if (vp->adj_next) vma_interval_tree_insert(vp->adj_next, - &vp->mapping->i_mmap); - vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap); + get_i_mmap_root(vp->mapping)); + vma_interval_tree_insert(vp->vma, + get_i_mmap_root(vp->mapping)); flush_dcache_mmap_unlock(vp->mapping); } =20 --=20 2.53.0 From nobody Mon Jun 15 19:26:53 2026 Received: from mailgw1.hygon.cn (unknown [101.204.27.37]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 85EAD353EF7; Thu, 11 Jun 2026 06:21:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.204.27.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781158937; cv=none; b=Xu207OF2lj0cdy0fow/drC7bwXFn733Cop2DRC3memnflFj3cp+l43Kmb2mwcbe3MVbyrxd9Rb4ocIHAzxHDAratFeyoDdAw/QE9F7zkHyhXch9lue7CpdaFWRKV7Mef/vUfTcfqdVL09gYQWFHpd7soAT+JLqe34RyKRPVWOMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781158937; c=relaxed/simple; bh=msfuZK5xnq8l7vUOg/LunQoZKRINmtsh4mc1cJoHDU8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AOWibXtqdP2l7vZJsFojcyMpnI+6CZRMY5CcnPIh4PiWb/apjg87ZTnJ1RVmSXeBmPVf2HhPFcpN+gApY7HZAVzjEh3mcoaf0KX7U+bISEXyIrMWH9j+et+5K/R4t0AbENOOkZS7z50k+X1RwulEAXXLYJ2SKszuPe9AyLBnCY4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn; spf=pass smtp.mailfrom=hygon.cn; arc=none smtp.client-ip=101.204.27.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hygon.cn Received: from maildlp1.hygon.cn (unknown [127.0.0.1]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXc174NCz1dd8l; Thu, 11 Jun 2026 14:21:53 +0800 (CST) Received: from maildlp1.hygon.cn (unknown [172.23.18.60]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXc12fj1z1dd8l; Thu, 11 Jun 2026 14:21:53 +0800 (CST) Received: from cncheex04.Hygon.cn (unknown [172.23.18.114]) by maildlp1.hygon.cn (Postfix) with ESMTPS id 510F816CF; Thu, 11 Jun 2026 14:21:35 +0800 (CST) Received: from hsj-2U-Workstation.hygon.cn (172.19.20.61) by cncheex04.Hygon.cn (172.23.18.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Thu, 11 Jun 2026 14:21:47 +0800 From: Huang Shijie To: , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Huang Shijie Subject: [PATCH v2 3/4] mm/fs: split the file's i_mmap tree Date: Thu, 11 Jun 2026 14:18:59 +0800 Message-ID: <20260611061915.2354307-4-huangsj@hygon.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260611061915.2354307-1-huangsj@hygon.cn> References: <20260611061915.2354307-1-huangsj@hygon.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: cncheex05.Hygon.cn (172.23.18.115) To cncheex04.Hygon.cn (172.23.18.114) Content-Type: text/plain; charset="utf-8" In the UnixBench tests, there is a test "execl" which tests the execve system call. For example, a Hygon's server has 12 NUMA nodes, and 384 CPUs. When we test our server with "./Run -c 384 execl", the test result is not good enough. The i_mmap locks contended heavily on "libc.so" and "ld.so". The i_mmap tree for "libc.so" can be over 6000 VMAs, all the VMAs can be in different NUMA mode. The insert/remo= ve operations do not run quickly enough. In order to reduce the competition of the i_mmap lock, this patch does following: 1.) Split the single i_mmap tree into several sibling trees: Each tree has a lock. The CONFIG_SPLIT_I_MMAP is used to turn on/off this feature. 2.) Introduce a new field "tree_idx" for vm_area_struct to save the sibling tree index for this VMA. 3.) Introduce a new field "vma_count" for address_space. The new mapping_mapped() will use it. 4.) Rewrite the vma_interval_tree_foreach() 5.) Rewrite the lock functions.=09 After this patch, the VMA insert/remove operations will work faster, and we can get over 400% performance improvement with the above test. Signed-off-by: Huang Shijie --- fs/Kconfig | 8 ++ fs/hugetlbfs/inode.c | 20 ++++- fs/inode.c | 75 ++++++++++++++++- include/linux/fs.h | 174 ++++++++++++++++++++++++++++++++++++++- include/linux/mm.h | 80 ++++++++++++++++++ include/linux/mm_types.h | 3 + mm/internal.h | 3 +- mm/mmap.c | 11 ++- mm/nommu.c | 23 ++++-- mm/pagewalk.c | 2 +- mm/vma.c | 72 +++++++++++----- mm/vma_init.c | 3 + 12 files changed, 436 insertions(+), 38 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index 43cb06de297f..e24804f70432 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -9,6 +9,14 @@ menu "File systems" config DCACHE_WORD_ACCESS bool =20 +config SPLIT_I_MMAP + bool "Split the file's i_mmap to several trees" + default n + help + Split the file's i_mmap to several trees, each tree has a separate + lock. This will reduce the lock contention of file's i_mmap tree, + but it will cost more memory for per inode. + config VALIDATE_FS_PARSER bool "Validate filesystem parameter description" help diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index da5b41ea5bdd..68d8308418dd 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -891,6 +891,23 @@ static struct inode *hugetlbfs_get_root(struct super_b= lock *sb, */ static struct lock_class_key hugetlbfs_i_mmap_rwsem_key; =20 +#ifdef CONFIG_SPLIT_I_MMAP +static void hugetlbfs_lockdep_set_class(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) { + lockdep_set_class(&mapping->i_mmap[i].rwsem, + &hugetlbfs_i_mmap_rwsem_key); + } +} +#else +static void hugetlbfs_lockdep_set_class(struct address_space *mapping) +{ + lockdep_set_class(&mapping->i_mmap_rwsem, &hugetlbfs_i_mmap_rwsem_key); +} +#endif + static struct inode *hugetlbfs_get_inode(struct super_block *sb, struct mnt_idmap *idmap, struct inode *dir, @@ -915,8 +932,7 @@ static struct inode *hugetlbfs_get_inode(struct super_b= lock *sb, =20 inode->i_ino =3D get_next_ino(); inode_init_owner(idmap, inode, dir, mode); - lockdep_set_class(&inode->i_mapping->i_mmap_rwsem, - &hugetlbfs_i_mmap_rwsem_key); + hugetlbfs_lockdep_set_class(inode->i_mapping); inode->i_mapping->a_ops =3D &hugetlbfs_aops; simple_inode_init_ts(inode); info->resv_map =3D resv_map; diff --git a/fs/inode.c b/fs/inode.c index 62c579a0cf7d..cb67ae83f5b3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -214,6 +214,70 @@ static int no_open(struct inode *inode, struct file *f= ile) return -ENXIO; } =20 +#ifdef CONFIG_SPLIT_I_MMAP +int split_tree_num; +static int split_tree_align __maybe_unused =3D 32; + +static void __init init_split_tree_num(void) +{ +#ifdef CONFIG_NUMA + split_tree_num =3D nr_node_ids; +#else + split_tree_num =3D ALIGN(nr_cpu_ids, split_tree_align); +#endif +} + +static void free_mapping_i_mmap(struct address_space *mapping) +{ + int i; + + if (!mapping->i_mmap) + return; + + for (i =3D 0; i < split_tree_num; i++) + kfree(mapping->i_mmap[i]); + + kfree(mapping->i_mmap); + mapping->i_mmap =3D NULL; +} + +static int init_mapping_i_mmap(struct address_space *mapping, gfp_t gfp) +{ + struct i_mmap_tree *tree; + int i; + + /* The extra one is used as terminator in vma_interval_tree_foreach() */ + mapping->i_mmap =3D kzalloc(sizeof(tree) * (split_tree_num + 1), gfp); + if (!mapping->i_mmap) + return -ENOMEM; + + for (i =3D 0; i < split_tree_num; i++) { + tree =3D kzalloc_node(sizeof(*tree), gfp, i); + if (!tree) + goto nomem; + + tree->root =3D RB_ROOT_CACHED; + init_rwsem(&tree->rwsem); + + mapping->i_mmap[i] =3D tree; + } + return 0; +nomem: + free_mapping_i_mmap(mapping); + return -ENOMEM; +} +#else +static int init_mapping_i_mmap(struct address_space *mapping, gfp_t gfp) +{ + mapping->i_mmap =3D RB_ROOT_CACHED; + init_rwsem(&mapping->i_mmap_rwsem); + return 0; +} + +static void free_mapping_i_mmap(struct address_space *mapping) { } +static void __init init_split_tree_num(void) {} +#endif + /** * inode_init_always_gfp - perform inode structure initialisation * @sb: superblock inode belongs to @@ -302,9 +366,14 @@ int inode_init_always_gfp(struct super_block *sb, stru= ct inode *inode, gfp_t gfp #endif inode->i_flctx =3D NULL; =20 - if (unlikely(security_inode_alloc(inode, gfp))) + if (init_mapping_i_mmap(mapping, gfp)) return -ENOMEM; =20 + if (unlikely(security_inode_alloc(inode, gfp))) { + free_mapping_i_mmap(mapping); + return -ENOMEM; + } + this_cpu_inc(nr_inodes); =20 return 0; @@ -380,6 +449,7 @@ void __destroy_inode(struct inode *inode) if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) posix_acl_release(inode->i_default_acl); #endif + free_mapping_i_mmap(&inode->i_data); this_cpu_dec(nr_inodes); } EXPORT_SYMBOL(__destroy_inode); @@ -480,9 +550,7 @@ EXPORT_SYMBOL(inc_nlink); static void __address_space_init_once(struct address_space *mapping) { xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT); - init_rwsem(&mapping->i_mmap_rwsem); spin_lock_init(&mapping->i_private_lock); - mapping->i_mmap =3D RB_ROOT_CACHED; } =20 void address_space_init_once(struct address_space *mapping) @@ -2619,6 +2687,7 @@ void __init inode_init(void) &i_hash_mask, 0, 0); + init_split_tree_num(); } =20 void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) diff --git a/include/linux/fs.h b/include/linux/fs.h index cd46615b8f53..f4b3645b61df 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -450,6 +450,25 @@ struct mapping_metadata_bhs { struct list_head list; /* The list of bhs (b_assoc_buffers) */ }; =20 +#ifdef CONFIG_SPLIT_I_MMAP +/* + * struct i_mmap_tree - A single sibling tree of the file's split i_mmap. + * @root: The red/black interval tree root. + * @rwsem: Protects insert/remove operations on this sibling tree. + * @vma_count: Number of VMAs in this sibling tree. + * + * When CONFIG_SPLIT_I_MMAP is enabled, the file's single i_mmap tree is + * split into split_tree_num sibling trees, each with its own lock. This + * reduces lock contention by allowing concurrent VMA insert/remove + * operations on different sibling trees. + */ +struct i_mmap_tree { + struct rb_root_cached root; + struct rw_semaphore rwsem; + atomic_t vma_count; +}; +#endif + /** * struct address_space - Contents of a cacheable, mappable object. * @host: Owner, either the inode or the block_device. @@ -461,8 +480,13 @@ struct mapping_metadata_bhs { * @gfp_mask: Memory allocation flags to use for allocating pages. * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings. * @nr_thps: Number of THPs in the pagecache (non-shmem only). - * @i_mmap: Tree of private and shared mappings. - * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable. + * @i_mmap: Tree of private and shared mappings. When CONFIG_SPLIT_I_MMAP + * is enabled, this is an array of split_tree_num struct i_mmap_tree + * pointers (plus a NULL terminator). + * @vma_count: Total number of VMAs across all sibling trees (only when + * CONFIG_SPLIT_I_MMAP is enabled). Used by mapping_mapped(). + * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable (only when + * CONFIG_SPLIT_I_MMAP is disabled; otherwise per-tree rwsem is used). * @nrpages: Number of page entries, protected by the i_pages lock. * @writeback_index: Writeback starts here. * @a_ops: Methods. @@ -480,14 +504,19 @@ struct address_space { /* number of thp, only for non-shmem files */ atomic_t nr_thps; #endif +#ifdef CONFIG_SPLIT_I_MMAP + struct i_mmap_tree **i_mmap; + atomic_t vma_count; +#else struct rb_root_cached i_mmap; + struct rw_semaphore i_mmap_rwsem; +#endif unsigned long nrpages; pgoff_t writeback_index; const struct address_space_operations *a_ops; unsigned long flags; errseq_t wb_err; spinlock_t i_private_lock; - struct rw_semaphore i_mmap_rwsem; } __attribute__((aligned(sizeof(long)))) __randomize_layout; /* * On most architectures that alignment is already the case; but @@ -508,6 +537,133 @@ static inline bool mapping_tagged(const struct addres= s_space *mapping, xa_mark_t return xa_marked(&mapping->i_pages, tag); } =20 +#ifdef CONFIG_SPLIT_I_MMAP +static inline int mapping_mapped(const struct address_space *mapping) +{ + return atomic_read(&mapping->vma_count); +} + +static inline void inc_mapping_vma(struct address_space *mapping, + struct vm_area_struct *vma) +{ + struct i_mmap_tree *tree =3D mapping->i_mmap[vma->tree_idx]; + + atomic_inc(&tree->vma_count); + atomic_inc(&mapping->vma_count); +} + +static inline void dec_mapping_vma(struct address_space *mapping, + struct vm_area_struct *vma) +{ + struct i_mmap_tree *tree =3D mapping->i_mmap[vma->tree_idx]; + + atomic_dec(&tree->vma_count); + atomic_dec(&mapping->vma_count); +} + +static inline struct rb_root_cached *get_i_mmap_root(struct address_space = *mapping) +{ + return (struct rb_root_cached *)mapping->i_mmap; +} + +static inline void i_mmap_tree_lock_write(struct address_space *mapping, + struct vm_area_struct *vma) +{ + struct i_mmap_tree *tree =3D mapping->i_mmap[vma->tree_idx]; + + down_write(&tree->rwsem); +} + +static inline void i_mmap_tree_unlock_write(struct address_space *mapping, + struct vm_area_struct *vma) +{ + struct i_mmap_tree *tree =3D mapping->i_mmap[vma->tree_idx]; + + up_write(&tree->rwsem); +} + +#define i_mmap_lock_write_prepare(mapping) +#define i_mmap_unlock_write_complete(mapping) + +extern int split_tree_num; +static inline void i_mmap_lock_write(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + down_write(&mapping->i_mmap[i]->rwsem); +} + +static inline int i_mmap_trylock_write(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) { + if (!down_write_trylock(&mapping->i_mmap[i]->rwsem)) { + while (i--) + up_write(&mapping->i_mmap[i]->rwsem); + return 0; + } + } + return 1; +} + +static inline void i_mmap_unlock_write(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + up_write(&mapping->i_mmap[i]->rwsem); +} + +static inline int i_mmap_trylock_read(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) { + if (!down_read_trylock(&mapping->i_mmap[i]->rwsem)) { + while (i--) + up_read(&mapping->i_mmap[i]->rwsem); + return 0; + } + } + return 1; +} + +static inline void i_mmap_lock_read(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + down_read(&mapping->i_mmap[i]->rwsem); +} + +static inline void i_mmap_unlock_read(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + up_read(&mapping->i_mmap[i]->rwsem); +} + +static inline void i_mmap_assert_locked(struct address_space *mapping) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + lockdep_assert_held(&mapping->i_mmap[i]->rwsem); +} + +static inline void i_mmap_assert_write_locked(struct address_space *mappin= g) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + lockdep_assert_held_write(&mapping->i_mmap[i]->rwsem); +} + +#else + static inline void i_mmap_lock_write(struct address_space *mapping) { down_write(&mapping->i_mmap_rwsem); @@ -561,6 +717,18 @@ static inline struct rb_root_cached *get_i_mmap_root(s= truct address_space *mappi return &mapping->i_mmap; } =20 +static inline void inc_mapping_vma(struct address_space *mapping, + struct vm_area_struct *vma) { } +static inline void dec_mapping_vma(struct address_space *mapping, + struct vm_area_struct *vma) { } + +#define i_mmap_lock_write_prepare(mapping) i_mmap_lock_write(mapping) +#define i_mmap_unlock_write_complete(mapping) i_mmap_unlock_write(mapping) +#define i_mmap_tree_lock_write(mapping, vma) +#define i_mmap_tree_unlock_write(mapping, vma) + +#endif + /* * Might pages of this file have been modified in userspace? * Note that i_mmap_writable counts all VM_SHARED, VM_MAYWRITE vmas: do_mm= ap diff --git a/include/linux/mm.h b/include/linux/mm.h index 0a45c6a8b9f2..9aa8119fa9bf 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4041,11 +4041,91 @@ struct vm_area_struct *vma_interval_tree_iter_first= (struct rb_root_cached *root, struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *= node, unsigned long start, unsigned long last); =20 +#ifdef CONFIG_SPLIT_I_MMAP +extern int split_tree_num; + +static inline int smallest_tree_idx(struct file *file) +{ + struct address_space *mapping =3D file->f_mapping; + int tmp =3D INT_MAX, count; + int i, j =3D 0; + + /* + * Since a not 100% accurate value is still okay, + * we do not need any lock here. + */ + for (i =3D 0; i < split_tree_num; i++) { + count =3D atomic_read(&mapping->i_mmap[i]->vma_count); + if (count < tmp) { + j =3D i; + tmp =3D count; + if (!tmp) + break; + } + } + return j; +} + +static inline void vma_set_tree_idx(struct vm_area_struct *vma) +{ +#ifdef CONFIG_NUMA + vma->tree_idx =3D numa_node_id(); +#else + vma->tree_idx =3D smallest_tree_idx(vma->vm_file); +#endif +} + +static inline struct rb_root_cached *get_rb_root(struct vm_area_struct *vm= a, + struct address_space *mapping) +{ + return &mapping->i_mmap[vma->tree_idx]->root; +} + +/* Find the first valid VMA in the sibling trees */ +static inline struct vm_area_struct *first_vma(struct i_mmap_tree ***__r, + unsigned long start, unsigned long last) +{ + struct vm_area_struct *vma =3D NULL; + struct i_mmap_tree **tree =3D *__r; + struct rb_root_cached *root; + + while (*tree) { + root =3D &(*tree)->root; + tree++; + vma =3D vma_interval_tree_iter_first(root, start, last); + if (vma) + break; + } + + /* Save for the next loop */ + *__r =3D tree; + return vma; +} + +/* + * Please use get_i_mmap_root() to get the @root. + * @_tmp is referenced to avoid unused variable warning. + */ +#define vma_interval_tree_foreach(vma, root, start, last) \ + for (struct i_mmap_tree **_r =3D (struct i_mmap_tree **)(root), \ + **_tmp =3D (vma =3D first_vma(&_r, start, last)) ? _r : NULL;\ + ((_tmp && vma) || (vma =3D first_vma(&_r, start, last))); \ + vma =3D vma_interval_tree_iter_next(vma, start, last)) +#else /* Please use get_i_mmap_root() to get the @root */ #define vma_interval_tree_foreach(vma, root, start, last) \ for (vma =3D vma_interval_tree_iter_first(root, start, last); \ vma; vma =3D vma_interval_tree_iter_next(vma, start, last)) =20 +static inline void vma_set_tree_idx(struct vm_area_struct *vma) { } + +static inline struct rb_root_cached *get_rb_root(struct vm_area_struct *vm= a, + struct address_space *mapping) +{ + return &mapping->i_mmap; +} +#endif + void anon_vma_interval_tree_insert(struct anon_vma_chain *node, struct rb_root_cached *root); void anon_vma_interval_tree_remove(struct anon_vma_chain *node, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index a308e2c23b82..8d6aab3346ce 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1072,6 +1072,9 @@ struct vm_area_struct { #ifdef __HAVE_PFNMAP_TRACKING struct pfnmap_track_ctx *pfnmap_track_ctx; #endif +#ifdef CONFIG_SPLIT_I_MMAP + int tree_idx; /* The sibling tree index for the VMA */ +#endif } __randomize_layout; =20 /* Clears all bits in the VMA flags bitmap, non-atomically. */ diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..2d35cacffd19 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1888,7 +1888,8 @@ static inline void maybe_rmap_unlock_action(struct vm= _area_struct *vma, =20 VM_WARN_ON_ONCE(vma_is_anonymous(vma)); file =3D vma->vm_file; - i_mmap_unlock_write(file->f_mapping); + i_mmap_tree_unlock_write(file->f_mapping, vma); + i_mmap_unlock_write_complete(file->f_mapping); action->hide_from_rmap_until_complete =3D false; } =20 diff --git a/mm/mmap.c b/mm/mmap.c index d714fdb357e5..70036ec9dcaa 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1825,15 +1825,20 @@ __latent_entropy int dup_mmap(struct mm_struct *mm,= struct mm_struct *oldmm) struct address_space *mapping =3D file->f_mapping; =20 get_file(file); - i_mmap_lock_write(mapping); + i_mmap_lock_write_prepare(mapping); + i_mmap_tree_lock_write(mapping, mpnt); + if (vma_is_shared_maywrite(tmp)) mapping_allow_writable(mapping); flush_dcache_mmap_lock(mapping); /* insert tmp into the share list, just after mpnt */ vma_interval_tree_insert_after(tmp, mpnt, - get_i_mmap_root(mapping)); + get_rb_root(mpnt, mapping)); + inc_mapping_vma(mapping, tmp); flush_dcache_mmap_unlock(mapping); - i_mmap_unlock_write(mapping); + + i_mmap_tree_unlock_write(mapping, mpnt); + i_mmap_unlock_write_complete(mapping); } =20 if (!(tmp->vm_flags & VM_WIPEONFORK)) diff --git a/mm/nommu.c b/mm/nommu.c index 0f18ffc658e9..1f2c60a220f6 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -567,11 +567,16 @@ static void setup_vma_to_mm(struct vm_area_struct *vm= a, struct mm_struct *mm) if (vma->vm_file) { struct address_space *mapping =3D vma->vm_file->f_mapping; =20 - i_mmap_lock_write(mapping); + i_mmap_lock_write_prepare(mapping); + i_mmap_tree_lock_write(mapping, vma); + flush_dcache_mmap_lock(mapping); - vma_interval_tree_insert(vma, get_i_mmap_root(mapping)); + vma_interval_tree_insert(vma, get_rb_root(vma, mapping)); + inc_mapping_vma(mapping, vma); flush_dcache_mmap_unlock(mapping); - i_mmap_unlock_write(mapping); + + i_mmap_tree_unlock_write(mapping, vma); + i_mmap_unlock_write_complete(mapping); } } =20 @@ -583,11 +588,16 @@ static void cleanup_vma_from_mm(struct vm_area_struct= *vma) struct address_space *mapping; mapping =3D vma->vm_file->f_mapping; =20 - i_mmap_lock_write(mapping); + i_mmap_lock_write_prepare(mapping); + i_mmap_tree_lock_write(mapping, vma); + flush_dcache_mmap_lock(mapping); - vma_interval_tree_remove(vma, get_i_mmap_root(mapping)); + vma_interval_tree_remove(vma, get_rb_root(vma, mapping)); + dec_mapping_vma(mapping, vma); flush_dcache_mmap_unlock(mapping); - i_mmap_unlock_write(mapping); + + i_mmap_tree_unlock_write(mapping, vma); + i_mmap_unlock_write_complete(mapping); } } =20 @@ -1063,6 +1073,7 @@ unsigned long do_mmap(struct file *file, if (file) { region->vm_file =3D get_file(file); vma->vm_file =3D get_file(file); + vma_set_tree_idx(vma); } =20 down_write(&nommu_region_sem); diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 8df1b5077951..d5745519d95a 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -809,7 +809,7 @@ int walk_page_mapping(struct address_space *mapping, pg= off_t first_index, if (!check_ops_safe(ops)) return -EINVAL; =20 - lockdep_assert_held(&mapping->i_mmap_rwsem); + i_mmap_assert_locked(mapping); vma_interval_tree_foreach(vma, get_i_mmap_root(mapping), first_index, first_index + nr - 1) { /* Clip to the vma */ diff --git a/mm/vma.c b/mm/vma.c index 6159650c1b42..2055758064a9 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -234,22 +234,23 @@ static void __vma_link_file(struct vm_area_struct *vm= a, mapping_allow_writable(mapping); =20 flush_dcache_mmap_lock(mapping); - vma_interval_tree_insert(vma, get_i_mmap_root(mapping)); + vma_interval_tree_insert(vma, get_rb_root(vma, mapping)); + inc_mapping_vma(mapping, vma); flush_dcache_mmap_unlock(mapping); } =20 -/* - * Requires inode->i_mapping->i_mmap_rwsem - */ static void __remove_shared_vm_struct(struct vm_area_struct *vma, struct address_space *mapping) { + i_mmap_tree_lock_write(mapping, vma); if (vma_is_shared_maywrite(vma)) mapping_unmap_writable(mapping); =20 flush_dcache_mmap_lock(mapping); - vma_interval_tree_remove(vma, get_i_mmap_root(mapping)); + vma_interval_tree_remove(vma, get_rb_root(vma, mapping)); + dec_mapping_vma(mapping, vma); flush_dcache_mmap_unlock(mapping); + i_mmap_tree_unlock_write(mapping, vma); } =20 /* @@ -297,8 +298,9 @@ static void vma_prepare(struct vma_prepare *vp) uprobe_munmap(vp->adj_next, vp->adj_next->vm_start, vp->adj_next->vm_end); =20 - i_mmap_lock_write(vp->mapping); + i_mmap_lock_write_prepare(vp->mapping); if (vp->insert && vp->insert->vm_file) { + i_mmap_tree_lock_write(vp->mapping, vp->insert); /* * Put into interval tree now, so instantiated pages * are visible to arm/parisc __flush_dcache_page @@ -307,6 +309,7 @@ static void vma_prepare(struct vma_prepare *vp) */ __vma_link_file(vp->insert, vp->insert->vm_file->f_mapping); + i_mmap_tree_unlock_write(vp->mapping, vp->insert); } } =20 @@ -318,12 +321,17 @@ static void vma_prepare(struct vma_prepare *vp) } =20 if (vp->file) { + i_mmap_tree_lock_write(vp->mapping, vp->vma); flush_dcache_mmap_lock(vp->mapping); vma_interval_tree_remove(vp->vma, - get_i_mmap_root(vp->mapping)); - if (vp->adj_next) + get_rb_root(vp->vma, vp->mapping)); + dec_mapping_vma(vp->mapping, vp->vma); + if (vp->adj_next) { + i_mmap_tree_lock_write(vp->mapping, vp->adj_next); vma_interval_tree_remove(vp->adj_next, - get_i_mmap_root(vp->mapping)); + get_rb_root(vp->adj_next, vp->mapping)); + dec_mapping_vma(vp->mapping, vp->adj_next); + } } =20 } @@ -340,12 +348,17 @@ static void vma_complete(struct vma_prepare *vp, stru= ct vma_iterator *vmi, struct mm_struct *mm) { if (vp->file) { - if (vp->adj_next) + if (vp->adj_next) { vma_interval_tree_insert(vp->adj_next, - get_i_mmap_root(vp->mapping)); + get_rb_root(vp->adj_next, vp->mapping)); + inc_mapping_vma(vp->mapping, vp->adj_next); + i_mmap_tree_unlock_write(vp->mapping, vp->adj_next); + } vma_interval_tree_insert(vp->vma, - get_i_mmap_root(vp->mapping)); + get_rb_root(vp->vma, vp->mapping)); + inc_mapping_vma(vp->mapping, vp->vma); flush_dcache_mmap_unlock(vp->mapping); + i_mmap_tree_unlock_write(vp->mapping, vp->vma); } =20 if (vp->remove && vp->file) { @@ -370,7 +383,7 @@ static void vma_complete(struct vma_prepare *vp, struct= vma_iterator *vmi, } =20 if (vp->file) { - i_mmap_unlock_write(vp->mapping); + i_mmap_unlock_write_complete(vp->mapping); =20 if (!vp->skip_vma_uprobe) { uprobe_mmap(vp->vma); @@ -1799,12 +1812,12 @@ static void unlink_file_vma_batch_process(struct un= link_vma_file_batch *vb) int i; =20 mapping =3D vb->vmas[0]->vm_file->f_mapping; - i_mmap_lock_write(mapping); + i_mmap_lock_write_prepare(mapping); for (i =3D 0; i < vb->count; i++) { VM_WARN_ON_ONCE(vb->vmas[i]->vm_file->f_mapping !=3D mapping); __remove_shared_vm_struct(vb->vmas[i], mapping); } - i_mmap_unlock_write(mapping); + i_mmap_unlock_write_complete(mapping); =20 unlink_file_vma_batch_init(vb); } @@ -1836,10 +1849,13 @@ static void vma_link_file(struct vm_area_struct *vm= a, bool hold_rmap_lock) =20 if (file) { mapping =3D file->f_mapping; - i_mmap_lock_write(mapping); + i_mmap_lock_write_prepare(mapping); + i_mmap_tree_lock_write(mapping, vma); __vma_link_file(vma, mapping); - if (!hold_rmap_lock) - i_mmap_unlock_write(mapping); + if (!hold_rmap_lock) { + i_mmap_tree_unlock_write(mapping, vma); + i_mmap_unlock_write_complete(mapping); + } } } =20 @@ -2164,6 +2180,23 @@ static void vm_lock_anon_vma(struct mm_struct *mm, s= truct anon_vma *anon_vma) } } =20 +#ifdef CONFIG_SPLIT_I_MMAP +static inline void i_mmap_nest_lock(struct address_space *mapping, + struct rw_semaphore *lock) +{ + int i; + + for (i =3D 0; i < split_tree_num; i++) + down_write_nest_lock(&mapping->i_mmap[i]->rwsem, lock); +} +#else +static inline void i_mmap_nest_lock(struct address_space *mapping, + struct rw_semaphore *lock) +{ + down_write_nest_lock(&mapping->i_mmap_rwsem, lock); +} +#endif + static void vm_lock_mapping(struct mm_struct *mm, struct address_space *ma= pping) { if (!test_bit(AS_MM_ALL_LOCKS, &mapping->flags)) { @@ -2178,7 +2211,7 @@ static void vm_lock_mapping(struct mm_struct *mm, str= uct address_space *mapping) */ if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags)) BUG(); - down_write_nest_lock(&mapping->i_mmap_rwsem, &mm->mmap_lock); + i_mmap_nest_lock(mapping, &mm->mmap_lock); } } =20 @@ -2489,6 +2522,7 @@ static int __mmap_new_file_vma(struct mmap_state *map, int error; =20 vma->vm_file =3D map->file; + vma_set_tree_idx(vma); if (!map->file_doesnt_need_get) get_file(map->file); =20 diff --git a/mm/vma_init.c b/mm/vma_init.c index 3c0b65950510..c115e33d4812 100644 --- a/mm/vma_init.c +++ b/mm/vma_init.c @@ -72,6 +72,9 @@ static void vm_area_init_from(const struct vm_area_struct= *src, #ifdef CONFIG_NUMA dest->vm_policy =3D src->vm_policy; #endif +#ifdef CONFIG_SPLIT_I_MMAP + dest->tree_idx =3D src->tree_idx; +#endif #ifdef __HAVE_PFNMAP_TRACKING dest->pfnmap_track_ctx =3D NULL; #endif --=20 2.53.0 From nobody Mon Jun 15 19:26:53 2026 Received: from mailgw1.hygon.cn (unknown [101.204.27.37]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EA7E32BD58A; Thu, 11 Jun 2026 06:22:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=101.204.27.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781158941; cv=none; b=ucU+pqEWz54sbhNTY6rE559Jv6mydt2BYDqQ9Gglmg07DPOoe58YkxWuK9xPwbl/QdY9Y6YbhJbN4qmAnzag1MkTCWzMFQywrONRWXxDyJeOrAQvkrac2bguJcuZp9Q0L/Ik/94oDiVjkraLAWZYo+2aqT8lPZmGcVcOTsaMMPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781158941; c=relaxed/simple; bh=HpKx7iNOoNn/W9ykpZyJTUq/2x/av0MA7ffeHEjPGko=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RmKGS8aHTmNFo5kwQH0CKxTzHTB4hCEF5AMehAMO7cWI4sj+PPIo/AxM1q/9y/HDf0u5uFmntdAfcCbswJvaQxT3WhO0Leywo+2gxrBlAmaeXlLdeaAbrd4yyAENbKlF5CV913nhVBL0qjosaHKUiYXIi4LTrZX6vdhq7tw1sEo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn; spf=pass smtp.mailfrom=hygon.cn; arc=none smtp.client-ip=101.204.27.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hygon.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hygon.cn Received: from maildlp2.hygon.cn (unknown [127.0.0.1]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXc774mMz1dd8x; Thu, 11 Jun 2026 14:21:59 +0800 (CST) Received: from maildlp2.hygon.cn (unknown [172.23.18.61]) by mailgw1.hygon.cn (Postfix) with ESMTP id 4gbXc66yN9z1dd8p; Thu, 11 Jun 2026 14:21:58 +0800 (CST) Received: from cncheex04.Hygon.cn (unknown [172.23.18.114]) by maildlp2.hygon.cn (Postfix) with ESMTPS id A4BFA30004DB; Thu, 11 Jun 2026 14:20:33 +0800 (CST) Received: from hsj-2U-Workstation.hygon.cn (172.19.20.61) by cncheex04.Hygon.cn (172.23.18.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Thu, 11 Jun 2026 14:21:53 +0800 From: Huang Shijie To: , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Huang Shijie Subject: [PATCH v2 4/4] docs/mm: update document for split i_mmap tree Date: Thu, 11 Jun 2026 14:19:00 +0800 Message-ID: <20260611061915.2354307-5-huangsj@hygon.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260611061915.2354307-1-huangsj@hygon.cn> References: <20260611061915.2354307-1-huangsj@hygon.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: cncheex05.Hygon.cn (172.23.18.115) To cncheex04.Hygon.cn (172.23.18.114) Document the i_mmap locking changes introduced by the following patches: - Use mapping_mapped() to simplify the code - Use get_i_mmap_root() to access the file's i_mmap - Split the file's i_mmap tree (CONFIG_SPLIT_I_MMAP) Add documentation for: - CONFIG_SPLIT_I_MMAP split i_mmap tree architecture with per-tree locks - New per-tree lock helpers: i_mmap_tree_lock_write/unlock_write - New vm_area_struct.tree_idx field for sibling tree selection - Updated i_mmap_lock_read/write semantics acquiring all per-tree locks - Updated lock ordering notes for split tree configuration - Updated page table freeing section for split tree scenario Signed-off-by: Huang Shijie --- Documentation/mm/process_addrs.rst | 63 +++++++++++++++++++++++------- 1 file changed, 49 insertions(+), 14 deletions(-) diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_= addrs.rst index 851680ead45f..4aed3100b249 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -60,6 +60,15 @@ Terminology :c:func:`!i_mmap_[try]lock_write` for file-backed memory. We refer to th= ese locks as the reverse mapping locks, or 'rmap locks' for brevity. =20 + When :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled, the file-backed i_mmap = tree + is split into multiple sibling trees (one per NUMA node or a number base= d on + CPU count), each with its own :c:type:`!struct i_mmap_tree` containing a + red/black interval tree and a :c:type:`!struct rw_semaphore`. In this + configuration, :c:func:`!i_mmap_lock_read` and :c:func:`!i_mmap_lock_wri= te` + acquire all per-tree locks, while VMA insert/remove operations use the + per-tree granularity :c:func:`!i_mmap_tree_lock_write` to lock only the + relevant sibling tree, significantly reducing lock contention. + We discuss page table locks separately in the dedicated section below. =20 The first thing **any** of these locks achieve is to **stabilise** the VMA @@ -230,12 +239,16 @@ These are the core fields which describe the MM the V= MA belongs to and its attri Updated under m= map read lock by :c:func:`!task_= numa_work`. :c:member:`!vm_userfaultfd_ctx` CONFIG_USERFAULTFD Userfaultfd con= text wrapper object of mmap write, - type :c:type:`!= vm_userfaultfd_ctx`, VMA write. - either of zero = size if userfaultfd is - disabled, or co= ntaining a pointer - to an underlying - :c:type:`!userf= aultfd_ctx` object which - describes userf= aultfd metadata. + type :c:type:`= !vm_userfaultfd_ctx`, VMA write. + either of zero= size if userfaultfd is + disabled, or c= ontaining a pointer + to an underlyi= ng + :c:type:`!user= faultfd_ctx` object which + describes user= faultfd metadata. + :c:member:`!tree_idx` CONFIG_SPLIT_I_MMAP The index of th= e sibling i_mmap tree Written once on + that this VMA = belongs to, set at initial map. + VMA creation t= ime based on the NUMA + node or the sm= allest sibling tree. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 These fields are present or not depending on whether the relevant kernel @@ -247,12 +260,18 @@ configuration option is set. Field Description = Write lock =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D :c:member:`!shared.rb` A red/black tree node used, if the = mmap write, VMA write, - mapping is file-backed, to place th= e VMA i_mmap write. - in the - :c:member:`!struct address_space->i= _mmap` - red/black interval tree. + mapping is file-backed, to place t= he VMA i_mmap write (or per-tree + in the = i_mmap write when + :c:member:`!struct address_space->= i_mmap` :c:macro:`!CONFIG_SPLIT_I_MMAP` + red/black interval tree (or one of= the is set). + sibling trees when + :c:macro:`!CONFIG_SPLIT_I_MMAP` + is enabled). :c:member:`!shared.rb_subtree_last` Metadata used for management of the= mmap write, VMA write, - interval tree if the VMA is file-ba= cked. i_mmap write. + interval tree if the VMA is file-b= acked. i_mmap write (or per-tree + = i_mmap write when + = :c:macro:`!CONFIG_SPLIT_I_MMAP` + = is set). :c:member:`!anon_vma_chain` List of pointers to both forked/CoW= =E2=80=99d mmap read, anon_vma write. :c:type:`!anon_vma` objects and :c:member:`!vma->anon_vma` if it is @@ -490,6 +509,16 @@ There is also a file-system specific lock ordering com= ment located at the top of Please check the current state of these comments which may have changed si= nce the time of writing of this document. =20 +.. note:: When :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled, the single + ``mapping->i_mmap_rwsem`` is replaced by an array of per-tree locks + ``mapping->i_mmap[i]->rwsem``. The lock ordering positions of + ``mapping->i_mmap_rwsem`` above apply to each per-tree lock + equivalently. VMA insert/remove operations acquire only the relevant + per-tree lock via :c:func:`!i_mmap_tree_lock_write`, while operations + that require all trees to be locked (such as + :c:func:`!unmap_mapping_range`) acquire all per-tree locks via + :c:func:`!i_mmap_lock_write` or :c:func:`!i_mmap_lock_read`. + ------------------------------ Locking Implementation Details ------------------------------ @@ -704,11 +733,15 @@ traversed or referenced by concurrent tasks. =20 It is insufficient to simply hold an mmap write lock and VMA lock (which w= ill prevent racing faults, and rmap operations), as a file-backed mapping can = be -truncated under the :c:struct:`!struct address_space->i_mmap_rwsem` alone. +truncated under the :c:struct:`!struct address_space->i_mmap_rwsem` alone +(or, when :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled, under all per-tree +``mapping->i_mmap[i]->rwsem`` locks acquired via +:c:func:`!i_mmap_lock_write`). =20 As a result, no VMA which can be accessed via the reverse mapping (either through the :c:struct:`!struct anon_vma->rb_root` or the :c:member:`!struct -address_space->i_mmap` interval trees) can have its page tables torn down. +address_space->i_mmap` interval trees, or the sibling trees when +:c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled) can have its page tables torn = down. =20 The operation is typically performed via :c:func:`!free_pgtables`, which a= ssumes either the mmap write lock has been taken (as specified by its @@ -729,7 +762,9 @@ cleared without page table locks (in the :c:func:`!pgd_= clear`, :c:func:`!p4d_cle .. note:: It is possible for leaf page tables to be torn down independent = of the page tables above it as is done by :c:func:`!retract_page_tables`, which is performed under the i_m= map - read lock, PMD, and PTE page table locks, without this level of = care. + read lock (or all per-tree ``mapping->i_mmap[i]->rwsem`` locks in + read mode when :c:macro:`!CONFIG_SPLIT_I_MMAP` is enabled), PMD,= and + PTE page table locks, without this level of care. =20 Page table moving ^^^^^^^^^^^^^^^^^ --=20 2.53.0