From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.honor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C61233F7A98; Wed, 27 May 2026 11:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880081; cv=none; b=rSwZDNJeSnqIXV4MZV+6nZGjqCmEd0yeyTe4NwBN48jYfH6DsBzf0aUFbr+lF5HFW63QPKj9ygxGaXQYG88hv/Dd3/cV2xS/hQ0NDEgzs5hOelgm94FGJWCGT4kCYoiE0vMx+CNF5ZzUc0YlsnGm2DuNbEHOzevE6O1gAtHqHOA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880081; c=relaxed/simple; bh=epyrz07nrx8yM7rwux7VThuOkAeqUmdOo+DgLUvKfB8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=r/FnTMO3a26PT27GHrnj2THdeRNH2KOqzd/auGbjd+GPGdZr3XrEhrwDitliwHHN04h9IoONdcZy1VWj+wsAlH36pE1I1R5k7jsg4ZRdJTMlGyqXcljp3x1J+M6eC5JHd34diXcfKCbx+vbK7EYFGzmyPHvQMXVtmUpDh18m2xA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=HV/NdIPY; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="HV/NdIPY" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=4K1pyVwRMRi3Tdr5Z8KBrwSA1NH7W9DE8pnYNwT+wLc=; b=HV/NdIPYYiqXFPYqUUK1wrPzxBTWY0GpF9z3107mXb1Gim0j7Zzn+iqUoW4YHQqZCn32kLQsG iEV14jXbmsDMlpKJoeP8BOSnu0H1NblvE3u+XZa0p1VjmUfcJm62M3lFRKZIpnS7kCaxyVQjmB7 Tq+TS9O2SsSJwwZtL7X8Vnk= Received: from TW006-1.hihonor.com (unknown [10.77.215.153]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdX5CZ7zYkxX9; Wed, 27 May 2026 19:06:40 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW006-1.hihonor.com (10.77.215.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:55 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:55 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 01/15] mm/rmap: introduce anon_rmap APIs for anonymous folios Date: Wed, 27 May 2026 19:01:33 +0800 Message-ID: <20260527110147.17815-2-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a set of anon_rmap APIs to operate on the reverse mappings of anonymous folios. Introduce anon_rmap_for_each_vma() as a wrapper around vma_interval_tree_foreach(), so callers no longer access the interval tree directly. This prepares the rmap code for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap traversal. No functional change intended. Signed-off-by: tao --- include/linux/rmap.h | 68 +++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 8dc0871e5f00..c42314ea4362 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -937,6 +937,44 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long= nr_pages, pgoff_t pgoff, void remove_migration_ptes(struct folio *src, struct folio *dst, enum ttu_flags flags); =20 +/* Reverse mapping handle for anonymous folio rmap helpers. */ +typedef struct anon_rmap { + unsigned long rmap; +} anon_rmap_t; + +#define ANON_RMAP_NULL make_anon_rmap(0) + +static inline anon_rmap_t make_anon_rmap(const void *anon_mapping) +{ + return (anon_rmap_t){ .rmap =3D (unsigned long)anon_mapping, }; +} + +static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap) +{ + return anon_rmap.rmap; +} + +static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *ano= n_vma) +{ + return make_anon_rmap(anon_vma); +} + +static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap) +{ + unsigned long rmap =3D anon_rmap_value(anon_rmap); + + return (struct anon_vma *)rmap; +} + +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma); +void put_anon_rmap(anon_rmap_t anon_rmap); +void anon_rmap_lock_write(anon_rmap_t anon_rmap); +int anon_rmap_trylock_write(anon_rmap_t anon_rmap); +void anon_rmap_unlock_write(anon_rmap_t anon_rmap); +void anon_rmap_lock_read(anon_rmap_t anon_rmap); +int anon_rmap_trylock_read(anon_rmap_t anon_rmap); +void anon_rmap_unlock_read(anon_rmap_t anon_rmap); + /* * rmap_walk_control: To control rmap traversing for specific needs * @@ -969,6 +1007,36 @@ void rmap_walk_locked(struct folio *folio, struct rma= p_walk_control *rwc); struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, struct rmap_walk_control *rwc); =20 +bool folio_maybe_same_anon_vma(const struct folio *folio, + const struct vm_area_struct *vma); +anon_rmap_t folio_get_anon_rmap(const struct folio *folio); +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio, + struct rmap_walk_control *rwc); + +static inline struct vm_area_struct *anon_rmap_iter_first_vma( + anon_rmap_t anon_rmap, unsigned long start, unsigned long last, + struct anon_vma_chain **avc) +{ + struct anon_vma *anon_vma =3D anon_rmap_to_anon_vma(anon_rmap); + + *avc =3D anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, las= t); + return *avc ? (*avc)->vma : NULL; +} + +static inline struct vm_area_struct *anon_rmap_iter_next_vma( + anon_rmap_t anon_rmap, unsigned long start, unsigned long last, + struct anon_vma_chain **avc) +{ + if (!*avc) + return NULL; + *avc =3D anon_vma_interval_tree_iter_next(*avc, start, last); + return *avc ? (*avc)->vma : NULL; +} + +#define anon_rmap_foreach_vma(vma, avc, anon_rmap, start, last) \ + for (vma =3D anon_rmap_iter_first_vma(anon_rmap, start, last, &avc); \ + vma; vma =3D anon_rmap_iter_next_vma(anon_rmap, start, last, &avc)) + #else /* !CONFIG_MMU */ =20 #define anon_vma_init() do {} while (0) diff --git a/mm/rmap.c b/mm/rmap.c index 78b7fb5f367c..1b2dada71778 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -701,6 +701,79 @@ struct anon_vma *folio_lock_anon_vma_read(const struct= folio *folio, return anon_vma; } =20 +anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma) +{ + mmap_assert_locked(vma->vm_mm); + VM_BUG_ON(!vma->anon_vma); + get_anon_vma(vma->anon_vma); + return anon_vma_to_anon_rmap(vma->anon_vma); +} + +void put_anon_rmap(anon_rmap_t anon_rmap) +{ + put_anon_vma(anon_rmap_to_anon_vma(anon_rmap)); +} + +void anon_rmap_lock_write(anon_rmap_t anon_rmap) +{ + anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap)); +} + +int anon_rmap_trylock_write(anon_rmap_t anon_rmap) +{ + return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap)); +} + +void anon_rmap_unlock_write(anon_rmap_t anon_rmap) +{ + anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap)); +} + +void anon_rmap_lock_read(anon_rmap_t anon_rmap) +{ + anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap)); +} + +int anon_rmap_trylock_read(anon_rmap_t anon_rmap) +{ + return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap)); +} + +void anon_rmap_unlock_read(anon_rmap_t anon_rmap) +{ + anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap)); +} + +bool folio_maybe_same_anon_vma(const struct folio *folio, + const struct vm_area_struct *vma) +{ + struct anon_vma *anon_vma; + struct anon_vma *tgt_anon_vma =3D vma->anon_vma; + bool same =3D false; + + rcu_read_lock(); + anon_vma =3D folio_anon_vma(folio); + if (anon_vma && tgt_anon_vma) + same =3D anon_vma->root =3D=3D tgt_anon_vma->root; + rcu_read_unlock(); + return same; +} + +anon_rmap_t folio_get_anon_rmap(const struct folio *folio) +{ + struct anon_vma *anon_vma =3D folio_get_anon_vma(folio); + + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; +} + +anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio, + struct rmap_walk_control *rwc) +{ + struct anon_vma *anon_vma =3D folio_lock_anon_vma_read(folio, rwc); + + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; +} + #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 604123F39EA; Wed, 27 May 2026 11:26:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881213; cv=none; b=n0PP3zJSh3oHBn1Fw8v/7xtOp9bilJ3+DlWDVRod/O+0xYmv63u+vO2S3A8zArEgVE2s1ep8DgcOAsBOjUWXwpkXRgKi3i0UXY7QKyn7LGYYNsZau3EEGSkRstmI7ry2LddSetDONfJTas8LNBTJqD1zneo+QwpnjtlGJNgrWYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881213; c=relaxed/simple; bh=BAv6YMhomLWquCtg1+4bK5fvlCc/LM/XONXnFkVyCoU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sw45yji5ShpJRjy/q9vqOqe2LTF7zJG6z0gC2hv1S7KkDHYPhPz2x3avFg6JyXC8ETrLsLD/H0As1FkXudhx6aQaMN3IBHH+TGaxq7AtH9Df27QxWO30v3Z1ztzRdG10/D17rUBQOKF6L/ukjVRlwCSsgXgPnPg68QekkK6KXxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW004-1.hihonor.com (unknown [10.77.232.85]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdL1cH5zYkxhZ; Wed, 27 May 2026 19:06:30 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW004-1.hihonor.com (10.77.232.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:56 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:55 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 02/15] mm: convert anon_vma rmap APIs to anon_rmap Date: Wed, 27 May 2026 19:01:34 +0800 Message-ID: <20260527110147.17815-3-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Convert the rmap anon_vma interfaces to anon_rmap APIs to clarify the semantics of anonymous rmap operations and prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap traversal. Replace folio_anon_vma(), folio_get_anon_vma(), folio_lock_anon_vma_read(), anon_vma_trylock_read(), anon_vma_lock_read(), anon_vma_unlock_read(), anon_vma_trylock_write(), anon_vma_lock_write(), anon_vma_unlock_write(), and vma_interval_tree_foreach() with the anon_rmap APIs. No functional change intended. Signed-off-by: tao --- include/linux/rmap.h | 6 ++-- mm/damon/ops-common.c | 4 +-- mm/huge_memory.c | 16 +++++------ mm/ksm.c | 43 ++++++++++++++--------------- mm/memory-failure.c | 11 ++++---- mm/migrate.c | 64 +++++++++++++++++++++---------------------- mm/page_idle.c | 2 +- mm/rmap.c | 51 ++++++++++++++++++---------------- 8 files changed, 98 insertions(+), 99 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index c42314ea4362..9802bce92695 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -997,15 +997,13 @@ struct rmap_walk_control { bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg); int (*done)(struct folio *folio); - struct anon_vma *(*anon_lock)(const struct folio *folio, - struct rmap_walk_control *rwc); + anon_rmap_t (*anon_lock)(const struct folio *folio, + struct rmap_walk_control *rwc); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; =20 void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); -struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, - struct rmap_walk_control *rwc); =20 bool folio_maybe_same_anon_vma(const struct folio *folio, const struct vm_area_struct *vma); diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c index 8c6d613425c1..5788410965b8 100644 --- a/mm/damon/ops-common.c +++ b/mm/damon/ops-common.c @@ -172,7 +172,7 @@ void damon_folio_mkold(struct folio *folio) { struct rmap_walk_control rwc =3D { .rmap_one =3D damon_folio_mkold_one, - .anon_lock =3D folio_lock_anon_vma_read, + .anon_lock =3D folio_lock_anon_rmap_read, }; =20 if (!folio_mapped(folio) || !folio_raw_mapping(folio)) { @@ -236,7 +236,7 @@ bool damon_folio_young(struct folio *folio) struct rmap_walk_control rwc =3D { .arg =3D &accessed, .rmap_one =3D damon_folio_young_one, - .anon_lock =3D folio_lock_anon_vma_read, + .anon_lock =3D folio_lock_anon_rmap_read, }; =20 if (!folio_mapped(folio) || !folio_raw_mapping(folio)) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 970e077019b7..ab3c2397449a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4051,7 +4051,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, struct folio *end_folio =3D folio_next(folio); bool is_anon =3D folio_test_anon(folio); struct address_space *mapping =3D NULL; - struct anon_vma *anon_vma =3D NULL; + anon_rmap_t anon_rmap =3D ANON_RMAP_NULL; int old_order =3D folio_order(folio); struct folio *new_folio, *next; int nr_shmem_dropped =3D 0; @@ -4087,12 +4087,12 @@ static int __folio_split(struct folio *folio, unsig= ned int new_order, * is taken to serialise against parallel split or collapse * operations. */ - anon_vma =3D folio_get_anon_vma(folio); - if (!anon_vma) { + anon_rmap =3D folio_get_anon_rmap(folio); + if (!anon_rmap_value(anon_rmap)) { ret =3D -EBUSY; goto out; } - anon_vma_lock_write(anon_vma); + anon_rmap_lock_write(anon_rmap); mapping =3D NULL; } else { unsigned int min_order; @@ -4122,7 +4122,7 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, } } =20 - anon_vma =3D NULL; + anon_rmap =3D ANON_RMAP_NULL; i_mmap_lock_read(mapping); =20 /* @@ -4200,9 +4200,9 @@ static int __folio_split(struct folio *folio, unsigne= d int new_order, } =20 out_unlock: - if (anon_vma) { - anon_vma_unlock_write(anon_vma); - put_anon_vma(anon_vma); + if (anon_rmap_value(anon_rmap)) { + anon_rmap_unlock_write(anon_rmap); + put_anon_rmap(anon_rmap); } if (mapping) i_mmap_unlock_read(mapping); diff --git a/mm/ksm.c b/mm/ksm.c index 7d5b76478f0b..f4c204a8a379 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -187,7 +187,7 @@ struct ksm_stable_node { /** * struct ksm_rmap_item - reverse mapping item for virtual addresses * @rmap_list: next rmap_item in mm_slot's singly-linked rmap_list - * @anon_vma: pointer to anon_vma for this mm,address, when in stable tree + * @anon_rmap: anonymous folio rmap for this mm,address, when in stable tr= ee * @nid: NUMA node id of unstable tree in which linked (may not match page) * @mm: the memory structure this rmap_item is pointing into * @address: the virtual address this rmap_item tracks (+ flags in low bit= s) @@ -201,7 +201,7 @@ struct ksm_stable_node { struct ksm_rmap_item { struct ksm_rmap_item *rmap_list; union { - struct anon_vma *anon_vma; /* when stable */ + anon_rmap_t anon_rmap; /* when stable */ #ifdef CONFIG_NUMA int nid; /* when node of unstable tree */ #endif @@ -786,7 +786,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item) * It is not an accident that whenever we want to break COW * to undo, we also need to drop a reference to the anon_vma. */ - put_anon_vma(rmap_item->anon_vma); + put_anon_rmap(rmap_item->anon_rmap); =20 mmap_read_lock(mm); vma =3D find_mergeable_vma(mm, addr); @@ -898,7 +898,7 @@ static void remove_node_from_stable_tree(struct ksm_sta= ble_node *stable_node) =20 VM_BUG_ON(stable_node->rmap_hlist_len <=3D 0); stable_node->rmap_hlist_len--; - put_anon_vma(rmap_item->anon_vma); + put_anon_rmap(rmap_item->anon_rmap); rmap_item->address &=3D PAGE_MASK; cond_resched(); } @@ -1051,7 +1051,7 @@ static void remove_rmap_item_from_tree(struct ksm_rma= p_item *rmap_item) VM_BUG_ON(stable_node->rmap_hlist_len <=3D 0); stable_node->rmap_hlist_len--; =20 - put_anon_vma(rmap_item->anon_vma); + put_anon_rmap(rmap_item->anon_rmap); rmap_item->head =3D NULL; rmap_item->address &=3D PAGE_MASK; =20 @@ -1598,9 +1598,8 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap= _item *rmap_item, /* Unstable nid is in union with stable anon_vma: remove first */ remove_rmap_item_from_tree(rmap_item); =20 - /* Must get reference to anon_vma while still holding mmap_lock */ - rmap_item->anon_vma =3D vma->anon_vma; - get_anon_vma(vma->anon_vma); + /* Must get reference to anon_rmap while still holding mmap_lock */ + rmap_item->anon_rmap =3D vma_get_anon_rmap(vma); out: mmap_read_unlock(mm); trace_ksm_merge_with_ksm_page(kpage, page_to_pfn(kpage ? kpage : page), @@ -3108,7 +3107,6 @@ struct folio *ksm_might_need_to_copy(struct folio *fo= lio, struct vm_area_struct *vma, unsigned long addr) { struct page *page =3D folio_page(folio, 0); - struct anon_vma *anon_vma =3D folio_anon_vma(folio); struct folio *new_folio; =20 if (folio_test_large(folio)) @@ -3118,10 +3116,10 @@ struct folio *ksm_might_need_to_copy(struct folio *= folio, if (folio_stable_node(folio) && !(ksm_run & KSM_RUN_UNMERGE)) return folio; /* no need to copy it */ - } else if (!anon_vma) { + } else if (!folio_test_anon(folio)) { return folio; /* no need to copy it */ } else if (folio->index =3D=3D linear_page_index(vma, addr) && - anon_vma->root =3D=3D vma->anon_vma->root) { + folio_maybe_same_anon_vma(folio, vma)) { return folio; /* still no need to copy it */ } if (PageHWPoison(page)) @@ -3173,20 +3171,20 @@ void rmap_walk_ksm(struct folio *folio, struct rmap= _walk_control *rwc) hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) { /* Ignore the stable/unstable/sqnr flags */ const unsigned long addr =3D rmap_item->address & PAGE_MASK; - struct anon_vma *anon_vma =3D rmap_item->anon_vma; + anon_rmap_t anon_rmap =3D rmap_item->anon_rmap; struct anon_vma_chain *vmac; struct vm_area_struct *vma; =20 cond_resched(); - if (!anon_vma_trylock_read(anon_vma)) { + if (!anon_rmap_trylock_read(anon_rmap)) { if (rwc->try_lock) { rwc->contended =3D true; return; } - anon_vma_lock_read(anon_vma); + anon_rmap_lock_read(anon_rmap); } =20 - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, + anon_rmap_foreach_vma(vma, vmac, anon_rmap, 0, ULONG_MAX) { =20 cond_resched(); @@ -3207,15 +3205,15 @@ void rmap_walk_ksm(struct folio *folio, struct rmap= _walk_control *rwc) continue; =20 if (!rwc->rmap_one(folio, vma, addr, rwc->arg)) { - anon_vma_unlock_read(anon_vma); + anon_rmap_unlock_read(anon_rmap); return; } if (rwc->done && rwc->done(folio)) { - anon_vma_unlock_read(anon_vma); + anon_rmap_unlock_read(anon_rmap); return; } } - anon_vma_unlock_read(anon_vma); + anon_rmap_unlock_read(anon_rmap); } if (!search_new_forks++) goto again; @@ -3237,9 +3235,9 @@ void collect_procs_ksm(const struct folio *folio, con= st struct page *page, if (!stable_node) return; hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) { - struct anon_vma *av =3D rmap_item->anon_vma; + anon_rmap_t anon_rmap =3D rmap_item->anon_rmap; =20 - anon_vma_lock_read(av); + anon_rmap_lock_read(anon_rmap); rcu_read_lock(); for_each_process(tsk) { struct anon_vma_chain *vmac; @@ -3248,10 +3246,9 @@ void collect_procs_ksm(const struct folio *folio, co= nst struct page *page, task_early_kill(tsk, force_early); if (!t) continue; - anon_vma_interval_tree_foreach(vmac, &av->rb_root, 0, + anon_rmap_foreach_vma(vma, vmac, anon_rmap, 0, ULONG_MAX) { - vma =3D vmac->vma; if (vma->vm_mm =3D=3D t->mm) { addr =3D rmap_item->address & PAGE_MASK; add_to_kill_ksm(t, page, vma, to_kill, @@ -3260,7 +3257,7 @@ void collect_procs_ksm(const struct folio *folio, con= st struct page *page, } } rcu_read_unlock(); - anon_vma_unlock_read(av); + anon_rmap_unlock_read(anon_rmap); } } #endif diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ee42d4361309..bc9abba75b5d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -547,11 +547,11 @@ static void collect_procs_anon(const struct folio *fo= lio, int force_early) { struct task_struct *tsk; - struct anon_vma *av; + anon_rmap_t anon_rmap; pgoff_t pgoff; =20 - av =3D folio_lock_anon_vma_read(folio, NULL); - if (av =3D=3D NULL) /* Not actually mapped anymore */ + anon_rmap =3D folio_lock_anon_rmap_read(folio, NULL); + if (!anon_rmap_value(anon_rmap)) /* Not actually mapped anymore */ return; =20 pgoff =3D page_pgoff(folio, page); @@ -564,9 +564,8 @@ static void collect_procs_anon(const struct folio *foli= o, =20 if (!t) continue; - anon_vma_interval_tree_foreach(vmac, &av->rb_root, + anon_rmap_foreach_vma(vma, vmac, anon_rmap, pgoff, pgoff) { - vma =3D vmac->vma; if (vma->vm_mm !=3D t->mm) continue; addr =3D page_mapped_in_vma(page, vma); @@ -574,7 +573,7 @@ static void collect_procs_anon(const struct folio *foli= o, } } rcu_read_unlock(); - anon_vma_unlock_read(av); + anon_rmap_unlock_read(anon_rmap); } =20 /* diff --git a/mm/migrate.c b/mm/migrate.c index 8a64291ab5b4..769983cf14e0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1142,18 +1142,18 @@ enum { =20 static void __migrate_folio_record(struct folio *dst, int old_page_state, - struct anon_vma *anon_vma) + anon_rmap_t anon_rmap) { - dst->private =3D (void *)anon_vma + old_page_state; + dst->private =3D (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_stat= e; } =20 static void __migrate_folio_extract(struct folio *dst, int *old_page_state, - struct anon_vma **anon_vmap) + anon_rmap_t *anon_rmapp) { unsigned long private =3D (unsigned long)dst->private; =20 - *anon_vmap =3D (struct anon_vma *)(private & ~PAGE_OLD_STATES); + *anon_rmapp =3D anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES= )); *old_page_state =3D private & PAGE_OLD_STATES; dst->private =3D NULL; } @@ -1161,15 +1161,15 @@ static void __migrate_folio_extract(struct folio *d= st, /* Restore the source folio to the original state upon failure */ static void migrate_folio_undo_src(struct folio *src, int page_was_mapped, - struct anon_vma *anon_vma, + anon_rmap_t anon_rmap, bool locked, struct list_head *ret) { if (page_was_mapped) remove_migration_ptes(src, src, 0); - /* Drop an anon_vma reference if we took one */ - if (anon_vma) - put_anon_vma(anon_vma); + /* Drop an anon_rmap reference if we took one */ + if (anon_rmap_value(anon_rmap)) + put_anon_rmap(anon_rmap); if (locked) folio_unlock(src); if (ret) @@ -1210,7 +1210,7 @@ static int migrate_folio_unmap(new_folio_t get_new_fo= lio, struct folio *dst; int rc =3D -EAGAIN; int old_page_state =3D 0; - struct anon_vma *anon_vma =3D NULL; + anon_rmap_t anon_rmap =3D ANON_RMAP_NULL; bool locked =3D false; bool dst_locked =3D false; =20 @@ -1275,19 +1275,19 @@ static int migrate_folio_unmap(new_folio_t get_new_= folio, /* * By try_to_migrate(), src->mapcount goes down to 0 here. In this case, * we cannot notice that anon_vma is freed while we migrate a page. - * This get_anon_vma() delays freeing anon_vma pointer until the end + * This get_anon_rmap() delays freeing anon_rmap pointer until the end * of migration. File cache pages are no problem because of page_lock() * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. * - * Only folio_get_anon_vma() understands the subtleties of - * getting a hold on an anon_vma from outside one of its mms. - * But if we cannot get anon_vma, then we won't need it anyway, + * Only folio_get_anon_rmap() understands the subtleties of + * getting a hold on an anon_rmap from outside one of its mms. + * But if we cannot get anon_rmap, then we won't need it anyway, * because that implies that the anon page is no longer mapped * (and cannot be remapped so long as we hold the page lock). */ if (folio_test_anon(src) && !folio_test_ksm(src)) - anon_vma =3D folio_get_anon_vma(src); + anon_rmap =3D folio_get_anon_rmap(src); =20 /* * Block others from accessing the new page when we get around to @@ -1302,7 +1302,7 @@ static int migrate_folio_unmap(new_folio_t get_new_fo= lio, dst_locked =3D true; =20 if (unlikely(page_has_movable_ops(&src->page))) { - __migrate_folio_record(dst, old_page_state, anon_vma); + __migrate_folio_record(dst, old_page_state, anon_rmap); return 0; } =20 @@ -1326,13 +1326,13 @@ static int migrate_folio_unmap(new_folio_t get_new_= folio, } else if (folio_mapped(src)) { /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && - !folio_test_ksm(src) && !anon_vma, src); + !folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src); try_to_migrate(src, mode =3D=3D MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); old_page_state |=3D PAGE_WAS_MAPPED; } =20 if (!folio_mapped(src)) { - __migrate_folio_record(dst, old_page_state, anon_vma); + __migrate_folio_record(dst, old_page_state, anon_rmap); return 0; } =20 @@ -1345,7 +1345,7 @@ static int migrate_folio_unmap(new_folio_t get_new_fo= lio, ret =3D NULL; =20 migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, - anon_vma, locked, ret); + anon_rmap, locked, ret); migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private); =20 return rc; @@ -1359,12 +1359,12 @@ static int migrate_folio_move(free_folio_t put_new_= folio, unsigned long private, { int rc; int old_page_state =3D 0; - struct anon_vma *anon_vma =3D NULL; + anon_rmap_t anon_rmap =3D ANON_RMAP_NULL; bool src_deferred_split =3D false; bool src_partially_mapped =3D false; struct list_head *prev; =20 - __migrate_folio_extract(dst, &old_page_state, &anon_vma); + __migrate_folio_extract(dst, &old_page_state, &anon_rmap); prev =3D dst->lru.prev; list_del(&dst->lru); =20 @@ -1425,9 +1425,9 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, * and will be freed. */ list_del(&src->lru); - /* Drop an anon_vma reference if we took one */ - if (anon_vma) - put_anon_vma(anon_vma); + /* Drop an anon_rmap reference if we took one */ + if (anon_rmap_value(anon_rmap)) + put_anon_rmap(anon_rmap); folio_unlock(src); migrate_folio_done(src, reason); =20 @@ -1439,12 +1439,12 @@ static int migrate_folio_move(free_folio_t put_new_= folio, unsigned long private, */ if (rc =3D=3D -EAGAIN) { list_add(&dst->lru, prev); - __migrate_folio_record(dst, old_page_state, anon_vma); + __migrate_folio_record(dst, old_page_state, anon_rmap); return rc; } =20 migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, - anon_vma, true, ret); + anon_rmap, true, ret); migrate_folio_undo_dst(dst, true, put_new_folio, private); =20 return rc; @@ -1476,7 +1476,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, struct folio *dst; int rc =3D -EAGAIN; int page_was_mapped =3D 0; - struct anon_vma *anon_vma =3D NULL; + anon_rmap_t anon_rmap =3D ANON_RMAP_NULL; struct address_space *mapping =3D NULL; enum ttu_flags ttu =3D 0; =20 @@ -1513,7 +1513,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, } =20 if (folio_test_anon(src)) - anon_vma =3D folio_get_anon_vma(src); + anon_rmap =3D folio_get_anon_rmap(src); =20 if (unlikely(!folio_trylock(dst))) goto put_anon; @@ -1550,8 +1550,8 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, folio_unlock(dst); =20 put_anon: - if (anon_vma) - put_anon_vma(anon_vma); + if (anon_rmap_value(anon_rmap)) + put_anon_rmap(anon_rmap); =20 if (!rc) { move_hugetlb_state(src, dst, reason); @@ -1778,11 +1778,11 @@ static void migrate_folios_undo(struct list_head *s= rc_folios, dst2 =3D list_next_entry(dst, lru); list_for_each_entry_safe(folio, folio2, src_folios, lru) { int old_page_state =3D 0; - struct anon_vma *anon_vma =3D NULL; + anon_rmap_t anon_rmap =3D ANON_RMAP_NULL; =20 - __migrate_folio_extract(dst, &old_page_state, &anon_vma); + __migrate_folio_extract(dst, &old_page_state, &anon_rmap); migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED, - anon_vma, true, ret_folios); + anon_rmap, true, ret_folios); list_del(&dst->lru); migrate_folio_undo_dst(dst, true, put_new_folio, private); dst =3D dst2; diff --git a/mm/page_idle.c b/mm/page_idle.c index 9c67cbac2965..d4103f20f526 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -102,7 +102,7 @@ static void page_idle_clear_pte_refs(struct folio *foli= o) */ static struct rmap_walk_control rwc =3D { .rmap_one =3D page_idle_clear_pte_refs_one, - .anon_lock =3D folio_lock_anon_vma_read, + .anon_lock =3D folio_lock_anon_rmap_read, }; =20 if (!folio_mapped(folio) || !folio_raw_mapping(folio)) diff --git a/mm/rmap.c b/mm/rmap.c index 1b2dada71778..41607168e00e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -630,8 +630,8 @@ struct anon_vma *folio_get_anon_vma(const struct folio = *folio) * reference like with folio_get_anon_vma() and then block on the mutex * on !rwc->try_lock case. */ -struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, - struct rmap_walk_control *rwc) +static struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, + struct rmap_walk_control *rwc) { struct anon_vma *anon_vma =3D NULL; struct anon_vma *root_anon_vma; @@ -744,6 +744,14 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap) anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap)); } =20 +static anon_rmap_t folio_anon_rmap(const struct folio *folio) +{ + struct anon_vma *anon_vma; + + anon_vma =3D folio_anon_vma(folio); + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; +} + bool folio_maybe_same_anon_vma(const struct folio *folio, const struct vm_area_struct *vma) { @@ -930,13 +938,11 @@ unsigned long page_address_in_vma(const struct folio = *folio, const struct page *page, const struct vm_area_struct *vma) { if (folio_test_anon(folio)) { - struct anon_vma *anon_vma =3D folio_anon_vma(folio); /* * Note: swapoff's unuse_vma() is more efficient with this * check, and needs it to match anon_vma when KSM is active. */ - if (!vma->anon_vma || !anon_vma || - vma->anon_vma->root !=3D anon_vma->root) + if (!vma->anon_vma || !folio_maybe_same_anon_vma(folio, vma)) return -EFAULT; } else if (!vma->vm_file) { return -EFAULT; @@ -944,7 +950,7 @@ unsigned long page_address_in_vma(const struct folio *f= olio, return -EFAULT; } =20 - /* KSM folios don't reach here because of the !anon_vma check */ + /* The !folio_maybe_same_anon_vma() above handles KSM folios */ return vma_address(vma, page_pgoff(folio, page), 1); } =20 @@ -1145,7 +1151,7 @@ int folio_referenced(struct folio *folio, int is_lock= ed, struct rmap_walk_control rwc =3D { .rmap_one =3D folio_referenced_one, .arg =3D (void *)&pra, - .anon_lock =3D folio_lock_anon_vma_read, + .anon_lock =3D folio_lock_anon_rmap_read, .try_lock =3D true, .invalid_vma =3D invalid_folio_referenced_vma, }; @@ -1580,8 +1586,7 @@ static void __page_check_anon_rmap(const struct folio= *folio, * are initially only visible via the pagetables, and the pte is locked * over the call to folio_add_new_anon_rmap. */ - VM_BUG_ON_FOLIO(folio_anon_vma(folio)->root !=3D vma->anon_vma->root, - folio); + VM_BUG_ON_FOLIO(!folio_maybe_same_anon_vma(folio, vma), folio); VM_BUG_ON_PAGE(page_pgoff(folio, page) !=3D linear_page_index(vma, addres= s), page); } @@ -2468,7 +2473,7 @@ void try_to_unmap(struct folio *folio, enum ttu_flags= flags) .rmap_one =3D try_to_unmap_one, .arg =3D (void *)flags, .done =3D folio_not_mapped, - .anon_lock =3D folio_lock_anon_vma_read, + .anon_lock =3D folio_lock_anon_rmap_read, }; =20 if (flags & TTU_RMAP_LOCKED) @@ -2813,7 +2818,7 @@ void try_to_migrate(struct folio *folio, enum ttu_fla= gs flags) .rmap_one =3D try_to_migrate_one, .arg =3D (void *)flags, .done =3D folio_not_mapped, - .anon_lock =3D folio_lock_anon_vma_read, + .anon_lock =3D folio_lock_anon_rmap_read, }; =20 /* @@ -2990,8 +2995,8 @@ void __put_anon_vma(struct anon_vma *anon_vma) anon_vma_free(root); } =20 -static struct anon_vma *rmap_walk_anon_lock(const struct folio *folio, - struct rmap_walk_control *rwc) +static anon_rmap_t rmap_walk_anon_lock(const struct folio *folio, + struct rmap_walk_control *rwc) { struct anon_vma *anon_vma; =20 @@ -3006,7 +3011,7 @@ static struct anon_vma *rmap_walk_anon_lock(const str= uct folio *folio, */ anon_vma =3D folio_anon_vma(folio); if (!anon_vma) - return NULL; + return ANON_RMAP_NULL; =20 if (anon_vma_trylock_read(anon_vma)) goto out; @@ -3019,7 +3024,7 @@ static struct anon_vma *rmap_walk_anon_lock(const str= uct folio *folio, =20 anon_vma_lock_read(anon_vma); out: - return anon_vma; + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; } =20 /* @@ -3035,9 +3040,10 @@ static struct anon_vma *rmap_walk_anon_lock(const st= ruct folio *folio, static void rmap_walk_anon(struct folio *folio, struct rmap_walk_control *rwc, bool locked) { - struct anon_vma *anon_vma; + anon_rmap_t anon_rmap; pgoff_t pgoff_start, pgoff_end; struct anon_vma_chain *avc; + struct vm_area_struct *vma; =20 /* * The folio lock ensures that folio->mapping can't be changed under us @@ -3046,20 +3052,19 @@ static void rmap_walk_anon(struct folio *folio, VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); =20 if (locked) { - anon_vma =3D folio_anon_vma(folio); + anon_rmap =3D folio_anon_rmap(folio); /* anon_vma disappear under us? */ - VM_BUG_ON_FOLIO(!anon_vma, folio); + VM_BUG_ON_FOLIO(!anon_rmap_value(anon_rmap), folio); } else { - anon_vma =3D rmap_walk_anon_lock(folio, rwc); + anon_rmap =3D rmap_walk_anon_lock(folio, rwc); } - if (!anon_vma) + if (!anon_rmap_value(anon_rmap)) return; =20 pgoff_start =3D folio_pgoff(folio); pgoff_end =3D pgoff_start + folio_nr_pages(folio) - 1; - anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, + anon_rmap_foreach_vma(vma, avc, anon_rmap, pgoff_start, pgoff_end) { - struct vm_area_struct *vma =3D avc->vma; unsigned long address =3D vma_address(vma, pgoff_start, folio_nr_pages(folio)); =20 @@ -3076,7 +3081,7 @@ static void rmap_walk_anon(struct folio *folio, } =20 if (!locked) - anon_vma_unlock_read(anon_vma); + anon_rmap_unlock_read(anon_rmap); } =20 /** --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C4893EEAE3; Wed, 27 May 2026 11:26:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881213; cv=none; b=FScyZzAFC3zpZhrrG02s0fENF27W/7l3MGHznvaF4cM25Hx2CS5ctJ+Lh6IgbcdtwMoPbtvkEMFsHPA1ONP5syBwPp9lVlL6IZTKPtrb0PETccjlhjOc+ggwcqXdsyOb+n4MR5bEiMpRFheSWT8HVG7AoHhP/TSqQvOL1dcnZB4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881213; c=relaxed/simple; bh=sFTjpTy9jS1J6p+gZVpzAXiVY7b1vJQet9M2pL8BGtk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=I2iicwBY6BOyqpvzfH9Rl5WRxB+ujV+F8aozjYJHu53xuLkr/oZYwN0n5HRB//1eANwP3UuF2+SZWQnk+Qbwa6g4PnrfjScGwz8Luha/73VFoPtDFDCfTRlTqqn0JnC6QtXFeQEzmvxi+5iWiHC1KijJzKbR2YijMRczNJd+6b4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW002-1.hihonor.com (unknown [10.72.0.137]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdL5rLWzYkxQN; Wed, 27 May 2026 19:06:30 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW002-1.hihonor.com (10.72.0.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:57 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:56 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 03/15] mm: introduce anon_vma_tree_t for multiple anon_vma topologies Date: Wed, 27 May 2026 19:01:35 +0800 Message-ID: <20260527110147.17815-4-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Prepare for upcoming ANON_VMA_LAZY support and RCU-based lockless rmap traversal by clearly separating anon_vma topology handling from the anon_rmap semantics. Prepare for supporting multiple anon_vma topologies by introducing lightweight abstractions used by the VMA and rmap code. Introduce anon_vma_tree_t as the type stored in vma->anon_vma: typedef unsigned long anon_vma_tree_t; It represents a tagged pointer encoding a reference to the anon_vma topology. The low bits are reserved as type tags to distinguish different implementations (e.g. regular anon_vma and lazy anon_vma). This keeps the VMA representation compact while allowing the topology to evolve without changing the VMA layout. Signed-off-by: tao --- include/linux/mm_types.h | 3 +++ mm/internal.h | 54 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index a308e2c23b82..5f4961ea1572 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -917,6 +917,9 @@ struct vm_area_desc { struct mmap_action action; }; =20 +/* Tagged pointer stored in vma->anon_vma. Low bits encode anon_vma type. = */ +typedef unsigned long anon_vma_tree_t; + /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..76544ad44ff0 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -246,6 +246,60 @@ static inline void anon_vma_unlock_read(struct anon_vm= a *anon_vma) up_read(&anon_vma->root->rwsem); } =20 +/* anon_vma_tree_t APIs */ + +static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma) +{ + return (anon_vma_tree_t)anon_vma; +} + +static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon= _tree) +{ + return (struct anon_vma *)anon_tree; +} + +static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree) +{ + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + anon_vma_lock_write(anon_vma); +} + +static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree) +{ + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + return anon_vma_trylock_write(anon_vma); +} + +static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree) +{ + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + anon_vma_unlock_write(anon_vma); +} + +static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree) +{ + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + anon_vma_lock_read(anon_vma); +} + +static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree) +{ + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + return anon_vma_trylock_read(anon_vma); +} + +static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree) +{ + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + anon_vma_unlock_read(anon_vma); +} + struct anon_vma *folio_get_anon_vma(const struct folio *folio); =20 /* Operations which modify VMAs. */ --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6025A3F075B; Wed, 27 May 2026 11:26:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881213; cv=none; b=YYfRsUZDnhYSTV/JlV1mJb0jxzSqbBhZ9yzA3ad+k8OdzH0TK2ffOpAuq2ddx81PPQpZgC6eX5nh6YlOJLDa/Q6HhfSxyiiy0Y09OCiuqFwXKz9qM8SImD1EI2z3vestczjm+qcRUxlxtXbMYOHAJrpxzwnLrOaklaoEObgNOnc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881213; c=relaxed/simple; bh=l/LI32jA9KJ8M+Gj01cVsCx0yTnOPECYaO11LsMf2iU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qeThBB4KpVis9sjK3svgVFJcY3ieNA4FXC64oh7jl3mBZ3w3dVOXsOw7nYnzEFienGEpJCHbSHkQPm0vW7OQbg6pnvb9jZfu00SclpHIRJtdkhDHRIBvqwbxBgNYmXr6g6YSSsNDjvB0pea7GScazDiMu4tGT0sbt70KrFnCpU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW005.hihonor.com (unknown [10.72.0.123]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdM3RRSzYkyXX; Wed, 27 May 2026 19:06:31 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW005.hihonor.com (10.72.0.123) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:57 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:56 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 04/15] mm: switch to anon_vma_tree_t APIs in preparation for ANON_VMA_LAZY Date: Wed, 27 May 2026 19:01:36 +0800 Message-ID: <20260527110147.17815-5-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replace direct anon_vma usage with anon_vma_tree_t APIs. This prepares for ANON_VMA_LAZY and prevents external modules from accessing anon_vma directly. Signed-off-by: tao --- include/linux/mm_types.h | 2 +- mm/debug.c | 2 +- mm/internal.h | 16 +++++++++++ mm/khugepaged.c | 8 +++--- mm/memory.c | 2 +- mm/mmap.c | 2 +- mm/mremap.c | 4 +-- mm/rmap.c | 59 ++++++++++++++++++++++------------------ mm/vma.c | 26 +++++++++--------- mm/vma.h | 4 +-- 10 files changed, 73 insertions(+), 52 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5f4961ea1572..e7f5debac98e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -987,7 +987,7 @@ struct vm_area_struct { */ struct list_head anon_vma_chain; /* Serialized by mmap_lock & * page_table_lock */ - struct anon_vma *anon_vma; /* Serialized by page_table_lock */ + anon_vma_tree_t anon_vma; /* Serialized by page_table_lock */ =20 /* Function pointers to deal with this struct. */ const struct vm_operations_struct *vm_ops; diff --git a/mm/debug.c b/mm/debug.c index 77fa8fe1d641..f64cf9c9abbb 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -163,7 +163,7 @@ void dump_vma(const struct vm_area_struct *vma) "flags: %#lx(%pGv)\n", vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm, (unsigned long)pgprot_val(vma->vm_page_prot), - vma->anon_vma, vma->vm_ops, vma->vm_pgoff, + (void *)vma->anon_vma, vma->vm_ops, vma->vm_pgoff, vma->vm_file, vma->vm_private_data, #ifdef CONFIG_PER_VMA_LOCK refcount_read(&vma->vm_refcnt), diff --git a/mm/internal.h b/mm/internal.h index 76544ad44ff0..3dbbd118a78c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -258,6 +258,22 @@ static inline struct anon_vma *anon_vma_tree_anon_vma(= anon_vma_tree_t anon_tree) return (struct anon_vma *)anon_tree; } =20 +/* Store anon_vma in vma->anon_vma using a tagged pointer. */ +static inline void vma_set_anon_vma(struct vm_area_struct *vma, + struct anon_vma *anon_vma) +{ + vma->anon_vma =3D (anon_vma_tree_t)anon_vma; +} + +/* Return the VMA's anon_vma. */ +static inline struct anon_vma *vma_anon_vma(const struct vm_area_struct *v= ma) +{ + /* Use READ_ONCE() for reusable_anon_vma */ + anon_vma_tree_t anon_tree =3D READ_ONCE(vma->anon_vma); + + return anon_vma_tree_anon_vma(anon_tree); +} + static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b8452dbdb043..747748eace91 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -761,7 +761,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Re-establish the PMD to point to the original page table * entry. Restoring PMD needs to be done prior to releasing * pages. Since pages are still isolated and locked here, - * acquiring anon_vma_lock_write is unnecessary. + * acquiring anon_vma_tree_lock_write is unnecessary. */ pmd_ptl =3D pmd_lock(vma->vm_mm, pmd); pmd_populate(vma->vm_mm, pmd, pmd_pgtable(orig_pmd)); @@ -1164,7 +1164,7 @@ static enum scan_result collapse_huge_page(struct mm_= struct *mm, unsigned long a if (result !=3D SCAN_SUCCEED) goto out_up_write; =20 - anon_vma_lock_write(vma->anon_vma); + anon_vma_tree_lock_write(vma->anon_vma); =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, address + HPAGE_PMD_SIZE); @@ -1205,7 +1205,7 @@ static enum scan_result collapse_huge_page(struct mm_= struct *mm, unsigned long a */ pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); - anon_vma_unlock_write(vma->anon_vma); + anon_vma_tree_unlock_write(vma->anon_vma); goto out_up_write; } =20 @@ -1213,7 +1213,7 @@ static enum scan_result collapse_huge_page(struct mm_= struct *mm, unsigned long a * All pages are isolated and locked so anon_vma rmap * can't run anymore. */ - anon_vma_unlock_write(vma->anon_vma); + anon_vma_tree_unlock_write(vma->anon_vma); =20 result =3D __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, diff --git a/mm/memory.c b/mm/memory.c index 86a973119bd4..c13b79987b26 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -602,7 +602,7 @@ static void print_bad_page_map(struct vm_area_struct *v= ma, if (page) dump_page(page, "bad page map"); pr_alert("addr:%px vm_flags:%08lx anon_vma:%px mapping:%px index:%lx\n", - (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index); + (void *)addr, vma->vm_flags, (void *)vma->anon_vma, mapping, index); pr_alert("file:%pD fault:%ps mmap:%ps mmap_prepare: %ps read_folio:%ps\n", vma->vm_file, vma->vm_ops ? vma->vm_ops->fault : NULL, diff --git a/mm/mmap.c b/mm/mmap.c index 5754d1c36462..eac1fb3823eb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1799,7 +1799,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, s= truct mm_struct *oldmm) * Don't prepare anon_vma until fault since we don't * copy page for current vma. */ - tmp->anon_vma =3D NULL; + vma_set_anon_vma(tmp, NULL); } else if (anon_vma_fork(tmp, mpnt)) goto fail_nomem_anon_vma_fork; vm_flags_clear(tmp, VM_LOCKED_MASK); diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832..6af41e58f79f 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -145,13 +145,13 @@ static void take_rmap_locks(struct vm_area_struct *vm= a) if (vma->vm_file) i_mmap_lock_write(vma->vm_file->f_mapping); if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); + anon_vma_tree_lock_write(vma->anon_vma); } =20 static void drop_rmap_locks(struct vm_area_struct *vma) { if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); + anon_vma_tree_unlock_write(vma->anon_vma); if (vma->vm_file) i_mmap_unlock_write(vma->vm_file->f_mapping); } diff --git a/mm/rmap.c b/mm/rmap.c index 41607168e00e..5c4eb090c801 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -186,6 +186,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma) { struct mm_struct *mm =3D vma->vm_mm; struct anon_vma *anon_vma, *allocated; + anon_vma_tree_t anon_tree; struct anon_vma_chain *avc; =20 mmap_assert_locked(mm); @@ -205,11 +206,12 @@ int __anon_vma_prepare(struct vm_area_struct *vma) allocated =3D anon_vma; } =20 - anon_vma_lock_write(anon_vma); + anon_tree =3D make_anon_vma_tree(anon_vma); + anon_vma_tree_lock_write(anon_tree); /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { - vma->anon_vma =3D anon_vma; + vma->anon_vma =3D anon_tree; anon_vma_chain_assign(vma, avc, anon_vma); anon_vma_interval_tree_insert(avc, &anon_vma->rb_root); anon_vma->num_active_vmas++; @@ -217,7 +219,7 @@ int __anon_vma_prepare(struct vm_area_struct *vma) avc =3D NULL; } spin_unlock(&mm->page_table_lock); - anon_vma_unlock_write(anon_vma); + anon_vma_tree_unlock_write(anon_tree); =20 if (unlikely(allocated)) put_anon_vma(allocated); @@ -283,7 +285,7 @@ static void maybe_reuse_anon_vma(struct vm_area_struct = *dst, if (anon_vma->num_children > 1) return; =20 - dst->anon_vma =3D anon_vma; + vma_set_anon_vma(dst, anon_vma); anon_vma->num_active_vmas++; } =20 @@ -321,11 +323,11 @@ int anon_vma_clone(struct vm_area_struct *dst, struct= vm_area_struct *src, enum vma_operation operation) { struct anon_vma_chain *avc, *pavc; - struct anon_vma *active_anon_vma =3D src->anon_vma; + anon_vma_tree_t active_anon_tree =3D src->anon_vma; =20 check_anon_vma_clone(dst, src, operation); =20 - if (!active_anon_vma) + if (!active_anon_tree) return 0; =20 /* @@ -350,7 +352,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct v= m_area_struct *src, * Now link the anon_vma's back to the newly inserted AVCs. * Note that all anon_vma's share the same root. */ - anon_vma_lock_write(src->anon_vma); + anon_vma_tree_lock_write(active_anon_tree); list_for_each_entry_reverse(avc, &dst->anon_vma_chain, same_vma) { struct anon_vma *anon_vma =3D avc->anon_vma; =20 @@ -360,9 +362,9 @@ int anon_vma_clone(struct vm_area_struct *dst, struct v= m_area_struct *src, } =20 if (operation !=3D VMA_OP_FORK) - dst->anon_vma->num_active_vmas++; + vma_anon_vma(dst)->num_active_vmas++; =20 - anon_vma_unlock_write(active_anon_vma); + anon_vma_tree_unlock_write(active_anon_tree); return 0; =20 enomem_failure: @@ -379,6 +381,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm= _area_struct *pvma) { struct anon_vma_chain *avc; struct anon_vma *anon_vma; + anon_vma_tree_t anon_tree; int rc; =20 /* Don't bother if the parent process has no anon_vma here. */ @@ -386,7 +389,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm= _area_struct *pvma) return 0; =20 /* Drop inherited anon_vma, we'll reuse existing or allocate new. */ - vma->anon_vma =3D NULL; + vma_set_anon_vma(vma, NULL); =20 anon_vma =3D anon_vma_alloc(); if (!anon_vma) @@ -421,8 +424,8 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm= _area_struct *pvma) * The root anon_vma's rwsem is the lock actually used when we * lock any of the anon_vmas in this anon_vma tree. */ - anon_vma->root =3D pvma->anon_vma->root; - anon_vma->parent =3D pvma->anon_vma; + anon_vma->parent =3D vma_anon_vma(pvma); + anon_vma->root =3D anon_vma->parent->root; /* * With refcounts, an anon_vma can stay around longer than the * process it belongs to. The root anon_vma needs to be pinned until @@ -430,13 +433,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct = vm_area_struct *pvma) */ get_anon_vma(anon_vma->root); /* Mark this anon_vma as the one where our new (COWed) pages go. */ - vma->anon_vma =3D anon_vma; + vma->anon_vma =3D anon_tree =3D make_anon_vma_tree(anon_vma); anon_vma_chain_assign(vma, avc, anon_vma); /* Now let rmap see it. */ - anon_vma_lock_write(anon_vma); + anon_vma_tree_lock_write(anon_tree); anon_vma_interval_tree_insert(avc, &anon_vma->rb_root); anon_vma->parent->num_children++; - anon_vma_unlock_write(anon_vma); + anon_vma_tree_unlock_write(anon_tree); =20 return 0; } @@ -463,7 +466,7 @@ static void cleanup_partial_anon_vmas(struct vm_area_st= ruct *vma) * able to correctly clone AVC state. Avoid inconsistent anon_vma tree * state by resetting. */ - vma->anon_vma =3D NULL; + vma_set_anon_vma(vma, NULL); } =20 /** @@ -479,18 +482,18 @@ static void cleanup_partial_anon_vmas(struct vm_area_= struct *vma) void unlink_anon_vmas(struct vm_area_struct *vma) { struct anon_vma_chain *avc, *next; - struct anon_vma *active_anon_vma =3D vma->anon_vma; + anon_vma_tree_t active_anon_tree =3D vma->anon_vma; =20 /* Always hold mmap lock, read-lock on unmap possibly. */ mmap_assert_locked(vma->vm_mm); =20 /* Unfaulted is a no-op. */ - if (!active_anon_vma) { + if (!active_anon_tree) { VM_WARN_ON_ONCE(!list_empty(&vma->anon_vma_chain)); return; } =20 - anon_vma_lock_write(active_anon_vma); + anon_vma_tree_lock_write(active_anon_tree); =20 /* * Unlink each anon_vma chained to the VMA. This list is ordered @@ -514,13 +517,13 @@ void unlink_anon_vmas(struct vm_area_struct *vma) anon_vma_chain_free(avc); } =20 - active_anon_vma->num_active_vmas--; + vma_anon_vma(vma)->num_active_vmas--; /* * vma would still be needed after unlink, and anon_vma will be prepared * when handle fault. */ - vma->anon_vma =3D NULL; - anon_vma_unlock_write(active_anon_vma); + vma_set_anon_vma(vma, NULL); + anon_vma_tree_unlock_write(active_anon_tree); =20 =20 /* @@ -703,10 +706,12 @@ static struct anon_vma *folio_lock_anon_vma_read(cons= t struct folio *folio, =20 anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma) { + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(vma->anon_vma); + mmap_assert_locked(vma->vm_mm); VM_BUG_ON(!vma->anon_vma); - get_anon_vma(vma->anon_vma); - return anon_vma_to_anon_rmap(vma->anon_vma); + get_anon_vma(anon_vma); + return anon_vma_to_anon_rmap(anon_vma); } =20 void put_anon_rmap(anon_rmap_t anon_rmap) @@ -756,7 +761,7 @@ bool folio_maybe_same_anon_vma(const struct folio *foli= o, const struct vm_area_struct *vma) { struct anon_vma *anon_vma; - struct anon_vma *tgt_anon_vma =3D vma->anon_vma; + struct anon_vma *tgt_anon_vma =3D vma_anon_vma(vma); bool same =3D false; =20 rcu_read_lock(); @@ -1518,7 +1523,7 @@ static __always_inline void __folio_add_rmap(struct f= olio *folio, */ void folio_move_anon_rmap(struct folio *folio, struct vm_area_struct *vma) { - void *anon_vma =3D vma->anon_vma; + void *anon_vma =3D vma_anon_vma(vma); =20 VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_VMA(!anon_vma, vma); @@ -1542,7 +1547,7 @@ void folio_move_anon_rmap(struct folio *folio, struct= vm_area_struct *vma) static void __folio_set_anon(struct folio *folio, struct vm_area_struct *v= ma, unsigned long address, bool exclusive) { - struct anon_vma *anon_vma =3D vma->anon_vma; + struct anon_vma *anon_vma =3D vma_anon_vma(vma); =20 BUG_ON(!anon_vma); =20 diff --git a/mm/vma.c b/mm/vma.c index d90791b00a7b..3501617085b0 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -107,8 +107,8 @@ static bool is_mergeable_anon_vma(struct vma_merge_stru= ct *vmg, bool merge_next) { struct vm_area_struct *tgt =3D merge_next ? vmg->next : vmg->prev; struct vm_area_struct *src =3D vmg->middle; /* existing merge case. */ - struct anon_vma *tgt_anon =3D tgt->anon_vma; - struct anon_vma *src_anon =3D vmg->anon_vma; + anon_vma_tree_t tgt_anon =3D tgt->anon_vma; + anon_vma_tree_t src_anon =3D vmg->anon_vma; =20 /* * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we @@ -311,7 +311,7 @@ static void vma_prepare(struct vma_prepare *vp) } =20 if (vp->anon_vma) { - anon_vma_lock_write(vp->anon_vma); + anon_vma_tree_lock_write(vp->anon_vma); anon_vma_interval_tree_pre_update_vma(vp->vma); if (vp->adj_next) anon_vma_interval_tree_pre_update_vma(vp->adj_next); @@ -364,7 +364,7 @@ static void vma_complete(struct vma_prepare *vp, struct= vma_iterator *vmi, anon_vma_interval_tree_post_update_vma(vp->vma); if (vp->adj_next) anon_vma_interval_tree_post_update_vma(vp->adj_next); - anon_vma_unlock_write(vp->anon_vma); + anon_vma_tree_unlock_write(vp->anon_vma); } =20 if (vp->file) { @@ -652,7 +652,7 @@ void validate_mm(struct mm_struct *mm) mt_validate(&mm->mm_mt); for_each_vma(vmi, vma) { #ifdef CONFIG_DEBUG_VM_RB - struct anon_vma *anon_vma =3D vma->anon_vma; + anon_vma_tree_t anon_tree =3D vma->anon_vma; struct anon_vma_chain *avc; #endif unsigned long vmi_start, vmi_end; @@ -676,11 +676,11 @@ void validate_mm(struct mm_struct *mm) } =20 #ifdef CONFIG_DEBUG_VM_RB - if (anon_vma) { - anon_vma_lock_read(anon_vma); + if (anon_tree) { + anon_vma_tree_lock_read(anon_tree); list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) anon_vma_interval_tree_verify(avc); - anon_vma_unlock_read(anon_vma); + anon_vma_tree_unlock_read(anon_tree); } #endif /* Check for a infinite loop */ @@ -2009,7 +2009,7 @@ static struct anon_vma *reusable_anon_vma(struct vm_a= rea_struct *old, struct vm_area_struct *b) { if (anon_vma_compatible(a, b)) { - struct anon_vma *anon_vma =3D READ_ONCE(old->anon_vma); + struct anon_vma *anon_vma =3D vma_anon_vma(old); =20 if (anon_vma && list_is_singular(&old->anon_vma_chain)) return anon_vma; @@ -3160,7 +3160,7 @@ int expand_upwards(struct vm_area_struct *vma, unsign= ed long address) /* Lock the VMA before expanding to prevent concurrent page faults */ vma_start_write(vma); /* We update the anon VMA tree. */ - anon_vma_lock_write(vma->anon_vma); + anon_vma_tree_lock_write(vma->anon_vma); =20 /* Somebody else might have raced and expanded it already */ if (address > vma->vm_end) { @@ -3186,7 +3186,7 @@ int expand_upwards(struct vm_area_struct *vma, unsign= ed long address) } } } - anon_vma_unlock_write(vma->anon_vma); + anon_vma_tree_unlock_write(vma->anon_vma); vma_iter_free(&vmi); validate_mm(mm); return error; @@ -3239,7 +3239,7 @@ int expand_downwards(struct vm_area_struct *vma, unsi= gned long address) /* Lock the VMA before expanding to prevent concurrent page faults */ vma_start_write(vma); /* We update the anon VMA tree. */ - anon_vma_lock_write(vma->anon_vma); + anon_vma_tree_lock_write(vma->anon_vma); =20 /* Somebody else might have raced and expanded it already */ if (address < vma->vm_start) { @@ -3266,7 +3266,7 @@ int expand_downwards(struct vm_area_struct *vma, unsi= gned long address) } } } - anon_vma_unlock_write(vma->anon_vma); + anon_vma_tree_unlock_write(vma->anon_vma); vma_iter_free(&vmi); validate_mm(mm); return error; diff --git a/mm/vma.h b/mm/vma.h index 8e4b61a7304c..d3bd83299219 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -15,7 +15,7 @@ struct vma_prepare { struct vm_area_struct *adj_next; struct file *file; struct address_space *mapping; - struct anon_vma *anon_vma; + anon_vma_tree_t anon_vma; struct vm_area_struct *insert; struct vm_area_struct *remove; struct vm_area_struct *remove2; @@ -104,7 +104,7 @@ struct vma_merge_struct { vma_flags_t vma_flags; }; struct file *file; - struct anon_vma *anon_vma; + anon_vma_tree_t anon_vma; struct mempolicy *policy; struct vm_userfaultfd_ctx uffd_ctx; struct anon_vma_name *anon_name; --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6015A3EF673; Wed, 27 May 2026 11:26:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881211; cv=none; b=SLk1rsBzAZv22I4ye7VDVZyJX6sVg/wtTEP4DdEaJTBiiH94sTHHZ+SDJFBPnvQC7lXTnOLvAOScUjXuYlCoOcwX8Kx4WVjKgGs5Ks7eStJOp10QZw1Ccbg6CFo06KzBxcknQdl3hglxrpW5EvzJ3iiJL4cS2tpdohjOPP03nig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881211; c=relaxed/simple; bh=u3wyjBIt5jW+EGM9oNwMCqW+k4jPq+GElpnXGFcwr+A=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K/y55PBdELnU9lA06ECKm8ChgZveSQBdaSBJJ9iR2gy7P8bqB96Hq9PIzXCNJd39VXmdBV6OJ98uKp+IsTgkrsWeoc4SjU7uzSJDAmE4eVVBIWcdcBtw4Qdsyy7iEuJXwuHJP1jPvHrYTi7B8hyMJNcaPeHhknbw6GjGQwK2hxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW003.hihonor.com (unknown [10.77.199.161]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdN0rt8zYl3cl; Wed, 27 May 2026 19:06:32 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW003.hihonor.com (10.77.199.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:55 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:57 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 05/15] mm: add CONFIG_ANON_VMA_LAZY and folio helpers Date: Wed, 27 May 2026 19:01:37 +0800 Message-ID: <20260527110147.17815-6-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the ANON_VMA_LAZY optimization foundation: - CONFIG_ANON_VMA_LAZY Kconfig option - FOLIO_MAPPING_ANON_VMA_LAZY flag for folio->mapping - add a runtime switch for ANON_VMA_LAZY This feature delays anon_vma allocation until fork, reducing memory overhead for VMAs without children. Signed-off-by: tao --- include/linux/page-flags.h | 23 +++++++++++ mm/Kconfig | 14 +++++++ mm/internal.h | 16 ++++++++ mm/mmap.c | 9 ++++ mm/rmap.c | 84 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 146 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 0e03d816e8b9..c0cc43118877 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -696,6 +696,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted) * the FOLIO_MAPPING_ANON_KSM bit may be set along with the FOLIO_MAPPING_= ANON * bit; and then folio->mapping points, not to an anon_vma, but to a priva= te * structure which KSM associates with that merged folio. See ksm.h. + * + * If CONFIG_ANON_VMA_LAZY is enabled, the FOLIO_MAPPING_ANON_KSM bit is u= sed + * for the ANON_VMA_LAZY optimization. In this case, folio->mapping points= to + * the ANON_VMA_LAZY root VMA instead of anon_vma. The folio_test_anon() + * check also needs to be updated accordingly. + * * Please note that, confusingly, "folio_mapping" refers to the inode * address_space which maps the folio from disk; whereas "folio_mapped" @@ -711,11 +717,16 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted) #define FOLIO_MAPPING_ANON 0x1 #define FOLIO_MAPPING_ANON_KSM 0x2 #define FOLIO_MAPPING_KSM (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM) +#define FOLIO_MAPPING_ANON_VMA_LAZY FOLIO_MAPPING_ANON_KSM #define FOLIO_MAPPING_FLAGS (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM) =20 static __always_inline bool folio_test_anon(const struct folio *folio) { +#ifdef CONFIG_ANON_VMA_LAZY + return ((unsigned long)folio->mapping & FOLIO_MAPPING_FLAGS) !=3D 0; +#else return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) !=3D 0; +#endif } =20 static __always_inline bool folio_test_lazyfree(const struct folio *folio) @@ -734,6 +745,18 @@ static __always_inline bool PageAnon(const struct page= *page) { return folio_test_anon(page_folio(page)); } + +static inline bool folio_test_anon_vma_lazy(const struct folio *folio) +{ +#ifdef CONFIG_ANON_VMA_LAZY + unsigned long flags =3D (unsigned long)folio->mapping; + + return (flags & FOLIO_MAPPING_FLAGS) =3D=3D FOLIO_MAPPING_ANON_VMA_LAZY; +#else + return false; +#endif +} + #ifdef CONFIG_KSM /* * A KSM page is one of those write-protected "shared pages" or "merged pa= ges" diff --git a/mm/Kconfig b/mm/Kconfig index e8bf1e9e6ad9..c16b5d9b3ce9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1412,6 +1412,20 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP =20 +config ARCH_SUPPORTS_ANON_VMA_LAZY + def_bool n + +config ANON_VMA_LAZY + bool "Lazy allocation of anon_vma" + def_bool y + depends on ARCH_SUPPORTS_ANON_VMA_LAZY && MMU + help + For anonymous VMAs without children, avoid allocating anon_vma + and anon_vma_chain to reduce memory overhead. + + Say Y to enable this optimization for anonymous VMAs without + children. + config IOMMU_MM_DATA bool =20 diff --git a/mm/internal.h b/mm/internal.h index 3dbbd118a78c..639f9c287f4c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -248,6 +248,22 @@ static inline void anon_vma_unlock_read(struct anon_vm= a *anon_vma) =20 /* anon_vma_tree_t APIs */ =20 +/* Encoded anon_vma tree type. Must fit within ANON_VMA_TREE_BITS. */ +#define ANON_VMA_TREE_REGULAR 0 /* regular anon_vma */ +#define ANON_VMA_TREE_VMA 1 +#define ANON_VMA_TREE_PARENT 2 +#define ANON_VMA_TREE_INVALID 3 /* reserved */ + +#define ANON_VMA_TREE_BITS 2 +#define ANON_VMA_TREE_MASK ((1UL << ANON_VMA_TREE_BITS) - 1) + +#ifdef CONFIG_ANON_VMA_LAZY +extern bool anon_vma_lazy_enable; +static inline bool anon_vma_lazy_enabled(void) { return anon_vma_lazy_enab= le; } +#else +static inline bool anon_vma_lazy_enabled(void) { return false; } +#endif + static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma) { return (anon_vma_tree_t)anon_vma; diff --git a/mm/mmap.c b/mm/mmap.c index eac1fb3823eb..2ae733eb39f0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1558,6 +1558,15 @@ static const struct ctl_table mmap_table[] =3D { .extra2 =3D (void *)&mmap_rnd_compat_bits_max, }, #endif +#ifdef CONFIG_ANON_VMA_LAZY + { + .procname =3D "anon_vma_lazy", + .data =3D &anon_vma_lazy_enable, + .maxlen =3D sizeof(anon_vma_lazy_enable), + .mode =3D 0600, + .proc_handler =3D proc_dobool, + }, +#endif }; #endif /* CONFIG_SYSCTL */ =20 diff --git a/mm/rmap.c b/mm/rmap.c index 5c4eb090c801..48c4463d8b2c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -87,6 +87,90 @@ static struct kmem_cache *anon_vma_cachep; static struct kmem_cache *anon_vma_chain_cachep; =20 +#ifdef CONFIG_ANON_VMA_LAZY +/* + * ANON_VMA_LAZY: defer anon_vma allocation until fork(). + * + * anon_vma and anon_vma_chain exist mainly to support reverse mapping + * across multiple processes. For VMAs that belong to a single process, + * eagerly creating anon_vma introduces unnecessary memory and setup + * overhead. + * + * This optimization delays anon_vma creation until fork(). Before that + * the VMA stays in a lazy state and no anon_vma or anon_vma_chain + * topology is created. + * + * vma->anon_vma encodes the anonymous VMA state. Low bits of the pointer + * distinguish lazy states: + * + * NULL + * VMA has no anonymous or CoW pages. + * + * regular anon_vma + * Standard anon_vma with anon_vma_chain topology. + * + * anon_vma_lazy_root | ANON_VMA_TREE_VMA + * Lazy root for the VMA that first faults anonymous pages. + * No anon_vma or anon_vma_chain topology exists. + * + * parent_anon_vma | ANON_VMA_TREE_PARENT + * Lazy state for VMAs created during fork(). The lazy parent_anon_= vma + * refers to the anon_vma of the parent VMA. + * + * Anonymous folios extend folio->mapping with FOLIO_MAPPING_ANON_VMA_LAZY: + * + * anon_vma | FOLIO_MAPPING_ANON + * regular anonymous mapping + * + * anon_vma_lazy_root | FOLIO_MAPPING_ANON_VMA_LAZY + * lazy anonymous mapping + * + * In typical workloads most VMAs remain in ANON_VMA_TREE_VMA state. + * These VMAs have no anon_vma, no anon_vma_chain and only a single VMA. + * Reverse mapping can therefore be performed without anon_vma locking, + * providing a faster rmap path for the common case. + * + * During fork(), VMAs in ANON_VMA_TREE_VMA are upgraded to regular + * anon_vma in the parent to establish sharing topology. Child VMAs are + * created as ANON_VMA_TREE_PARENT and do not allocate anon_vma, + * avoiding additional fork overhead. + * + * Folio mapping rules: + * + * Lazy anonymous folios store the lazy root in folio->mapping using + * FOLIO_MAPPING_ANON_VMA_LAZY. This allows rmap walkers to resolve the + * owning VMA without requiring anon_vma topology. + * + * folio->mapping may be updated during fork() when lazy VMAs are + * upgraded to regular anon_vma. dup_anon_rmap() in copy_page_range() + * performs the upgrade and installs the new anon_vma mapping. + * + * folio_move_anon_rmap() updates folio->mapping when anonymous folios + * move between VMAs. + * + * As with regular anonymous memory, __folio_remove_rmap() does not + * clear folio->mapping. Rmap walkers validate mappings using + * folio_mapped(). + * + * VMA split keeps vma->anon_vma unchanged. The lazy root holds an extra + * reference so folio->mapping remains valid without scanning folios. + * + * Internal helpers: + * + * anon_vma_link_t + * The value encodes a reference to anon_vma topology. Low bits + * are used as type tags to distinguish different anon_vma + * implementations (e.g. regular anon_vma or anon_vma_lazy). + * + * anon_rmap_t + * anon_rmap_t wraps the tagged pointer used by the rmap code and + * provides a type-safe interface for reverse mapping operations, + * covering both regular anon_vma and lazy anon_vma mappings. + */ + +bool anon_vma_lazy_enable; +#endif + static inline struct anon_vma *anon_vma_alloc(void) { struct anon_vma *anon_vma; --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F0C03F412B; Wed, 27 May 2026 11:26:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881212; cv=none; b=s/hXpHoTOnHzQ95oZ90do/gA6nwgMuTWfWBtpcui2h7+lOGPoIeK7fqkNkXHRRD2ANM4HDciaqgKHagbgaJoZtYpbIn0dnFKKqlSZnTb1aMoL58kId1QCQoKZAvCN59OR45vNoomD7GhvWL0CzkbFJkT4hSPuP3YVEJyA6I3Qyc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881212; c=relaxed/simple; bh=FlV4d46OIV6ymP4a4GLqgly2A0PlFo5ynAuJy/tx9c0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=e8sNTIP1S44ZNzpa6Ls1IihNpWzPkEtL/H7hsVjVcXG9WP2IjA4LcbYdlwBr/rHU1HTdBQASzvX469GieLSb4tV2BnX/+8fayzC6b5JeKUDGxNBAfQXKcZHh5u/OFJvdLmlBR3+IjYLfKeCDpGw2IDRFlH/PO8SgUcRlhsAeVjg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW001.hihonor.com (unknown [10.77.229.151]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdN5KTvzYl4n8; Wed, 27 May 2026 19:06:32 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW001.hihonor.com (10.77.229.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:59 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:58 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 06/15] mm: add CONFIG_VMA_REF and VMA helpers Date: Wed, 27 May 2026 19:01:38 +0800 Message-ID: <20260527110147.17815-7-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rcuref only manages the lifetime of a VMA and does not track its state. Prepare for the upcoming ANON_VMA_LAZY support. Signed-off-by: tao --- include/linux/mm.h | 38 ++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 4 ++++ mm/Kconfig | 8 ++++++++ mm/debug_vm_pgtable.c | 2 +- mm/mmap.c | 4 ++-- mm/vma.c | 12 ++++++------ mm/vma_exec.c | 2 +- mm/vma_init.c | 1 + 8 files changed, 61 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index af23453e9dbd..e98bdb414e43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -918,6 +918,43 @@ static inline void assert_fault_locked(const struct vm= _fault *vmf) } #endif /* CONFIG_PER_VMA_LOCK */ =20 +#ifdef CONFIG_VMA_REF +static inline void vma_rcuref_init(struct vm_area_struct *vma) +{ + rcuref_init(&vma->vm_rcuref, 1); +} + +static inline struct vm_area_struct *vma_get(struct vm_area_struct *vma) +{ + if (rcuref_get(&vma->vm_rcuref)) + return vma; + return NULL; +} + +static inline bool vma_put(struct vm_area_struct *vma) +{ + bool release =3D rcuref_put(&vma->vm_rcuref); + + if (unlikely(release)) + vm_area_free(vma); + return release; +} +#else +static inline void vma_rcuref_init(struct vm_area_struct *vma) {} + +static inline struct vm_area_struct *vma_get(struct vm_area_struct *vma) +{ + VM_WARN_ON_ONCE(true); /* not allowed */ + return NULL; +} + +static inline bool vma_put(struct vm_area_struct *vma) +{ + vm_area_free(vma); + return true; +} +#endif /* CONFIG_VMA_REF */ + static inline bool mm_flags_test(int flag, const struct mm_struct *mm) { return test_bit(flag, ACCESS_PRIVATE(&mm->flags, __mm_flags)); @@ -957,6 +994,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); vma_lock_init(vma, false); + vma_rcuref_init(vma); } =20 /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e7f5debac98e..a2bf17a42b55 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -6,6 +6,7 @@ =20 #include #include +#include #include #include #include @@ -978,6 +979,9 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; +#endif +#ifdef CONFIG_ANON_VMA_LAZY + rcuref_t vm_rcuref; /* Ensures the VMA stays valid. */ #endif /* * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma diff --git a/mm/Kconfig b/mm/Kconfig index c16b5d9b3ce9..c039ce583924 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1419,13 +1419,21 @@ config ANON_VMA_LAZY bool "Lazy allocation of anon_vma" def_bool y depends on ARCH_SUPPORTS_ANON_VMA_LAZY && MMU + select VMA_REF help For anonymous VMAs without children, avoid allocating anon_vma and anon_vma_chain to reduce memory overhead. =20 + ANON_VMA_LAZY records the VMA in folio->mapping, while VMA_REF + ensures that the recorded VMA remains valid. + Say Y to enable this optimization for anonymous VMAs without children. =20 +config VMA_REF + def_bool n + depends on MMU + config IOMMU_MM_DATA bool =20 diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 23dc3ee09561..cab8a4e71243 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -1036,7 +1036,7 @@ static void __init destroy_args(struct pgtable_debug_= args *args) =20 /* Free vma and mm struct */ if (args->vma) - vm_area_free(args->vma); + vma_put(args->vma); =20 if (args->mm) mmput(args->mm); diff --git a/mm/mmap.c b/mm/mmap.c index 2ae733eb39f0..ccedebc87cd5 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1481,7 +1481,7 @@ static struct vm_area_struct *__install_special_mappi= ng( return vma; =20 out: - vm_area_free(vma); + vma_put(vma); return ERR_PTR(ret); } =20 @@ -1922,7 +1922,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, s= truct mm_struct *oldmm) fail_nomem_anon_vma_fork: mpol_put(vma_policy(tmp)); fail_nomem_policy: - vm_area_free(tmp); + vma_put(tmp); fail_nomem: retval =3D -ENOMEM; vm_unacct_memory(charge); diff --git a/mm/vma.c b/mm/vma.c index 3501617085b0..ed15968a5891 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -392,7 +392,7 @@ static void vma_complete(struct vma_prepare *vp, struct= vma_iterator *vmi, mpol_put(vma_policy(vp->remove)); if (!vp->remove2) WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end); - vm_area_free(vp->remove); + vma_put(vp->remove); =20 /* * In mprotect's case 6 (see comments on vma_merge), @@ -470,7 +470,7 @@ void remove_vma(struct vm_area_struct *vma) if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - vm_area_free(vma); + vma_put(vma); } =20 /* @@ -582,7 +582,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_st= ruct *vma, out_free_vmi: vma_iter_free(vmi); out_free_vma: - vm_area_free(new); + vma_put(new); return err; } =20 @@ -1950,7 +1950,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct= **vmap, out_free_mempol: mpol_put(vma_policy(new_vma)); out_free_vma: - vm_area_free(new_vma); + vma_put(new_vma); out: return NULL; } @@ -2596,7 +2596,7 @@ static int __mmap_new_vma(struct mmap_state *map, str= uct vm_area_struct **vmap, free_iter_vma: vma_iter_free(vmi); free_vma: - vm_area_free(vma); + vma_put(vma); return error; } =20 @@ -2946,7 +2946,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_= area_struct *vma, return 0; =20 mas_store_fail: - vm_area_free(vma); + vma_put(vma); unacct_fail: vm_unacct_memory(len >> PAGE_SHIFT); return -ENOMEM; diff --git a/mm/vma_exec.c b/mm/vma_exec.c index 5cee8b7efa0f..e7f388010488 100644 --- a/mm/vma_exec.c +++ b/mm/vma_exec.c @@ -160,6 +160,6 @@ int create_init_stack_vma(struct mm_struct *mm, struct = vm_area_struct **vmap, mmap_write_unlock(mm); err_free: *vmap =3D NULL; - vm_area_free(vma); + vma_put(vma); return err; } diff --git a/mm/vma_init.c b/mm/vma_init.c index 3c0b65950510..1300d813d61b 100644 --- a/mm/vma_init.c +++ b/mm/vma_init.c @@ -137,6 +137,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc= t *orig) INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); dup_anon_vma_name(orig, new); + vma_rcuref_init(new); =20 return new; } --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A7A83F9A18; Wed, 27 May 2026 11:08:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880092; cv=none; b=jNrdIknRYGMXeYkmFSd4Vpl3TZXokIcbxoap0dhwSjMO7q7B5hCL+3OHvAK4f8BmUJ5dlLnozCp82epzfZbhaMO7ocfDXcuZmN0rgGF1RzOfqmzFGwifATsA93uHFRBlPZCSbUeUYOjQ5Y/KAwDj9kpdnef0hoL3GaWGQEpxcwE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880092; c=relaxed/simple; bh=Dk7SM3To25Cme62PVbHQX7dIZzrQQBtRDWTHoV7EWI8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SBGUx1I076MJaaXdIL1FXwtSJahJlIUCK3rSr9hQmr5oaiK+HJwXOZf11OF9OxefN1qdWKBMSDGjOyhsH8/kdSq4eKHHnkgY4veofg+5muYrPF1O1G1M3ECnxrLOJWXHbqelmnkWBg7sl/eCf20L2mRf89AlGqCBGRqqSDdVQ80= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW006-1.hihonor.com (unknown [10.77.215.153]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdP2F6tzYl7vQ; Wed, 27 May 2026 19:06:33 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW006-1.hihonor.com (10.77.215.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:59 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:58 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 07/15] mm: replace direct FOLIO_MAPPING_ANON usage with helpers Date: Wed, 27 May 2026 19:01:39 +0800 Message-ID: <20260527110147.17815-8-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replace direct uses of FOLIO_MAPPING_ANON in external modules with helper functions in preparation for ANON_VMA_LAZY. Signed-off-by: tao --- fs/proc/page.c | 6 ++---- include/linux/page-flags.h | 15 ++++++++++++--- include/linux/pagemap.h | 2 +- mm/gup.c | 6 ++---- 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/fs/proc/page.c b/fs/proc/page.c index f9b2c2c906cd..93ddfda9fa1d 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -148,7 +148,6 @@ u64 stable_page_flags(const struct page *page) const struct folio *folio; struct page_snapshot ps; unsigned long k; - unsigned long mapping; bool is_anon; u64 u =3D 0; =20 @@ -163,8 +162,7 @@ u64 stable_page_flags(const struct page *page) folio =3D &ps.folio_snapshot; =20 k =3D folio->flags.f; - mapping =3D (unsigned long)folio->mapping; - is_anon =3D mapping & FOLIO_MAPPING_ANON; + is_anon =3D folio_test_anon(folio); =20 /* * pseudo flags for the well known (anonymous) memory mapped pages @@ -173,7 +171,7 @@ u64 stable_page_flags(const struct page *page) u |=3D 1 << KPF_MMAP; if (is_anon) { u |=3D 1 << KPF_ANON; - if (mapping & FOLIO_MAPPING_KSM) + if (!PageAnonNotKsm(page)) u |=3D 1 << KPF_KSM; } =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index c0cc43118877..50c80a1e2c7c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -720,15 +720,20 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted) #define FOLIO_MAPPING_ANON_VMA_LAZY FOLIO_MAPPING_ANON_KSM #define FOLIO_MAPPING_FLAGS (FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM) =20 -static __always_inline bool folio_test_anon(const struct folio *folio) +static __always_inline bool mapping_is_anon(unsigned long mapping) { #ifdef CONFIG_ANON_VMA_LAZY - return ((unsigned long)folio->mapping & FOLIO_MAPPING_FLAGS) !=3D 0; + return (mapping & FOLIO_MAPPING_FLAGS) !=3D 0; #else - return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) !=3D 0; + return (mapping & FOLIO_MAPPING_ANON) !=3D 0; #endif } =20 +static __always_inline bool folio_test_anon(const struct folio *folio) +{ + return mapping_is_anon((unsigned long)folio->mapping); +} + static __always_inline bool folio_test_lazyfree(const struct folio *folio) { return folio_test_anon(folio) && !folio_test_swapbacked(folio); @@ -738,7 +743,11 @@ static __always_inline bool PageAnonNotKsm(const struc= t page *page) { unsigned long flags =3D (unsigned long)page_folio(page)->mapping; =20 +#ifdef CONFIG_ANON_VMA_LAZY + return (flags & FOLIO_MAPPING_FLAGS) !=3D FOLIO_MAPPING_KSM; +#else return (flags & FOLIO_MAPPING_FLAGS) =3D=3D FOLIO_MAPPING_ANON; +#endif } =20 static __always_inline bool PageAnon(const struct page *page) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..746939872ac4 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -507,7 +507,7 @@ static inline pgoff_t mapping_align_index(const struct = address_space *mapping, static inline bool mapping_large_folio_support(const struct address_space = *mapping) { /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ - VM_WARN_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON, + VM_WARN_ONCE(mapping_is_anon((unsigned long)mapping), "Anonymous mapping always supports large folio"); =20 return mapping_max_folio_order(mapping) > 0; diff --git a/mm/gup.c b/mm/gup.c index ad9ded39609c..69dda325b082 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2740,7 +2740,6 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags) bool reject_file_backed =3D false; struct address_space *mapping; bool check_secretmem =3D false; - unsigned long mapping_flags; =20 /* * If we aren't pinning then no problematic write can occur. A long term @@ -2792,9 +2791,8 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags) return false; =20 /* Anonymous folios pose no problem. */ - mapping_flags =3D (unsigned long)mapping & FOLIO_MAPPING_FLAGS; - if (mapping_flags) - return mapping_flags & FOLIO_MAPPING_ANON; + if (mapping_is_anon((unsigned long)mapping)) + return true; =20 /* * At this point, we know the mapping is non-null and points to an --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta20.hihonor.com (mta20.honor.com [81.70.206.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7069C3F9A13; Wed, 27 May 2026 11:08:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.206.69 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880091; cv=none; b=AcovIEg0UV1XePa1+Vf62Cx2IkxCu4RbyYNPojAcknRwh0dIWNNtSmUw8GzoLQ6HbnlsjhW/Z6K89T7YgRtQmvuKZunwQPWxQ8Odg3x0vYRNgeFWzvQd1PqIxjPnUKUw4DbREZCoQEf9jdqN6efyg4O1Uj6XeUZLtd9oclEFa2E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880091; c=relaxed/simple; bh=5RoN2zk62hdf+TzEyIdl1VbOtdl1FBSsnOkeGJTu7is=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FOrYI2axa/q1zTgt1VGAMtua97OEtpK1PozeVcv8F/naC0Xnmwfkp6FFoYp5k9oDXqRvq4d4IjqUJF+zGqfNeXQPzL2jZYYvPizyzDvDItUYMj2ce/3RUheBbhIyOhvH2jEYnaQ/Ygz+QwPZfkXzm6RnTf8OEnEK+nfzTBjwsLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.206.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW004-1.hihonor.com (unknown [10.77.232.85]) by mta20.hihonor.com (SkyGuard) with ESMTPS id 4gQRdQ01LHzYl88Q; Wed, 27 May 2026 19:06:34 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW004-1.hihonor.com (10.77.232.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:00 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:59 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 08/15] mm: prepare rmap infrastructure for ANON_VMA_LAZY Date: Wed, 27 May 2026 19:01:40 +0800 Message-ID: <20260527110147.17815-9-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce ANON_VMA_LAZY helpers and prepare the anon_rmap and anon_vma_tree infrastructure for the upcoming ANON_VMA_LAZY feature. Implement the core ANON_VMA_LAZY rmap semantics by updating anon_rmap_trylock_read(), anon_rmap_lock_read(), anon_rmap_unlock_read(), and anon_rmap_for_each_vma(). Also update __migrate_folio_record(): instead of storing both old_page_state and anon_vma in dst->private, store old_page_state in dst->private and use dst->mapping to hold anon_rmap. Split folio_lock_anon_rmap_read() and related functions into the next patch to keep this change small and easier to review. Signed-off-by: tao --- include/linux/rmap.h | 53 +++++++++++++++++++++--- mm/internal.h | 99 +++++++++++++++++++++++++++++++++++++------- mm/migrate.c | 11 ++++- mm/rmap.c | 42 +++++++++++++++++++ 4 files changed, 183 insertions(+), 22 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9802bce92695..ebe9f3f61170 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -938,15 +938,23 @@ void remove_migration_ptes(struct folio *src, struct = folio *dst, enum ttu_flags flags); =20 /* Reverse mapping handle for anonymous folio rmap helpers. */ +enum anon_rmap_type { + ANON_RMAP_ANON_VMA =3D 0, + ANON_RMAP_ANON_VMA_LAZY =3D 1, +}; +#define ANON_RMAP_TYPE_BITS 1 +#define ANON_RMAP_TYPE_MASK ((1UL << ANON_RMAP_TYPE_BITS) - 1) + typedef struct anon_rmap { unsigned long rmap; } anon_rmap_t; =20 -#define ANON_RMAP_NULL make_anon_rmap(0) +#define ANON_RMAP_NULL (make_anon_rmap(0, ANON_RMAP_ANON_VMA)) =20 -static inline anon_rmap_t make_anon_rmap(const void *anon_mapping) +static inline anon_rmap_t make_anon_rmap(const void *anon_mapping, + enum anon_rmap_type type) { - return (anon_rmap_t){ .rmap =3D (unsigned long)anon_mapping, }; + return (anon_rmap_t){ .rmap =3D (unsigned long)anon_mapping + type, }; } =20 static inline unsigned long anon_rmap_value(anon_rmap_t anon_rmap) @@ -956,14 +964,38 @@ static inline unsigned long anon_rmap_value(anon_rmap= _t anon_rmap) =20 static inline anon_rmap_t anon_vma_to_anon_rmap(const struct anon_vma *ano= n_vma) { - return make_anon_rmap(anon_vma); + return make_anon_rmap(anon_vma, ANON_RMAP_ANON_VMA); } =20 static inline struct anon_vma *anon_rmap_to_anon_vma(anon_rmap_t anon_rmap) { unsigned long rmap =3D anon_rmap_value(anon_rmap); =20 - return (struct anon_vma *)rmap; + return (struct anon_vma *)(rmap - ANON_RMAP_ANON_VMA); +} + +static inline anon_rmap_t vma_to_anon_rmap(const struct vm_area_struct *vm= a) +{ + return make_anon_rmap(vma, ANON_RMAP_ANON_VMA_LAZY); +} + +static inline struct vm_area_struct *anon_rmap_to_vma(anon_rmap_t anon_rma= p) +{ + unsigned long rmap =3D anon_rmap_value(anon_rmap); + + VM_BUG_ON((rmap & ANON_RMAP_TYPE_MASK) !=3D ANON_RMAP_ANON_VMA_LAZY); + return (struct vm_area_struct *)(rmap - ANON_RMAP_ANON_VMA_LAZY); +} + +static inline bool anon_rmap_is_anon_vma(anon_rmap_t anon_rmap) +{ +#ifdef CONFIG_ANON_VMA_LAZY + unsigned long rmap =3D anon_rmap_value(anon_rmap); + + return (rmap & ANON_RMAP_TYPE_MASK) =3D=3D ANON_RMAP_ANON_VMA; +#else + return true; +#endif } =20 anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *vma); @@ -1015,8 +1047,17 @@ static inline struct vm_area_struct *anon_rmap_iter_= first_vma( anon_rmap_t anon_rmap, unsigned long start, unsigned long last, struct anon_vma_chain **avc) { - struct anon_vma *anon_vma =3D anon_rmap_to_anon_vma(anon_rmap); + struct anon_vma *anon_vma; + + *avc =3D NULL; + if (!anon_rmap_is_anon_vma(anon_rmap)) { + struct vm_area_struct *vma =3D anon_rmap_to_vma(anon_rmap); =20 + if (vma->vm_pgoff + vma_pages(vma) < start || vma->vm_pgoff > last) + return NULL; /* No overlap in the VMA range. */ + return vma; + } else + anon_vma =3D anon_rmap_to_anon_vma(anon_rmap); *avc =3D anon_vma_interval_tree_iter_first(&anon_vma->rb_root, start, las= t); return *avc ? (*avc)->vma : NULL; } diff --git a/mm/internal.h b/mm/internal.h index 639f9c287f4c..6b703646f66d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -260,76 +260,147 @@ static inline void anon_vma_unlock_read(struct anon_= vma *anon_vma) #ifdef CONFIG_ANON_VMA_LAZY extern bool anon_vma_lazy_enable; static inline bool anon_vma_lazy_enabled(void) { return anon_vma_lazy_enab= le; } -#else -static inline bool anon_vma_lazy_enabled(void) { return false; } -#endif =20 -static inline anon_vma_tree_t make_anon_vma_tree(struct anon_vma *anon_vma) +static inline int anon_vma_tree_type(anon_vma_tree_t anon_tree) { - return (anon_vma_tree_t)anon_vma; + VM_WARN_ON(((unsigned long)anon_tree & ANON_VMA_TREE_MASK) =3D=3D + ANON_VMA_TREE_INVALID); + return (unsigned long)anon_tree & ANON_VMA_TREE_MASK; +} + +static inline bool anon_vma_tree_is_vma(anon_vma_tree_t anon_tree) +{ + return anon_vma_tree_type(anon_tree) =3D=3D ANON_VMA_TREE_VMA; +} + +static inline bool anon_vma_tree_is_parent(anon_vma_tree_t anon_tree) +{ + return anon_vma_tree_type(anon_tree) =3D=3D ANON_VMA_TREE_PARENT; +} + +static inline struct vm_area_struct *anon_vma_tree_vma(anon_vma_tree_t ano= n_tree) +{ + BUILD_BUG_ON(__alignof__(struct vm_area_struct) <=3D ANON_VMA_TREE_MASK); + if (!anon_vma_tree_is_vma(anon_tree)) + return NULL; + return (struct vm_area_struct *)( + (unsigned long)anon_tree & ~ANON_VMA_TREE_MASK); } =20 static inline struct anon_vma *anon_vma_tree_anon_vma(anon_vma_tree_t anon= _tree) { - return (struct anon_vma *)anon_tree; + BUILD_BUG_ON(__alignof__(struct anon_vma) <=3D ANON_VMA_TREE_MASK); + if (anon_vma_tree_is_vma(anon_tree)) + return NULL; + return (struct anon_vma *)((unsigned long)anon_tree & ~ANON_VMA_TREE_MASK= ); +} + +#else +static inline bool anon_vma_lazy_enabled(void) { return false; } +static inline int anon_vma_tree_type(anon_vma_tree_t anon_tree) { return 0= ; } +static inline bool anon_vma_tree_is_vma(anon_vma_tree_t anon_tree) { retur= n false; } +static inline bool anon_vma_tree_is_parent( + anon_vma_tree_t anon_tree) { return false; } +static inline struct vm_area_struct *anon_vma_tree_vma( + anon_vma_tree_t anon_tree) { return NULL; } +static inline struct anon_vma *anon_vma_tree_anon_vma( + anon_vma_tree_t anon_tree) { return (struct anon_vma *)anon_tree; } +#endif + +static inline anon_vma_tree_t make_anon_vma_tree(const struct anon_vma *an= on_vma) +{ + return (anon_vma_tree_t)anon_vma; } =20 /* Store anon_vma in vma->anon_vma using a tagged pointer. */ static inline void vma_set_anon_vma(struct vm_area_struct *vma, - struct anon_vma *anon_vma) + const struct anon_vma *anon_vma) { vma->anon_vma =3D (anon_vma_tree_t)anon_vma; } =20 -/* Return the VMA's anon_vma. */ +/* Return the VMA's anon_vma, or NULL if it is marked lazy. */ static inline struct anon_vma *vma_anon_vma(const struct vm_area_struct *v= ma) { /* Use READ_ONCE() for reusable_anon_vma */ anon_vma_tree_t anon_tree =3D READ_ONCE(vma->anon_vma); =20 + if (anon_vma_tree_type(anon_tree) !=3D ANON_VMA_TREE_REGULAR) + return NULL; return anon_vma_tree_anon_vma(anon_tree); } =20 +static inline bool vma_is_anon_vma_lazy(const struct vm_area_struct *vma) +{ + return anon_vma_tree_type((anon_vma_tree_t)vma->anon_vma); +} + +static inline const struct vm_area_struct *vma_anon_vma_lazy_root( + const struct vm_area_struct *vma) +{ + anon_vma_tree_t anon_tree =3D (anon_vma_tree_t)vma->anon_vma; + int lazy_type =3D anon_vma_tree_type(anon_tree); + + if (!lazy_type) + return NULL; + if (anon_vma_tree_is_parent(anon_tree)) + return vma; + return anon_vma_tree_vma(anon_tree); +} + +static inline bool vma_is_anon_vma_lazy_root(const struct vm_area_struct *= vma) +{ + return vma =3D=3D vma_anon_vma_lazy_root(vma); +} + +/* + * ANON_VMA_TREE_VMA is just a VMA, without anon_vma or anon_vma_chain, + * so no protection is needed. + */ static inline void anon_vma_tree_lock_write(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); =20 - anon_vma_lock_write(anon_vma); + if (anon_vma) + anon_vma_lock_write(anon_vma); } =20 static inline int anon_vma_tree_trylock_write(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); =20 - return anon_vma_trylock_write(anon_vma); + return anon_vma ? anon_vma_trylock_write(anon_vma) : 1; } =20 static inline void anon_vma_tree_unlock_write(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); =20 - anon_vma_unlock_write(anon_vma); + if (anon_vma) + anon_vma_unlock_write(anon_vma); } =20 static inline void anon_vma_tree_lock_read(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); =20 - anon_vma_lock_read(anon_vma); + if (anon_vma) + anon_vma_lock_read(anon_vma); } =20 static inline int anon_vma_tree_trylock_read(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); =20 - return anon_vma_trylock_read(anon_vma); + return anon_vma ? anon_vma_trylock_read(anon_vma) : 1; } =20 static inline void anon_vma_tree_unlock_read(anon_vma_tree_t anon_tree) { struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); =20 - anon_vma_unlock_read(anon_vma); + if (anon_vma) + anon_vma_unlock_read(anon_vma); } =20 struct anon_vma *folio_get_anon_vma(const struct folio *folio); diff --git a/mm/migrate.c b/mm/migrate.c index 769983cf14e0..b397cdeab09a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1144,7 +1144,10 @@ static void __migrate_folio_record(struct folio *dst, int old_page_state, anon_rmap_t anon_rmap) { - dst->private =3D (void *)anon_rmap_to_anon_vma(anon_rmap) + old_page_stat= e; + unsigned long rmap =3D anon_rmap_value(anon_rmap); + + dst->private =3D (void *)(rmap & ~PAGE_OLD_STATES) + old_page_state; + dst->mapping =3D (struct address_space *)rmap; } =20 static void __migrate_folio_extract(struct folio *dst, @@ -1152,8 +1155,12 @@ static void __migrate_folio_extract(struct folio *ds= t, anon_rmap_t *anon_rmapp) { unsigned long private =3D (unsigned long)dst->private; + unsigned long mapping =3D (unsigned long)dst->mapping; =20 - *anon_rmapp =3D anon_vma_to_anon_rmap((void *)(private & ~PAGE_OLD_STATES= )); + VM_BUG_ON((private & ~PAGE_OLD_STATES) !=3D (mapping & ~ANON_RMAP_TYPE_MA= SK)); + *anon_rmapp =3D make_anon_rmap((void *)(mapping & ~ANON_RMAP_TYPE_MASK), + mapping & ANON_RMAP_TYPE_MASK); + dst->mapping =3D NULL; *old_page_state =3D private & PAGE_OLD_STATES; dst->private =3D NULL; } diff --git a/mm/rmap.c b/mm/rmap.c index 48c4463d8b2c..001c44570df8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -794,42 +794,84 @@ anon_rmap_t vma_get_anon_rmap(struct vm_area_struct *= vma) =20 mmap_assert_locked(vma->vm_mm); VM_BUG_ON(!vma->anon_vma); + if (!anon_vma) { + vma_get(vma); + return vma_to_anon_rmap(vma); + } get_anon_vma(anon_vma); return anon_vma_to_anon_rmap(anon_vma); } =20 void put_anon_rmap(anon_rmap_t anon_rmap) { + if (!anon_rmap_is_anon_vma(anon_rmap)) { + vma_put(anon_rmap_to_vma(anon_rmap)); + return; + } put_anon_vma(anon_rmap_to_anon_vma(anon_rmap)); } =20 +/* + * Rmap for anonymous pages normally only needs read protection. + * However, huge page splitting in huge_memory requires the rmap + * write lock to prevent concurrency, achieved by upgrading to a + * regular anon_vma. + */ void anon_rmap_lock_write(anon_rmap_t anon_rmap) { + VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap)); anon_vma_lock_write(anon_rmap_to_anon_vma(anon_rmap)); } =20 int anon_rmap_trylock_write(anon_rmap_t anon_rmap) { + VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap)); return anon_vma_trylock_write(anon_rmap_to_anon_vma(anon_rmap)); } =20 void anon_rmap_unlock_write(anon_rmap_t anon_rmap) { + VM_BUG_ON(!anon_rmap_is_anon_vma(anon_rmap)); anon_vma_unlock_write(anon_rmap_to_anon_vma(anon_rmap)); } =20 +static void anon_vma_lazy_lock_read(struct vm_area_struct *vma) +{ + vma_get(vma); +} + +static bool anon_vma_lazy_trylock_read(struct vm_area_struct *vma) +{ + return (bool)vma_get(vma); +} + +static void anon_vma_lazy_unlock_read(struct vm_area_struct *vma) +{ + vma_put(vma); +} + void anon_rmap_lock_read(anon_rmap_t anon_rmap) { + if (!anon_rmap_is_anon_vma(anon_rmap)) { + anon_vma_lazy_lock_read(anon_rmap_to_vma(anon_rmap)); + return; + } anon_vma_lock_read(anon_rmap_to_anon_vma(anon_rmap)); } =20 int anon_rmap_trylock_read(anon_rmap_t anon_rmap) { + if (!anon_rmap_is_anon_vma(anon_rmap)) + return anon_vma_lazy_trylock_read(anon_rmap_to_vma(anon_rmap)); return anon_vma_trylock_read(anon_rmap_to_anon_vma(anon_rmap)); } =20 void anon_rmap_unlock_read(anon_rmap_t anon_rmap) { + if (!anon_rmap_is_anon_vma(anon_rmap)) { + anon_vma_lazy_unlock_read(anon_rmap_to_vma(anon_rmap)); + return; + } anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap)); } =20 --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.hihonor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BA4D3F58E7; Wed, 27 May 2026 11:08:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880087; cv=none; b=ivD2Eg/vXvgmOB7RHmZn9QVF7dSW557YCNfX49OxJAXYK9RUZiwedkshaadmGg2A101xpjxWi3F8qAxxbxdxP2k0C1vdfvuKTiV2occKeihGnj4oGtt15SQhtHC/D+EYHSP+dxiKNyn0wxkfgPDIWgr9g7XIjwXpNNUJiFDKFUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880087; c=relaxed/simple; bh=MTPRlp31QWJHOoL4nsovOJNdGaUO1zyIt68nyuomHmg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=boIQ6turq49KKaoZ1I0amG8Q2ybgAVGDspsGHznwLHxDXvym3GN/wH5rg0hU0D7W3VmfXQu8ETKDuL252YxaSP6jRALscccL2QTJqSVpNpKVY2Fv2tgK962zk7JeOKF8w4Wwj9DiTads6qMLenKf6auDSkV7S3rGezknlV8WHJI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=FMgslGyV; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="FMgslGyV" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=H0GLFq8C9nPPiYhXgw5zAUxPFjuLn1gqka+weNMFags=; b=FMgslGyVg0RRYZZQlKDK6VfD2gw8hIqaisjkqVL+vKFJb23PxpvExqtX3zbNjKbOwPDDGV62E TyTVyMMIq2R7BuO+Y3t7cBIxf7zuJCin764atMWjJ2AwlV96KxgUhdIiU29bHfzaWpqmZ+ltkjE 1bG2dGP2IaEG/Rif6aUkHaQ= Received: from TW002-1.hihonor.com (unknown [10.72.0.137]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdd4vMfzYl1Gk; Wed, 27 May 2026 19:06:45 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW002-1.hihonor.com (10.72.0.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:00 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:00 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 09/15] mm: implement ANON_VMA_LAZY rmap semantics Date: Wed, 27 May 2026 19:01:41 +0800 Message-ID: <20260527110147.17815-10-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement ANON_VMA_LAZY anon_rmap semantics by updating folio_anon_rmap(), folio_maybe_same_anon_vma(), folio_get_anon_rmap(), and folio_lock_anon_rmap_read(). ANON_VMA_LAZY VMAs resolve the target VMA via root_vma. As this path does not involve anon_vma topology, vma_get() is sufficient to ensure that the VMA still exists. Signed-off-by: tao --- mm/rmap.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 120 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 001c44570df8..f70e3cb9812e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -875,9 +875,97 @@ void anon_rmap_unlock_read(anon_rmap_t anon_rmap) anon_vma_unlock_read(anon_rmap_to_anon_vma(anon_rmap)); } =20 +static inline bool test_folio_unmapped(const struct folio *folio, bool tes= t) +{ + return test && !folio_mapped(folio); +} + +/* + * Must be called under rcu_read_lock(). + * + * For FOLIO_MAPPING_ANON_VMA_LAZY, first obtain the VMA recorded in the + * lazy mapping and take a reference with vma_get() so its fields can be + * safely accessed. If the folio is no longer mapped in that VMA, resolve + * and look up the actual VMA covering the folio. + */ +static struct vm_area_struct *folio_resolve_anon_vma_lazy( + const struct folio *folio, bool tryget, bool test_map) +{ + struct vm_area_struct *vma, *anon_lazy_root; + struct mm_struct *mm; + unsigned long anon_mapping; + pgoff_t pgoff; + unsigned long addr; + + anon_mapping =3D (unsigned long)READ_ONCE(folio->mapping); + if ((anon_mapping & FOLIO_MAPPING_FLAGS) !=3D FOLIO_MAPPING_ANON_VMA_LAZY) + return NULL; + if (test_folio_unmapped(folio, test_map)) + return NULL; + + anon_lazy_root =3D vma =3D (struct vm_area_struct *)(anon_mapping - + FOLIO_MAPPING_ANON_VMA_LAZY); + mm =3D vma->vm_mm; + if (!mm || !vma->anon_vma || !vma_get(anon_lazy_root)) + return NULL; + pgoff =3D folio->index; + if (vma_address(vma, pgoff, folio_nr_pages(folio)) =3D=3D -EFAULT) { + addr =3D vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + vma =3D vma_lookup(mm, addr); + if (vma && tryget && !vma_get(vma)) + vma =3D NULL; + } + if (!tryget || anon_lazy_root !=3D vma) + vma_put(anon_lazy_root); + if (test_folio_unmapped(folio, test_map) && vma) { + vma_put(vma); + vma =3D NULL; + } + return vma; +} + +/* Like folio_get_anon_vma(), but for ANON_VMA_LAZY VMAs. */ +static struct vm_area_struct *folio_get_anon_vma_lazy(const struct folio *= folio) +{ + struct vm_area_struct *vma =3D NULL; + + rcu_read_lock(); + vma =3D folio_resolve_anon_vma_lazy(folio, true, true); + rcu_read_unlock(); + return vma; +} + +/* + * For ANON_VMA_LAZY VMAs, similar to folio_get_anon_lazy_vma(). + * + * These VMAs do not have an anon_vma or anon_vma_chain and correspond + * to only a single VMA. Therefore, reverse mapping can be performed + * without taking the anon_vma lock, providing a faster rmap path for + * this common case. + */ +static struct vm_area_struct *folio_lock_anon_vma_lazy_read( + const struct folio *folio, struct rmap_walk_control *rwc, bool test_map) +{ + struct vm_area_struct *vma =3D NULL; + + rcu_read_lock(); + vma =3D folio_resolve_anon_vma_lazy(folio, true, test_map); + rcu_read_unlock(); + return vma; +} + static anon_rmap_t folio_anon_rmap(const struct folio *folio) { struct anon_vma *anon_vma; + struct vm_area_struct *vma; + + if (folio_test_anon_vma_lazy(folio)) { + rcu_read_lock(); + vma =3D folio_resolve_anon_vma_lazy(folio, false, false); + rcu_read_unlock(); + if (vma) + return vma_to_anon_rmap(vma); + } =20 anon_vma =3D folio_anon_vma(folio); return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; @@ -887,29 +975,49 @@ bool folio_maybe_same_anon_vma(const struct folio *fo= lio, const struct vm_area_struct *vma) { struct anon_vma *anon_vma; - struct anon_vma *tgt_anon_vma =3D vma_anon_vma(vma); + struct anon_vma *tgt_anon_vma =3D anon_vma_tree_anon_vma(vma->anon_vma); bool same =3D false; =20 rcu_read_lock(); - anon_vma =3D folio_anon_vma(folio); - if (anon_vma && tgt_anon_vma) - same =3D anon_vma->root =3D=3D tgt_anon_vma->root; + if (folio_test_anon_vma_lazy(folio)) { + same =3D vma =3D=3D folio_resolve_anon_vma_lazy(folio, false, false); + } else { + anon_vma =3D folio_anon_vma(folio); + if (anon_vma && tgt_anon_vma) + same =3D anon_vma->root =3D=3D tgt_anon_vma->root; + } rcu_read_unlock(); return same; } =20 anon_rmap_t folio_get_anon_rmap(const struct folio *folio) { - struct anon_vma *anon_vma =3D folio_get_anon_vma(folio); + struct anon_vma *anon_vma; + struct vm_area_struct *vma; + + if (folio_test_anon_vma_lazy(folio)) { + vma =3D folio_get_anon_vma_lazy(folio); + if (vma) + return vma_to_anon_rmap(vma); + } =20 + anon_vma =3D folio_get_anon_vma(folio); return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; } =20 anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio, struct rmap_walk_control *rwc) { - struct anon_vma *anon_vma =3D folio_lock_anon_vma_read(folio, rwc); + struct anon_vma *anon_vma; + struct vm_area_struct *vma; + + if (folio_test_anon_vma_lazy(folio)) { + vma =3D folio_lock_anon_vma_lazy_read(folio, rwc, true); + if (vma) + return vma_to_anon_rmap(vma); + } =20 + anon_vma =3D folio_lock_anon_vma_read(folio, rwc); return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; } =20 @@ -3140,6 +3248,12 @@ static anon_rmap_t rmap_walk_anon_lock(const struct = folio *folio, * are holding mmap_lock. Users without mmap_lock are required to * take a reference count to prevent the anon_vma disappearing */ + if (folio_test_anon_vma_lazy(folio)) { + struct vm_area_struct *vma; + + vma =3D folio_lock_anon_vma_lazy_read(folio, rwc, false); + return vma ? vma_to_anon_rmap(vma) : ANON_RMAP_NULL; + } anon_vma =3D folio_anon_vma(folio); if (!anon_vma) return ANON_RMAP_NULL; --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.hihonor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D4673F58DB; Wed, 27 May 2026 11:08:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880086; cv=none; b=mnQ/tex5UcK3dc+QkZxsZJvyrI0J4FUkr26TcledI+5W9v4sIsIoY5fd/YQng92hawua7yRJiOsNwqVVMfPIfFIWLyJlrcdwixJikcUPCAho9VqbCFXafc4Uu2pGSLeraxpt2KlXu0bRs4pELryZiz9zyQg2krc3hMxjWpxLdYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880086; c=relaxed/simple; bh=gVRySYRoucQ/mthILEIU0ZMqnprK+pSK9PPAHPCDgJ8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Tu9Uz5subVyxRXChm+rfTD5nE3dB6o7/VZ/H000jz0qtikx6R6syNYcL04GCKxXuggRHhxX1N1F/SLqwZ0QXurVQ/t28hNcmcj/zoKy4VvN/9zKdXLewQfch+nSu3vB/4dc+j/+OK1pILP8TiSFFuBHN/7b6BJqqi7OfMILfVfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=VYbIP/wV; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="VYbIP/wV" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=vxRnvipTE+LZ6U2mPwIuR9jTHTQx1xEgCwosOb/RHrs=; b=VYbIP/wV5EBteY3Rxja6Xhj0vKZkB1CsZQ8IRrMSRY53u3FJU+6bQcDpDahwaCSpRoZGHC1VM d+UvCvJ2uVyNm+Z2cnU4Q9/y4eYVlWYzcowSTealOF213nmpJF+S7iZUpqi3/8YDt0AF/24W0kG +NWs1yRRtlHFZLit0ffbPPo= Received: from TW005.hihonor.com (unknown [10.72.0.123]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdf3mPFzYl1Gk; Wed, 27 May 2026 19:06:46 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW005.hihonor.com (10.72.0.123) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:01 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:00 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 10/15] mm: defer anon_vma creation with ANON_VMA_LAZY Date: Wed, 27 May 2026 19:01:42 +0800 Message-ID: <20260527110147.17815-11-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Mark VMAs as ANON_VMA_LAZY and defer anon_vma creation until fork, avoiding early allocation when it may not be needed and reducing overhead. During fork(), ANON_VMA_LAZY VMAs are first upgraded to a regular anon_vma in the parent to establish the sharing topology. Child VMAs are created as ANON_VMA_TREE_PARENT and do not allocate anon_vma, avoiding additional fork overhead. Signed-off-by: tao --- mm/internal.h | 9 +++ mm/memory.c | 4 + mm/rmap.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++-- mm/vma.c | 9 ++- 4 files changed, 222 insertions(+), 9 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 6b703646f66d..0a36eba3f63c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -417,6 +417,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct v= m_area_struct *src, enum vma_operation operation); int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma); int __anon_vma_prepare(struct vm_area_struct *vma); +/* Called on first anon fault or from anon_vma_prepare(). */ +void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma); void unlink_anon_vmas(struct vm_area_struct *vma); =20 static inline int anon_vma_prepare(struct vm_area_struct *vma) @@ -424,6 +426,13 @@ static inline int anon_vma_prepare(struct vm_area_stru= ct *vma) if (likely(vma->anon_vma)) return 0; =20 +#ifdef CONFIG_ANON_VMA_LAZY + if (anon_vma_lazy_enabled()) { + vma_prepare_anon_vma_lazy(vma); + return 0; + } +#endif + return __anon_vma_prepare(vma); } =20 diff --git a/mm/memory.c b/mm/memory.c index c13b79987b26..8fd3877f69fb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3822,6 +3822,10 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) =20 if (likely(vma->anon_vma)) return 0; + if (anon_vma_lazy_enabled()) { + vma_prepare_anon_vma_lazy(vma); + return 0; + } if (vmf->flags & FAULT_FLAG_VMA_LOCK) { if (!mmap_read_trylock(vma->vm_mm)) return VM_FAULT_RETRY; diff --git a/mm/rmap.c b/mm/rmap.c index f70e3cb9812e..d9424f4eb6d0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -240,9 +240,118 @@ static void anon_vma_chain_assign(struct vm_area_stru= ct *vma, list_add(&avc->same_vma, &vma->anon_vma_chain); } =20 +#ifdef CONFIG_ANON_VMA_LAZY +/* Called on first anon fault or from anon_vma_prepare(). */ +void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma) +{ + struct mm_struct *mm =3D vma->vm_mm; + + spin_lock(&mm->page_table_lock); + if (!vma->anon_vma) { + vma_get(vma); + vma->anon_vma =3D (anon_vma_tree_t)( + (unsigned long)vma + ANON_VMA_TREE_VMA); + } + spin_unlock(&mm->page_table_lock); +} + +/* + * Link VMA to its root ANON_VMA_TREE_VMA. Root holds reference to prevent + * premature freeing while folios reference it via folio->mapping. + */ +static bool vma_link_anon_vma_lazy_root(struct vm_area_struct *vma, + struct vm_area_struct *src) +{ + struct mm_struct *mm =3D src->vm_mm; + struct vm_area_struct *root_vma; + bool ret =3D false; + + VM_BUG_ON_VMA(vma->vm_mm !=3D src->vm_mm, vma); + /* src may be upgraded concurrently */ + spin_lock(&mm->page_table_lock); + root_vma =3D anon_vma_tree_vma(src->anon_vma); + if (root_vma) { + vma_get(root_vma); + vma->anon_vma =3D src->anon_vma; + ret =3D true; + } else { + vma_set_anon_vma(vma, NULL); + } + spin_unlock(&mm->page_table_lock); + return ret; +} + +/* Link VMA to its ANON_VMA_TREE_PARENT .*/ +static void vma_link_anon_vma_lazy_parent(struct vm_area_struct *vma, + struct vm_area_struct *src) +{ + struct anon_vma *parent_anon_vma =3D vma_anon_vma(src); + + vma_assert_write_locked(src); + VM_BUG_ON_VMA(vma->anon_vma, vma); + VM_BUG_ON_VMA(!parent_anon_vma, src); + + get_anon_vma(parent_anon_vma); + vma->anon_vma =3D (anon_vma_tree_t)( + (unsigned long)parent_anon_vma + ANON_VMA_TREE_PARENT); +} + +/* Unlink VMA from anon_vma, dropping root/parent reference. */ +static bool vma_unlink_anon_vma_lazy(struct vm_area_struct *vma, + anon_vma_tree_t new_anon_vma_tree) +{ + struct mm_struct *mm =3D vma->vm_mm; + anon_vma_tree_t anon_tree_mutable =3D READ_ONCE(vma->anon_vma); + anon_vma_tree_t anon_tree; + bool is_lazy =3D true; + struct vm_area_struct *root_vma =3D NULL; + struct anon_vma *parent_anon_vma =3D NULL; + + VM_BUG_ON_VMA(anon_vma_tree_type(new_anon_vma_tree), vma); + + anon_vma_tree_lock_write(anon_tree_mutable); + spin_lock(&mm->page_table_lock); + anon_tree =3D vma->anon_vma; + if (anon_vma_tree_is_vma(anon_tree)) { + root_vma =3D anon_vma_tree_vma(anon_tree); + vma->anon_vma =3D new_anon_vma_tree; + } else if (anon_vma_tree_is_parent(anon_tree)) { + parent_anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + vma->anon_vma =3D new_anon_vma_tree; + } else { + is_lazy =3D false; + } + spin_unlock(&mm->page_table_lock); + anon_vma_tree_unlock_write(anon_tree_mutable); + if (!is_lazy) + return false; + + /* drop reference after unlock */ + VM_BUG_ON_VMA(!parent_anon_vma && !root_vma, vma); + if (parent_anon_vma) { + /* There must be nodes; it cannot be the last reference. */ + VM_BUG_ON(RB_EMPTY_ROOT(&parent_anon_vma->rb_root.rb_root)); + put_anon_vma(parent_anon_vma); + } + if (root_vma) + vma_put(root_vma); + return is_lazy; +} +#else +static inline bool vma_link_anon_vma_lazy_root(struct vm_area_struct *vma, + struct vm_area_struct *src) { return false; } +static void vma_link_anon_vma_lazy_parent(struct vm_area_struct *vma, + struct vm_area_struct *src) {} +static inline bool vma_unlink_anon_vma_lazy(struct vm_area_struct *vma, + anon_vma_tree_t new_anon_vma_tree) { return false; } +#endif + /** - * __anon_vma_prepare - attach an anon_vma to a memory region + * vma_prepare_anon_vma - attach an anon_vma to a memory region * @vma: the memory region in question + * @upgrade_lazy: true when upgrading a lazy VMA to a regular anon_vma. + * @parent_anon_vma: non-NULL if the VMA is inherited from its parent, + * otherwise NULL. * * This makes sure the memory mapping described by 'vma' has * an 'anon_vma' attached to it, so that we can associate the @@ -266,12 +375,14 @@ static void anon_vma_chain_assign(struct vm_area_stru= ct *vma, * to do any locking for the common case of already having * an anon_vma. */ -int __anon_vma_prepare(struct vm_area_struct *vma) +static int vma_prepare_anon_vma(struct vm_area_struct *vma, bool upgrade_l= azy, + struct anon_vma *parent_anon_vma) { struct mm_struct *mm =3D vma->vm_mm; struct anon_vma *anon_vma, *allocated; anon_vma_tree_t anon_tree; struct anon_vma_chain *avc; + bool is_lazy =3D false; =20 mmap_assert_locked(mm); might_sleep(); @@ -282,19 +393,30 @@ int __anon_vma_prepare(struct vm_area_struct *vma) =20 anon_vma =3D find_mergeable_anon_vma(vma); allocated =3D NULL; - if (!anon_vma) { + /* If parent_anon_vma exists, mergeable anon_vma root must match it. */ + if (!anon_vma || + (parent_anon_vma && anon_vma->root !=3D parent_anon_vma->root)) { anon_vma =3D anon_vma_alloc(); if (unlikely(!anon_vma)) goto out_enomem_free_avc; - anon_vma->num_children++; /* self-parent link for new root */ allocated =3D anon_vma; + if (parent_anon_vma) { + anon_vma->root =3D parent_anon_vma->root; + anon_vma->parent =3D parent_anon_vma; + } } =20 anon_tree =3D make_anon_vma_tree(anon_vma); + if (upgrade_lazy) + is_lazy =3D vma_unlink_anon_vma_lazy(vma, anon_tree); anon_vma_tree_lock_write(anon_tree); /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); - if (likely(!vma->anon_vma)) { + if (likely(!vma->anon_vma || is_lazy)) { + if (anon_vma->root !=3D anon_vma) + get_anon_vma(anon_vma->root); + if (allocated) + anon_vma->parent->num_children++; vma->anon_vma =3D anon_tree; anon_vma_chain_assign(vma, avc, anon_vma); anon_vma_interval_tree_insert(avc, &anon_vma->rb_root); @@ -318,6 +440,28 @@ int __anon_vma_prepare(struct vm_area_struct *vma) return -ENOMEM; } =20 +/** + * __anon_vma_prepare - attach an anon_vma to a memory region + * @vma: the memory region in question + * + * Wrapper around vma_prepare_anon_vma() for the non-lazy case. + * Called when ANON_VMA_LAZY is disabled. + */ +int __anon_vma_prepare(struct vm_area_struct *vma) +{ + return vma_prepare_anon_vma(vma, false, NULL); +} + +static int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma) +{ + anon_vma_tree_t vma_tree =3D vma->anon_vma; + struct anon_vma *parent_anon_vma =3D NULL; + + if (anon_vma_tree_is_parent(vma_tree)) + parent_anon_vma =3D anon_vma_tree_anon_vma(vma_tree); + return vma_prepare_anon_vma(vma, true, parent_anon_vma); +} + static void check_anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src, enum vma_operation operation) @@ -414,6 +558,20 @@ int anon_vma_clone(struct vm_area_struct *dst, struct = vm_area_struct *src, if (!active_anon_tree) return 0; =20 + /* Check ANON_VMA_LAZY first. */ + if (anon_vma_tree_is_vma(active_anon_tree)) { + if (vma_link_anon_vma_lazy_root(dst, src)) + return 0; + } else if (anon_vma_tree_is_parent(active_anon_tree)) { + /* split from tree_parent is rare; promote to regular. */ + int err =3D vma_upgrade_anon_vma_lazy(src); + + if (err) + return err; + VM_BUG_ON_VMA(vma_is_anon_vma_lazy(src), src); + dst->anon_vma =3D src->anon_vma; + } + /* * Allocate AVCs. We don't need an anon_vma lock for this as we * are not updating the anon_vma rbtree nor are we changing @@ -445,7 +603,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct v= m_area_struct *src, maybe_reuse_anon_vma(dst, anon_vma); } =20 - if (operation !=3D VMA_OP_FORK) + if (operation !=3D VMA_OP_FORK && vma_anon_vma(dst)) vma_anon_vma(dst)->num_active_vmas++; =20 anon_vma_tree_unlock_write(active_anon_tree); @@ -456,9 +614,38 @@ int anon_vma_clone(struct vm_area_struct *dst, struct = vm_area_struct *src, return -ENOMEM; } =20 +static int vma_fork_anon_vma_lazy(struct vm_area_struct *vma, + struct vm_area_struct *pvma) +{ + int error; + + if (vma_is_anon_vma_lazy(pvma)) { + error =3D vma_upgrade_anon_vma_lazy(pvma); + if (error) + return error; + VM_BUG_ON_VMA(vma_is_anon_vma_lazy(pvma), pvma); + } + + vma_set_anon_vma(vma, NULL); + error =3D anon_vma_clone(vma, pvma, VMA_OP_FORK); + if (error) + return error; + + if (vma->anon_vma) + return 0; + /* Lazily allocate the child anon_vma. */ + vma_link_anon_vma_lazy_parent(vma, pvma); + return 0; +} + /* * Attach vma to its own anon_vma, as well as to the anon_vmas that * the corresponding VMA in the parent process is attached to. + * + * For ANON_VMA_LAZY: if the parent VMA is lazy, upgrade it to a regular + * anon_vma before cloning. The child VMA may also be marked lazy when + * ANON_VMA_LAZY is enabled, deferring anon_vma allocation. + * * Returns 0 on success, non-zero on failure. */ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) @@ -472,6 +659,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm= _area_struct *pvma) if (!pvma->anon_vma) return 0; =20 + if (anon_vma_lazy_enabled()) + return vma_fork_anon_vma_lazy(vma, pvma); + /* Drop inherited anon_vma, we'll reuse existing or allocate new. */ vma_set_anon_vma(vma, NULL); =20 @@ -577,6 +767,10 @@ void unlink_anon_vmas(struct vm_area_struct *vma) return; } =20 + /* Unlink ANON_VMA_LAZY first, then ancestor anon_vma. */ + if (vma_is_anon_vma_lazy(vma)) + vma_unlink_anon_vma_lazy(vma, (anon_vma_tree_t)NULL); + anon_vma_tree_lock_write(active_anon_tree); =20 /* @@ -601,7 +795,8 @@ void unlink_anon_vmas(struct vm_area_struct *vma) anon_vma_chain_free(avc); } =20 - vma_anon_vma(vma)->num_active_vmas--; + if (vma_anon_vma(vma)) + vma_anon_vma(vma)->num_active_vmas--; /* * vma would still be needed after unlink, and anon_vma will be prepared * when handle fault. diff --git a/mm/vma.c b/mm/vma.c index ed15968a5891..0a31ef82a90c 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -1995,6 +1995,8 @@ static int anon_vma_compatible(struct vm_area_struct = *a, struct vm_area_struct * * acceptable for merging, so we can do all of this optimistically. But * we do that READ_ONCE() to make sure that we never re-load the pointer. * + * For upgrading ANON_VMA_LAZY VMAs, follow the same reuse rules as splitt= ing. + * * IOW: that the "list_is_singular()" test on the anon_vma_chain only * matters for the 'stable anon_vma' case (ie the thing we want to avoid * is to return an anon_vma that is "complex" due to having gone through @@ -2005,12 +2007,15 @@ static int anon_vma_compatible(struct vm_area_struc= t *a, struct vm_area_struct * * a read lock on the mmap_lock. */ static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, + struct vm_area_struct *vma, struct vm_area_struct *a, struct vm_area_struct *b) { if (anon_vma_compatible(a, b)) { struct anon_vma *anon_vma =3D vma_anon_vma(old); =20 + if (anon_vma && vma_is_anon_vma_lazy(vma)) + return anon_vma; if (anon_vma && list_is_singular(&old->anon_vma_chain)) return anon_vma; } @@ -2034,7 +2039,7 @@ struct anon_vma *find_mergeable_anon_vma(struct vm_ar= ea_struct *vma) /* Try next first. */ next =3D vma_iter_load(&vmi); if (next) { - anon_vma =3D reusable_anon_vma(next, vma, next); + anon_vma =3D reusable_anon_vma(next, vma, vma, next); if (anon_vma) return anon_vma; } @@ -2044,7 +2049,7 @@ struct anon_vma *find_mergeable_anon_vma(struct vm_ar= ea_struct *vma) prev =3D vma_prev(&vmi); /* Try prev next. */ if (prev) - anon_vma =3D reusable_anon_vma(prev, prev, vma); + anon_vma =3D reusable_anon_vma(prev, vma, prev, vma); =20 /* * We might reach here with anon_vma =3D=3D NULL if we can't find --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.hihonor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F4C23F660F; Wed, 27 May 2026 11:08:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880087; cv=none; b=p2S/aWp+GuyvzelQC4DrcQ47k7fLTwianu1ieHjWr0Hntj35T5KN6223wAJRDlcIhwEd3zsPZOaTt0dm0UiqqtTibGMt0NAMUe+5i3BfXR9b2zGPPcwzswuL4NGsNePC9EhgEpxq3KR3RAhCvDlG82UeC68EsiCmWoxvjn1fOJo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880087; c=relaxed/simple; bh=3EcCp4Gw030E4P+Z/BQF5V4KmCi7yOeuuHKsvzHSxOU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=J94vQL+b9H7ohTnQQs5jHjKspEEeGr6iM+XAsAfRCBiYdd9KKWGUurV2nnfWpCAJfzhkaZWYJw4mSo5HJmM7nHxwQ7K5pnHKNT65cFLspD9SONsCKKwcak8A4OL9ezfPgAUYgdwQgSGsihYsv+195bYH1T8XRvZgnc5j/L5HZ0k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=j6PuxmT0; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="j6PuxmT0" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=B/D3Hq7FcaguQsGLfZNTY7RVE4zbG/cH9UzpMbWmWQk=; b=j6PuxmT0nB/SvuBB1nifbsqSH66ihs0KrqETIo5mxMj7mFxKHZ1P+hYlUtEehfjRZ1u1tCRq/ jHjTRLa2cHvCdDJ3NvQz5ky5jztpwpx4WwiI4DyXSArFCA6y5DVIHkv7EAavcWiCFcfQ8I7t9a/ MpcdBKwc4Aif1G4/JDZDpeQ= Received: from TW003.hihonor.com (unknown [10.77.199.161]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdg003JzYl1Gk; Wed, 27 May 2026 19:06:46 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW003.hihonor.com (10.77.199.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:07:58 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:01 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 11/15] mm: handle ANON_VMA_LAZY in huge page operations Date: Wed, 27 May 2026 19:01:43 +0800 Message-ID: <20260527110147.17815-12-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When splitting a huge page, the folio needs to be converted into multiple subpages. Holding only folio_lock(folio) cannot guarantee that the split operation completes atomically. Check and upgrade anon_vma during huge page allocation and collapse to ensure the anon_vma is properly protected. Signed-off-by: tao --- mm/internal.h | 5 +++++ mm/khugepaged.c | 5 +++++ mm/memory.c | 17 +++++++++++++---- mm/rmap.c | 15 +++++++++++---- 4 files changed, 34 insertions(+), 8 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 0a36eba3f63c..a746f5272aa6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -419,6 +419,11 @@ int anon_vma_fork(struct vm_area_struct *vma, struct v= m_area_struct *pvma); int __anon_vma_prepare(struct vm_area_struct *vma); /* Called on first anon fault or from anon_vma_prepare(). */ void vma_prepare_anon_vma_lazy(struct vm_area_struct *vma); +/* + * Upgrade VMA ANON_VMA_LAZY to a regular anon_vma during fork, or when + * cloning ANON_VMA_TREE_PARENT or a hugepage VMA. + */ +int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma); void unlink_anon_vmas(struct vm_area_struct *vma); =20 static inline int anon_vma_prepare(struct vm_area_struct *vma) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 747748eace91..a33cda026be7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1164,6 +1164,11 @@ static enum scan_result collapse_huge_page(struct mm= _struct *mm, unsigned long a if (result !=3D SCAN_SUCCEED) goto out_up_write; =20 + /* Upgrade anon_vma_lazy to protect the anon_vma. */ + if (vma_upgrade_anon_vma_lazy(vma)) { + result =3D SCAN_FAIL; + goto out_up_write; + } anon_vma_tree_lock_write(vma->anon_vma); =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, diff --git a/mm/memory.c b/mm/memory.c index 8fd3877f69fb..26d116b3393c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3819,19 +3819,28 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; vm_fault_t ret =3D 0; + bool maybe_huge =3D pmd_none(*vmf->pmd); =20 - if (likely(vma->anon_vma)) - return 0; - if (anon_vma_lazy_enabled()) { + if (likely(vma->anon_vma)) { + if (!vma_is_anon_vma_lazy(vma) || !maybe_huge) + return 0; + } +#ifdef CONFIG_ANON_VMA_LAZY + if (anon_vma_lazy_enabled() && !maybe_huge) { vma_prepare_anon_vma_lazy(vma); return 0; } +#endif if (vmf->flags & FAULT_FLAG_VMA_LOCK) { if (!mmap_read_trylock(vma->vm_mm)) return VM_FAULT_RETRY; } - if (__anon_vma_prepare(vma)) + if (!vma->anon_vma && __anon_vma_prepare(vma)) + ret =3D VM_FAULT_OOM; +#ifdef CONFIG_ANON_VMA_LAZY + if (vma->anon_vma && maybe_huge && vma_upgrade_anon_vma_lazy(vma)) ret =3D VM_FAULT_OOM; +#endif if (vmf->flags & FAULT_FLAG_VMA_LOCK) mmap_read_unlock(vma->vm_mm); return ret; diff --git a/mm/rmap.c b/mm/rmap.c index d9424f4eb6d0..57cd85efc50a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -452,13 +452,20 @@ int __anon_vma_prepare(struct vm_area_struct *vma) return vma_prepare_anon_vma(vma, false, NULL); } =20 -static int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma) +/** + * vma_upgrade_anon_vma_lazy - upgrade a VMA's lazy anon_vma to a regular = one + * @vma: the VMA whose anon_vma_lazy is being upgraded + */ +int vma_upgrade_anon_vma_lazy(struct vm_area_struct *vma) { - anon_vma_tree_t vma_tree =3D vma->anon_vma; + anon_vma_tree_t anon_tree =3D READ_ONCE(vma->anon_vma); struct anon_vma *parent_anon_vma =3D NULL; =20 - if (anon_vma_tree_is_parent(vma_tree)) - parent_anon_vma =3D anon_vma_tree_anon_vma(vma_tree); + VM_BUG_ON_VMA(!anon_tree, vma); + if (!anon_vma_tree_type(anon_tree)) + return 0; + if (anon_vma_tree_is_parent(anon_tree)) + parent_anon_vma =3D anon_vma_tree_anon_vma(anon_tree); return vma_prepare_anon_vma(vma, true, parent_anon_vma); } =20 --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.hihonor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F5553F7AAB; Wed, 27 May 2026 11:08:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880087; cv=none; b=KFwawlCGR7Oo80JmQOEXoehoou+9eqsotDSHYBHjALyOKj7WY65gvR7uAyYiNY3S4p88+XJFFyuLBqxtTJCKLHx3zSeOVc1r9doykjvuoykHmSD18ICamDH0ucB7Y0ZLKyNqSIr7+/v1Una+5/9G+0035L3DtMO3MwTiu1/+XPY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880087; c=relaxed/simple; bh=QglP7vIjBa7jqW9D4PFqID96zmswl0R4CUC8NZKbxeg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jiIeTz2Cah5+Ep/X7nJwwdkJfP4/Da0ukqGWLYTQ0kkCeZ7UFQlTQC3fPFy1LV35ZyJHly3G8v7HwH7kKJCLRjh/YPXvJ6SmTjPOxOMSpeqCml27NNSAE4TRnFrzJZVyL4kezU9wFItHloBukodHyYpf3/34OFrSjdRihrD2bOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=Ig32bQFm; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="Ig32bQFm" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=yAh+iwG70E8JH4DQIjXnQUcbO61+yXM/nHjJfWOvA0k=; b=Ig32bQFmc8INY3BCfFA6LQ4UzcwpW/JGe8sael1tqJqOBLk3oox+I6nqUUIGex1n1N8pQtIJk yVZGl4/RXGdJkFp/kao4vzp2Ip99gxCSnwMPPcUQ6r8gz8VW+NPUDa6Zk3T7/N6YcDsm3X97AlG aefu1iGJ0x1tJNI3J3TBcXY= Received: from TW001.hihonor.com (unknown [10.77.229.151]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdg4PV6zYl1N4; Wed, 27 May 2026 19:06:47 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW001.hihonor.com (10.77.229.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:02 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:01 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 12/15] mm: handle ANON_VMA_LAZY during migration Date: Wed, 27 May 2026 19:01:44 +0800 Message-ID: <20260527110147.17815-13-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To ensure the atomicity of folio migration, introduce folio_trylock_get_anon_rmap(). This helper guarantees that the migration operation is mutually exclusive with free_pgtables(). For ANON_VMA_LAZY, it uses vma_start_read() to prevent the VMA from being modified or removed during migration. Signed-off-by: tao --- include/linux/rmap.h | 12 ++++++++ mm/migrate.c | 71 +++++++++++++++++++++++++------------------- mm/rmap.c | 40 +++++++++++++++++++++++++ 3 files changed, 92 insertions(+), 31 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index ebe9f3f61170..59244481a8c1 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -1042,6 +1042,18 @@ bool folio_maybe_same_anon_vma(const struct folio *f= olio, anon_rmap_t folio_get_anon_rmap(const struct folio *folio); anon_rmap_t folio_lock_anon_rmap_read(const struct folio *folio, struct rmap_walk_control *rwc); +/* + * folio_trylock_get_anon_rmap ensures that the migration operation + * completes atomically and is mutually exclusive with free_pgtables(). + * + * Note: for ANON_VMA_LAZY, this is not equivalent to + * anon_rmap_trylock_read() + folio_get_anon_rmap(), because + * anon_rmap_trylock_read() only increments the VMA reference count, + * while this helper uses vma_start_read() to prevent the VMA from + * being modified or removed. + */ +anon_rmap_t folio_trylock_get_anon_rmap(const struct folio *folio); +void anon_rmap_unlock_put(anon_rmap_t anon_rmap); =20 static inline struct vm_area_struct *anon_rmap_iter_first_vma( anon_rmap_t anon_rmap, unsigned long start, unsigned long last, diff --git a/mm/migrate.c b/mm/migrate.c index b397cdeab09a..4abbfd1faea2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1173,10 +1173,11 @@ static void migrate_folio_undo_src(struct folio *sr= c, struct list_head *ret) { if (page_was_mapped) - remove_migration_ptes(src, src, 0); + remove_migration_ptes(src, src, + anon_rmap_value(anon_rmap) ? TTU_RMAP_LOCKED : 0); /* Drop an anon_rmap reference if we took one */ if (anon_rmap_value(anon_rmap)) - put_anon_rmap(anon_rmap); + anon_rmap_unlock_put(anon_rmap); if (locked) folio_unlock(src); if (ret) @@ -1279,6 +1280,18 @@ static int migrate_folio_unmap(new_folio_t get_new_f= olio, folio_wait_writeback(src); } =20 + /* + * Block others from accessing the new page when we get around to + * establishing additional references. We are usually the only one + * holding a reference to dst at this point. We used to have a BUG + * here if folio_trylock(dst) fails, but would like to allow for + * cases where there might be a race with the previous use of dst. + * This is much like races on refcount of oldpage: just don't BUG(). + */ + if (unlikely(!folio_trylock(dst))) + goto out; + dst_locked =3D true; + /* * By try_to_migrate(), src->mapcount goes down to 0 here. In this case, * we cannot notice that anon_vma is freed while we migrate a page. @@ -1287,26 +1300,17 @@ static int migrate_folio_unmap(new_folio_t get_new_= folio, * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. * - * Only folio_get_anon_rmap() understands the subtleties of - * getting a hold on an anon_rmap from outside one of its mms. + * Only folio_trylock_get_anon_rmap() understands the subtleties of + * getting and locking an anon_rmap from outside one of its mms. * But if we cannot get anon_rmap, then we won't need it anyway, * because that implies that the anon page is no longer mapped * (and cannot be remapped so long as we hold the page lock). */ - if (folio_test_anon(src) && !folio_test_ksm(src)) - anon_rmap =3D folio_get_anon_rmap(src); - - /* - * Block others from accessing the new page when we get around to - * establishing additional references. We are usually the only one - * holding a reference to dst at this point. We used to have a BUG - * here if folio_trylock(dst) fails, but would like to allow for - * cases where there might be a race with the previous use of dst. - * This is much like races on refcount of oldpage: just don't BUG(). - */ - if (unlikely(!folio_trylock(dst))) - goto out; - dst_locked =3D true; + if (folio_test_anon(src) && !folio_test_ksm(src)) { + anon_rmap =3D folio_trylock_get_anon_rmap(src); + if (!anon_rmap_value(anon_rmap)) + goto out; + } =20 if (unlikely(page_has_movable_ops(&src->page))) { __migrate_folio_record(dst, old_page_state, anon_rmap); @@ -1331,10 +1335,14 @@ static int migrate_folio_unmap(new_folio_t get_new_= folio, goto out; } } else if (folio_mapped(src)) { + enum ttu_flags ttu =3D mode =3D=3D MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0; + + if (anon_rmap_value(anon_rmap)) + ttu |=3D TTU_RMAP_LOCKED; /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_rmap_value(anon_rmap), src); - try_to_migrate(src, mode =3D=3D MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); + try_to_migrate(src, ttu); old_page_state |=3D PAGE_WAS_MAPPED; } =20 @@ -1415,7 +1423,8 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, lru_add_drain(); =20 if (old_page_state & PAGE_WAS_MAPPED) - remove_migration_ptes(src, dst, 0); + remove_migration_ptes(src, dst, + anon_rmap_value(anon_rmap) ? TTU_RMAP_LOCKED : 0); =20 out_unlock_both: folio_unlock(dst); @@ -1434,7 +1443,7 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, list_del(&src->lru); /* Drop an anon_rmap reference if we took one */ if (anon_rmap_value(anon_rmap)) - put_anon_rmap(anon_rmap); + anon_rmap_unlock_put(anon_rmap); folio_unlock(src); migrate_folio_done(src, reason); =20 @@ -1485,7 +1494,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, int page_was_mapped =3D 0; anon_rmap_t anon_rmap =3D ANON_RMAP_NULL; struct address_space *mapping =3D NULL; - enum ttu_flags ttu =3D 0; + enum ttu_flags ttu =3D TTU_RMAP_LOCKED; =20 if (folio_ref_count(src) =3D=3D 1) { /* page was freed from under us. So we are done. */ @@ -1519,11 +1528,14 @@ static int unmap_and_move_huge_page(new_folio_t get= _new_folio, goto out_unlock; } =20 - if (folio_test_anon(src)) - anon_rmap =3D folio_get_anon_rmap(src); - if (unlikely(!folio_trylock(dst))) - goto put_anon; + goto out_unlock; + + if (folio_test_anon(src)) { + anon_rmap =3D folio_trylock_get_anon_rmap(src); + if (!anon_rmap_value(anon_rmap)) + goto unlock_put_anon; + } =20 if (folio_mapped(src)) { if (!folio_test_anon(src)) { @@ -1536,8 +1548,6 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, mapping =3D hugetlb_folio_mapping_lock_write(src); if (unlikely(!mapping)) goto unlock_put_anon; - - ttu =3D TTU_RMAP_LOCKED; } =20 try_to_migrate(src, ttu); @@ -1550,15 +1560,14 @@ static int unmap_and_move_huge_page(new_folio_t get= _new_folio, if (page_was_mapped) remove_migration_ptes(src, !rc ? dst : src, ttu); =20 - if (ttu & TTU_RMAP_LOCKED) + if (page_was_mapped && !folio_test_anon(src)) i_mmap_unlock_write(mapping); =20 unlock_put_anon: folio_unlock(dst); =20 -put_anon: if (anon_rmap_value(anon_rmap)) - put_anon_rmap(anon_rmap); + anon_rmap_unlock_put(anon_rmap); =20 if (!rc) { move_hugetlb_state(src, dst, reason); diff --git a/mm/rmap.c b/mm/rmap.c index 57cd85efc50a..46876b3dbfbc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1223,6 +1223,46 @@ anon_rmap_t folio_lock_anon_rmap_read(const struct f= olio *folio, return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; } =20 +anon_rmap_t folio_trylock_get_anon_rmap(const struct folio *folio) +{ + struct anon_vma *anon_vma; + struct vm_area_struct *vma; + + if (folio_test_anon_vma_lazy(folio)) { + vma =3D folio_get_anon_vma_lazy(folio); + if (vma && !lock_vma_under_rcu(vma->vm_mm, vma->vm_start)) { + vma_put(vma); + vma =3D NULL; + } + if (vma) + return vma_to_anon_rmap(vma); + } + + anon_vma =3D folio_get_anon_vma(folio); + if (anon_vma && !anon_vma_trylock_read(anon_vma)) { + put_anon_vma(anon_vma); + anon_vma =3D NULL; + } + return anon_vma ? anon_vma_to_anon_rmap(anon_vma) : ANON_RMAP_NULL; +} + +void anon_rmap_unlock_put(anon_rmap_t anon_rmap) +{ + struct anon_vma *anon_vma; + + if (!anon_rmap_is_anon_vma(anon_rmap)) { + struct vm_area_struct *vma =3D anon_rmap_to_vma(anon_rmap); + + vma_end_read(vma); + vma_put(vma); + return; + } + + anon_vma =3D anon_rmap_to_anon_vma(anon_rmap); + anon_vma_unlock_read(anon_vma); + put_anon_vma(anon_vma); +} + #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta21.hihonor.com (mta21.honor.com [81.70.160.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB9553F44D4; Wed, 27 May 2026 11:26:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.160.142 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881214; cv=none; b=tQsL5oOipriT0Ot5uX9O6FjH3MmKuUh0rFV1M2ddMICsJxDhQsqoU3JcnODFsHZdyE5lOBu+x/i1oxgarCACv7HBakA1zwNRpeUgodSyf4ljjkgQ6c0fFx0JsqQr7cMmQObpDzxOqPpl33qR3960mo9fDZ0IsJD3nISAszknPVc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779881214; c=relaxed/simple; bh=qIYNFJAPjU9ghkiPO1zZBGrcxFbZhjCPI2Kl4xfIX8M=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ET616xzoANcWzD+LKYlg7XsiV4heyG3E81W93y3MIQMOoviGIaJDkLTEp+EWE1587pEU2iimbYG3pTuSzp5D0UkKa5hZMjZ37jaO/l+9Z8v3ZRBmCTkGJxegevFPipE3MpFGHFPkV6fq+dISE6eoxj9yZqiDCNJwpTroE7maALs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; arc=none smtp.client-ip=81.70.160.142 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Received: from TW006-1.hihonor.com (unknown [10.77.215.153]) by mta21.hihonor.com (SkyGuard) with ESMTPS id 4gQRdH25mSzYlBHf; Wed, 27 May 2026 19:06:27 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW006-1.hihonor.com (10.77.215.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:03 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:02 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 13/15] mm: support setup and upgrade of ANON_VMA_LAZY folios Date: Wed, 27 May 2026 19:01:45 +0800 Message-ID: <20260527110147.17815-14-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" new_anon_rmap() and move_anon_rmap() decide whether to set PAGE_MAPPING_ANON_VMA_LAZY. try_dup_anon_rmap() upgrades the folio to PAGE_MAPPING_ANON during fork() when required. rmap_walk_anon() detects ANON_VMA_LAZY upgrades and retries the walk to ensure the mapping is handled correctly. remove_rmap() needs no special handling since folio_mapped() is checked before use. Signed-off-by: tao --- include/linux/rmap.h | 38 ++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 21 ++++++++++++++++++++- 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 59244481a8c1..9b1970698204 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -392,6 +392,14 @@ static __always_inline void __folio_rmap_sanity_checks= (const struct folio *folio unsigned long mapping =3D (unsigned long)folio->mapping; struct anon_vma *anon_vma; =20 + if (folio_test_anon_vma_lazy(folio)) { + struct vm_area_struct *root_vma =3D + (void *)(mapping - FOLIO_MAPPING_ANON_VMA_LAZY); + + VM_WARN_ON_FOLIO(!rcuref_read(&root_vma->vm_rcuref), folio); + return; + } + anon_vma =3D (void *)(mapping - FOLIO_MAPPING_ANON); VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) =3D=3D 0, folio); } @@ -431,6 +439,31 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_a= rea_struct *, void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); =20 +/** + * folio_upgrade_anon_vma_lazy - upgrade folio->mapping from ANON_VMA_LAZY= to + * an anon_vma + * @folio: The folio to upgrade + * @vma: The VMA the folio currently belongs to + * + * Upgrade folio->mapping from ANON_VMA_LAZY to an anon_vma. + * This transition is strictly one-way and never reverts back to a lazy + * mapping. + * + * Called during fork() while holding the mmap lock and the VMA write lock, + * but without taking the folio lock. Concurrent readers may briefly obser= ve + * the old lazy mapping. Migration relies on folio_trylock_get_anon_rmap() + * to ensure atomicity, while other rmap operations remain unaffected. + */ +static inline void folio_upgrade_anon_vma_lazy(struct folio *folio, + struct vm_area_struct *vma) +{ + unsigned long anon_tree =3D (unsigned long)vma->anon_vma; + + VM_BUG_ON_VMA(!anon_tree || !IS_ALIGNED(anon_tree, sizeof(long)), vma); + anon_tree =3D anon_tree + FOLIO_MAPPING_ANON; + WRITE_ONCE(folio->mapping, (struct address_space *)anon_tree); +} + /* See folio_try_dup_anon_rmap_*() */ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, struct vm_area_struct *vma) @@ -438,6 +471,9 @@ static inline int hugetlb_try_dup_anon_rmap(struct foli= o *folio, VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); =20 + if (folio_test_anon_vma_lazy(folio)) + folio_upgrade_anon_vma_lazy(folio, vma); + if (PageAnonExclusive(&folio->page)) { if (unlikely(folio_needs_cow_for_dma(vma, folio))) return -EBUSY; @@ -573,6 +609,8 @@ static __always_inline int __folio_try_dup_anon_rmap(st= ruct folio *folio, int i; =20 VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + if (folio_test_anon_vma_lazy(folio)) + folio_upgrade_anon_vma_lazy(folio, src_vma); __folio_rmap_sanity_checks(folio, page, nr_pages, level); =20 /* diff --git a/mm/rmap.c b/mm/rmap.c index 46876b3dbfbc..e14509b47412 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2002,6 +2002,16 @@ void folio_move_anon_rmap(struct folio *folio, struc= t vm_area_struct *vma) void *anon_vma =3D vma_anon_vma(vma); =20 VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + if (!anon_vma) { + const struct vm_area_struct *root_vma =3D vma_anon_vma_lazy_root(vma); + + VM_BUG_ON_VMA(!root_vma, vma); + root_vma =3D (void *)root_vma + FOLIO_MAPPING_ANON_VMA_LAZY; + WRITE_ONCE(folio->mapping, (struct address_space *)root_vma); + return; + } + VM_BUG_ON_VMA(!anon_vma, vma); =20 anon_vma +=3D FOLIO_MAPPING_ANON; @@ -2023,7 +2033,16 @@ void folio_move_anon_rmap(struct folio *folio, struc= t vm_area_struct *vma) static void __folio_set_anon(struct folio *folio, struct vm_area_struct *v= ma, unsigned long address, bool exclusive) { - struct anon_vma *anon_vma =3D vma_anon_vma(vma); + anon_vma_tree_t anon_tree =3D vma->anon_vma; + const struct vm_area_struct *root_vma =3D vma_anon_vma_lazy_root(vma); + struct anon_vma *anon_vma =3D anon_vma_tree_anon_vma(anon_tree); + + if (root_vma && (anon_vma_tree_is_vma(anon_tree) || exclusive)) { + root_vma =3D (void *)root_vma + FOLIO_MAPPING_ANON_VMA_LAZY; + WRITE_ONCE(folio->mapping, (struct address_space *)root_vma); + folio->index =3D linear_page_index(vma, address); + return; + } =20 BUG_ON(!anon_vma); =20 --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.hihonor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D3BA3F8EC5; Wed, 27 May 2026 11:08:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880088; cv=none; b=u7449aqcw8VxUOLezVrN3+UxFnb1rF1OErSbZZ3nmfELBUgIF2u8UGp1XbNYbP+HM6AkY0iu+urhjCHEvxwBilZAUtIBFZBN9c9H88rK4vzx+Qh5R/EOO+EuVnELkFKEt9PEUj8CyHhykn4aeL0gwIatGlVI+3kWmClF3cQbxjE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880088; c=relaxed/simple; bh=1hgIUguGCCpbntdhLkJvhmWf5z8Ow+vvXzLBm83l/RA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XkdH/AFIHkc/v8mVVK7+vyALBXppKBdktjbGiNbZNvqA8xjpHtadARx88XwhkDcMgDNpvA8fxQXdgy6nQd0kXps8o8C0OXsKyowlB5+QA/dWUOphDSMVr2JkiqSpeQIyXc9hf+zTDtyx9fFbEZK8NflI+T1ExKEbf//WgA9vvdk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=CPeVKOpq; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="CPeVKOpq" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=fr1wwam3yXWGP8K5xNNCiXv/Wq2ATv+XGc7BKR2rebs=; b=CPeVKOpqx4CAIv+0r/KaD2lvQPXbCeVU1t1dn/zmUNc6wbj67qGdXOhRJx9dzIoUjTdL1wyiB NvTLHdG/CXjDMAJl6IngyK6ubbSoJJbGk6nHNSIH1ezZfUHuQRRvCuXjbuZ6kFvQJNLjFis6HH9 DDHx9on4j4VZNX+dN2rduGM= Received: from TW004-1.hihonor.com (unknown [10.77.232.85]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdh5qnGzYl1Gk; Wed, 27 May 2026 19:06:48 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW004-1.hihonor.com (10.77.232.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:04 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:03 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 14/15] mm: support merging of ANON_VMA_LAZY VMAs Date: Wed, 27 May 2026 19:01:46 +0800 Message-ID: <20260527110147.17815-15-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow ANON_VMA_LAZY VMAs to merge if they share the same root or if one side has no root. For ANON_VMA_LAZY merges, do not delete the lazy root VMA. The lazy root VMA may still be referenced by folio->mapping. Signed-off-by: tao --- mm/vma.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/mm/vma.c b/mm/vma.c index 0a31ef82a90c..ae1047dcfbc2 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -76,9 +76,10 @@ static bool vma_is_fork_child(struct vm_area_struct *vma) /* * The list_is_singular() test is to avoid merging VMA cloned from * parents. This can improve scalability caused by the anon_vma root - * lock. + * lock. ANON_VMA_TREE_VMA has no anon_vma_chain. */ - return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain); + return vma && vma->anon_vma && !anon_vma_tree_is_vma(vma->anon_vma) && + !list_is_singular(&vma->anon_vma_chain); } =20 static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool mer= ge_next) @@ -776,6 +777,17 @@ static bool can_merge_remove_vma(struct vm_area_struct= *vma) return !vma->vm_ops || !vma->vm_ops->close; } =20 +/* + * The ANON_VMA_LAZY root VMA may still be referenced by folio->mapping. + * Keeping the root avoids allocating an extra VMA. + */ +#define SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, delete_vma) do {= \ + if (anon_vma_lazy_enabled()) { \ + if (delete_vma && vma_is_anon_vma_lazy_root(delete_vma)) \ + swap(vmg->target, delete_vma); \ + } \ +} while (0) + /* * vma_merge_existing_range - Attempt to merge VMAs based on a VMA having = its * attributes modified. @@ -933,12 +945,15 @@ static __must_check struct vm_area_struct *vma_merge_= existing_range( vmg->end =3D next->vm_end; vmg->pgoff =3D prev->vm_pgoff; =20 + SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle); + SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, next); + /* * We already ensured anon_vma compatibility above, so now it's * simply a case of, if prev has no anon_vma object, which of * next or middle contains the anon_vma we must duplicate. */ - err =3D dup_anon_vma(prev, next->anon_vma ? next : middle, + err =3D dup_anon_vma(vmg->target, next->anon_vma ? next : middle, &anon_dup); } else if (merge_left) { /* @@ -954,8 +969,10 @@ static __must_check struct vm_area_struct *vma_merge_e= xisting_range( =20 if (!vmg->__remove_middle) vmg->__adjust_middle_start =3D true; + else + SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle); =20 - err =3D dup_anon_vma(prev, middle, &anon_dup); + err =3D dup_anon_vma(vmg->target, middle, &anon_dup); } else { /* merge_right */ /* * |<------------->| OR @@ -974,6 +991,7 @@ static __must_check struct vm_area_struct *vma_merge_ex= isting_range( if (vmg->__remove_middle) { vmg->end =3D next->vm_end; vmg->pgoff =3D next->vm_pgoff - pglen; + SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, middle); } else { /* We shrink middle and expand next. */ vmg->__adjust_next_start =3D true; @@ -982,7 +1000,7 @@ static __must_check struct vm_area_struct *vma_merge_e= xisting_range( vmg->pgoff =3D middle->vm_pgoff; } =20 - err =3D dup_anon_vma(next, middle, &anon_dup); + err =3D dup_anon_vma(vmg->target, middle, &anon_dup); } =20 if (err || commit_merge(vmg)) @@ -1212,6 +1230,7 @@ int vma_expand(struct vma_merge_struct *vmg) =20 vma_start_write(next); vmg->__remove_next =3D true; + SWAP_VMG_TARGET_IF_DELETE_ANON_VMA_LAZY_ROOT(vmg, next); =20 next_sticky =3D vma_flags_and_mask(&next->flags, VMA_STICKY_FLAGS); vma_flags_set_mask(&sticky_flags, next_sticky); --=20 2.17.1 From nobody Mon Jun 8 18:57:53 2026 Received: from mta22.hihonor.com (mta22.hihonor.com [81.70.192.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 331563F8ED8; Wed, 27 May 2026 11:08:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=81.70.192.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880091; cv=none; b=NUW4l+XRuPgr4Hv29rIr/JBc5BbuGhSefOokv9KjCGbHKtOcMkBplz1boVktlutSrCQRAsILCDhshfUsw3oBdFuXWru4/Z5rtmjxSsz4uEAL4h2tukGGPBBkTzjEevRd1Sc4buKoIguAl4CfnQYdyB2ldS+cw66FWHdIoXd7DXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779880091; c=relaxed/simple; bh=6F4C7jXw3kPjRHyDTra56ILH7RMIhMbIQ1DUccefIQ4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=m8rHecGZ+teVjl4TqZMwwLkr+dnFGoPoLL7LAW6xOcc/qt4Mhyx3GaznYagtYDNIrGHNF0tgJIZ/HSHX5YK5dcXhqORMhHS6pdBxFTEcfq9uRVeiJ6pcGji8pbKr3X9BGre5hPCpr52ghv6xSvD8XiidQebGVW3/dLhtQtn5pk0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com; spf=pass smtp.mailfrom=honor.com; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b=d/GWnyKh; arc=none smtp.client-ip=81.70.192.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=honor.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=honor.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=honor.com header.i=@honor.com header.b="d/GWnyKh" dkim-signature: v=1; a=rsa-sha256; d=honor.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=To:From; bh=q/70JID062V814TY2TOyw8EA5k2YWRj4Olo6OZJNrik=; b=d/GWnyKhP5OdN/FwoZeN58jllQS4j1OQ9Opo/3nlNKj09J/EjFx9D5jk0CvFygD3nYmcdbfbp rCI6J2aiagXzUJlZApf+pFo0AMnJgoLif70IBWoHv3Wl5EA1H/AHwuhSuappH6oOZxlRXMsTTnK kMH6ECTvISsLQzWBCPTWAFo= Received: from TW002-1.hihonor.com (unknown [10.72.0.137]) by mta22.hihonor.com (SkyGuard) with ESMTPS id 4gQRdj3h0vzYl1Gk; Wed, 27 May 2026 19:06:49 +0800 (CST) Received: from TA003.hihonor.com (10.72.0.43) by TW002-1.hihonor.com (10.72.0.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:04 +0800 Received: from localhost.localdomain (10.144.18.117) by TA003.hihonor.com (10.72.0.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Wed, 27 May 2026 19:08:03 +0800 From: tao To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , <21cnbao@gmail.com>, , , , Subject: [PATCH 15/15] mm: enable CONFIG_ANON_VMA_LAZY on arm64 and x86_64 Date: Wed, 27 May 2026 19:01:47 +0800 Message-ID: <20260527110147.17815-16-tao.wangtao@honor.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20260527110147.17815-1-tao.wangtao@honor.com> References: <20260527110147.17815-1-tao.wangtao@honor.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: TW006-1.hihonor.com (10.77.215.153) To TA003.hihonor.com (10.72.0.43) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All prerequisites are in place, so enable CONFIG_ANON_VMA_LAZY for arm64 and x86_64. Signed-off-by: tao --- arch/arm64/Kconfig | 1 + arch/x86/Kconfig | 1 + mm/rmap.c | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fe60738e5943..9517883f0aaf 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -81,6 +81,7 @@ config ARM64 select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK + select ARCH_SUPPORTS_ANON_VMA_LAZY select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_SCHED_SMT diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f3f7cb01d69d..cc3430eaa7b4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -28,6 +28,7 @@ config X86_64 select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_SUPPORTS_PER_VMA_LOCK + select ARCH_SUPPORTS_ANON_VMA_LAZY select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA diff --git a/mm/rmap.c b/mm/rmap.c index e14509b47412..77e2ab95671a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -168,7 +168,7 @@ static struct kmem_cache *anon_vma_chain_cachep; * covering both regular anon_vma and lazy anon_vma mappings. */ =20 -bool anon_vma_lazy_enable; +bool anon_vma_lazy_enable =3D true; #endif =20 static inline struct anon_vma *anon_vma_alloc(void) --=20 2.17.1