[PATCH] maple_tree: Add dead node check in mas_dup_alloc()

Boudewijn van der Heide posted 1 patch 1 month ago
lib/maple_tree.c | 5 +++++
1 file changed, 5 insertions(+)
[PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Boudewijn van der Heide 1 month ago
The __mt_dup() function is exported and can be called without internal
locking, relying on the caller to provide appropriate synchronization.
If a caller fails to hold proper locks, the source tree may be modified
concurrently, potentially resulting in dead nodes during traversal.

The call stack is:
  __mt_dup()
    → mas_dup_build()
      → mas_dup_alloc()  [accesses node->slot[]]

The mas_dup_alloc() function may access node slots without first
verifying that the node is still alive. If a dead node is encountered,
its memory layout may have been switched to the RCU union member, making
slot array access undefined behavior as we would be reading from the
rcu_head structure instead.

Add an explicit dead node check to detect concurrent modification during
duplication. When a dead node is detected, return -EBUSY to indicate that
the tree is undergoing concurrent modification.

Signed-off-by: Boudewijn van der Heide <boudewijn@delta-utec.com>

---

Build-tested and boot-tested with QEMU with Buildroot on x86_64. The
kernel booted and basic commandline operations work correctly. The race
condition this patch addresses is difficult to reproduce in testing, as
it requires concurrent tree modifications without proper locking.
---
 lib/maple_tree.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 5aa4c9500018..f623a7aabd53 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -6251,6 +6251,11 @@ static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas,
 	/* Allocate memory for child nodes. */
 	type = mte_node_type(mas->node);
 	new_slots = ma_slots(new_node, type);
+	if (unlikely(ma_dead_node(node))) {
+		mas_set_err(mas, -EBUSY);
+		return;
+	}
+
 	count = mas->node_request = mas_data_end(mas) + 1;
 	mas_alloc_nodes(mas, gfp);
 	if (unlikely(mas_is_err(mas)))
-- 
2.47.3

Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Andrew Morton 1 month ago
On Sat,  3 Jan 2026 17:57:58 +0100 Boudewijn van der Heide <boudewijn@delta-utec.com> wrote:

> The __mt_dup() function is exported and can be called without internal
> locking, relying on the caller to provide appropriate synchronization.
> If a caller fails to hold proper locks, the source tree may be modified
> concurrently, potentially resulting in dead nodes during traversal.
> 
> The call stack is:
>   __mt_dup()
>     → mas_dup_build()
>       → mas_dup_alloc()  [accesses node->slot[]]
> 
> The mas_dup_alloc() function may access node slots without first
> verifying that the node is still alive. If a dead node is encountered,
> its memory layout may have been switched to the RCU union member, making
> slot array access undefined behavior as we would be reading from the
> rcu_head structure instead.
> 
> Add an explicit dead node check to detect concurrent modification during
> duplication. When a dead node is detected, return -EBUSY to indicate that
> the tree is undergoing concurrent modification.
> 
> Signed-off-by: Boudewijn van der Heide <boudewijn@delta-utec.com>
> 
> ---
> 
> Build-tested and boot-tested with QEMU with Buildroot on x86_64. The
> kernel booted and basic commandline operations work correctly. The race
> condition this patch addresses is difficult to reproduce in testing, as
> it requires concurrent tree modifications without proper locking.

Thanks.

What are the worst-case userspace-visible runtime effects when this
happens?

If they're bad then presumably we'll want to backport this fix into
earlier kernels with a Cc: <stable@vger.kernel.org> and, very
preferably a Fixes: line.
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Boudewijn van der Heide 1 month ago
On Sat, 3 Jan 2026, Andrew Morton <akpm@linux-foundation.org> wrote:
> What are the worst-case userspace-visible runtime effects when this
> happens?

Worst case: if __mt_dup() is invoked without the required external
locking and the source tree is concurrently modified, a node can
transition to the “dead” RCU layout while mas_dup_alloc() is still
traversing it. In that case the code may interpret the rcu_head contents
as slot pointers.

Practically, this could lead to invalid pointer dereferences (kernel
oops) or corruption of the duplicated tree. Depending on how that
duplicated tree is later used (e.g. in mm/VMA paths), the effects could
be userspace-visible, such as fork() failures, process crashes, or
broader system instability.

My understanding is that current in-tree users hold the appropriate
locks and should not hit this, as triggering it requires violating the
__mt_dup() synchronization contract. The risk primarily comes from the
fact that __mt_dup() is exported (EXPORT_SYMBOL), making it reachable by
out-of-tree modules or future callers which may not follow the locking
rules.

> If they're bad then presumably we'll want to backport this fix into
> earlier kernels with a Cc: <stable@vger.kernel.org> and, very
> preferably a Fixes: line.

The function was introduced without the check here:

  Fixes: fd32e4e9b764 ("maple_tree: introduce interfaces __mt_dup() and mtree_dup()")

If you think this warrants stable backporting as a safety fix, 
I’m happy to send a v2 with the Fixes: tag and Cc: stable@vger.kernel.org added.

Thanks,
Boudewijn
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Andrew Morton 1 month ago
On Sat,  3 Jan 2026 21:06:31 +0100 Boudewijn van der Heide <boudewijn@delta-utec.com> wrote:

> On Sat, 3 Jan 2026, Andrew Morton <akpm@linux-foundation.org> wrote:
> > What are the worst-case userspace-visible runtime effects when this
> > happens?
> 
> Worst case: if __mt_dup() is invoked without the required external
> locking and the source tree is concurrently modified, a node can
> transition to the “dead” RCU layout while mas_dup_alloc() is still
> traversing it. In that case the code may interpret the rcu_head contents
> as slot pointers.
> 
> Practically, this could lead to invalid pointer dereferences (kernel
> oops) or corruption of the duplicated tree. Depending on how that
> duplicated tree is later used (e.g. in mm/VMA paths), the effects could
> be userspace-visible, such as fork() failures, process crashes, or
> broader system instability.
> 
> My understanding is that current in-tree users hold the appropriate
> locks and should not hit this, as triggering it requires violating the
> __mt_dup() synchronization contract. The risk primarily comes from the
> fact that __mt_dup() is exported (EXPORT_SYMBOL), making it reachable by
> out-of-tree modules or future callers which may not follow the locking
> rules.
> 
> > If they're bad then presumably we'll want to backport this fix into
> > earlier kernels with a Cc: <stable@vger.kernel.org> and, very
> > preferably a Fixes: line.
> 
> The function was introduced without the check here:
> 
>   Fixes: fd32e4e9b764 ("maple_tree: introduce interfaces __mt_dup() and mtree_dup()")
> 

Great, thanks, I added all that to the changelog and queued this in
mm.git as a hotfix, for runtime testing and pending reviewer input.
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Matthew Wilcox 1 month ago
On Mon, Jan 05, 2026 at 05:33:50PM -0800, Andrew Morton wrote:
> Great, thanks, I added all that to the changelog and queued this in
> mm.git as a hotfix, for runtime testing and pending reviewer input.

Surely this should just be a lockdep assertion that the appropriate
locks are held?
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Boudewijn van der Heide 1 month ago
> Surely this should just be a lockdep assertion that the appropriate
> locks are held?

Just to confirm: do you want me to remove the original runtime check entirely
and replace it with a lockdep_assert(), or do you want both?
If it's only the assertion,
that would mean that production builds won't enforce the check, right?

For v2, should I add a Fixes: line and Cc: stable,
or should i leave it out?

Also, do you want me to include a Suggested-by tag
for your lockdep_assert suggestion?

Thanks,
Boudewijn
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Liam R. Howlett 1 month ago
* Boudewijn van der Heide <boudewijn@delta-utec.com> [260106 08:24]:
> > Surely this should just be a lockdep assertion that the appropriate
> > locks are held?
> 
> Just to confirm: do you want me to remove the original runtime check entirely
> and replace it with a lockdep_assert(), or do you want both?
> If it's only the assertion,

Please do not include any runtime checks in this change - Just the
lockdep_assert().

> that would mean that production builds won't enforce the check, right?
> 
> For v2, should I add a Fixes: line and Cc: stable,
> or should i leave it out?

This does not need to be backported and does not fix anything.  It's an
attempt to protect the user from shooting their own foot off by using
the interface incorrectly.

The fact that no one in the tree uses it incorrectly means that any
backport would be for the benefit of out-of-tree drivers, which we do
not support.

Thanks,
Liam
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Matthew Wilcox 1 month ago
On Tue, Jan 06, 2026 at 11:01:27AM -0500, Liam R. Howlett wrote:
> The fact that no one in the tree uses it incorrectly means that any
> backport would be for the benefit of out-of-tree drivers, which we do
> not support.

Should this interface be EXPORT_SYMBOL_GPL to make that clear?
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Liam R. Howlett 1 month ago
* Matthew Wilcox <willy@infradead.org> [260106 13:29]:
> On Tue, Jan 06, 2026 at 11:01:27AM -0500, Liam R. Howlett wrote:
> > The fact that no one in the tree uses it incorrectly means that any
> > backport would be for the benefit of out-of-tree drivers, which we do
> > not support.
> 
> Should this interface be EXPORT_SYMBOL_GPL to make that clear?

I don't think so.

If someone decides to take the technical debt of not upstreaming, then
they should very much read the documentation more carefully.

The locking rules are stated above the function definition.

Thanks,
Liam
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Boudewijn van der Heide 1 month ago
> > > Surely this should just be a lockdep assertion that the appropriate
> > > locks are held?
> > >
> > Just to confirm: do you want me to remove the original runtime check entirely
> > and replace it with a lockdep_assert(), or do you want both?
> > If it's only the assertion,

> Please do not include any runtime checks in this change - Just the
> lockdep_assert().

> > that would mean that production builds won't enforce the check, right?
> >
> > For v2, should I add a Fixes: line and Cc: stable,
> > or should i leave it out?

> This does not need to be backported and does not fix anything.  It's an
> attempt to protect the user from shooting their own foot off by using
> the interface incorrectly.

> The fact that no one in the tree uses it incorrectly means that any
> backport would be for the benefit of out-of-tree drivers, which we do
> not support.

Thanks for the help and clarification! I will send v2 with just the assertion,
as suggested.

Thanks,
Boudewijn
Re: [PATCH] maple_tree: Add dead node check in mas_dup_alloc()
Posted by Andrew Morton 1 month ago
On Tue, 6 Jan 2026 02:49:34 +0000 Matthew Wilcox <willy@infradead.org> wrote:

> On Mon, Jan 05, 2026 at 05:33:50PM -0800, Andrew Morton wrote:
> > Great, thanks, I added all that to the changelog and queued this in
> > mm.git as a hotfix, for runtime testing and pending reviewer input.
> 
> Surely this should just be a lockdep assertion that the appropriate
> locks are held?

yup ;)
[PATCH v2] maple_tree: Add lockdep assertion in mas_dup_alloc()
Posted by Boudewijn van der Heide 1 month ago
The __mt_dup() function requires callers to hold the appropriate write
lock when duplicating a maple tree. Without proper locking, concurrent
modifications during duplication could access invalid node slots.

Add a lockdep assertion to catch such API misuse during development.
This is API hardening rather than a bug fix - all in-tree callers
already follow the proper locking rules as documented above __mt_dup().

Signed-off-by: Boudewijn van der Heide <boudewijn@delta-utec.com>
---
Changes in v2:
    - Replaced runtime deadnode check with a lockdep assertion
v1:
    https://lore.kernel.org/lkml/20260103165758.74094-1-boudewijn@delta-utec.com/

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 5aa4c9500018..3b4357f16352 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -6248,6 +6248,8 @@ static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas,
 	void __rcu **new_slots;
 	unsigned long val;
 
+	lockdep_assert(mt_write_locked(mas->tree));
+
 	/* Allocate memory for child nodes. */
 	type = mte_node_type(mas->node);
 	new_slots = ma_slots(new_node, type);
--
2.47.3
Re: [PATCH v2] maple_tree: Add lockdep assertion in mas_dup_alloc()
Posted by Liam R. Howlett 2 weeks, 4 days ago
* Boudewijn van der Heide <boudewijn@delta-utec.com> [260106 16:08]:
> The __mt_dup() function requires callers to hold the appropriate write
> lock when duplicating a maple tree. Without proper locking, concurrent
> modifications during duplication could access invalid node slots.
> 
> Add a lockdep assertion to catch such API misuse during development.
> This is API hardening rather than a bug fix - all in-tree callers
> already follow the proper locking rules as documented above __mt_dup().
> 
> Signed-off-by: Boudewijn van der Heide <boudewijn@delta-utec.com>
> ---
> Changes in v2:
>     - Replaced runtime deadnode check with a lockdep assertion
> v1:
>     https://lore.kernel.org/lkml/20260103165758.74094-1-boudewijn@delta-utec.com/
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 5aa4c9500018..3b4357f16352 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -6248,6 +6248,8 @@ static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas,
>  	void __rcu **new_slots;
>  	unsigned long val;
>  
> +	lockdep_assert(mt_write_locked(mas->tree));
> +

This is still the wrong place.  You are validating the lock is held in a
function that is called in a loop without any unlocking.

mas_dup_build() is the only caller of mas_dup_alloc(), and that is only
called from two functions: __mt_dup() and mtree_dup().

This would be better served in __mt_dup() since the other caller,
mtree_dup(), already does the locking so there's no way this will
trigger from that call path.

That way, we don't spend a lot of cycles checking lockdep when forking
for no reason.


>  	/* Allocate memory for child nodes. */
>  	type = mte_node_type(mas->node);
>  	new_slots = ma_slots(new_node, type);
> --
> 2.47.3
> 
>
Re: [PATCH v2] maple_tree: Add lockdep assertion in mas_dup_alloc()
Posted by Boudewijn van der Heide 2 weeks, 3 days ago
Thanks for the review! That makes sense, I overlooked this detail in v2.
We should absolutely just check in __mt_dup(),
like mtree_dup() does for actual locks.
I took a closer look at __mt_dup() and mtree_dup(),
and I saw that mtree_dup() locks both the new and old maple_tree.
So I think it also makes sense for us to assert both trees in __mt_dup(),
like this:

@@ -6379,6 +6379,9 @@ int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp)
 	MA_STATE(mas, mt, 0, 0);
 	MA_STATE(new_mas, new, 0, 0);
 
+	lockdep_assert(mt_write_locked(new));
+	lockdep_assert(mt_write_locked(mt));
+
 	mas_dup_build(&mas, &new_mas, gfp);
 	if (unlikely(mas_is_err(&mas))) {
 		ret = xa_err(mas.node);

Does that look good to you for v3?

> >  	/* Allocate memory for child nodes. */
> >  	type = mte_node_type(mas->node);
> >  	new_slots = ma_slots(new_node, type);
> > --
> > 2.47.3
> > 
> >
> 

Thanks,
Boudewijn