md-cluster: fix NULL pointer dereference in process_metadata_update

[PATCH] md-cluster: fix NULL pointer dereference in process_metadata_update

Posted by Jiasheng Jiang 3 weeks ago

The function process_metadata_update() blindly dereferences the 'thread'
pointer (acquired via rcu_dereference_protected) within the wait_event()
macro.

While the code comment states "daemon thread must exist", there is a valid
race condition window during the MD array startup sequence (md_run):

1. bitmap_load() is called, which invokes md_cluster_ops->join().
2. join() starts the "cluster_recv" thread (recv_daemon).
3. At this point, recv_daemon is active and processing messages.
4. However, mddev->thread (the main MD thread) is not initialized until
   later in md_run().

If a METADATA_UPDATED message is received from a remote node during this
specific window, process_metadata_update() will be called while
mddev->thread is still NULL, leading to a kernel panic.

To fix this, we must validate the 'thread' pointer. If it is NULL, we
release the held lock (no_new_dev_lockres) and return early, safely
ignoring the update request as the array is not yet fully ready to
process it.

Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
---
 drivers/md/md-cluster.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 11f1e91d387d..896279988dfd 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -549,8 +549,13 @@ static void process_metadata_update(struct mddev *mddev, struct cluster_msg *msg
 
 	dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR);
 
-	/* daemaon thread must exist */
 	thread = rcu_dereference_protected(mddev->thread, true);
+	if (!thread) {
+		pr_warn("md-cluster: Received metadata update but MD thread is not ready\n");
+		dlm_unlock_sync(cinfo->no_new_dev_lockres);
+		return;
+	}
+
 	wait_event(thread->wqueue,
 		   (got_lock = mddev_trylock(mddev)) ||
 		    test_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state));
-- 
2.25.1

Re: [PATCH] md-cluster: fix NULL pointer dereference in process_metadata_update

Posted by Yu Kuai 1 week, 5 days ago

在 2026/1/17 22:59, Jiasheng Jiang 写道:

> The function process_metadata_update() blindly dereferences the 'thread'
> pointer (acquired via rcu_dereference_protected) within the wait_event()
> macro.
>
> While the code comment states "daemon thread must exist", there is a valid
> race condition window during the MD array startup sequence (md_run):
>
> 1. bitmap_load() is called, which invokes md_cluster_ops->join().
> 2. join() starts the "cluster_recv" thread (recv_daemon).
> 3. At this point, recv_daemon is active and processing messages.
> 4. However, mddev->thread (the main MD thread) is not initialized until
>     later in md_run().
>
> If a METADATA_UPDATED message is received from a remote node during this
> specific window, process_metadata_update() will be called while
> mddev->thread is still NULL, leading to a kernel panic.
>
> To fix this, we must validate the 'thread' pointer. If it is NULL, we
> release the held lock (no_new_dev_lockres) and return early, safely
> ignoring the update request as the array is not yet fully ready to
> process it.
>
> Signed-off-by: Jiasheng Jiang<jiashengjiangcool@gmail.com>
> ---
>   drivers/md/md-cluster.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
Applied to md-7.0

-- 
Thansk,
Kuai