[PATCH v2] hwrng: core - use RCU for current_rng to fix race condition

Lianjie Wang posted 1 patch 1 week, 6 days ago
drivers/char/hw_random/core.c | 145 +++++++++++++++++++---------------
1 file changed, 81 insertions(+), 64 deletions(-)
[PATCH v2] hwrng: core - use RCU for current_rng to fix race condition
Posted by Lianjie Wang 1 week, 6 days ago
Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread
exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex
lock, a concurrent hwrng_unregister() may call kthread_stop() again on
the same task.

Additionally, if the hwrng_unregister() call happens immediately after a
hwrng_register() before, the stopped thread may have never been running,
and thus hwrng_fill remains dirty even after the hwrng_unregister() call
returns. In this case, further calls to hwrng_register() may not start
new threads, and hwrng_unregister() will also call kthread_stop() on the
same task, causing use-after-free and sometimes lockups:

refcount_t: addition on 0; use-after-free.
WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0
Call Trace:
 kthread_stop+0x181/0x360
 hwrng_unregister+0x288/0x380
 virtrng_remove+0xe3/0x200

This patch fixes the race by protecting the global hwrng_fill pointer
inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only
once, and calls to kthread_create() and kthread_stop() are serialized
with the lock held.

To avoid deadlock in hwrng_fillfn() while being stopped,
get_current_rng() and put_rng() no longer hold the rng_mutex lock now.
Instead, we convert current_rng to RCU.

With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no
longer clear hwrng_fill itself. Therefore, the kthread_stop() call is
moved from hwrng_unregister() to drop_current_rng(), where the lock is
already held. This ensures the task is joined via kthread_stop() on all
possible paths (whether kthread_should_stop() is set, or
get_current_rng() starts returning NULL).

Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR()
checks are removed from its callers. The NULL check is also moved from
put_rng() to its caller rng_current_show(), since all the other callers
of put_rng() already check for NULL.

Fixes: be4000bc4644 ("hwrng: create filler thread")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Lianjie Wang <karin0.zst@gmail.com>
---
v2:
 - Convert the lock for get_current_rng() to RCU to break the deadlock, as
   suggested by Herbert Xu.
 - Remove rng_mutex from put_rng() and move NULL check to rng_current_show().
 - Move kthread_stop() to drop_current_rng() inside the lock to join the task
   on all paths, avoiding modifying hwrng_fill inside hwrng_fillfn().
 - Revert changes to rng_fillbuf.

v1: https://lore.kernel.org/linux-crypto/20251221122448.246531-1-karin0.zst@gmail.com/

 drivers/char/hw_random/core.c | 145 +++++++++++++++++++---------------
 1 file changed, 81 insertions(+), 64 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 96d7fe41b373..749678589d9a 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -20,6 +20,7 @@
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/random.h>
+#include <linux/rcupdate.h>
 #include <linux/sched.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
@@ -30,13 +31,13 @@

 #define RNG_BUFFER_SIZE (SMP_CACHE_BYTES < 32 ? 32 : SMP_CACHE_BYTES)

-static struct hwrng *current_rng;
+static struct hwrng __rcu *current_rng;
 /* the current rng has been explicitly chosen by user via sysfs */
 static int cur_rng_set_by_user;
 static struct task_struct *hwrng_fill;
 /* list of registered rngs */
 static LIST_HEAD(rng_list);
-/* Protects rng_list and current_rng */
+/* Protects rng_list, hwrng_fill and updating on current_rng */
 static DEFINE_MUTEX(rng_mutex);
 /* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
 static DEFINE_MUTEX(reading_mutex);
@@ -76,6 +77,7 @@ static inline void cleanup_rng(struct kref *kref)

 static int set_current_rng(struct hwrng *rng)
 {
+	struct hwrng *old_rng;
 	int err;

 	BUG_ON(!mutex_is_locked(&rng_mutex));
@@ -84,8 +86,14 @@ static int set_current_rng(struct hwrng *rng)
 	if (err)
 		return err;

-	drop_current_rng();
-	current_rng = rng;
+	old_rng = rcu_dereference_protected(current_rng,
+					    lockdep_is_held(&rng_mutex));
+	rcu_assign_pointer(current_rng, rng);
+
+	if (old_rng) {
+		synchronize_rcu();
+		kref_put(&old_rng->ref, cleanup_rng);
+	}

 	/* if necessary, start hwrng thread */
 	if (!hwrng_fill) {
@@ -101,47 +109,55 @@ static int set_current_rng(struct hwrng *rng)

 static void drop_current_rng(void)
 {
-	BUG_ON(!mutex_is_locked(&rng_mutex));
-	if (!current_rng)
+	struct hwrng *rng;
+
+	rng = rcu_dereference_protected(current_rng,
+					lockdep_is_held(&rng_mutex));
+	if (!rng)
 		return;

+	RCU_INIT_POINTER(current_rng, NULL);
+	synchronize_rcu();
+
+	if (hwrng_fill) {
+		kthread_stop(hwrng_fill);
+		hwrng_fill = NULL;
+	}
+
 	/* decrease last reference for triggering the cleanup */
-	kref_put(&current_rng->ref, cleanup_rng);
-	current_rng = NULL;
+	kref_put(&rng->ref, cleanup_rng);
 }

-/* Returns ERR_PTR(), NULL or refcounted hwrng */
+/* Returns NULL or refcounted hwrng */
 static struct hwrng *get_current_rng_nolock(void)
 {
-	if (current_rng)
-		kref_get(&current_rng->ref);
+	struct hwrng *rng;

-	return current_rng;
+	rng = rcu_dereference_protected(current_rng,
+					lockdep_is_held(&rng_mutex));
+	if (rng)
+		kref_get(&rng->ref);
+
+	return rng;
 }

 static struct hwrng *get_current_rng(void)
 {
 	struct hwrng *rng;

-	if (mutex_lock_interruptible(&rng_mutex))
-		return ERR_PTR(-ERESTARTSYS);
+	rcu_read_lock();
+	rng = rcu_dereference(current_rng);
+	if (rng && !kref_get_unless_zero(&rng->ref))
+		rng = NULL;

-	rng = get_current_rng_nolock();
+	rcu_read_unlock();

-	mutex_unlock(&rng_mutex);
 	return rng;
 }

 static void put_rng(struct hwrng *rng)
 {
-	/*
-	 * Hold rng_mutex here so we serialize in case they set_current_rng
-	 * on rng again immediately.
-	 */
-	mutex_lock(&rng_mutex);
-	if (rng)
-		kref_put(&rng->ref, cleanup_rng);
-	mutex_unlock(&rng_mutex);
+	kref_put(&rng->ref, cleanup_rng);
 }

 static int hwrng_init(struct hwrng *rng)
@@ -213,10 +229,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,

 	while (size) {
 		rng = get_current_rng();
-		if (IS_ERR(rng)) {
-			err = PTR_ERR(rng);
-			goto out;
-		}
 		if (!rng) {
 			err = -ENODEV;
 			goto out;
@@ -303,7 +315,7 @@ static struct miscdevice rng_miscdev = {

 static int enable_best_rng(void)
 {
-	struct hwrng *rng, *new_rng = NULL;
+	struct hwrng *rng, *cur_rng, *new_rng = NULL;
 	int ret = -ENODEV;

 	BUG_ON(!mutex_is_locked(&rng_mutex));
@@ -321,7 +333,9 @@ static int enable_best_rng(void)
 			new_rng = rng;
 	}

-	ret = ((new_rng == current_rng) ? 0 : set_current_rng(new_rng));
+	cur_rng = rcu_dereference_protected(current_rng,
+					    lockdep_is_held(&rng_mutex));
+	ret = ((new_rng == cur_rng) ? 0 : set_current_rng(new_rng));
 	if (!ret)
 		cur_rng_set_by_user = 0;

@@ -371,11 +385,10 @@ static ssize_t rng_current_show(struct device *dev,
 	struct hwrng *rng;

 	rng = get_current_rng();
-	if (IS_ERR(rng))
-		return PTR_ERR(rng);

 	ret = sysfs_emit(buf, "%s\n", rng ? rng->name : "none");
-	put_rng(rng);
+	if (rng)
+		put_rng(rng);

 	return ret;
 }
@@ -416,8 +429,6 @@ static ssize_t rng_quality_show(struct device *dev,
 	struct hwrng *rng;

 	rng = get_current_rng();
-	if (IS_ERR(rng))
-		return PTR_ERR(rng);

 	if (!rng) /* no need to put_rng */
 		return -ENODEV;
@@ -432,6 +443,7 @@ static ssize_t rng_quality_store(struct device *dev,
 				 struct device_attribute *attr,
 				 const char *buf, size_t len)
 {
+	struct hwrng *rng;
 	u16 quality;
 	int ret = -EINVAL;

@@ -448,12 +460,13 @@ static ssize_t rng_quality_store(struct device *dev,
 		goto out;
 	}

-	if (!current_rng) {
+	rng = rcu_dereference_protected(current_rng, lockdep_is_held(&rng_mutex));
+	if (!rng) {
 		ret = -ENODEV;
 		goto out;
 	}

-	current_rng->quality = quality;
+	rng->quality = quality;
 	current_quality = quality; /* obsolete */

 	/* the best available RNG may have changed */
@@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
 		struct hwrng *rng;

 		rng = get_current_rng();
-		if (IS_ERR(rng) || !rng)
+		if (!rng) {
+			/* This is only possible within drop_current_rng(),
+			 * so just wait until we are stopped.
+			 */
+			while (!kthread_should_stop()) {
+				set_current_state(TASK_INTERRUPTIBLE);
+				schedule();
+			}
 			break;
+		}
+
 		mutex_lock(&reading_mutex);
 		rc = rng_get_data(rng, rng_fillbuf,
 				  rng_buffer_size(), 1);
@@ -518,14 +540,13 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10, true);
 	}
-	hwrng_fill = NULL;
 	return 0;
 }

 int hwrng_register(struct hwrng *rng)
 {
 	int err = -EINVAL;
-	struct hwrng *tmp;
+	struct hwrng *cur_rng, *tmp;

 	if (!rng->name || (!rng->data_read && !rng->read))
 		goto out;
@@ -547,16 +568,19 @@ int hwrng_register(struct hwrng *rng)
 	/* Adjust quality field to always have a proper value */
 	rng->quality = min3(default_quality, 1024, rng->quality ?: 1024);

-	if (!cur_rng_set_by_user &&
-	    (!current_rng || rng->quality > current_rng->quality)) {
-		/*
-		 * Set new rng as current as the new rng source
-		 * provides better entropy quality and was not
-		 * chosen by userspace.
-		 */
-		err = set_current_rng(rng);
-		if (err)
-			goto out_unlock;
+	if (!cur_rng_set_by_user) {
+		cur_rng = rcu_dereference_protected(current_rng,
+						    lockdep_is_held(&rng_mutex));
+		if (!cur_rng || rng->quality > cur_rng->quality) {
+			/*
+			 * Set new rng as current as the new rng source
+			 * provides better entropy quality and was not
+			 * chosen by userspace.
+			 */
+			err = set_current_rng(rng);
+			if (err)
+				goto out_unlock;
+		}
 	}
 	mutex_unlock(&rng_mutex);
 	return 0;
@@ -569,14 +593,17 @@ EXPORT_SYMBOL_GPL(hwrng_register);

 void hwrng_unregister(struct hwrng *rng)
 {
-	struct hwrng *new_rng;
+	struct hwrng *cur_rng;
 	int err;

 	mutex_lock(&rng_mutex);

 	list_del(&rng->list);
 	complete_all(&rng->dying);
-	if (current_rng == rng) {
+
+	cur_rng = rcu_dereference_protected(current_rng,
+					    lockdep_is_held(&rng_mutex));
+	if (cur_rng == rng) {
 		err = enable_best_rng();
 		if (err) {
 			drop_current_rng();
@@ -584,17 +611,7 @@ void hwrng_unregister(struct hwrng *rng)
 		}
 	}

-	new_rng = get_current_rng_nolock();
-	if (list_empty(&rng_list)) {
-		mutex_unlock(&rng_mutex);
-		if (hwrng_fill)
-			kthread_stop(hwrng_fill);
-	} else
-		mutex_unlock(&rng_mutex);
-
-	if (new_rng)
-		put_rng(new_rng);
-
+	mutex_unlock(&rng_mutex);
 	wait_for_completion(&rng->cleanup_done);
 }
 EXPORT_SYMBOL_GPL(hwrng_unregister);
@@ -682,7 +699,7 @@ static int __init hwrng_modinit(void)
 static void __exit hwrng_modexit(void)
 {
 	mutex_lock(&rng_mutex);
-	BUG_ON(current_rng);
+	WARN_ON(rcu_access_pointer(current_rng));
 	kfree(rng_buffer);
 	kfree(rng_fillbuf);
 	mutex_unlock(&rng_mutex);
--
2.52.0
Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race condition
Posted by Herbert Xu 1 week, 5 days ago
On Sun, Jan 25, 2026 at 04:55:55AM +0900, Lianjie Wang wrote:
> Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread
> exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex
> lock, a concurrent hwrng_unregister() may call kthread_stop() again on
> the same task.
> 
> Additionally, if the hwrng_unregister() call happens immediately after a
> hwrng_register() before, the stopped thread may have never been running,
> and thus hwrng_fill remains dirty even after the hwrng_unregister() call
> returns. In this case, further calls to hwrng_register() may not start
> new threads, and hwrng_unregister() will also call kthread_stop() on the
> same task, causing use-after-free and sometimes lockups:
> 
> refcount_t: addition on 0; use-after-free.
> WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0
> Call Trace:
>  kthread_stop+0x181/0x360
>  hwrng_unregister+0x288/0x380
>  virtrng_remove+0xe3/0x200
> 
> This patch fixes the race by protecting the global hwrng_fill pointer
> inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only
> once, and calls to kthread_create() and kthread_stop() are serialized
> with the lock held.
> 
> To avoid deadlock in hwrng_fillfn() while being stopped,
> get_current_rng() and put_rng() no longer hold the rng_mutex lock now.
> Instead, we convert current_rng to RCU.
> 
> With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no
> longer clear hwrng_fill itself. Therefore, the kthread_stop() call is
> moved from hwrng_unregister() to drop_current_rng(), where the lock is
> already held. This ensures the task is joined via kthread_stop() on all
> possible paths (whether kthread_should_stop() is set, or
> get_current_rng() starts returning NULL).
> 
> Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR()
> checks are removed from its callers. The NULL check is also moved from
> put_rng() to its caller rng_current_show(), since all the other callers
> of put_rng() already check for NULL.
> 
> Fixes: be4000bc4644 ("hwrng: create filler thread")
> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: Lianjie Wang <karin0.zst@gmail.com>
> ---
> v2:
>  - Convert the lock for get_current_rng() to RCU to break the deadlock, as
>    suggested by Herbert Xu.
>  - Remove rng_mutex from put_rng() and move NULL check to rng_current_show().
>  - Move kthread_stop() to drop_current_rng() inside the lock to join the task
>    on all paths, avoiding modifying hwrng_fill inside hwrng_fillfn().
>  - Revert changes to rng_fillbuf.
> 
> v1: https://lore.kernel.org/linux-crypto/20251221122448.246531-1-karin0.zst@gmail.com/
> 
>  drivers/char/hw_random/core.c | 145 +++++++++++++++++++---------------
>  1 file changed, 81 insertions(+), 64 deletions(-)

Thanks, this looks pretty good!

>  static struct hwrng *get_current_rng(void)
>  {
>  	struct hwrng *rng;
> 
> -	if (mutex_lock_interruptible(&rng_mutex))
> -		return ERR_PTR(-ERESTARTSYS);
> +	rcu_read_lock();
> +	rng = rcu_dereference(current_rng);
> +	if (rng && !kref_get_unless_zero(&rng->ref))
> +		rng = NULL;

rng->ref should never be zero here as the final kref_put is delayed
by RCU.  So this should be a plain kref_get.

>  static void put_rng(struct hwrng *rng)
>  {
> -	/*
> -	 * Hold rng_mutex here so we serialize in case they set_current_rng
> -	 * on rng again immediately.
> -	 */
> -	mutex_lock(&rng_mutex);
> -	if (rng)
> -		kref_put(&rng->ref, cleanup_rng);
> -	mutex_unlock(&rng_mutex);
> +	kref_put(&rng->ref, cleanup_rng);
>  }

I think the mutex needs to be kept here as otherwise there is
a risk of a slow cleanup_rng racing against a subsequent hwrng_init
on the same RNG.

> @@ -371,11 +385,10 @@ static ssize_t rng_current_show(struct device *dev,
>  	struct hwrng *rng;
> 
>  	rng = get_current_rng();
> -	if (IS_ERR(rng))
> -		return PTR_ERR(rng);
> 
>  	ret = sysfs_emit(buf, "%s\n", rng ? rng->name : "none");
> -	put_rng(rng);
> +	if (rng)
> +		put_rng(rng);

I don't think this NULL check is necessary as put_rng can handle
rng == NULL.

> @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
>  		struct hwrng *rng;
> 
>  		rng = get_current_rng();
> -		if (IS_ERR(rng) || !rng)
> +		if (!rng) {
> +			/* This is only possible within drop_current_rng(),
> +			 * so just wait until we are stopped.
> +			 */
> +			while (!kthread_should_stop()) {
> +				set_current_state(TASK_INTERRUPTIBLE);
> +				schedule();
> +			}
>  			break;
> +		}
> +

Is the schedule necessary? Shouldn't the break just work as it
did before?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race condition
Posted by Lianjie Wang 1 week, 3 days ago
On Mon, Jan 26, 2026 at 12:40:19PM +0800, Herbert Xu wrote:
> >  static struct hwrng *get_current_rng(void)
> >  {
> >  	struct hwrng *rng;
> > 
> > -	if (mutex_lock_interruptible(&rng_mutex))
> > -		return ERR_PTR(-ERESTARTSYS);
> > +	rcu_read_lock();
> > +	rng = rcu_dereference(current_rng);
> > +	if (rng && !kref_get_unless_zero(&rng->ref))
> > +		rng = NULL;
> 
> rng->ref should never be zero here as the final kref_put is delayed
> by RCU.  So this should be a plain kref_get.

Thanks for the review! I will change this to kref_get() in v3.

> >  static void put_rng(struct hwrng *rng)
> >  {
> > -	/*
> > -	 * Hold rng_mutex here so we serialize in case they set_current_rng
> > -	 * on rng again immediately.
> > -	 */
> > -	mutex_lock(&rng_mutex);
> > -	if (rng)
> > -		kref_put(&rng->ref, cleanup_rng);
> > -	mutex_unlock(&rng_mutex);
> > +	kref_put(&rng->ref, cleanup_rng);
> >  }
> 
> I think the mutex needs to be kept here as otherwise there is
> a risk of a slow cleanup_rng racing against a subsequent hwrng_init
> on the same RNG.

It is true that cleanup_rng() can race with hwrng_init(). However,
currently put_rng() is also called in hwrng_fillfn(), where we want to
avoid holding rng_mutex to prevent deadlock in hwrng_unregister().

To solve this race, should we introduce a separate lock (e.g.,
init_mutex) to serialize only hwrng_init() and cleanup_rng()?

Alternatively, I think we could stop the thread also in
set_current_rng() before switching current_rng, so that each lifetime of
hwrng_fillfn() thread strictly holds a single RNG instance, avoiding the
need to call get_current_rng() or put_rng() inside hwrng_fillfn().

> > @@ -371,11 +385,10 @@ static ssize_t rng_current_show(struct device *dev,
> >  	struct hwrng *rng;
> > 
> >  	rng = get_current_rng();
> > -	if (IS_ERR(rng))
> > -		return PTR_ERR(rng);
> > 
> >  	ret = sysfs_emit(buf, "%s\n", rng ? rng->name : "none");
> > -	put_rng(rng);
> > +	if (rng)
> > +		put_rng(rng);
> 
> I don't think this NULL check is necessary as put_rng can handle
> rng == NULL.

I removed the NULL check in put_rng() and moved it to rng_current_show()
in v2, since all the other callers of put_rng() already check for NULL
before calling put_rng(). I can restore the NULL check in put_rng() in
v3 if preferred.

> > @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
> >  		struct hwrng *rng;
> > 
> >  		rng = get_current_rng();
> > -		if (IS_ERR(rng) || !rng)
> > +		if (!rng) {
> > +			/* This is only possible within drop_current_rng(),
> > +			 * so just wait until we are stopped.
> > +			 */
> > +			while (!kthread_should_stop()) {
> > +				set_current_state(TASK_INTERRUPTIBLE);
> > +				schedule();
> > +			}
> >  			break;
> > +		}
> > +
> 
> Is the schedule necessary? Shouldn't the break just work as it
> did before?

With the break alone, the task_struct might get freed before
kthread_stop() is called, which can still cause use-after-free
sometimes:

refcount_t: addition on 0; use-after-free.
WARNING: lib/refcount.c:25 at refcount_warn_saturate+0x103/0x130
...
WARNING: kernel/fork.c:778 at __put_task_struct+0x2ea/0x510
...
Call Trace:
 <TASK>
 kthread_stop+0x347/0x3b0
 hwrng_unregister+0x2d4/0x360
 remove_common+0x1d3/0x230
 virtio_dev_remove+0xb6/0x200
 ...
 </TASK>

As in the description of kthread_create_on_node() from kernel/kthread.c,
it seems we cannot return directly if we plan to call kthread_stop():

 *  ... @threadfn() can either return directly if it is a
 * standalone thread for which no one will call kthread_stop(), or
 * return when 'kthread_should_stop()' is true (which means
 * kthread_stop() has been called). ...

And in kthread_stop():

 * If threadfn() may call kthread_exit() itself, the caller must ensure
 * task_struct can't go away.

So I intended to wait until kthread_should_stop() is set. Otherwise, we
might need to call get_task_struct() and put_task_struct() manually to
hold the task_struct and ensure kthread_stop() works.

Alternatively, we could stop the thread *before* clearing current_rng in
drop_current_rng(), so that get_current_rng() will never return NULL
inside hwrng_fillfn(). What do you think?

Best regards,
-- 
Lianjie Wang
Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race condition
Posted by Herbert Xu 1 week, 3 days ago
On Tue, Jan 27, 2026 at 10:32:30PM +0900, Lianjie Wang wrote:
>
> It is true that cleanup_rng() can race with hwrng_init(). However,
> currently put_rng() is also called in hwrng_fillfn(), where we want to
> avoid holding rng_mutex to prevent deadlock in hwrng_unregister().
> 
> To solve this race, should we introduce a separate lock (e.g.,
> init_mutex) to serialize only hwrng_init() and cleanup_rng()?
> 
> Alternatively, I think we could stop the thread also in
> set_current_rng() before switching current_rng, so that each lifetime of
> hwrng_fillfn() thread strictly holds a single RNG instance, avoiding the
> need to call get_current_rng() or put_rng() inside hwrng_fillfn().

Yes I missed the dead-lock.

I suggest that we delay the work in cleanup_rng into a work_struct,
IOW cleanup_work will simply schedule a work_struct to perform the
actual cleanup.

Then the work_struct can take the mutex safely, immediately check
the reference count on the rng, and if it is non-zero it should return
because the rng has been re-used in the mean time.

If it is zero then it can proceed with the clean-up while holding the
mutex, thus preventing any re-use from occurring.

> I removed the NULL check in put_rng() and moved it to rng_current_show()
> in v2, since all the other callers of put_rng() already check for NULL
> before calling put_rng(). I can restore the NULL check in put_rng() in
> v3 if preferred.

Let's keep this patch simpler by keeping things as they are.  If
it turns out that moving the NULL check makes the code more readable,
you can do that in a follow-up patch.

> > > @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
> > >  		struct hwrng *rng;
> > > 
> > >  		rng = get_current_rng();
> > > -		if (IS_ERR(rng) || !rng)
> > > +		if (!rng) {
> > > +			/* This is only possible within drop_current_rng(),
> > > +			 * so just wait until we are stopped.
> > > +			 */
> > > +			while (!kthread_should_stop()) {
> > > +				set_current_state(TASK_INTERRUPTIBLE);
> > > +				schedule();
> > > +			}
> > >  			break;
> > > +		}
> > > +
> > 
> > Is the schedule necessary? Shouldn't the break just work as it
> > did before?
> 
> With the break alone, the task_struct might get freed before
> kthread_stop() is called, which can still cause use-after-free
> sometimes:

Please change the comment above to say that this loop is needed
to keep the task_struct alive for kthread_stop as it isn't obvious.

I wonder why nobody has added a helper for this since this seems
to be a common pattern in other kthread users? For example,
fs/ext4/mmp.c also does this, and they also set the task state
to TASK_RUNNING, perhaps we should do that too?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt