[PATCH v2] ath9k: sleep for less time when unregistering hwrng

Jason A. Donenfeld posted 1 patch 3 years, 10 months ago
There is a newer version of this series
drivers/char/hw_random/core.c        | 10 ++++++++--
drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
2 files changed, 10 insertions(+), 19 deletions(-)
[PATCH v2] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
I do not have an ath9k and therefore I can't test this myself. The
analysis above was done completely statically, with no dynamic tracing
and just a bug report of symptoms from Gregory. So it might be totally
wrong. Thus, this patch very much requires Gregory's testing. Please
don't apply it until we have his `Tested-by` line.

 drivers/char/hw_random/core.c        | 10 ++++++++--
 drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..af1c1905bb7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			int i;
+
+			for (i = 0; i < 100; ++i) {
+				if (kthread_should_stop() ||
+				    msleep_interruptible(10000 / 100))
+					goto out;
+			}
 			continue;
 		}
 
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10);
 	}
+out:
 	hwrng_fill = NULL;
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..883110c66e5e 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) ||
+		    ++fail_stats >= 100 || msleep_interruptible(5))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1

Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng
Posted by Herbert Xu 3 years, 10 months ago
On Fri, Jun 24, 2022 at 10:44:33PM +0200, Jason A. Donenfeld wrote:
.
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
>  			break;
>  
>  		if (rc <= 0) {
> -			pr_warn("hwrng: no data available\n");
> -			msleep_interruptible(10000);
> +			int i;
> +
> +			for (i = 0; i < 100; ++i) {
> +				if (kthread_should_stop() ||
> +				    msleep_interruptible(10000 / 100))
> +					goto out;
> +			}

Please use schedule_timeout_interruptible.  But if you're going
to make it interruptible it should probably at least try to do
something about signals rather than just ignore them.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH v3] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/hw_random/core.c        | 10 ++++++++--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..a15273271d87 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			int i;
+
+			for (i = 0; i < 100; ++i) {
+				if (kthread_should_stop() ||
+				    schedule_timeout_interruptible(HZ / 20))
+					goto out;
+			}
 			continue;
 		}
 
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10);
 	}
+out:
 	hwrng_fill = NULL;
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..39195f89ea85 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    schedule_timeout_interruptible(HZ / 20) ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1

[PATCH v4] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/hw_random/core.c        | 10 ++++++++--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..a15273271d87 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			int i;
+
+			for (i = 0; i < 100; ++i) {
+				if (kthread_should_stop() ||
+				    schedule_timeout_interruptible(HZ / 20))
+					goto out;
+			}
 			continue;
 		}
 
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   entropy >> 10);
 	}
+out:
 	hwrng_fill = NULL;
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+		    schedule_timeout_interruptible(HZ / 20))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1

[PATCH v5] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and switches to using schedule_timeout_interruptible(), so that the
kthread can be stopped.

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Sorry for all the churn here in sending a v4 and v5 so soon. The
semantics of schedule_timeout_interruptible vs msleep_interruptible with
respect to kthreads is kind of confusing. I'll send a follow up patch
for that elsewhere. For now I think this should suffice for fixing the
bug.

 drivers/char/hw_random/core.c        |  3 +--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..5309fab98631 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,7 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			schedule_timeout_interruptible(HZ * 10);
 			continue;
 		}
 
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+		    schedule_timeout_interruptible(HZ / 20))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1

[PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and switches to using schedule_timeout_interruptible(), so that the
kthread can be stopped.

Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/hw_random/core.c        |  5 +++--
 drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
 2 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..9987f0642285 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,9 @@ static int hwrng_fillfn(void *unused)
 			break;
 
 		if (rc <= 0) {
-			pr_warn("hwrng: no data available\n");
-			msleep_interruptible(10000);
+			if (kthread_should_stop())
+				break;
+			schedule_timeout_interruptible(HZ * 10);
 			continue;
 		}
 
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
 	return j << 2;
 }
 
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
-	u32 delay;
-
-	if (fail_stats < 100)
-		delay = 10;
-	else if (fail_stats < 105)
-		delay = 1000;
-	else
-		delay = 10000;
-
-	return delay;
-}
-
 static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 {
 	struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 			bytes_read += max & 3UL;
 			memzero_explicit(&word, sizeof(word));
 		}
-		if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+		if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+		    ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+		    schedule_timeout_interruptible(HZ / 20))
 			break;
-
-		msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
 	}
 
 	if (wait && !bytes_read && max)
-- 
2.35.1

Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Toke Høiland-Jørgensen 3 years, 10 months ago
"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Herbert Xu 3 years, 10 months ago
On Mon, Jun 27, 2022 at 02:07:35PM +0200, Jason A. Donenfeld wrote:
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
> 
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  drivers/char/hw_random/core.c        |  5 +++--
>  drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
>  2 files changed, 6 insertions(+), 19 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Toke Høiland-Jørgensen 3 years, 10 months ago
"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Gregory, care to take this version for a spin as well to double-check
that it still resolves the issue? :)

-Toke
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Gregory Erwin 3 years, 10 months ago
On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
>
> > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > when waiting for a long time. There are numerous deadlocks that emerge
> > related to shutdown. Work around this API limitation by waiting for a
> > shorter amount of time and erroring more frequently. This commit also
> > prevents hwrng from splatting messages to dmesg when there's a timeout
> > and switches to using schedule_timeout_interruptible(), so that the
> > kthread can be stopped.
> >
> > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > Cc: Kalle Valo <kvalo@kernel.org>
> > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: stable@vger.kernel.org
> > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > Link: https://bugs.archlinux.org/task/75138
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>
> Gregory, care to take this version for a spin as well to double-check
> that it still resolves the issue? :)
>
> -Toke
>

With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.

Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Hi Gregory,

On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
>
> On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> >
> > > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > > when waiting for a long time. There are numerous deadlocks that emerge
> > > related to shutdown. Work around this API limitation by waiting for a
> > > shorter amount of time and erroring more frequently. This commit also
> > > prevents hwrng from splatting messages to dmesg when there's a timeout
> > > and switches to using schedule_timeout_interruptible(), so that the
> > > kthread can be stopped.
> > >
> > > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Cc: Kalle Valo <kvalo@kernel.org>
> > > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > Cc: stable@vger.kernel.org
> > > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > > Link: https://bugs.archlinux.org/task/75138
> > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> >
> > Gregory, care to take this version for a spin as well to double-check
> > that it still resolves the issue? :)
> >
> > -Toke
> >
>
> With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
> wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
> immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>

Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have
been the better patch, then?

Jason
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
On Tue, Jun 28, 2022 at 12:46 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Gregory,
>
> On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> >
> > On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> > >
> > > > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > > > when waiting for a long time. There are numerous deadlocks that emerge
> > > > related to shutdown. Work around this API limitation by waiting for a
> > > > shorter amount of time and erroring more frequently. This commit also
> > > > prevents hwrng from splatting messages to dmesg when there's a timeout
> > > > and switches to using schedule_timeout_interruptible(), so that the
> > > > kthread can be stopped.
> > > >
> > > > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > > > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > > > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > Cc: Kalle Valo <kvalo@kernel.org>
> > > > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > > > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > > > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > > > Link: https://bugs.archlinux.org/task/75138
> > > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > >
> > > Gregory, care to take this version for a spin as well to double-check
> > > that it still resolves the issue? :)
> > >
> > > -Toke
> > >
> >
> > With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
> > wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
> > immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
> >
> > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
>
> Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have
> been the better patch, then?

$ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
| git am

That one, if you want to give it a spin and see if that 5-6s is back
to ~1s or less.

Jason
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Herbert Xu 3 years, 10 months ago
On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote:
>
> $ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
> | git am
> 
> That one, if you want to give it a spin and see if that 5-6s is back
> to ~1s or less.

Whatever caused kthread_should_stop to return true should have
woken the thread and caused schedule_timeout to return.

If it's not waking the thread up then we should find out why.

Oh wait you're checking kthread_should_stop before the schedule
call instead of afterwards, that would do it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Hi Herbert,

On Tue, Jun 28, 2022 at 12:52 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote:
> >
> > $ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
> > | git am
> >
> > That one, if you want to give it a spin and see if that 5-6s is back
> > to ~1s or less.
>
> Whatever caused kthread_should_stop to return true should have
> woken the thread and caused schedule_timeout to return.
>
> If it's not waking the thread up then we should find out why.
>
> Oh wait you're checking kthread_should_stop before the schedule
> call instead of afterwards, that would do it.

Oh, that's a really good observation, thank you! I'll send a patch
that does it right. I have to check kthread_should_stop() before,
because the wake up might have been consumed by a sleep inside of the
hwrng ->read_data() function, causing it to return early. So we have
to check it before going to sleep again. And after, I need to be
checking the return value of schedule_timeout_interruptible(), which
I'm not in this latest. So v+1 coming up.

By the way, this thread might interest you:
https://lore.kernel.org/lkml/20220627145716.641185-1-Jason@zx2c4.com/

Jason
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Hi again,

On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote:
> > Oh wait you're checking kthread_should_stop before the schedule
> > call instead of afterwards, that would do it.
> 
> Oh, that's a really good observation, thank you! 

Wait, no. I already am kthread_should_stop() it afterwards. That
"continue" goes to the top of the loop, which checks it in the while()
statement. So something else is amiss. I guess I'll investigate.

Jason
Re: [PATCH v6] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Hi Herbert,

On Tue, Jun 28, 2022 at 02:05:34PM +0200, Jason A. Donenfeld wrote:
> Hi again,
> 
> On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote:
> > > Oh wait you're checking kthread_should_stop before the schedule
> > > call instead of afterwards, that would do it.
> > 
> > Oh, that's a really good observation, thank you! 
> 
> Wait, no. I already am kthread_should_stop() it afterwards. That
> "continue" goes to the top of the loop, which checks it in the while()
> statement. So something else is amiss. I guess I'll investigate.

I worked out a solution for the "larger problem" now. It's a bit
involved, but not too bad. I'll post a patch shortly.

Jason
Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng
Posted by Gregory Erwin 3 years, 10 months ago
On Fri, Jun 24, 2022 at 1:44 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and prevents calling msleep_interruptible() for tons of time when a
> thread is supposed to be shutting down, since msleep_interruptible()
> isn't actually interrupted by kthread_stop().
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> I do not have an ath9k and therefore I can't test this myself. The
> analysis above was done completely statically, with no dynamic tracing
> and just a bug report of symptoms from Gregory. So it might be totally
> wrong. Thus, this patch very much requires Gregory's testing. Please
> don't apply it until we have his `Tested-by` line.
>
>  drivers/char/hw_random/core.c        | 10 ++++++++--
>  drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
>  2 files changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
>                         break;
>
>                 if (rc <= 0) {
> -                       pr_warn("hwrng: no data available\n");
> -                       msleep_interruptible(10000);
> +                       int i;
> +
> +                       for (i = 0; i < 100; ++i) {
> +                               if (kthread_should_stop() ||
> +                                   msleep_interruptible(10000 / 100))
> +                                       goto out;
> +                       }
>                         continue;
>                 }
>
> @@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
>                 add_hwgenerator_randomness((void *)rng_fillbuf, rc,
>                                            entropy >> 10);
>         }
> +out:
>         hwrng_fill = NULL;
>         return 0;
>  }
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..883110c66e5e 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
>         return j << 2;
>  }
>
> -static u32 ath9k_rng_delay_get(u32 fail_stats)
> -{
> -       u32 delay;
> -
> -       if (fail_stats < 100)
> -               delay = 10;
> -       else if (fail_stats < 105)
> -               delay = 1000;
> -       else
> -               delay = 10000;
> -
> -       return delay;
> -}
> -
>  static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>  {
>         struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
> @@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
>                         bytes_read += max & 3UL;
>                         memzero_explicit(&word, sizeof(word));
>                 }
> -               if (!wait || !max || likely(bytes_read) || fail_stats > 110)
> +               if (!wait || !max || likely(bytes_read) ||
> +                   ++fail_stats >= 100 || msleep_interruptible(5))
>                         break;
> -
> -               msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
>         }
>
>         if (wait && !bytes_read && max)
> --
> 2.35.1
>

Jason,

This patch is working as you described. Trying to read from /dev/hwrng
consistently blocks for only 1.3s before returning an IO error. The longest
that I observed 'ip link set wlan0 down' to block was also about 1.3s,
and that was immediately after 'cat /dev/hwrng'. Additionally, the longest
duration that I observed for wiphy_suspend() to return was just under 100ms.

Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng
Posted by Jason A. Donenfeld 3 years, 10 months ago
Hi Gregory,

On Sat, Jun 25, 2022 at 2:13 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> This patch is working as you described. Trying to read from /dev/hwrng
> consistently blocks for only 1.3s before returning an IO error. The longest
> that I observed 'ip link set wlan0 down' to block was also about 1.3s,
> and that was immediately after 'cat /dev/hwrng'. Additionally, the longest
> duration that I observed for wiphy_suspend() to return was just under 100ms.
>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>

Great, thanks for testing. I think that barring more invasive changes
to the hwrng subsystem, a heuristic approach like this is the best
we're going to do inside the ath9k driver itself.

So Toke/Kalle - can you queue this up?

Jason