drivers/char/hw_random/core.c | 10 ++++++++-- drivers/net/wireless/ath/ath9k/rng.c | 19 ++----------------- 2 files changed, 10 insertions(+), 19 deletions(-)
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
I do not have an ath9k and therefore I can't test this myself. The
analysis above was done completely statically, with no dynamic tracing
and just a bug report of symptoms from Gregory. So it might be totally
wrong. Thus, this patch very much requires Gregory's testing. Please
don't apply it until we have his `Tested-by` line.
drivers/char/hw_random/core.c | 10 ++++++++--
drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
2 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..af1c1905bb7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
break;
if (rc <= 0) {
- pr_warn("hwrng: no data available\n");
- msleep_interruptible(10000);
+ int i;
+
+ for (i = 0; i < 100; ++i) {
+ if (kthread_should_stop() ||
+ msleep_interruptible(10000 / 100))
+ goto out;
+ }
continue;
}
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
entropy >> 10);
}
+out:
hwrng_fill = NULL;
return 0;
}
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..883110c66e5e 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
return j << 2;
}
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
- u32 delay;
-
- if (fail_stats < 100)
- delay = 10;
- else if (fail_stats < 105)
- delay = 1000;
- else
- delay = 10000;
-
- return delay;
-}
-
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
{
struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) ||
+ ++fail_stats >= 100 || msleep_interruptible(5))
break;
-
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
}
if (wait && !bytes_read && max)
--
2.35.1
On Fri, Jun 24, 2022 at 10:44:33PM +0200, Jason A. Donenfeld wrote:
.
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
> break;
>
> if (rc <= 0) {
> - pr_warn("hwrng: no data available\n");
> - msleep_interruptible(10000);
> + int i;
> +
> + for (i = 0; i < 100; ++i) {
> + if (kthread_should_stop() ||
> + msleep_interruptible(10000 / 100))
> + goto out;
> + }
Please use schedule_timeout_interruptible. But if you're going
to make it interruptible it should probably at least try to do
something about signals rather than just ignore them.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
drivers/char/hw_random/core.c | 10 ++++++++--
drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
2 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..a15273271d87 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
break;
if (rc <= 0) {
- pr_warn("hwrng: no data available\n");
- msleep_interruptible(10000);
+ int i;
+
+ for (i = 0; i < 100; ++i) {
+ if (kthread_should_stop() ||
+ schedule_timeout_interruptible(HZ / 20))
+ goto out;
+ }
continue;
}
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
entropy >> 10);
}
+out:
hwrng_fill = NULL;
return 0;
}
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..39195f89ea85 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
return j << 2;
}
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
- u32 delay;
-
- if (fail_stats < 100)
- delay = 10;
- else if (fail_stats < 105)
- delay = 1000;
- else
- delay = 10000;
-
- return delay;
-}
-
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
{
struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+ schedule_timeout_interruptible(HZ / 20) ||
+ ((current->flags & PF_KTHREAD) && kthread_should_stop()))
break;
-
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
}
if (wait && !bytes_read && max)
--
2.35.1
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and prevents calling msleep_interruptible() for tons of time when a
thread is supposed to be shutting down, since msleep_interruptible()
isn't actually interrupted by kthread_stop().
Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
drivers/char/hw_random/core.c | 10 ++++++++--
drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
2 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..a15273271d87 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
break;
if (rc <= 0) {
- pr_warn("hwrng: no data available\n");
- msleep_interruptible(10000);
+ int i;
+
+ for (i = 0; i < 100; ++i) {
+ if (kthread_should_stop() ||
+ schedule_timeout_interruptible(HZ / 20))
+ goto out;
+ }
continue;
}
@@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
entropy >> 10);
}
+out:
hwrng_fill = NULL;
return 0;
}
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
return j << 2;
}
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
- u32 delay;
-
- if (fail_stats < 100)
- delay = 10;
- else if (fail_stats < 105)
- delay = 1000;
- else
- delay = 10000;
-
- return delay;
-}
-
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
{
struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+ ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+ schedule_timeout_interruptible(HZ / 20))
break;
-
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
}
if (wait && !bytes_read && max)
--
2.35.1
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and switches to using schedule_timeout_interruptible(), so that the
kthread can be stopped.
Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Sorry for all the churn here in sending a v4 and v5 so soon. The
semantics of schedule_timeout_interruptible vs msleep_interruptible with
respect to kthreads is kind of confusing. I'll send a follow up patch
for that elsewhere. For now I think this should suffice for fixing the
bug.
drivers/char/hw_random/core.c | 3 +--
drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
2 files changed, 4 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..5309fab98631 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,7 @@ static int hwrng_fillfn(void *unused)
break;
if (rc <= 0) {
- pr_warn("hwrng: no data available\n");
- msleep_interruptible(10000);
+ schedule_timeout_interruptible(HZ * 10);
continue;
}
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
return j << 2;
}
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
- u32 delay;
-
- if (fail_stats < 100)
- delay = 10;
- else if (fail_stats < 105)
- delay = 1000;
- else
- delay = 10000;
-
- return delay;
-}
-
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
{
struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+ ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+ schedule_timeout_interruptible(HZ / 20))
break;
-
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
}
if (wait && !bytes_read && max)
--
2.35.1
Even though hwrng provides a `wait` parameter, it doesn't work very well
when waiting for a long time. There are numerous deadlocks that emerge
related to shutdown. Work around this API limitation by waiting for a
shorter amount of time and erroring more frequently. This commit also
prevents hwrng from splatting messages to dmesg when there's a timeout
and switches to using schedule_timeout_interruptible(), so that the
kthread can be stopped.
Reported-by: Gregory Erwin <gregerwin256@gmail.com>
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
Link: https://bugs.archlinux.org/task/75138
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
drivers/char/hw_random/core.c | 5 +++--
drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
2 files changed, 6 insertions(+), 19 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 16f227b995e8..9987f0642285 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -513,8 +513,9 @@ static int hwrng_fillfn(void *unused)
break;
if (rc <= 0) {
- pr_warn("hwrng: no data available\n");
- msleep_interruptible(10000);
+ if (kthread_should_stop())
+ break;
+ schedule_timeout_interruptible(HZ * 10);
continue;
}
diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
index cb5414265a9b..757603d1949d 100644
--- a/drivers/net/wireless/ath/ath9k/rng.c
+++ b/drivers/net/wireless/ath/ath9k/rng.c
@@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
return j << 2;
}
-static u32 ath9k_rng_delay_get(u32 fail_stats)
-{
- u32 delay;
-
- if (fail_stats < 100)
- delay = 10;
- else if (fail_stats < 105)
- delay = 1000;
- else
- delay = 10000;
-
- return delay;
-}
-
static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
{
struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
@@ -80,10 +66,10 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
bytes_read += max & 3UL;
memzero_explicit(&word, sizeof(word));
}
- if (!wait || !max || likely(bytes_read) || fail_stats > 110)
+ if (!wait || !max || likely(bytes_read) || ++fail_stats >= 100 ||
+ ((current->flags & PF_KTHREAD) && kthread_should_stop()) ||
+ schedule_timeout_interruptible(HZ / 20))
break;
-
- msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
}
if (wait && !bytes_read && max)
--
2.35.1
"Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
On Mon, Jun 27, 2022 at 02:07:35PM +0200, Jason A. Donenfeld wrote:
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> drivers/char/hw_random/core.c | 5 +++--
> drivers/net/wireless/ath/ath9k/rng.c | 20 +++-----------------
> 2 files changed, 6 insertions(+), 19 deletions(-)
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
"Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and switches to using schedule_timeout_interruptible(), so that the
> kthread can be stopped.
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Gregory, care to take this version for a spin as well to double-check
that it still resolves the issue? :)
-Toke
On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
>
> > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > when waiting for a long time. There are numerous deadlocks that emerge
> > related to shutdown. Work around this API limitation by waiting for a
> > shorter amount of time and erroring more frequently. This commit also
> > prevents hwrng from splatting messages to dmesg when there's a timeout
> > and switches to using schedule_timeout_interruptible(), so that the
> > kthread can be stopped.
> >
> > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > Cc: Kalle Valo <kvalo@kernel.org>
> > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: stable@vger.kernel.org
> > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > Link: https://bugs.archlinux.org/task/75138
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>
> Gregory, care to take this version for a spin as well to double-check
> that it still resolves the issue? :)
>
> -Toke
>
With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Hi Gregory,
On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
>
> On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> >
> > > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > > when waiting for a long time. There are numerous deadlocks that emerge
> > > related to shutdown. Work around this API limitation by waiting for a
> > > shorter amount of time and erroring more frequently. This commit also
> > > prevents hwrng from splatting messages to dmesg when there's a timeout
> > > and switches to using schedule_timeout_interruptible(), so that the
> > > kthread can be stopped.
> > >
> > > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Cc: Kalle Valo <kvalo@kernel.org>
> > > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > Cc: stable@vger.kernel.org
> > > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > > Link: https://bugs.archlinux.org/task/75138
> > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> >
> > Gregory, care to take this version for a spin as well to double-check
> > that it still resolves the issue? :)
> >
> > -Toke
> >
>
> With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
> wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
> immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
>
> Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have
been the better patch, then?
Jason
On Tue, Jun 28, 2022 at 12:46 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Gregory,
>
> On Tue, Jun 28, 2022 at 3:39 AM Gregory Erwin <gregerwin256@gmail.com> wrote:
> >
> > On Mon, Jun 27, 2022 at 5:18 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> > >
> > > > Even though hwrng provides a `wait` parameter, it doesn't work very well
> > > > when waiting for a long time. There are numerous deadlocks that emerge
> > > > related to shutdown. Work around this API limitation by waiting for a
> > > > shorter amount of time and erroring more frequently. This commit also
> > > > prevents hwrng from splatting messages to dmesg when there's a timeout
> > > > and switches to using schedule_timeout_interruptible(), so that the
> > > > kthread can be stopped.
> > > >
> > > > Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> > > > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
> > > > Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > Cc: Kalle Valo <kvalo@kernel.org>
> > > > Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> > > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> > > > Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> > > > Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> > > > Link: https://bugs.archlinux.org/task/75138
> > > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > >
> > > Gregory, care to take this version for a spin as well to double-check
> > > that it still resolves the issue? :)
> > >
> > > -Toke
> > >
> >
> > With patch v6, reads from /dev/hwrng block for 5-6s, during which 'ip link set
> > wlan0 down' will also block. Otherwise, 'ip link set wlan0 down' returns
> > immediately. Similarly, wiphy_suspend() consistently returns in under 10ms.
> >
> > Tested-by: Gregory Erwin <gregerwin256@gmail.com>
>
> Oh 5-6s... so it's actually worse. Yikes. Sounds like v4 might have
> been the better patch, then?
$ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw
| git am
That one, if you want to give it a spin and see if that 5-6s is back
to ~1s or less.
Jason
On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote: > > $ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw > | git am > > That one, if you want to give it a spin and see if that 5-6s is back > to ~1s or less. Whatever caused kthread_should_stop to return true should have woken the thread and caused schedule_timeout to return. If it's not waking the thread up then we should find out why. Oh wait you're checking kthread_should_stop before the schedule call instead of afterwards, that would do it. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Hi Herbert, On Tue, Jun 28, 2022 at 12:52 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Jun 28, 2022 at 12:48:50PM +0200, Jason A. Donenfeld wrote: > > > > $ curl https://lore.kernel.org/lkml/20220627104955.534013-1-Jason@zx2c4.com/raw > > | git am > > > > That one, if you want to give it a spin and see if that 5-6s is back > > to ~1s or less. > > Whatever caused kthread_should_stop to return true should have > woken the thread and caused schedule_timeout to return. > > If it's not waking the thread up then we should find out why. > > Oh wait you're checking kthread_should_stop before the schedule > call instead of afterwards, that would do it. Oh, that's a really good observation, thank you! I'll send a patch that does it right. I have to check kthread_should_stop() before, because the wake up might have been consumed by a sleep inside of the hwrng ->read_data() function, causing it to return early. So we have to check it before going to sleep again. And after, I need to be checking the return value of schedule_timeout_interruptible(), which I'm not in this latest. So v+1 coming up. By the way, this thread might interest you: https://lore.kernel.org/lkml/20220627145716.641185-1-Jason@zx2c4.com/ Jason
Hi again, On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote: > > Oh wait you're checking kthread_should_stop before the schedule > > call instead of afterwards, that would do it. > > Oh, that's a really good observation, thank you! Wait, no. I already am kthread_should_stop() it afterwards. That "continue" goes to the top of the loop, which checks it in the while() statement. So something else is amiss. I guess I'll investigate. Jason
Hi Herbert, On Tue, Jun 28, 2022 at 02:05:34PM +0200, Jason A. Donenfeld wrote: > Hi again, > > On Tue, Jun 28, 2022 at 12:55:57PM +0200, Jason A. Donenfeld wrote: > > > Oh wait you're checking kthread_should_stop before the schedule > > > call instead of afterwards, that would do it. > > > > Oh, that's a really good observation, thank you! > > Wait, no. I already am kthread_should_stop() it afterwards. That > "continue" goes to the top of the loop, which checks it in the while() > statement. So something else is amiss. I guess I'll investigate. I worked out a solution for the "larger problem" now. It's a bit involved, but not too bad. I'll post a patch shortly. Jason
On Fri, Jun 24, 2022 at 1:44 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and prevents calling msleep_interruptible() for tons of time when a
> thread is supposed to be shutting down, since msleep_interruptible()
> isn't actually interrupted by kthread_stop().
>
> Reported-by: Gregory Erwin <gregerwin256@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Kalle Valo <kvalo@kernel.org>
> Cc: Rui Salvaterra <rsalvaterra@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: stable@vger.kernel.org
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> I do not have an ath9k and therefore I can't test this myself. The
> analysis above was done completely statically, with no dynamic tracing
> and just a bug report of symptoms from Gregory. So it might be totally
> wrong. Thus, this patch very much requires Gregory's testing. Please
> don't apply it until we have his `Tested-by` line.
>
> drivers/char/hw_random/core.c | 10 ++++++++--
> drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
> 2 files changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
> break;
>
> if (rc <= 0) {
> - pr_warn("hwrng: no data available\n");
> - msleep_interruptible(10000);
> + int i;
> +
> + for (i = 0; i < 100; ++i) {
> + if (kthread_should_stop() ||
> + msleep_interruptible(10000 / 100))
> + goto out;
> + }
> continue;
> }
>
> @@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
> add_hwgenerator_randomness((void *)rng_fillbuf, rc,
> entropy >> 10);
> }
> +out:
> hwrng_fill = NULL;
> return 0;
> }
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..883110c66e5e 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
> return j << 2;
> }
>
> -static u32 ath9k_rng_delay_get(u32 fail_stats)
> -{
> - u32 delay;
> -
> - if (fail_stats < 100)
> - delay = 10;
> - else if (fail_stats < 105)
> - delay = 1000;
> - else
> - delay = 10000;
> -
> - return delay;
> -}
> -
> static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
> {
> struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
> @@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
> bytes_read += max & 3UL;
> memzero_explicit(&word, sizeof(word));
> }
> - if (!wait || !max || likely(bytes_read) || fail_stats > 110)
> + if (!wait || !max || likely(bytes_read) ||
> + ++fail_stats >= 100 || msleep_interruptible(5))
> break;
> -
> - msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
> }
>
> if (wait && !bytes_read && max)
> --
> 2.35.1
>
Jason,
This patch is working as you described. Trying to read from /dev/hwrng
consistently blocks for only 1.3s before returning an IO error. The longest
that I observed 'ip link set wlan0 down' to block was also about 1.3s,
and that was immediately after 'cat /dev/hwrng'. Additionally, the longest
duration that I observed for wiphy_suspend() to return was just under 100ms.
Tested-by: Gregory Erwin <gregerwin256@gmail.com>
Hi Gregory, On Sat, Jun 25, 2022 at 2:13 AM Gregory Erwin <gregerwin256@gmail.com> wrote: > This patch is working as you described. Trying to read from /dev/hwrng > consistently blocks for only 1.3s before returning an IO error. The longest > that I observed 'ip link set wlan0 down' to block was also about 1.3s, > and that was immediately after 'cat /dev/hwrng'. Additionally, the longest > duration that I observed for wiphy_suspend() to return was just under 100ms. > > Tested-by: Gregory Erwin <gregerwin256@gmail.com> Great, thanks for testing. I think that barring more invasive changes to the hwrng subsystem, a heuristic approach like this is the best we're going to do inside the ath9k driver itself. So Toke/Kalle - can you queue this up? Jason
© 2016 - 2026 Red Hat, Inc.