[patch] Re: [PowerPC][Linux-next][6.11.0-rc4-next-20240820] OOPs while running LTP FS Stress

Mike Galbraith posted 1 patch 2 months, 1 week ago
kernel/sched/fair.c |   10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
[patch] Re: [PowerPC][Linux-next][6.11.0-rc4-next-20240820] OOPs while running LTP FS Stress
Posted by Mike Galbraith 2 months, 1 week ago
On Mon, 2024-09-16 at 12:00 +0530, Venkat Rao Bagalkote wrote:
> Greetings!!!

Greetings,

> I am seeing below kernel crash from 6.11.0-rc4-next-20240820.
>
>
> Tried to do git bisect, but it didnt point to right patch. Attached is
> the bisect log.
>
> Any help in fixing this is much appriciated.

I met this, as well as other ways the wheels can fall off that turned
out to have the same root.  I gave Peter a heads up with diag offline,
but having now convinced myself that all is well, I'll go ahead and
post a patchlet.

At the very least it's worth putting out for wider testing.. and should
anyone have something prettier in mind, yeah, do that instead.

sched: Fix sched_delayed vs cfs_bandwidth

Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
leads to a couple terminal scenarios.  Finish it first, so ENQUEUE_WAKEUP
can proceed as it would have sans DELAY_DEQUEUE treatment.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/fair.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6049,10 +6049,14 @@ void unthrottle_cfs_rq(struct cfs_rq *cf
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);

-		if (se->on_rq) {
-			SCHED_WARN_ON(se->sched_delayed);
-			break;
+		/* Handle any unfinished DELAY_DEQUEUE business first. */
+		if (unlikely(se->on_rq && se->sched_delayed)) {
+			int flags = DEQUEUE_SLEEP | DEQUEUE_SPECIAL;
+
+			dequeue_entity(qcfs_rq, se, flags | DEQUEUE_DELAYED);
 		}
+		if (se->on_rq)
+			break;
 		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);

 		if (cfs_rq_is_idle(group_cfs_rq(se)))
Re: [patch] Re: [PowerPC][Linux-next][6.11.0-rc4-next-20240820] OOPs while running LTP FS Stress
Posted by Venkat Rao Bagalkote 2 months, 1 week ago
Hello Mike,

Thanks for the patch. I applied your patch and verified the issue, and 
can confirm your patch fixes the issue.


Please add the below tags.


Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>

Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>


Regards,

Venkat.

On 19/09/24 11:39 am, Mike Galbraith wrote:
> On Mon, 2024-09-16 at 12:00 +0530, Venkat Rao Bagalkote wrote:
>> Greetings!!!
> Greetings,
>
>> I am seeing below kernel crash from 6.11.0-rc4-next-20240820.
>>
>>
>> Tried to do git bisect, but it didnt point to right patch. Attached is
>> the bisect log.
>>
>> Any help in fixing this is much appriciated.
> I met this, as well as other ways the wheels can fall off that turned
> out to have the same root.  I gave Peter a heads up with diag offline,
> but having now convinced myself that all is well, I'll go ahead and
> post a patchlet.
>
> At the very least it's worth putting out for wider testing.. and should
> anyone have something prettier in mind, yeah, do that instead.
>
> sched: Fix sched_delayed vs cfs_bandwidth
>
> Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
> leads to a couple terminal scenarios.  Finish it first, so ENQUEUE_WAKEUP
> can proceed as it would have sans DELAY_DEQUEUE treatment.
>
> Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> ---
>   kernel/sched/fair.c |   10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6049,10 +6049,14 @@ void unthrottle_cfs_rq(struct cfs_rq *cf
>   	for_each_sched_entity(se) {
>   		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
>
> -		if (se->on_rq) {
> -			SCHED_WARN_ON(se->sched_delayed);
> -			break;
> +		/* Handle any unfinished DELAY_DEQUEUE business first. */
> +		if (unlikely(se->on_rq && se->sched_delayed)) {
> +			int flags = DEQUEUE_SLEEP | DEQUEUE_SPECIAL;
> +
> +			dequeue_entity(qcfs_rq, se, flags | DEQUEUE_DELAYED);
>   		}
> +		if (se->on_rq)
> +			break;
>   		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
>
>   		if (cfs_rq_is_idle(group_cfs_rq(se)))
>
>
Re: [patch] Re: [PowerPC][Linux-next][6.11.0-rc4-next-20240820] OOPs while running LTP FS Stress
Posted by Mike Galbraith 2 months, 1 week ago
On Thu, 2024-09-19 at 20:09 +0530, Venkat Rao Bagalkote wrote:
>
> Please add the below tags.
>
>
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>

Sure, and while at it I can brush patchlet's rather scruffy fur.

1. on_rq being implied by sched_delayed, redundant check can go.
2. no tasks anywhere in sight, DEQUEUE_SPECIAL can go.
3. use of unlikely in an unlikely path can also go.
No functional change.  Too bad everything around there fits in 80
characters, or a couple useless diffstat plus signs could go too.

sched: Fix sched_delayed vs cfs_bandwidth

Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
leads to a couple terminal scenarios.  Finish it first, so ENQUEUE_WAKEUP
can proceed as it would have sans DELAY_DEQUEUE treatment.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/fair.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6049,10 +6049,13 @@ void unthrottle_cfs_rq(struct cfs_rq *cf
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);

-		if (se->on_rq) {
-			SCHED_WARN_ON(se->sched_delayed);
+		/* Handle any unfinished DELAY_DEQUEUE business first. */
+		if (se->sched_delayed) {
+			int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
+
+			dequeue_entity(qcfs_rq, se, flags);
+		} else if (se->on_rq)
 			break;
-		}
 		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);

 		if (cfs_rq_is_idle(group_cfs_rq(se)))
[patch] sched: Fix sched_delayed vs cfs_bandwidth
Posted by Mike Galbraith 1 month, 4 weeks ago

Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
leads to a couple terminal scenarios.  Finish it first, so ENQUEUE_WAKEUP
can proceed as it would have sans DELAY_DEQUEUE treatment.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/fair.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6058,10 +6058,13 @@ void unthrottle_cfs_rq(struct cfs_rq *cf
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);

-		if (se->on_rq) {
-			SCHED_WARN_ON(se->sched_delayed);
+		/* Handle any unfinished DELAY_DEQUEUE business first. */
+		if (se->sched_delayed) {
+			int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
+
+			dequeue_entity(qcfs_rq, se, flags);
+		} else if (se->on_rq)
 			break;
-		}
 		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);

 		if (cfs_rq_is_idle(group_cfs_rq(se)))
[tip: sched/urgent] sched: Fix sched_delayed vs cfs_bandwidth
Posted by tip-bot2 for Mike Galbraith 1 month, 3 weeks ago
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     9b5ce1a37e904fac32d560668134965f4e937f6c
Gitweb:        https://git.kernel.org/tip/9b5ce1a37e904fac32d560668134965f4e937f6c
Author:        Mike Galbraith <efault@gmx.de>
AuthorDate:    Tue, 01 Oct 2024 03:34:01 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 02 Oct 2024 11:27:54 +02:00

sched: Fix sched_delayed vs cfs_bandwidth

Meeting an unfinished DELAY_DEQUEUE treated entity in unthrottle_cfs_rq()
leads to a couple terminal scenarios.  Finish it first, so ENQUEUE_WAKEUP
can proceed as it would have sans DELAY_DEQUEUE treatment.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/7515d2e64c989b9e3b828a9e21bcd959b99df06a.camel@gmx.de
---
 kernel/sched/fair.c |  9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 225b31a..b63a7ac 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6058,10 +6058,13 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
 
-		if (se->on_rq) {
-			SCHED_WARN_ON(se->sched_delayed);
+		/* Handle any unfinished DELAY_DEQUEUE business first. */
+		if (se->sched_delayed) {
+			int flags = DEQUEUE_SLEEP | DEQUEUE_DELAYED;
+
+			dequeue_entity(qcfs_rq, se, flags);
+		} else if (se->on_rq)
 			break;
-		}
 		enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
 
 		if (cfs_rq_is_idle(group_cfs_rq(se)))