[PATCH] sched/fair: Warn against unbalanced util_est during dequeue

Hongyan Xia posted 1 patch 1 year, 5 months ago
kernel/sched/fair.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
[PATCH] sched/fair: Warn against unbalanced util_est during dequeue
Posted by Hongyan Xia 1 year, 5 months ago
The latest tip/sched/core seems to have an unbalanced util_est issue. As
I am debugging it, I found the logic util_est_dequeue() doesn't make
much sense. It guards against util_est underflow like this:

enqueued -= min_t(unsigned int, enqueued, _task_util_est(p));

However, this is misleading because when the number of tasks drops to 0,
util_est should be exactly 0, no more, no less, because
util_est_update() is done after util_est_dequeue() and there is no
change of util_est between util_est_enqueue() and dequeue(). Even if the
current logic guards against underflow, it doesn't guard against
overflow and may cause problems to go un-noticed.

Do not guard against underflow and add a warning to check that util_est
should be exactly 0 when queue is empty.

Signed-off-by: Hongyan Xia <hongyan.xia2@arm.com>
---
 kernel/sched/fair.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fea057b311f6..574ef19df64b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4950,7 +4950,7 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
 
 	/* Update root cfs_rq's estimated utilization */
 	enqueued  = cfs_rq->avg.util_est;
-	enqueued -= min_t(unsigned int, enqueued, _task_util_est(p));
+	enqueued -= _task_util_est(p);
 	WRITE_ONCE(cfs_rq->avg.util_est, enqueued);
 
 	trace_sched_util_est_cfs_tp(cfs_rq);
@@ -7176,10 +7176,14 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	util_est_dequeue(&rq->cfs, p);
 
 	if (dequeue_entities(rq, &p->se, flags) < 0) {
+		if (!rq->cfs.h_nr_running)
+			SCHED_WARN_ON(rq->cfs.avg.util_est);
 		util_est_update(&rq->cfs, p, DEQUEUE_SLEEP);
 		return false;
 	}
 
+	if (!rq->cfs.h_nr_running)
+		SCHED_WARN_ON(rq->cfs.avg.util_est);
 	util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
 	hrtick_update(rq);
 	return true;
-- 
2.34.1
Re: [PATCH] sched/fair: Warn against unbalanced util_est during dequeue
Posted by kernel test robot 1 year, 5 months ago
Hi Hongyan,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master peterz-queue/sched/core next-20240821]
[cannot apply to tip/auto-latest linus/master v6.11-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Hongyan-Xia/sched-fair-Warn-against-unbalanced-util_est-during-dequeue/20240821-210618
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/752ae417c02b9277ca3ec18893747c54dd5f277f.1724245193.git.hongyan.xia2%40arm.com
patch subject: [PATCH] sched/fair: Warn against unbalanced util_est during dequeue
config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20240822/202408221018.b3g39QVN-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 26670e7fa4f032a019d23d56c6a02926e854e8af)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240822/202408221018.b3g39QVN-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408221018.b3g39QVN-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/sched/fair.c:27:
   In file included from include/linux/mm_api.h:1:
   In file included from include/linux/mm.h:2228:
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
>> kernel/sched/fair.c:7180:26: error: no member named 'avg' in 'struct cfs_rq'
    7180 |                         SCHED_WARN_ON(rq->cfs.avg.util_est);
         |                                       ~~~~~~~ ^
   kernel/sched/sched.h:97:42: note: expanded from macro 'SCHED_WARN_ON'
      97 | # define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
         |                                          ^
   kernel/sched/fair.c:7186:25: error: no member named 'avg' in 'struct cfs_rq'
    7186 |                 SCHED_WARN_ON(rq->cfs.avg.util_est);
         |                               ~~~~~~~ ^
   kernel/sched/sched.h:97:42: note: expanded from macro 'SCHED_WARN_ON'
      97 | # define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
         |                                          ^
   1 warning and 2 errors generated.


vim +7180 kernel/sched/fair.c

  7168	
  7169	/*
  7170	 * The dequeue_task method is called before nr_running is
  7171	 * decreased. We remove the task from the rbtree and
  7172	 * update the fair scheduling stats:
  7173	 */
  7174	static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
  7175	{
  7176		util_est_dequeue(&rq->cfs, p);
  7177	
  7178		if (dequeue_entities(rq, &p->se, flags) < 0) {
  7179			if (!rq->cfs.h_nr_running)
> 7180				SCHED_WARN_ON(rq->cfs.avg.util_est);
  7181			util_est_update(&rq->cfs, p, DEQUEUE_SLEEP);
  7182			return false;
  7183		}
  7184	
  7185		if (!rq->cfs.h_nr_running)
  7186			SCHED_WARN_ON(rq->cfs.avg.util_est);
  7187		util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
  7188		hrtick_update(rq);
  7189		return true;
  7190	}
  7191	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] sched/fair: Warn against unbalanced util_est during dequeue
Posted by kernel test robot 1 year, 5 months ago
Hi Hongyan,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master peterz-queue/sched/core next-20240821]
[cannot apply to tip/auto-latest linus/master v6.11-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Hongyan-Xia/sched-fair-Warn-against-unbalanced-util_est-during-dequeue/20240821-210618
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/752ae417c02b9277ca3ec18893747c54dd5f277f.1724245193.git.hongyan.xia2%40arm.com
patch subject: [PATCH] sched/fair: Warn against unbalanced util_est during dequeue
config: alpha-allnoconfig (https://download.01.org/0day-ci/archive/20240822/202408221024.pQc774Ya-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240822/202408221024.pQc774Ya-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408221024.pQc774Ya-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/sched/fair.c:54:
   kernel/sched/fair.c: In function 'dequeue_task_fair':
>> kernel/sched/fair.c:7180:46: error: 'struct cfs_rq' has no member named 'avg'
    7180 |                         SCHED_WARN_ON(rq->cfs.avg.util_est);
         |                                              ^
   kernel/sched/sched.h:97:42: note: in definition of macro 'SCHED_WARN_ON'
      97 | # define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
         |                                          ^
   kernel/sched/sched.h:97:44: warning: left-hand operand of comma expression has no effect [-Wunused-value]
      97 | # define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
         |                                            ^
   kernel/sched/fair.c:7180:25: note: in expansion of macro 'SCHED_WARN_ON'
    7180 |                         SCHED_WARN_ON(rq->cfs.avg.util_est);
         |                         ^~~~~~~~~~~~~
   kernel/sched/fair.c:7186:38: error: 'struct cfs_rq' has no member named 'avg'
    7186 |                 SCHED_WARN_ON(rq->cfs.avg.util_est);
         |                                      ^
   kernel/sched/sched.h:97:42: note: in definition of macro 'SCHED_WARN_ON'
      97 | # define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
         |                                          ^
   kernel/sched/sched.h:97:44: warning: left-hand operand of comma expression has no effect [-Wunused-value]
      97 | # define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
         |                                            ^
   kernel/sched/fair.c:7186:17: note: in expansion of macro 'SCHED_WARN_ON'
    7186 |                 SCHED_WARN_ON(rq->cfs.avg.util_est);
         |                 ^~~~~~~~~~~~~


vim +7180 kernel/sched/fair.c

  7168	
  7169	/*
  7170	 * The dequeue_task method is called before nr_running is
  7171	 * decreased. We remove the task from the rbtree and
  7172	 * update the fair scheduling stats:
  7173	 */
  7174	static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
  7175	{
  7176		util_est_dequeue(&rq->cfs, p);
  7177	
  7178		if (dequeue_entities(rq, &p->se, flags) < 0) {
  7179			if (!rq->cfs.h_nr_running)
> 7180				SCHED_WARN_ON(rq->cfs.avg.util_est);
  7181			util_est_update(&rq->cfs, p, DEQUEUE_SLEEP);
  7182			return false;
  7183		}
  7184	
  7185		if (!rq->cfs.h_nr_running)
  7186			SCHED_WARN_ON(rq->cfs.avg.util_est);
  7187		util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
  7188		hrtick_update(rq);
  7189		return true;
  7190	}
  7191	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki