[PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs

Waiman Long posted 1 patch 1 month, 2 weeks ago
There is a newer version of this series
kernel/rcu/tasks.h | 4 ++--
kernel/rcu/tree.c  | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
[PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Waiman Long 1 month, 2 weeks ago
It was discovered that isolated CPUs could sometimes be disturbed by
kworkers processing kfree_rcu() works causing higher than expected
latency. It is because the RCU core uses "system_wq" which doesn't have
the WQ_UNBOUND flag to handle all its work items. Fix this violation of
latency limits by using "system_unbound_wq" in the RCU core instead.
This will ensure that those work items will not be run on CPUs marked
as isolated.

Beside the WQ_UNBOUND flag, the other major difference between system_wq
and system_unbound_wq is their max_active count. The system_unbound_wq
has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.

Reported-by: Vratislav Bendel <vbendel@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-50220
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/rcu/tasks.h | 4 ++--
 kernel/rcu/tree.c  | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e641cc681901..494aa9513d0b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
 	if (delayed_work_pending(&krcp->monitor_work)) {
 		delay_left = krcp->monitor_work.timer.expires - jiffies;
 		if (delay < delay_left)
-			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
+			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
 		return;
 	}
-	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
+	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
 }
 
 static void
@@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
 			// be that the work is in the pending state when
 			// channels have been detached following by each
 			// other.
-			queue_rcu_work(system_wq, &krwp->rcu_work);
+			queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
 		}
 	}
 
@@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
 			!atomic_xchg(&krcp->work_in_progress, 1)) {
 		if (atomic_read(&krcp->backoff_page_cache_fill)) {
-			queue_delayed_work(system_wq,
+			queue_delayed_work(system_unbound_wq,
 				&krcp->page_cache_work,
 					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
 		} else {
-- 
2.43.5
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Neeraj Upadhyay 1 month, 2 weeks ago
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> It was discovered that isolated CPUs could sometimes be disturbed by
> kworkers processing kfree_rcu() works causing higher than expected
> latency. It is because the RCU core uses "system_wq" which doesn't have
> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> latency limits by using "system_unbound_wq" in the RCU core instead.
> This will ensure that those work items will not be run on CPUs marked
> as isolated.
> 

Alternative approach here could be, in case we want to keep per CPU worker
pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
is fixing?


- Neeraj

> Beside the WQ_UNBOUND flag, the other major difference between system_wq
> and system_unbound_wq is their max_active count. The system_unbound_wq
> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> 
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Closes: https://issues.redhat.com/browse/RHEL-50220
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/rcu/tasks.h | 4 ++--
>  kernel/rcu/tree.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e641cc681901..494aa9513d0b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>  	if (delayed_work_pending(&krcp->monitor_work)) {
>  		delay_left = krcp->monitor_work.timer.expires - jiffies;
>  		if (delay < delay_left)
> -			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
> +			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  		return;
>  	}
> -	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
> +	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  }
>  
>  static void
> @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  			// be that the work is in the pending state when
>  			// channels have been detached following by each
>  			// other.
> -			queue_rcu_work(system_wq, &krwp->rcu_work);
> +			queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
>  		}
>  	}
>  
> @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>  			!atomic_xchg(&krcp->work_in_progress, 1)) {
>  		if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -			queue_delayed_work(system_wq,
> +			queue_delayed_work(system_unbound_wq,
>  				&krcp->page_cache_work,
>  					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>  		} else {
> -- 
> 2.43.5
>
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Waiman Long 1 month, 2 weeks ago
On 7/25/24 11:35, Neeraj Upadhyay wrote:
> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
>> It was discovered that isolated CPUs could sometimes be disturbed by
>> kworkers processing kfree_rcu() works causing higher than expected
>> latency. It is because the RCU core uses "system_wq" which doesn't have
>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
>> latency limits by using "system_unbound_wq" in the RCU core instead.
>> This will ensure that those work items will not be run on CPUs marked
>> as isolated.
>>
> Alternative approach here could be, in case we want to keep per CPU worker
> pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
> WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
> is fixing?

What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or 
what will we lose by using system_unbound_wq? All the calls that are 
modified to use system_unbound_wq are using WORK_CPU_UNBOUND as their 
cpu. IOW, they doesn't care which CPUs are used to run the work items. 
The only downside I can see is the possible loss of some cache locality.

In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An 
WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a 
WQ_CPU_INTENSIVE workqueue.

Cheers,
Longman
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Neeraj Upadhyay 1 month, 2 weeks ago
On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote:
> On 7/25/24 11:35, Neeraj Upadhyay wrote:
> > On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> > > It was discovered that isolated CPUs could sometimes be disturbed by
> > > kworkers processing kfree_rcu() works causing higher than expected
> > > latency. It is because the RCU core uses "system_wq" which doesn't have
> > > the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> > > latency limits by using "system_unbound_wq" in the RCU core instead.
> > > This will ensure that those work items will not be run on CPUs marked
> > > as isolated.
> > > 
> > Alternative approach here could be, in case we want to keep per CPU worker
> > pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
> > WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
> > is fixing?
> 
> What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what
> will we lose by using system_unbound_wq? All the calls that are modified to
> use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they
> doesn't care which CPUs are used to run the work items. The only downside I
> can see is the possible loss of some cache locality.
> 

For the nohz_full case, where unbounded pool workers run only on housekeeping CPU
(cpu0), if multiple other CPUs are queuing work, the execution of those
works could get delayed. However, this should not generally happen as
other CPUs would be mostly running in user mode.


> In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An
> WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a
> WQ_CPU_INTENSIVE workqueue.

Got it, thanks!

I have picked the patch for further review and testing [1]


[1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next


- Neeraj

> 
> Cheers,
> Longman
> 
>
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Waiman Long 1 month, 2 weeks ago
On 7/25/24 15:33, Neeraj Upadhyay wrote:
> On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote:
>> On 7/25/24 11:35, Neeraj Upadhyay wrote:
>>> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
>>>> It was discovered that isolated CPUs could sometimes be disturbed by
>>>> kworkers processing kfree_rcu() works causing higher than expected
>>>> latency. It is because the RCU core uses "system_wq" which doesn't have
>>>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
>>>> latency limits by using "system_unbound_wq" in the RCU core instead.
>>>> This will ensure that those work items will not be run on CPUs marked
>>>> as isolated.
>>>>
>>> Alternative approach here could be, in case we want to keep per CPU worker
>>> pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
>>> WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
>>> is fixing?
>> What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what
>> will we lose by using system_unbound_wq? All the calls that are modified to
>> use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they
>> doesn't care which CPUs are used to run the work items. The only downside I
>> can see is the possible loss of some cache locality.
>>
> For the nohz_full case, where unbounded pool workers run only on housekeeping CPU
> (cpu0), if multiple other CPUs are queuing work, the execution of those
> works could get delayed. However, this should not generally happen as
> other CPUs would be mostly running in user mode.
Well, it there is only one housekeeping CPU, a lot of background kernel 
tasks will be slowed down. Users should be careful about the proper 
balance between the number of housekeeping and nohz-full CPUs.
>
>
>> In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An
>> WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a
>> WQ_CPU_INTENSIVE workqueue.
> Got it, thanks!
>
> I have picked the patch for further review and testing [1]
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next

Thanks, let me know if you see any problem.

Cheers,
Longman
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Breno Leitao 1 month, 2 weeks ago
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> It was discovered that isolated CPUs could sometimes be disturbed by
> kworkers processing kfree_rcu() works causing higher than expected
> latency. It is because the RCU core uses "system_wq" which doesn't have
> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> latency limits by using "system_unbound_wq" in the RCU core instead.
> This will ensure that those work items will not be run on CPUs marked
> as isolated.
> 
> Beside the WQ_UNBOUND flag, the other major difference between system_wq
> and system_unbound_wq is their max_active count. The system_unbound_wq
> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> 
> Reported-by: Vratislav Bendel <vbendel@redhat.com>

I've seen this problem a while ago and reported to the list:

	https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/

I've just applied this test, and run my workload for 2 hours without
hitting this issue. Thanks for solving it.

Tested-by: Breno Leitao <leitao@debian.org>
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Waiman Long 1 month, 1 week ago
On 7/24/24 09:30, Breno Leitao wrote:
> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
>> It was discovered that isolated CPUs could sometimes be disturbed by
>> kworkers processing kfree_rcu() works causing higher than expected
>> latency. It is because the RCU core uses "system_wq" which doesn't have
>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
>> latency limits by using "system_unbound_wq" in the RCU core instead.
>> This will ensure that those work items will not be run on CPUs marked
>> as isolated.
>>
>> Beside the WQ_UNBOUND flag, the other major difference between system_wq
>> and system_unbound_wq is their max_active count. The system_unbound_wq
>> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
>> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
>>
>> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> I've seen this problem a while ago and reported to the list:
>
> 	https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/
>
> I've just applied this test, and run my workload for 2 hours without
> hitting this issue. Thanks for solving it.
>
> Tested-by: Breno Leitao <leitao@debian.org>

Thank for testing this patch. So it is just us that saw this problem.

Cheers,
Longman
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Uladzislau Rezki 1 month, 2 weeks ago
On Wed, Jul 24, 2024 at 06:30:29AM -0700, Breno Leitao wrote:
> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> > It was discovered that isolated CPUs could sometimes be disturbed by
> > kworkers processing kfree_rcu() works causing higher than expected
> > latency. It is because the RCU core uses "system_wq" which doesn't have
> > the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> > latency limits by using "system_unbound_wq" in the RCU core instead.
> > This will ensure that those work items will not be run on CPUs marked
> > as isolated.
> > 
> > Beside the WQ_UNBOUND flag, the other major difference between system_wq
> > and system_unbound_wq is their max_active count. The system_unbound_wq
> > has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> > is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> > 
> > Reported-by: Vratislav Bendel <vbendel@redhat.com>
> 
> I've seen this problem a while ago and reported to the list:
> 
> 	https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/
> 
> I've just applied this test, and run my workload for 2 hours without
> hitting this issue. Thanks for solving it.
> 
> Tested-by: Breno Leitao <leitao@debian.org>
>
Thank you for testing this! I saw your recent email about that :)

--
Uladzislau Rezki
Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs
Posted by Uladzislau Rezki 1 month, 2 weeks ago
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
> It was discovered that isolated CPUs could sometimes be disturbed by
> kworkers processing kfree_rcu() works causing higher than expected
> latency. It is because the RCU core uses "system_wq" which doesn't have
> the WQ_UNBOUND flag to handle all its work items. Fix this violation of
> latency limits by using "system_unbound_wq" in the RCU core instead.
> This will ensure that those work items will not be run on CPUs marked
> as isolated.
> 
> Beside the WQ_UNBOUND flag, the other major difference between system_wq
> and system_unbound_wq is their max_active count. The system_unbound_wq
> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active
> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE.
> 
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Closes: https://issues.redhat.com/browse/RHEL-50220
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/rcu/tasks.h | 4 ++--
>  kernel/rcu/tree.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e641cc681901..494aa9513d0b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>  	if (delayed_work_pending(&krcp->monitor_work)) {
>  		delay_left = krcp->monitor_work.timer.expires - jiffies;
>  		if (delay < delay_left)
> -			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
> +			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  		return;
>  	}
> -	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
> +	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
>  }
>  
>  static void
> @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  			// be that the work is in the pending state when
>  			// channels have been detached following by each
>  			// other.
> -			queue_rcu_work(system_wq, &krwp->rcu_work);
> +			queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
>  		}
>  	}
>  
> @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>  			!atomic_xchg(&krcp->work_in_progress, 1)) {
>  		if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -			queue_delayed_work(system_wq,
> +			queue_delayed_work(system_unbound_wq,
>  				&krcp->page_cache_work,
>  					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>  		} else {
> -- 
> 2.43.5
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Thanks!

--
Uladzislau Rezki