Make use of the new local_*lock_n*() and local_schedule_work_on() interface
to improve performance & latency on PREEMTP_RT kernels.
For functions that may be scheduled in a different cpu, replace
local_*lock*() by local_lock_n*(), and replace schedule_work_on() by
local_schedule_work_on(). The same happens for flush_work() and
local_flush_work().
This should bring no relevant performance impact on non-RT kernels:
For functions that may be scheduled in a different cpu, the local_*lock's
this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()).
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
mm/swap.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/mm/swap.c b/mm/swap.c
index cd8f0150ba3a..a79f2091eae5 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -760,11 +760,11 @@ void lru_add_drain(void)
* the same cpu. It shouldn't be a problem in !SMP case since
* the core is only one and the locks will disable preemption.
*/
-static void lru_add_and_bh_lrus_drain(void)
+static void lru_add_and_bh_lrus_drain(int cpu)
{
- local_lock(&cpu_fbatches.lock);
- lru_add_drain_cpu(smp_processor_id());
- local_unlock(&cpu_fbatches.lock);
+ local_lock_n(&cpu_fbatches.lock, cpu);
+ lru_add_drain_cpu(cpu);
+ local_unlock_n(&cpu_fbatches.lock, cpu);
invalidate_bh_lrus_cpu();
mlock_drain_local();
}
@@ -782,9 +782,9 @@ void lru_add_drain_cpu_zone(struct zone *zone)
static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
-static void lru_add_drain_per_cpu(struct work_struct *dummy)
+static void lru_add_drain_per_cpu(struct work_struct *w)
{
- lru_add_and_bh_lrus_drain();
+ lru_add_and_bh_lrus_drain(w->data.counter);
}
static bool cpu_needs_drain(unsigned int cpu)
@@ -888,13 +888,13 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
if (cpu_needs_drain(cpu)) {
INIT_WORK(work, lru_add_drain_per_cpu);
- queue_work_on(cpu, mm_percpu_wq, work);
+ local_queue_work_on(cpu, mm_percpu_wq, work);
__cpumask_set_cpu(cpu, &has_work);
}
}
for_each_cpu(cpu, &has_work)
- flush_work(&per_cpu(lru_add_drain_work, cpu));
+ local_flush_work(&per_cpu(lru_add_drain_work, cpu));
done:
mutex_unlock(&lock);
@@ -941,7 +941,7 @@ void lru_cache_disable(void)
#ifdef CONFIG_SMP
__lru_add_drain_all(true);
#else
- lru_add_and_bh_lrus_drain();
+ lru_add_and_bh_lrus_drain(smp_processor_id());
#endif
}
--
2.41.0
On Sat, Jul 29, 2023 at 05:37:33AM -0300, Leonardo Bras wrote:
> Make use of the new local_*lock_n*() and local_schedule_work_on() interface
> to improve performance & latency on PREEMTP_RT kernels.
>
> For functions that may be scheduled in a different cpu, replace
> local_*lock*() by local_lock_n*(), and replace schedule_work_on() by
> local_schedule_work_on(). The same happens for flush_work() and
> local_flush_work().
>
> This should bring no relevant performance impact on non-RT kernels:
> For functions that may be scheduled in a different cpu, the local_*lock's
> this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()).
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
> mm/swap.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
Leo,
I think the interruptions should rather be removed for both
CONFIG_PREEMPT_RT AND !CONFIG_PREEMPT_RT.
The impact of grabbing locks must be properly analyzed and not
"rejected blindly".
Example:
commit 01b44456a7aa7c3b24fa9db7d1714b208b8ef3d8
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Fri Jun 24 13:54:23 2022 +0100
mm/page_alloc: replace local_lock with normal spinlock
struct per_cpu_pages is no longer strictly local as PCP lists can be
drained remotely using a lock for protection. While the use of local_lock
works, it goes against the intent of local_lock which is for "pure CPU
local concurrency control mechanisms and not suited for inter-CPU
concurrency control" (Documentation/locking/locktypes.rst)
local_lock protects against migration between when the percpu pointer is
accessed and the pcp->lock acquired. The lock acquisition is a preemption
point so in the worst case, a task could migrate to another NUMA node and
accidentally allocate remote memory. The main requirement is to pin the
task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
On Tue, 2023-08-08 at 16:39 -0300, Marcelo Tosatti wrote: > On Sat, Jul 29, 2023 at 05:37:33AM -0300, Leonardo Bras wrote: > > Make use of the new local_*lock_n*() and local_schedule_work_on() interface > > to improve performance & latency on PREEMTP_RT kernels. > > > > For functions that may be scheduled in a different cpu, replace > > local_*lock*() by local_lock_n*(), and replace schedule_work_on() by > > local_schedule_work_on(). The same happens for flush_work() and > > local_flush_work(). > > > > This should bring no relevant performance impact on non-RT kernels: > > For functions that may be scheduled in a different cpu, the local_*lock's > > this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()). > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > mm/swap.c | 18 +++++++++--------- > > 1 file changed, 9 insertions(+), 9 deletions(-) > > Leo, > > I think the interruptions should rather be removed for both > CONFIG_PREEMPT_RT AND !CONFIG_PREEMPT_RT. > > The impact of grabbing locks must be properly analyzed and not > "rejected blindly". Yes, I agree with the idea, but we have been noticing a lot of rejection for these ideas lately, as the maintainers perceive this as a big change. My idea here is to provide a general way to improve the PREEMPT_RT scenarios CPU Isolation, even though there is resistance in !PREEMPT_RT case. As I commented, spinlocks are already held by PREEMPT_RT's local_locks hotpaths, and yet are using schedule_work_on() to have remote work done. This patch tries to solve this by using spin_lock(remote_cpu_lock) instead and save a lot of cycles while decreasing IPI at the remote cpu. It looks a simple solution, improving isolation and performance on PREEMPT_RT with no visible drawbacks. I agree the interface is not ideal, and for that I really need you guys help. I understand that having this change merged, we will have more precedence to discuss performance for the !PREEMPT_RT case. What do you think on that? Thanks! Leo > > Example: > > commit 01b44456a7aa7c3b24fa9db7d1714b208b8ef3d8 > Author: Mel Gorman <mgorman@techsingularity.net> > Date: Fri Jun 24 13:54:23 2022 +0100 > > mm/page_alloc: replace local_lock with normal spinlock > > struct per_cpu_pages is no longer strictly local as PCP lists can be > drained remotely using a lock for protection. While the use of local_lock > works, it goes against the intent of local_lock which is for "pure CPU > local concurrency control mechanisms and not suited for inter-CPU > concurrency control" (Documentation/locking/locktypes.rst) > > local_lock protects against migration between when the percpu pointer is > accessed and the pcp->lock acquired. The lock acquisition is a preemption > point so in the worst case, a task could migrate to another NUMA node and > accidentally allocate remote memory. The main requirement is to pin the > task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. >
© 2016 - 2026 Red Hat, Inc.