kernel/locking/percpu-rwsem.c | 5 +++++ 1 file changed, 5 insertions(+)
In the scenario where a large number of containers are created
at the same time, there will be a lot of tasks created in a
short time, and they will be written into cgroup.procs.
copy_process() will require the cgroup_threadgroup_rwsem read
lock, cgroup_procs_write will require the cgroup_threadgroup_rwsem
write lock. As the readers will pre-increase the read_count and
then check whether there is any writers, resulting that the
writer may be starving, especially when there is a steady stream
of readers.
To alleviate this problem, we add one more check whether there
are writers waiting before increasing the read_count, to make
writers getting lock faster.
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
---
kernel/locking/percpu-rwsem.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 6083883c4fe0..66bf18c28b43 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -47,6 +47,11 @@ EXPORT_SYMBOL_GPL(percpu_free_rwsem);
static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
{
+ if (unlikely(atomic_read_acquire(&sem->block))) {
+ rcuwait_wake_up(&sem->writer);
+ return false;
+ }
+
this_cpu_inc(*sem->read_count);
/*
--
2.43.5
On 3/27/25 11:05 PM, Cruz Zhao wrote:
> In the scenario where a large number of containers are created
> at the same time, there will be a lot of tasks created in a
> short time, and they will be written into cgroup.procs.
>
> copy_process() will require the cgroup_threadgroup_rwsem read
> lock, cgroup_procs_write will require the cgroup_threadgroup_rwsem
> write lock. As the readers will pre-increase the read_count and
> then check whether there is any writers, resulting that the
> writer may be starving, especially when there is a steady stream
> of readers.
>
> To alleviate this problem, we add one more check whether there
> are writers waiting before increasing the read_count, to make
> writers getting lock faster.
>
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> ---
> kernel/locking/percpu-rwsem.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index 6083883c4fe0..66bf18c28b43 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -47,6 +47,11 @@ EXPORT_SYMBOL_GPL(percpu_free_rwsem);
>
> static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
> {
> + if (unlikely(atomic_read_acquire(&sem->block))) {
> + rcuwait_wake_up(&sem->writer);
> + return false;
> + }
> +
> this_cpu_inc(*sem->read_count);
>
> /*
The specific sequence of events are there for a reason. If we disturb
the sequence like that, there is a possibility that a percpu_up_write()
may miss a waiting reader, for example. So a more careful analysis has
to be done.
BTW, how much performance benefit did you gain by making this change? We
certainly need to see some performance metrics.
The design of percpu rwsem prefers readers more with much less
performance overhead than regular rwsem. It also assumes writers come in
once in a while. To be more fair to writer, we use rwsem.
Cheers,
Longman
© 2016 - 2025 Red Hat, Inc.