[PATCH v6 0/3] sched: Restructure task_mm_cid_work for predictability

Gabriele Monaco posted 3 patches 10 months, 1 week ago
There is a newer version of this series
include/linux/mm_types.h                      |  23 +-
include/linux/sched.h                         |   1 -
kernel/sched/core.c                           |  66 +-----
kernel/sched/sched.h                          |  32 ++-
tools/testing/selftests/rseq/.gitignore       |   1 +
tools/testing/selftests/rseq/Makefile         |   2 +-
.../selftests/rseq/mm_cid_compaction_test.c   | 200 ++++++++++++++++++
7 files changed, 246 insertions(+), 79 deletions(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
[PATCH v6 0/3] sched: Restructure task_mm_cid_work for predictability
Posted by Gabriele Monaco 10 months, 1 week ago
This patchset moves the task_mm_cid_work to a preemptible and migratable
context. This reduces the impact of this task to the scheduling latency
of real time tasks.
The change makes the recurrence of the task a bit more predictable.
We also add optimisation and fixes to make sure the task_mm_cid_work
works as intended.

The behaviour causing latency was introduced in commit 223baf9d17f2
("sched: Fix performance regression introduced by mm_cid") which
introduced a task work tied to the scheduler tick.
That approach presents two possible issues:
* the task work runs before returning to user and causes, in fact, a
  scheduling latency (with order of magnitude significant in PREEMPT_RT)
* periodic tasks with short runtime are less likely to run during the
  tick, hence they might not run the task work at all

Patch 1 allows the mm_cids to be actually compacted when a process
reduces its number of threads, which was not the case since the same
mm_cids were reused to improve cache locality, more details in [4].

Patch 2 contains the main changes, removing the task_work on the
scheduler tick and using a delayed_work instead.
Additionally, we terminate the call immediately if we see that no mm_cid
is actually active, which could happen on processes sleeping for long
time or which exited but whose mm has not been freed yet.

Patch 3 adds a selftest to validate the functionality of the
task_mm_cid_work (i.e. to compact the mm_cids). The test fails if patch
1 is not applied and is flaky without patch 2. We expect it to always
pass with the entire patchset applied.

Changes since V5:
* Punctuation

Changes since V4 [1]:
* Fixes on the selftest
    * Polished memory allocation and cleanup
    * Handle the test failure in main

Changes since V3 [2]:
* Fixes on the selftest
    * Minor style issues in comments and indentation
    * Use of perror where possible
    * Add a barrier to align threads execution
    * Improve test failure and error handling

Changes since V2 [3]:
* Change the order of the patches
* Merge patches changing the main delayed_work logic
* Improved self-test to spawn 1 less thread and use the main one instead

Changes since V1 [4]:
* Re-arm the delayed_work at each invocation
* Cancel the work synchronously at mmdrop
* Remove next scan fields and completely rely on the delayed_work
* Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers)
* Add self test

Overhead comparison in [4]

[1] - https://lore.kernel.org/lkml/20250113074231.61638-4-gmonaco@redhat.com
[2] - https://lore.kernel.org/lkml/20241216130909.240042-1-gmonaco@redhat.com
[3] - https://lore.kernel.org/lkml/20241213095407.271357-1-gmonaco@redhat.com
[4] - https://lore.kernel.org/lkml/20241205083110.180134-2-gmonaco@redhat.com

To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Ingo Molnar <mingo@kernel.org>
To: Paul E. McKenney <paulmck@kernel.org>
To: Shuah Khan <shuah@kernel.org>

Gabriele Monaco (2):
  sched: Move task_mm_cid_work to mm delayed work
  rseq/selftests: Add test for mm_cid compaction

Mathieu Desnoyers (1):
  sched: Compact RSEQ concurrency IDs with reduced threads and affinity

 include/linux/mm_types.h                      |  23 +-
 include/linux/sched.h                         |   1 -
 kernel/sched/core.c                           |  66 +-----
 kernel/sched/sched.h                          |  32 ++-
 tools/testing/selftests/rseq/.gitignore       |   1 +
 tools/testing/selftests/rseq/Makefile         |   2 +-
 .../selftests/rseq/mm_cid_compaction_test.c   | 200 ++++++++++++++++++
 7 files changed, 246 insertions(+), 79 deletions(-)
 create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c


base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
-- 
2.48.1
Re: [PATCH v6 0/3] sched: Restructure task_mm_cid_work for predictability
Posted by Mathieu Desnoyers 10 months, 1 week ago
On 2025-02-10 16:32, Gabriele Monaco wrote:
> This patchset moves the task_mm_cid_work to a preemptible and migratable
> context. This reduces the impact of this task to the scheduling latency
> of real time tasks.
> The change makes the recurrence of the task a bit more predictable.
> We also add optimisation and fixes to make sure the task_mm_cid_work
> works as intended.

Ingo, Peter, this series is ready to be pulled into tip.

Thanks!

Mathieu


> 
> The behaviour causing latency was introduced in commit 223baf9d17f2
> ("sched: Fix performance regression introduced by mm_cid") which
> introduced a task work tied to the scheduler tick.
> That approach presents two possible issues:
> * the task work runs before returning to user and causes, in fact, a
>    scheduling latency (with order of magnitude significant in PREEMPT_RT)
> * periodic tasks with short runtime are less likely to run during the
>    tick, hence they might not run the task work at all
> 
> Patch 1 allows the mm_cids to be actually compacted when a process
> reduces its number of threads, which was not the case since the same
> mm_cids were reused to improve cache locality, more details in [4].
> 
> Patch 2 contains the main changes, removing the task_work on the
> scheduler tick and using a delayed_work instead.
> Additionally, we terminate the call immediately if we see that no mm_cid
> is actually active, which could happen on processes sleeping for long
> time or which exited but whose mm has not been freed yet.
> 
> Patch 3 adds a selftest to validate the functionality of the
> task_mm_cid_work (i.e. to compact the mm_cids). The test fails if patch
> 1 is not applied and is flaky without patch 2. We expect it to always
> pass with the entire patchset applied.
> 
> Changes since V5:
> * Punctuation
> 
> Changes since V4 [1]:
> * Fixes on the selftest
>      * Polished memory allocation and cleanup
>      * Handle the test failure in main
> 
> Changes since V3 [2]:
> * Fixes on the selftest
>      * Minor style issues in comments and indentation
>      * Use of perror where possible
>      * Add a barrier to align threads execution
>      * Improve test failure and error handling
> 
> Changes since V2 [3]:
> * Change the order of the patches
> * Merge patches changing the main delayed_work logic
> * Improved self-test to spawn 1 less thread and use the main one instead
> 
> Changes since V1 [4]:
> * Re-arm the delayed_work at each invocation
> * Cancel the work synchronously at mmdrop
> * Remove next scan fields and completely rely on the delayed_work
> * Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers)
> * Add self test
> 
> Overhead comparison in [4]
> 
> [1] - https://lore.kernel.org/lkml/20250113074231.61638-4-gmonaco@redhat.com
> [2] - https://lore.kernel.org/lkml/20241216130909.240042-1-gmonaco@redhat.com
> [3] - https://lore.kernel.org/lkml/20241213095407.271357-1-gmonaco@redhat.com
> [4] - https://lore.kernel.org/lkml/20241205083110.180134-2-gmonaco@redhat.com
> 
> To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> To: Peter Zijlstra <peterz@infradead.org>
> To: Ingo Molnar <mingo@kernel.org>
> To: Paul E. McKenney <paulmck@kernel.org>
> To: Shuah Khan <shuah@kernel.org>
> 
> Gabriele Monaco (2):
>    sched: Move task_mm_cid_work to mm delayed work
>    rseq/selftests: Add test for mm_cid compaction
> 
> Mathieu Desnoyers (1):
>    sched: Compact RSEQ concurrency IDs with reduced threads and affinity
> 
>   include/linux/mm_types.h                      |  23 +-
>   include/linux/sched.h                         |   1 -
>   kernel/sched/core.c                           |  66 +-----
>   kernel/sched/sched.h                          |  32 ++-
>   tools/testing/selftests/rseq/.gitignore       |   1 +
>   tools/testing/selftests/rseq/Makefile         |   2 +-
>   .../selftests/rseq/mm_cid_compaction_test.c   | 200 ++++++++++++++++++
>   7 files changed, 246 insertions(+), 79 deletions(-)
>   create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
> 
> 
> base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com