[PATCH v3] sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer

zhidao su posted 1 patch 1 month ago
kernel/sched/ext.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
[PATCH v3] sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
Posted by zhidao su 1 month ago
scx_enable() uses double-checked locking to lazily initialize a static
kthread_worker pointer. The fast path reads helper locklessly:

    if (!READ_ONCE(helper)) {          // lockless read -- no helper_mutex

The write side initializes helper under helper_mutex, but previously
used a plain assignment:

        helper = kthread_run_worker(0, "scx_enable_helper");
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 plain write -- KCSAN data race with READ_ONCE() above

Since READ_ONCE() on the fast path and the plain write on the
initialization path access the same variable without a common lock,
they constitute a data race. KCSAN requires that all sides of a
lock-free access use READ_ONCE()/WRITE_ONCE() consistently.

Use a temporary variable to stage the result of kthread_run_worker(),
and only WRITE_ONCE() into helper after confirming the pointer is
valid. This avoids a window where a concurrent caller on the fast path
could observe an ERR pointer via READ_ONCE(helper) before the error
check completes.

Fixes: b06ccbabe250 ("sched_ext: Fix starvation of scx_enable() under fair-class saturation")
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
---
v3: Use a temporary variable to stage kthread_run_worker() result;
    only write to helper after confirming validity, eliminating the
    window where an ERR pointer could be observed via the lockless
    fast path (Tejun Heo)
v2: Add missing Fixes: tag (Andrea Righi)
---
 kernel/sched/ext.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 9a1471ad5ae7..63dc7cf1e2a7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5355,13 +5355,14 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
 	if (!READ_ONCE(helper)) {
 		mutex_lock(&helper_mutex);
 		if (!helper) {
-			helper = kthread_run_worker(0, "scx_enable_helper");
-			if (IS_ERR_OR_NULL(helper)) {
-				helper = NULL;
+			struct kthread_worker *w =
+				kthread_run_worker(0, "scx_enable_helper");
+			if (IS_ERR_OR_NULL(w)) {
 				mutex_unlock(&helper_mutex);
 				return -ENOMEM;
 			}
-			sched_set_fifo(helper->task);
+			sched_set_fifo(w->task);
+			WRITE_ONCE(helper, w);
 		}
 		mutex_unlock(&helper_mutex);
 	}
-- 
2.43.0
Re: [PATCH v3] sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
Posted by Tejun Heo 1 month ago
> sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer

Applied to sched_ext/for-7.0-fixes.

Thanks.

--
tejun
Re: [PATCH v3] sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
Posted by zhidao su 1 month ago
Tested v3 on a KCSAN-enabled kernel (CONFIG_KCSAN=y, non-strict mode)
built from this tree and booted via virtme-ng:

  Kernel: 7.0.0-rc2
  CPUs: 4 (QEMU/KVM)
  Memory: 4G

All 28 sched_ext selftests pass:

  PASSED:  28
  SKIPPED: 0
  FAILED:  0

No KCSAN reports related to the scx_enable helper pointer were
observed during the test run.

The fix (staging kthread_run_worker() result in a local variable and
only calling WRITE_ONCE(helper, w) after confirming validity) correctly
closes the window where a concurrent lockless reader could observe an
ERR pointer via READ_ONCE(helper).

zhidao su