Disabling preemption entirely during smp_call_function_many_cond() was
primarily for the following reasons:
- To prevent the remote online CPU from going offline. Specifically, we
want to ensure that no new csds are queued after smpcfd_dying_cpu() has
finished. Therefore, preemption must be disabled until all necessary IPIs
are sent.
- To prevent current CPU from going offline. Being migrated to another CPU
and calling csd_lock_wait() may cause UAF due to smpcfd_dead_cpu() during
the current CPU offline process.
- To protect the per-cpu cfd_data from concurrent modification by other
tasks on the current CPU. cfd_data contains cpumasks and per-cpu csds.
Before enqueueing a csd, we block on the csd_lock() to ensure the
previous async csd->func() has completed, and then initialize csd->func and
csd->info. After sending the IPI, we spin-wait for the remote CPU to call
csd_unlock(). Actually the csd_lock mechanism already guarantees csd
serialization. If preemption occurs during csd_lock_wait, other concurrent
smp_call_function_many_cond calls will simply block until the previous
csd->func() completes:
task A task B
sd->func = fun_a
send ipis
preempted by B
--------------->
csd_lock(csd); // block until last
// fun_a finished
csd->func = func_b;
csd->info = info;
...
send ipis
switch back to A
<---------------
csd_lock_wait(csd); // block until remote finish func_*
Previous patches replaced the per-cpu cfd->cpumask with task-local cpumask,
and the percpu csd is allocated only once and is never freed to ensure
we can safely access csd. Now we can enable preemption before
csd_lock_wait() which makes the potentially unpredictable csd_lock_wait()
preemptible and migratable.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
kernel/smp.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 2a33877dd812..4ddb1ec1e43e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -844,7 +844,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
unsigned int scf_flags,
smp_cond_func_t cond_func)
{
- int cpu, last_cpu, this_cpu = smp_processor_id();
+ int cpu, last_cpu, this_cpu;
struct call_function_data *cfd;
bool wait = scf_flags & SCF_WAIT;
struct cpumask *cpumask, *task_mask;
@@ -852,10 +852,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
int nr_cpus = 0;
bool run_remote = false;
- lockdep_assert_preemption_disabled();
-
task_mask = smp_task_ipi_mask(current);
preemptible_wait = task_mask && preemptible();
+
+ this_cpu = get_cpu();
cfd = this_cpu_ptr(&cfd_data);
cpumask = preemptible_wait ? task_mask : cfd->cpumask;
@@ -937,6 +937,19 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
local_irq_restore(flags);
}
+ /*
+ * We may block in csd_lock_wait() for a significant amount of time,
+ * especially when interrupts are disabled or with a large number of
+ * remote CPUs. Try to enable preemption before csd_lock_wait().
+ *
+ * Use the cpumask_stack instead of cfd->cpumask to avoid concurrency
+ * modification from tasks on the same cpu. If preemption occurs during
+ * csd_lock_wait, other concurrent smp_call_function_many_cond() calls
+ * will simply block until the previous csd->func() completes.
+ */
+ if (preemptible_wait)
+ put_cpu();
+
if (run_remote && wait) {
for_each_cpu(cpu, cpumask) {
call_single_data_t *csd;
@@ -945,6 +958,9 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
csd_lock_wait(csd);
}
}
+
+ if (!preemptible_wait)
+ put_cpu();
}
/**
--
2.20.1