From nobody Mon Oct 6 20:58:59 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B75F81A238C for ; Wed, 16 Jul 2025 16:06:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682011; cv=none; b=nmjXQj2qaAKJ78iYJm1KyH9H/r3ad6Y/H8QJItx8FrB7k0cJviG/8mxhMr39Bxx2gJgJ8a718cOqJelqOcw+COyPaYy41DUh944NBWuBtGzrGUlg91iDDSh5Q5BrXm5AV+R+lZqulHF3M/lXV6KTKJ2Cz4rAhU9J/iVMSuONThE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682011; c=relaxed/simple; bh=edSYU/bBIK5jUtlJoKdM/Esaxbt7uRjtsFTqe0jDFKQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=du7ShCeIxG+dPZIa67dHgzDaVVLryikzNUBA4WHB9cxVUFEat3vDyGckwATqTEetdKJvyo9CeN6ARxG3bUABiFghS7N2JUfNBqc+R4fWMoVDUojOO3Lqz+j3rwn2bOAtjsk0gBRU+LW4bce9o86N9KPiA8TQAHWyww2NvXd4NKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CJ8A84mw; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CJ8A84mw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752682008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cUMR6AgLLIFcwMdK2r2Cw5KLluDgtRgYt1YiW6x37yA=; b=CJ8A84mwhgN4XHGJjFOK47yehMBJGlWalNI1gqCX+0JKZMUgcBL5qMFmLs1DwYzWhreSUc fOZpYm2sQgXLvzMlt1Wwx56WXiil1DVOBlNELcpbJ73QJPeeXbI4kYuwAtabEEVJhfWMpH XdGFvFXm4VdngQUVv54YsF6tLG7ZsYQ= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-665-_l-FqFHJO2OE8CwW5JxQEQ-1; Wed, 16 Jul 2025 12:06:45 -0400 X-MC-Unique: _l-FqFHJO2OE8CwW5JxQEQ-1 X-Mimecast-MFC-AGG-ID: _l-FqFHJO2OE8CwW5JxQEQ_1752682004 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0002F19560A5; Wed, 16 Jul 2025 16:06:43 +0000 (UTC) Received: from gmonaco-thinkpadt14gen3.rmtit.com (unknown [10.44.33.144]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EE7DA19560AB; Wed, 16 Jul 2025 16:06:40 +0000 (UTC) From: Gabriele Monaco To: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , sched-ext@lists.linux.dev Cc: Gabriele Monaco , Mathieu Desnoyers , Ingo Molnar Subject: [PATCH v2 1/4] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Date: Wed, 16 Jul 2025 18:06:05 +0200 Message-ID: <20250716160603.138385-7-gmonaco@redhat.com> In-Reply-To: <20250716160603.138385-6-gmonaco@redhat.com> References: <20250716160603.138385-6-gmonaco@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" The fair scheduling class relies on prev_sum_exec_runtime to compute the duration of the task's runtime since it was last scheduled. This value is currently not required by other scheduling classes but can be useful to understand long running tasks and take certain actions (e.g. during a scheduler tick). Add support for prev_sum_exec_runtime to the RT, deadline and sched_ext classes by simply assigning the sum_exec_runtime at each set_next_task. Reviewed-by: Mathieu Desnoyers Signed-off-by: Gabriele Monaco --- kernel/sched/deadline.c | 1 + kernel/sched/ext.c | 1 + kernel/sched/rt.c | 1 + 3 files changed, 3 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 89019a1408264..65ecd86bae37d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2389,6 +2389,7 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) p->se.exec_start =3D rq_clock_task(rq); if (on_dl_rq(&p->dl)) update_stats_wait_end_dl(dl_rq, dl_se); + p->se.prev_sum_exec_runtime =3D p->se.sum_exec_runtime; =20 /* You can't push away the running task */ dequeue_pushable_dl_task(rq, p); diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index b498d867ba210..a4ac4386b9795 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3255,6 +3255,7 @@ static void set_next_task_scx(struct rq *rq, struct t= ask_struct *p, bool first) } =20 p->se.exec_start =3D rq_clock_task(rq); + p->se.prev_sum_exec_runtime =3D p->se.sum_exec_runtime; =20 /* see dequeue_task_scx() on why we skip when !QUEUED */ if (SCX_HAS_OP(sch, running) && (p->scx.flags & SCX_TASK_QUEUED)) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index e40422c370335..2c70ff2042ee9 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1693,6 +1693,7 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f p->se.exec_start =3D rq_clock_task(rq); if (on_rt_rq(&p->rt)) update_stats_wait_end_rt(rt_rq, rt_se); + p->se.prev_sum_exec_runtime =3D p->se.sum_exec_runtime; =20 /* The running task is never eligible for pushing */ dequeue_pushable_task(rq, p); --=20 2.50.1 From nobody Mon Oct 6 20:58:59 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EF472F5336 for ; Wed, 16 Jul 2025 16:06:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682018; cv=none; b=pumXi7uoj0h97xehbJvu2BFwloaTH0mgUn9guj2LU2q0iMrV+npm3q0i6teJH40DGUBfaqfRigphA1kwHqxE+fo60078U5OAZOZuImBDQBX127bSSx4HFmlfNaD55GL88jDWRNvZM4MZWVyTGah2cHBm/KRO2bG+31IfXkFQk4E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682018; c=relaxed/simple; bh=ifMnAwGYNxyxP5tsydOHaDt5x413efy5X35ku/pADAQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K8LjDQmE17ibuGXzve5uFWx3br1IUd9SdKPIATKLK7bNb0DpZ4YZ0RFbmxZ5PiU9LgRJQ9s9QOBW5IqjKJMWXhxYJ6CfA3OYtDZRDG24PztQnRrWxe9tIqK6ucnPFBtwUZGD+XoucCz9viJf5O6dihxFpWaL9OW2aj3oDN8FkDA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=i+YWym5q; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="i+YWym5q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752682015; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MkHMZF+W7s2KVUTf3+zNGyNPfFECcfwDxIyxKbBuCS4=; b=i+YWym5qAP0p+h0HtNlUTFjXzN3d/HuzvIUbhefBajLW5SENSRNOMhQAndCptzxJzx655z FfRNGPAtsz0h6b4URfibAH5JtuQTwAVkF4WSyPxjTvJ1bqeV2sqo2j2ghK+dX7fDqlculQ R5fZCdWX1H3997DE4tKa/guaeNLrthw= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-624-n75QxM0UM3OWc2VkwffocQ-1; Wed, 16 Jul 2025 12:06:52 -0400 X-MC-Unique: n75QxM0UM3OWc2VkwffocQ-1 X-Mimecast-MFC-AGG-ID: n75QxM0UM3OWc2VkwffocQ_1752682010 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2D3101800283; Wed, 16 Jul 2025 16:06:50 +0000 (UTC) Received: from gmonaco-thinkpadt14gen3.rmtit.com (unknown [10.44.33.144]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B48B319560AB; Wed, 16 Jul 2025 16:06:45 +0000 (UTC) From: Gabriele Monaco To: linux-kernel@vger.kernel.org, Andrew Morton , David Hildenbrand , Ingo Molnar , Peter Zijlstra , Mathieu Desnoyers , "Paul E. McKenney" , linux-mm@kvack.org Cc: Gabriele Monaco , Ingo Molnar Subject: [PATCH v2 2/4] rseq: Run the mm_cid_compaction from rseq_handle_notify_resume() Date: Wed, 16 Jul 2025 18:06:06 +0200 Message-ID: <20250716160603.138385-8-gmonaco@redhat.com> In-Reply-To: <20250716160603.138385-6-gmonaco@redhat.com> References: <20250716160603.138385-6-gmonaco@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Currently the mm_cid_compaction is triggered by the scheduler tick and runs in a task_work, behaviour is more unpredictable with periodic tasks with short runtime, which may rarely run during a tick. Run the mm_cid_compaction from the rseq_handle_notify_resume() call, which runs from resume_user_mode_work. Since the context is the same where the task_work would run, skip this step and call the compaction function directly. The compaction function still exits prematurely in case the scan is not required, that is when the pseudo-period of 100ms did not elapse. Keep a tick handler used for long running tasks that are never preempted (i.e. that never call rseq_handle_notify_resume), which triggers a compaction and mm_cid update only in that case. Signed-off-by: Gabriele Monaco --- include/linux/mm.h | 2 ++ include/linux/mm_types.h | 11 ++++++++ include/linux/sched.h | 2 +- kernel/rseq.c | 2 ++ kernel/sched/core.c | 55 +++++++++++++++++++++++++--------------- kernel/sched/sched.h | 2 ++ 6 files changed, 53 insertions(+), 21 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index fa538feaa8d95..cc8c1c9ae26c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2294,6 +2294,7 @@ void sched_mm_cid_before_execve(struct task_struct *t= ); void sched_mm_cid_after_execve(struct task_struct *t); void sched_mm_cid_fork(struct task_struct *t); void sched_mm_cid_exit_signals(struct task_struct *t); +void task_mm_cid_work(struct task_struct *t); static inline int task_mm_cid(struct task_struct *t) { return t->mm_cid; @@ -2303,6 +2304,7 @@ static inline void sched_mm_cid_before_execve(struct = task_struct *t) { } static inline void sched_mm_cid_after_execve(struct task_struct *t) { } static inline void sched_mm_cid_fork(struct task_struct *t) { } static inline void sched_mm_cid_exit_signals(struct task_struct *t) { } +static inline void task_mm_cid_work(struct task_struct *t) { } static inline int task_mm_cid(struct task_struct *t) { /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d6b91e8a66d6d..e6d6e468e64b4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1420,6 +1420,13 @@ static inline void mm_set_cpus_allowed(struct mm_str= uct *mm, const struct cpumas WRITE_ONCE(mm->nr_cpus_allowed, cpumask_weight(mm_allowed)); raw_spin_unlock(&mm->cpus_allowed_lock); } + +static inline bool mm_cid_needs_scan(struct mm_struct *mm) +{ + if (!mm) + return false; + return time_after(jiffies, READ_ONCE(mm->mm_cid_next_scan)); +} #else /* CONFIG_SCHED_MM_CID */ static inline void mm_init_cid(struct mm_struct *mm, struct task_struct *p= ) { } static inline int mm_alloc_cid(struct mm_struct *mm, struct task_struct *p= ) { return 0; } @@ -1430,6 +1437,10 @@ static inline unsigned int mm_cid_size(void) return 0; } static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct = cpumask *cpumask) { } +static inline bool mm_cid_needs_scan(struct mm_struct *mm) +{ + return false; +} #endif /* CONFIG_SCHED_MM_CID */ =20 struct mmu_gather; diff --git a/include/linux/sched.h b/include/linux/sched.h index aa9c5be7a6325..a75f61cea2271 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1428,7 +1428,7 @@ struct task_struct { int last_mm_cid; /* Most recent cid in mm */ int migrate_from_cpu; int mm_cid_active; /* Whether cid bitmap is active */ - struct callback_head cid_work; + unsigned long last_cid_reset; /* Time of last reset in jiffies */ #endif =20 struct tlbflush_unmap_batch tlb_ubc; diff --git a/kernel/rseq.c b/kernel/rseq.c index b7a1ec327e811..100f81e330dc6 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -441,6 +441,8 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, = struct pt_regs *regs) } if (unlikely(rseq_update_cpu_node_id(t))) goto error; + /* The mm_cid compaction returns prematurely if scan is not needed. */ + task_mm_cid_work(t); return; =20 error: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 81c6df746df17..27b856a1cb0a9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10589,22 +10589,13 @@ static void sched_mm_cid_remote_clear_weight(stru= ct mm_struct *mm, int cpu, sched_mm_cid_remote_clear(mm, pcpu_cid, cpu); } =20 -static void task_mm_cid_work(struct callback_head *work) +void task_mm_cid_work(struct task_struct *t) { unsigned long now =3D jiffies, old_scan, next_scan; - struct task_struct *t =3D current; struct cpumask *cidmask; - struct mm_struct *mm; int weight, cpu; + struct mm_struct *mm =3D t->mm; =20 - WARN_ON_ONCE(t !=3D container_of(work, struct task_struct, cid_work)); - - work->next =3D work; /* Prevent double-add */ - if (t->flags & PF_EXITING) - return; - mm =3D t->mm; - if (!mm) - return; old_scan =3D READ_ONCE(mm->mm_cid_next_scan); next_scan =3D now + msecs_to_jiffies(MM_CID_SCAN_DELAY); if (!old_scan) { @@ -10643,23 +10634,47 @@ void init_sched_mm_cid(struct task_struct *t) if (mm_users =3D=3D 1) mm->mm_cid_next_scan =3D jiffies + msecs_to_jiffies(MM_CID_SCAN_DELAY); } - t->cid_work.next =3D &t->cid_work; /* Protect against double add */ - init_task_work(&t->cid_work, task_mm_cid_work); } =20 void task_tick_mm_cid(struct rq *rq, struct task_struct *curr) { - struct callback_head *work =3D &curr->cid_work; - unsigned long now =3D jiffies; + u64 rtime =3D curr->se.sum_exec_runtime - curr->se.prev_sum_exec_runtime; =20 + /* + * If a task is running unpreempted for a long time, it won't get its + * mm_cid compacted and won't update its mm_cid value after a + * compaction occurs. + * For such a task, this function does two things: + * A) trigger the mm_cid recompaction, + * B) trigger an update of the task's rseq->mm_cid field at some point + * after recompaction, so it can get a mm_cid value closer to 0. + * A change in the mm_cid triggers an rseq_preempt. + * + * B occurs once after the compaction work completes, neither A nor B + * run as long as the compaction work is pending, the task is exiting + * or is not a userspace task. + */ if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || - work->next !=3D work) + test_tsk_thread_flag(curr, TIF_NOTIFY_RESUME)) return; - if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan))) + if (rtime < RSEQ_UNPREEMPTED_THRESHOLD) return; - - /* No page allocation under rq lock */ - task_work_add(curr, work, TWA_RESUME); + if (mm_cid_needs_scan(curr->mm)) { + /* Trigger mm_cid recompaction */ + rseq_set_notify_resume(curr); + } else if (time_after(jiffies, curr->last_cid_reset + + msecs_to_jiffies(MM_CID_SCAN_DELAY))) { + /* Update mm_cid field */ + int old_cid =3D curr->mm_cid; + + if (!curr->mm_cid_active) + return; + mm_cid_snapshot_time(rq, curr->mm); + mm_cid_put_lazy(curr); + curr->last_mm_cid =3D curr->mm_cid =3D mm_cid_get(rq, curr, curr->mm); + if (old_cid !=3D curr->mm_cid) + rseq_preempt(curr); + } } =20 void sched_mm_cid_exit_signals(struct task_struct *t) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 475bb5998295e..90a5b58188232 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3606,6 +3606,7 @@ extern const char *preempt_modes[]; =20 #define SCHED_MM_CID_PERIOD_NS (100ULL * 1000000) /* 100ms */ #define MM_CID_SCAN_DELAY 100 /* 100ms */ +#define RSEQ_UNPREEMPTED_THRESHOLD SCHED_MM_CID_PERIOD_NS =20 extern raw_spinlock_t cid_lock; extern int use_cid_lock; @@ -3809,6 +3810,7 @@ static inline int mm_cid_get(struct rq *rq, struct ta= sk_struct *t, int cid; =20 lockdep_assert_rq_held(rq); + t->last_cid_reset =3D jiffies; cpumask =3D mm_cidmask(mm); cid =3D __this_cpu_read(pcpu_cid->cid); if (mm_cid_is_valid(cid)) { --=20 2.50.1 From nobody Mon Oct 6 20:58:59 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 701552F530E for ; Wed, 16 Jul 2025 16:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682023; cv=none; b=p0Netm6M1Tc8xoldP9hjLY81w314BU6LnJo+8AGnSmpiqVIw5edCO6s+sBD0cQa3hWQmNEj1/qGx3eQkqwfEQigFXMX5loPGnCSgqaQevyzvD81pxGaW7wheKxzwhfSRn6RSW42DIt3Lq7vmWhSqadJvuD/4aX2/Npk6bBT/Egc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682023; c=relaxed/simple; bh=kdUikepQOPp7iFuazxVdT5rzKbgevZ33o4lVcsCV2lU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fKGJyHUyKkZmKk8TpVZbEvX8cgX/xY+Eb6AH2R/Y+VQtUIA/nPYxWC9OB6F3/dW4RObMspsAPhxKd0Pw1T+ZzC6Ii8iJAbdAn4v9HrG4ZtO0o+sVWO7ow9DfwUn4vgbtWkCyPXk52bIwBu/kt/xJb6nZnEQGW+UzMLzkzLP2sWE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RzEocr9p; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RzEocr9p" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752682020; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ICf9beWgXfGrKSdsZJKdY+yXhXtYFoWe+KDS9YCRDxI=; b=RzEocr9pQdUeP/CvIAP6gJzoGG9Sy32q4AIC6GM13LpCVTJxmSJqNNiO8Zr7YpqPWmGlDy sEpVdSNYPf6Hm0JKvRkwHn5u2hK/F3Kp5JibV280HjlDhVW7bnHKzkuABUPIQ2/yae1ZRC qyzSSov2Xme+WdoAaZSur6Stp4ygBG8= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-67-Iu7prPRiP8Cdi1wrCyuGtA-1; Wed, 16 Jul 2025 12:06:56 -0400 X-MC-Unique: Iu7prPRiP8Cdi1wrCyuGtA-1 X-Mimecast-MFC-AGG-ID: Iu7prPRiP8Cdi1wrCyuGtA_1752682015 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 28C7B1800C31; Wed, 16 Jul 2025 16:06:55 +0000 (UTC) Received: from gmonaco-thinkpadt14gen3.rmtit.com (unknown [10.44.33.144]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7AD6319560AB; Wed, 16 Jul 2025 16:06:51 +0000 (UTC) From: Gabriele Monaco To: linux-kernel@vger.kernel.org, Andrew Morton , David Hildenbrand , Ingo Molnar , Peter Zijlstra , Mathieu Desnoyers , linux-mm@kvack.org Cc: Gabriele Monaco , Ingo Molnar Subject: [PATCH v2 3/4] sched: Compact RSEQ concurrency IDs in batches Date: Wed, 16 Jul 2025 18:06:07 +0200 Message-ID: <20250716160603.138385-9-gmonaco@redhat.com> In-Reply-To: <20250716160603.138385-6-gmonaco@redhat.com> References: <20250716160603.138385-6-gmonaco@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Currently, task_mm_cid_work() is called from resume_user_mode_work(). This can delay the execution of the corresponding thread for the entire duration of the function, negatively affecting the response in case of real time tasks. In practice, we observe task_mm_cid_work increasing the latency of 30-35us on a 128 cores system, this order of magnitude is meaningful under PREEMPT_RT. Run the task_mm_cid_work in batches of up to CONFIG_RSEQ_CID_SCAN_BATCH CPUs, this reduces the duration of the delay for each scan. The task_mm_cid_work contains a mechanism to avoid running more frequently than every 100ms. Keep this pseudo-periodicity only on complete scans. This means each call to task_mm_cid_work returns prematurely if the period did not elapse and a scan is not ongoing (i.e. the next batch to scan is not the first). This way full scans are not excessively delayed while still keeping each run, and introduced latency, short. Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_ci= d") Signed-off-by: Gabriele Monaco --- include/linux/mm_types.h | 15 +++++++++++++++ init/Kconfig | 12 ++++++++++++ kernel/sched/core.c | 37 ++++++++++++++++++++++++++++++++++--- 3 files changed, 61 insertions(+), 3 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e6d6e468e64b4..a822966a584f3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -995,6 +995,13 @@ struct mm_struct { * When the next mm_cid scan is due (in jiffies). */ unsigned long mm_cid_next_scan; + /* + * @mm_cid_scan_batch: Counter for batch used in the next scan. + * + * Scan in batches of CONFIG_RSEQ_CID_SCAN_BATCH. This field + * increments at each scan and reset when all batches are done. + */ + unsigned int mm_cid_scan_batch; /** * @nr_cpus_allowed: Number of CPUs allowed for mm. * @@ -1385,6 +1392,7 @@ static inline void mm_init_cid(struct mm_struct *mm, = struct task_struct *p) raw_spin_lock_init(&mm->cpus_allowed_lock); cpumask_copy(mm_cpus_allowed(mm), &p->cpus_mask); cpumask_clear(mm_cidmask(mm)); + mm->mm_cid_scan_batch =3D 0; } =20 static inline int mm_alloc_cid_noprof(struct mm_struct *mm, struct task_st= ruct *p) @@ -1423,8 +1431,15 @@ static inline void mm_set_cpus_allowed(struct mm_str= uct *mm, const struct cpumas =20 static inline bool mm_cid_needs_scan(struct mm_struct *mm) { + unsigned int next_batch; + if (!mm) return false; + next_batch =3D READ_ONCE(mm->mm_cid_scan_batch); + /* Always needs scan unless it's the first batch. */ + if (CONFIG_RSEQ_CID_SCAN_BATCH * next_batch < num_possible_cpus() && + next_batch) + return true; return time_after(jiffies, READ_ONCE(mm->mm_cid_next_scan)); } #else /* CONFIG_SCHED_MM_CID */ diff --git a/init/Kconfig b/init/Kconfig index 666783eb50abd..98d7f078cd6df 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1860,6 +1860,18 @@ config DEBUG_RSEQ =20 If unsure, say N. =20 +config RSEQ_CID_SCAN_BATCH + int "Number of CPUs to scan at every mm_cid compaction attempt" + range 1 NR_CPUS + default 8 + depends on SCHED_MM_CID + help + CPUs are scanned pseudo-periodically to compact the CID of each task, + this operation can take a longer amount of time on systems with many + CPUs, resulting in higher scheduling latency for the current task. + A higher value means the CID is compacted faster, but results in + higher scheduling latency. + config CACHESTAT_SYSCALL bool "Enable cachestat() system call" if EXPERT default y diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 27b856a1cb0a9..eae4c8faf980b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10591,11 +10591,26 @@ static void sched_mm_cid_remote_clear_weight(stru= ct mm_struct *mm, int cpu, =20 void task_mm_cid_work(struct task_struct *t) { + int weight, cpu, from_cpu, this_batch, next_batch, idx; unsigned long now =3D jiffies, old_scan, next_scan; struct cpumask *cidmask; - int weight, cpu; struct mm_struct *mm =3D t->mm; =20 + /* + * This function is called from __rseq_handle_notify_resume, which + * makes sure t is a user thread and is not exiting. + */ + this_batch =3D READ_ONCE(mm->mm_cid_scan_batch); + next_batch =3D this_batch + 1; + from_cpu =3D cpumask_nth(this_batch * CONFIG_RSEQ_CID_SCAN_BATCH, + cpu_possible_mask); + if (from_cpu >=3D nr_cpu_ids) { + from_cpu =3D 0; + next_batch =3D 1; + } + /* Delay scan only if we are done with all cpus. */ + if (from_cpu !=3D 0) + goto cid_compact; old_scan =3D READ_ONCE(mm->mm_cid_next_scan); next_scan =3D now + msecs_to_jiffies(MM_CID_SCAN_DELAY); if (!old_scan) { @@ -10611,17 +10626,33 @@ void task_mm_cid_work(struct task_struct *t) return; if (!try_cmpxchg(&mm->mm_cid_next_scan, &old_scan, next_scan)) return; + +cid_compact: + if (!try_cmpxchg(&mm->mm_cid_scan_batch, &this_batch, next_batch)) + return; cidmask =3D mm_cidmask(mm); /* Clear cids that were not recently used. */ - for_each_possible_cpu(cpu) + idx =3D 0; + cpu =3D from_cpu; + for_each_cpu_from(cpu, cpu_possible_mask) { + if (idx =3D=3D CONFIG_RSEQ_CID_SCAN_BATCH) + break; sched_mm_cid_remote_clear_old(mm, cpu); + ++idx; + } weight =3D cpumask_weight(cidmask); /* * Clear cids that are greater or equal to the cidmask weight to * recompact it. */ - for_each_possible_cpu(cpu) + idx =3D 0; + cpu =3D from_cpu; + for_each_cpu_from(cpu, cpu_possible_mask) { + if (idx =3D=3D CONFIG_RSEQ_CID_SCAN_BATCH) + break; sched_mm_cid_remote_clear_weight(mm, cpu, weight); + ++idx; + } } =20 void init_sched_mm_cid(struct task_struct *t) --=20 2.50.1 From nobody Mon Oct 6 20:58:59 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 261D92F546F for ; Wed, 16 Jul 2025 16:07:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682030; cv=none; b=oNh/kmowv67wsoXWmEg3HhpMuI3Bp7CqoYncxrLa18h62+YC32PhIEDcxuNO4xfmbKq4f+z1KByq5wsZlIiZMVJmCdzEN3r40vIV21DBa2Okfti8ZSQ67kE1WLw58md1GBmYs6aVCfkJN1jvs1yrQmGjll2p0ePXU3E3LN1fI5U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752682030; c=relaxed/simple; bh=IR/pm9RbdXcGGvlgDUhYFq/oaO5VmsfA64RfbDOguBI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ocbKe6jP/jz+ybvGNCvWgMQskLdCspMGJw87pvJkd6mxtXFI/x4LSGqbQCUntoHowU075qBBxEOxPwdy7F6wOkgrJc5BuhWKK3I2Gi2VpQM7qQtuY/fpq39hriFH1VWjfkQr/FIdz5gswpro+BIQrmCzQzusFqNGjJpaG2hhBr8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VMy4jHag; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VMy4jHag" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752682028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZtE446LAhzUPlnkevY2yDedHqFX+zH/LYimYHOJXwOo=; b=VMy4jHagmiL3Px0WPlutRpIsxvU4YMi7xRIJMXYinF8PA3XvXRfkuSc1EX7FOJ9fUgJWM0 ZJdQw7ZmkEoVd53F9+DEMGX7zhHqLU/AaqX+N30elROK8y/6SWlVWUueGIgQNPPqpggqMR b7b9ptOBcw30X+IYsFynBac9M9I/73s= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-510-6us77FgONvOnKWfpaRtqbw-1; Wed, 16 Jul 2025 12:07:02 -0400 X-MC-Unique: 6us77FgONvOnKWfpaRtqbw-1 X-Mimecast-MFC-AGG-ID: 6us77FgONvOnKWfpaRtqbw_1752682020 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 235291800872; Wed, 16 Jul 2025 16:07:00 +0000 (UTC) Received: from gmonaco-thinkpadt14gen3.rmtit.com (unknown [10.44.33.144]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9209719560AB; Wed, 16 Jul 2025 16:06:56 +0000 (UTC) From: Gabriele Monaco To: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , "Paul E. McKenney" , Shuah Khan , linux-kselftest@vger.kernel.org Cc: Gabriele Monaco , Shuah Khan , Ingo Molnar Subject: [PATCH v2 4/4] selftests/rseq: Add test for mm_cid compaction Date: Wed, 16 Jul 2025 18:06:08 +0200 Message-ID: <20250716160603.138385-10-gmonaco@redhat.com> In-Reply-To: <20250716160603.138385-6-gmonaco@redhat.com> References: <20250716160603.138385-6-gmonaco@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" A task in the kernel (task_mm_cid_work) runs somewhat periodically to compact the mm_cid for each process. Add a test to validate that it runs correctly and timely. The test spawns 1 thread pinned to each CPU, then each thread, including the main one, runs in short bursts for some time. During this period, the mm_cids should be spanning all numbers between 0 and nproc. At the end of this phase, a thread with high enough mm_cid (>=3D nproc/2) is selected to be the new leader, all other threads terminate. After some time, the only running thread should see 0 as mm_cid, if that doesn't happen, the compaction mechanism didn't work and the test fails. The test never fails if only 1 core is available, in which case, we cannot test anything as the only available mm_cid is 0. Acked-by: Shuah Khan Signed-off-by: Gabriele Monaco --- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../selftests/rseq/mm_cid_compaction_test.c | 204 ++++++++++++++++++ 3 files changed, 206 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selfte= sts/rseq/.gitignore index 0fda241fa62b0..b3920c59bf401 100644 --- a/tools/testing/selftests/rseq/.gitignore +++ b/tools/testing/selftests/rseq/.gitignore @@ -3,6 +3,7 @@ basic_percpu_ops_test basic_percpu_ops_mm_cid_test basic_test basic_rseq_op_test +mm_cid_compaction_test param_test param_test_benchmark param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftest= s/rseq/Makefile index 0d0a5fae59547..bc4d940f66d40 100644 --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -17,7 +17,7 @@ OVERRIDE_TARGETS =3D 1 TEST_GEN_PROGS =3D basic_test basic_percpu_ops_test basic_percpu_ops_mm_ci= d_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \ - syscall_errors_test + syscall_errors_test mm_cid_compaction_test =20 TEST_GEN_PROGS_EXTENDED =3D librseq.so =20 diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/= testing/selftests/rseq/mm_cid_compaction_test.c new file mode 100644 index 0000000000000..d13623625f5a9 --- /dev/null +++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "rseq.h" + +#define VERBOSE 0 +#define printf_verbose(fmt, ...) \ + do { \ + if (VERBOSE) \ + printf(fmt, ##__VA_ARGS__); \ + } while (0) + +/* 50 ms */ +#define RUNNER_PERIOD 50000 +/* + * Number of runs before we terminate or get the token. + * The number is slowly increasing with the number of CPUs as the compacti= on + * process can take longer on larger systems. This is an arbitrary value. + */ +#define THREAD_RUNS (3 + args->num_cpus/8) + +/* + * Number of times we check that the mm_cid were compacted. + * Checks are repeated every RUNNER_PERIOD. + */ +#define MM_CID_COMPACT_TIMEOUT 10 + +struct thread_args { + int cpu; + int num_cpus; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + pthread_t *tinfo; + struct thread_args *args_head; +}; + +static void __noreturn *thread_runner(void *arg) +{ + struct thread_args *args =3D arg; + int i, ret, curr_mm_cid; + cpu_set_t cpumask; + + CPU_ZERO(&cpumask); + CPU_SET(args->cpu, &cpumask); + ret =3D pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask); + if (ret) { + errno =3D ret; + perror("Error: failed to set affinity"); + abort(); + } + pthread_barrier_wait(args->barrier); + + for (i =3D 0; i < THREAD_RUNS; i++) + usleep(RUNNER_PERIOD); + curr_mm_cid =3D rseq_current_mm_cid(); + /* + * We select one thread with high enough mm_cid to be the new leader. + * All other threads (including the main thread) will terminate. + * After some time, the mm_cid of the only remaining thread should + * converge to 0, if not, the test fails. + */ + if (curr_mm_cid >=3D args->num_cpus / 2 && + !pthread_mutex_trylock(args->token)) { + printf_verbose( + "cpu%d has mm_cid=3D%d and will be the new leader.\n", + sched_getcpu(), curr_mm_cid); + for (i =3D 0; i < args->num_cpus; i++) { + if (args->tinfo[i] =3D=3D pthread_self()) + continue; + ret =3D pthread_join(args->tinfo[i], NULL); + if (ret) { + errno =3D ret; + perror("Error: failed to join thread"); + abort(); + } + } + pthread_barrier_destroy(args->barrier); + free(args->tinfo); + free(args->token); + free(args->barrier); + free(args->args_head); + + for (i =3D 0; i < MM_CID_COMPACT_TIMEOUT; i++) { + curr_mm_cid =3D rseq_current_mm_cid(); + printf_verbose("run %d: mm_cid=3D%d on cpu%d.\n", i, + curr_mm_cid, sched_getcpu()); + if (curr_mm_cid =3D=3D 0) + exit(EXIT_SUCCESS); + usleep(RUNNER_PERIOD); + } + exit(EXIT_FAILURE); + } + printf_verbose("cpu%d has mm_cid=3D%d and is going to terminate.\n", + sched_getcpu(), curr_mm_cid); + pthread_exit(NULL); +} + +int test_mm_cid_compaction(void) +{ + cpu_set_t affinity; + int i, j, ret =3D 0, num_threads; + pthread_t *tinfo; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + struct thread_args *args; + + sched_getaffinity(0, sizeof(affinity), &affinity); + num_threads =3D CPU_COUNT(&affinity); + tinfo =3D calloc(num_threads, sizeof(*tinfo)); + if (!tinfo) { + perror("Error: failed to allocate tinfo"); + return -1; + } + args =3D calloc(num_threads, sizeof(*args)); + if (!args) { + perror("Error: failed to allocate args"); + ret =3D -1; + goto out_free_tinfo; + } + token =3D malloc(sizeof(*token)); + if (!token) { + perror("Error: failed to allocate token"); + ret =3D -1; + goto out_free_args; + } + barrier =3D malloc(sizeof(*barrier)); + if (!barrier) { + perror("Error: failed to allocate barrier"); + ret =3D -1; + goto out_free_token; + } + if (num_threads =3D=3D 1) { + fprintf(stderr, "Cannot test on a single cpu. " + "Skipping mm_cid_compaction test.\n"); + /* only skipping the test, this is not a failure */ + goto out_free_barrier; + } + pthread_mutex_init(token, NULL); + ret =3D pthread_barrier_init(barrier, NULL, num_threads); + if (ret) { + errno =3D ret; + perror("Error: failed to initialise barrier"); + goto out_free_barrier; + } + for (i =3D 0, j =3D 0; i < CPU_SETSIZE && j < num_threads; i++) { + if (!CPU_ISSET(i, &affinity)) + continue; + args[j].num_cpus =3D num_threads; + args[j].tinfo =3D tinfo; + args[j].token =3D token; + args[j].barrier =3D barrier; + args[j].cpu =3D i; + args[j].args_head =3D args; + if (!j) { + /* The first thread is the main one */ + tinfo[0] =3D pthread_self(); + ++j; + continue; + } + ret =3D pthread_create(&tinfo[j], NULL, thread_runner, &args[j]); + if (ret) { + errno =3D ret; + perror("Error: failed to create thread"); + abort(); + } + ++j; + } + printf_verbose("Started %d threads.\n", num_threads); + + /* Also main thread will terminate if it is not selected as leader */ + thread_runner(&args[0]); + + /* only reached in case of errors */ +out_free_barrier: + free(barrier); +out_free_token: + free(token); +out_free_args: + free(args); +out_free_tinfo: + free(tinfo); + + return ret; +} + +int main(int argc, char **argv) +{ + if (!rseq_mm_cid_available()) { + fprintf(stderr, "Error: rseq_mm_cid unavailable\n"); + return -1; + } + if (test_mm_cid_compaction()) + return -1; + return 0; +} --=20 2.50.1