[PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization

Oleksii Moisieiev posted 1 patch 1 week, 2 days ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/548ea03cc3c3287b1f5dcd101b3c2990ebc08089.1775208527.git.oleksii._5Fmoisieiev@epam.com
xen/common/sched/rt.c | 9 +++++++++
1 file changed, 9 insertions(+)
[PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Oleksii Moisieiev 1 week, 2 days ago
In RTDS, removing the last eligible pCPU can kill repl_timer.
When a pCPU is later re-added, rt_switch_sched() reinitializes the
timer object, but pending entries may already exist in replq.

Without re-arming from replq head, replenishment can remain inactive
until some unrelated event programs the timer again. This may stall
budget replenishment for non-extratime units.

Fix this by re-arming repl_timer in rt_switch_sched() immediately after
init_timer() when replq is non-empty, using the earliest pending
deadline.

This keeps behavior unchanged when replq is empty.

Signed-off-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com>
---

Changes in v2:
- update commit description, remove unneeded paragraph

 xen/common/sched/rt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
index 7b1f64a779..59021e1110 100644
--- a/xen/common/sched/rt.c
+++ b/xen/common/sched/rt.c
@@ -741,8 +741,17 @@ rt_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     if ( prv->repl_timer.status == TIMER_STATUS_invalid ||
          prv->repl_timer.status == TIMER_STATUS_killed )
     {
+        struct list_head *replq = rt_replq(new_ops);
+
         init_timer(&prv->repl_timer, repl_timer_handler, (void *)new_ops, cpu);
         dprintk(XENLOG_DEBUG, "RTDS: timer initialized on cpu %u\n", cpu);
+
+        /*
+         * When re-adding CPUs after all RTDS CPUs were removed, replq may
+         * already contain pending replenishment events. Re-arm immediately.
+         */
+        if ( !list_empty(replq) )
+            set_timer(&prv->repl_timer, replq_elem(replq->next)->cur_deadline);
     }
 
     sched_idle_unit(cpu)->priv = vdata;
-- 
2.43.0

base-commit: a7bf8ff218ca05eb3674fdfd2817f6cff471e96a
Re: [PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Jürgen Groß 2 days, 10 hours ago
On 03.04.26 11:29, Oleksii Moisieiev wrote:
> In RTDS, removing the last eligible pCPU can kill repl_timer.
> When a pCPU is later re-added, rt_switch_sched() reinitializes the
> timer object, but pending entries may already exist in replq.

Did you experience this behavior?

I'm asking because I don't see how this could happen. A CPU can't be
removed from a cpupool as long as there are domains in that cpupool.
So how would replq contain entries after the last CPU of the cpupool
has been removed?

But maybe I'm missing something.


Juergen
Re: [PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Oleksii Moisieiev 2 days, 9 hours ago
Hi Juergen,

During our safety certification analysis work, we identified this as a 
potential
issue. While we haven't encountered this problem in practice yet, it 
could occur
in the future, so I believe it should be addressed proactively.
--
Oleksii.


On 10/04/2026 13:14, Jürgen Groß wrote:
> On 03.04.26 11:29, Oleksii Moisieiev wrote:
>> In RTDS, removing the last eligible pCPU can kill repl_timer.
>> When a pCPU is later re-added, rt_switch_sched() reinitializes the
>> timer object, but pending entries may already exist in replq.
>
> Did you experience this behavior?
>
> I'm asking because I don't see how this could happen. A CPU can't be
> removed from a cpupool as long as there are domains in that cpupool.
> So how would replq contain entries after the last CPU of the cpupool
> has been removed?
>
> But maybe I'm missing something.
>
>
> Juergen

Re: [PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Juergen Gross 2 days, 8 hours ago
On 10.04.26 14:04, Oleksii Moisieiev wrote:
> Hi Juergen,
> 
> During our safety certification analysis work, we identified this as a potential
> issue. While we haven't encountered this problem in practice yet, it could occur
> in the future, so I believe it should be addressed proactively.

For being able to occur in future, the handling of removing a cpu from a
cpupool would need to be changed. Considering the refusal to remove the
last cpu from a populated cpupool is on purpose (this avoids leaving a
domain without any cpu to run on), adding the code as you suggest would
just be an addition without any benefit.

It isn't doing any harm (other than adding code without purpose), so I
won't explicitly NAK the patch, but I won't Ack it either.


Juergen

> -- 
> Oleksii.
> 
> 
> On 10/04/2026 13:14, Jürgen Groß wrote:
>> On 03.04.26 11:29, Oleksii Moisieiev wrote:
>>> In RTDS, removing the last eligible pCPU can kill repl_timer.
>>> When a pCPU is later re-added, rt_switch_sched() reinitializes the
>>> timer object, but pending entries may already exist in replq.
>>
>> Did you experience this behavior?
>>
>> I'm asking because I don't see how this could happen. A CPU can't be
>> removed from a cpupool as long as there are domains in that cpupool.
>> So how would replq contain entries after the last CPU of the cpupool
>> has been removed?
>>
>> But maybe I'm missing something.
>>
>>
>> Juergen
> 

Re: [PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Juergen Gross 2 days, 8 hours ago
On 10.04.26 14:13, Juergen Gross wrote:
> On 10.04.26 14:04, Oleksii Moisieiev wrote:
>> Hi Juergen,
>>
>> During our safety certification analysis work, we identified this as a potential
>> issue. While we haven't encountered this problem in practice yet, it could occur
>> in the future, so I believe it should be addressed proactively.
> 
> For being able to occur in future, the handling of removing a cpu from a
> cpupool would need to be changed. Considering the refusal to remove the
> last cpu from a populated cpupool is on purpose (this avoids leaving a
> domain without any cpu to run on), adding the code as you suggest would
> just be an addition without any benefit.
> 
> It isn't doing any harm (other than adding code without purpose), so I
> won't explicitly NAK the patch, but I won't Ack it either.

One further remark: I would ack the addition of an ASSERT(list_empty(replq))
instead of the conditional set_timer() call.


Juergen
Re: [PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Oleksii Moisieiev 2 days, 5 hours ago
On 10/04/2026 15:16, Juergen Gross wrote:
> On 10.04.26 14:13, Juergen Gross wrote:
>> On 10.04.26 14:04, Oleksii Moisieiev wrote:
>>> Hi Juergen,
>>>
>>> During our safety certification analysis work, we identified this as 
>>> a potential
>>> issue. While we haven't encountered this problem in practice yet, it 
>>> could occur
>>> in the future, so I believe it should be addressed proactively.
>>
>> For being able to occur in future, the handling of removing a cpu from a
>> cpupool would need to be changed. Considering the refusal to remove the
>> last cpu from a populated cpupool is on purpose (this avoids leaving a
>> domain without any cpu to run on), adding the code as you suggest would
>> just be an addition without any benefit.
>>
>> It isn't doing any harm (other than adding code without purpose), so I
>> won't explicitly NAK the patch, but I won't Ack it either.
>
> One further remark: I would ack the addition of an 
> ASSERT(list_empty(replq))
> instead of the conditional set_timer() call.
>
You're right: with the current cpupool semantics, when the timer is 
re-initialized in this path, replq is expected to be empty. In that case 
there is nothing to re-arm, and the timer can be programmed later when a 
new replenishment event is queued.

Now I see that it would probably be better to update the cpupool logic 
to prohibit removing the last pCPU from a cpupool. In that case, this 
fix — even with the ASSERT — seems to be no longer relevant.

I think I'd rather post an update for the cpupool semantics and drop 
this patch. Or I can send a v3 with the ASSERT if you think that is 
still reasonable.

Thank you for the review!

--

Oleksii

>
> Juergen

Re: [PATCH v2] xen/sched: rtds: re-arm repl_timer after timer re-initialization
Posted by Jürgen Groß 2 days, 5 hours ago
On 10.04.26 17:12, Oleksii Moisieiev wrote:
> 
> On 10/04/2026 15:16, Juergen Gross wrote:
>> On 10.04.26 14:13, Juergen Gross wrote:
>>> On 10.04.26 14:04, Oleksii Moisieiev wrote:
>>>> Hi Juergen,
>>>>
>>>> During our safety certification analysis work, we identified this as a 
>>>> potential
>>>> issue. While we haven't encountered this problem in practice yet, it could 
>>>> occur
>>>> in the future, so I believe it should be addressed proactively.
>>>
>>> For being able to occur in future, the handling of removing a cpu from a
>>> cpupool would need to be changed. Considering the refusal to remove the
>>> last cpu from a populated cpupool is on purpose (this avoids leaving a
>>> domain without any cpu to run on), adding the code as you suggest would
>>> just be an addition without any benefit.
>>>
>>> It isn't doing any harm (other than adding code without purpose), so I
>>> won't explicitly NAK the patch, but I won't Ack it either.
>>
>> One further remark: I would ack the addition of an ASSERT(list_empty(replq))
>> instead of the conditional set_timer() call.
>>
> You're right: with the current cpupool semantics, when the timer is re- 
> initialized in this path, replq is expected to be empty. In that case there is 
> nothing to re-arm, and the timer can be programmed later when a new 
> replenishment event is queued.
> 
> Now I see that it would probably be better to update the cpupool logic to 
> prohibit removing the last pCPU from a cpupool. In that case, this fix — even 
> with the ASSERT — seems to be no longer relevant.
> 
> I think I'd rather post an update for the cpupool semantics and drop this patch. 
> Or I can send a v3 with the ASSERT if you think that is still reasonable.

The cpupool semantics are already existing. I have written it this way when I
introduced cpupools.


Juergen