From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001337; cv=none;
	d=zoho.com; s=zohoarc;
	b=h8sc9S+qstLl/lg03xWTwO9CI6/GPJ1F1JqWtU30DIV/+2N7VcywKRyJyVFdnUSASz3jLBjed4nl/hl1gsFCrhjATBMi0NiBkB5QnMLEWAf/mH65l/BKZ7HXqRXtaQWtXCUkbJllqJClsIXaKljBzKKj5bysUhUrfZJbKNcqTPM=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001337;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=rrQsxJYr1C6Hnt5XL/JXihdCZDCPNz4uKIbfWxsPASo=;
	b=onP8ujwgDECKW0r6Z9/830hlj66wcR6sDno1TWF7TGlZuJVi0YAjDNEF7rYPlCXhspX4TixR4IKMTVOQdRAjVxcA97vNQkCuDts1md2lA3N+8LvV0K3t5S5babjEAw71KEeLW4RZHzDVdquTndiU441QL7Fz3e9KGq1wWktfZus=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001337321406.11624519593784;
 Wed, 2 Oct 2019 00:28:57 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3B-0001F4-NY; Wed, 02 Oct 2019 07:27:53 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ39-0001Ek-Mf
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:27:51 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 22ddf1ac-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:49 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 9B2FBAF41;
 Wed,  2 Oct 2019 07:27:48 +0000 (UTC)
X-Inumbo-ID: 22ddf1ac-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:26 +0200
Message-Id: <20191002072745.24919-2-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 01/20] xen/sched: add code to sync scheduling
 of all vcpus of a sched unit
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>,
 Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
 =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

When switching sched units synchronize all vcpus of the new unit to be
scheduled at the same time.

A variable sched_granularity is added which holds the number of vcpus
per schedule unit.

As tasklets require to schedule the idle unit it is required to set the
tasklet_work_scheduled parameter of do_schedule() to true if any cpu
covered by the current schedule() call has any pending tasklet work.

For joining other vcpus of the schedule unit we need to add a new
softirq SCHED_SLAVE_SOFTIRQ in order to have a way to initiate a
context switch without calling the generic schedule() function
selecting the vcpu to switch to, as we already know which vcpu we
want to run. This has the other advantage not to loose any other
concurrent SCHEDULE_SOFTIRQ events.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2:
- move syncing after context_switch() to schedule.c
V2:
- don't run tasklets directly from sched_wait_rendezvous_in()
V3:
- adapt array size in sched_move_domain() (Jan Beulich)
- int -> unsigned int (Jan Beulich)
V4:
- renamed sd to sr in several places (Jan Beulich)
- swap stop_timer() and NOW() calls (Jan Beulich)
- context_switch() on ARM returns - handle that (Jan Beulich)
---
 xen/arch/arm/domain.c      |   2 +-
 xen/arch/x86/domain.c      |   3 +-
 xen/common/schedule.c      | 353 +++++++++++++++++++++++++++++++++++------=
----
 xen/common/softirq.c       |   6 +-
 xen/include/xen/sched-if.h |   1 +
 xen/include/xen/sched.h    |  16 +-
 xen/include/xen/softirq.h  |   1 +
 7 files changed, 294 insertions(+), 88 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index f0ee5a2140..460e968e97 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -318,7 +318,7 @@ static void schedule_tail(struct vcpu *prev)
=20
     local_irq_enable();
=20
-    context_saved(prev);
+    sched_context_switched(prev, current);
=20
     update_runstate_area(current);
=20
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index c7fa224c89..27f99d3bcc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1784,7 +1784,6 @@ static void __context_switch(void)
     per_cpu(curr_vcpu, cpu) =3D n;
 }
=20
-
 void context_switch(struct vcpu *prev, struct vcpu *next)
 {
     unsigned int cpu =3D smp_processor_id();
@@ -1860,7 +1859,7 @@ void context_switch(struct vcpu *prev, struct vcpu *n=
ext)
         }
     }
=20
-    context_saved(prev);
+    sched_context_switched(prev, next);
=20
     _update_runstate_area(next);
     /* Must be done with interrupts enabled */
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 4711ece1ef..ff67fb3633 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -61,6 +61,9 @@ boolean_param("sched_smt_power_savings", sched_smt_power_=
savings);
 int sched_ratelimit_us =3D SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
=20
+/* Number of vcpus per struct sched_unit. */
+static unsigned int __read_mostly sched_granularity =3D 1;
+
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
=20
@@ -532,8 +535,8 @@ int sched_move_domain(struct domain *d, struct cpupool =
*c)
     if ( IS_ERR(domdata) )
         return PTR_ERR(domdata);
=20
-    /* TODO: fix array size with multiple vcpus per unit. */
-    unit_priv =3D xzalloc_array(void *, d->max_vcpus);
+    unit_priv =3D xzalloc_array(void *,
+                              DIV_ROUND_UP(d->max_vcpus, sched_granularity=
));
     if ( unit_priv =3D=3D NULL )
     {
         sched_free_domdata(c->sched, domdata);
@@ -1714,133 +1717,325 @@ void vcpu_set_periodic_timer(struct vcpu *v, s_ti=
me_t value)
     spin_unlock(&v->periodic_timer_lock);
 }
=20
-/*
- * The main function
- * - deschedule the current domain (scheduler independent).
- * - pick a new domain (scheduler dependent).
- */
-static void schedule(void)
+static void sched_switch_units(struct sched_resource *sr,
+                               struct sched_unit *next, struct sched_unit =
*prev,
+                               s_time_t now)
 {
-    struct sched_unit    *prev =3D current->sched_unit, *next =3D NULL;
-    s_time_t              now;
-    struct scheduler     *sched;
-    unsigned long        *tasklet_work =3D &this_cpu(tasklet_work_to_do);
-    bool                  tasklet_work_scheduled =3D false;
-    struct sched_resource *sd;
-    spinlock_t           *lock;
-    int cpu =3D smp_processor_id();
+    sr->curr =3D next;
=20
-    ASSERT_NOT_IN_ATOMIC();
+    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit=
_id,
+             now - prev->state_entry_time);
+    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit=
_id,
+             (next->vcpu_list->runstate.state =3D=3D RUNSTATE_runnable) ?
+             (now - next->state_entry_time) : 0, prev->next_time);
=20
-    SCHED_STAT_CRANK(sched_run);
+    ASSERT(prev->vcpu_list->runstate.state =3D=3D RUNSTATE_running);
+
+    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
+             next->domain->domain_id, next->unit_id);
+
+    sched_unit_runstate_change(prev, false, now);
+
+    ASSERT(next->vcpu_list->runstate.state !=3D RUNSTATE_running);
+    sched_unit_runstate_change(next, true, now);
=20
-    sd =3D get_sched_res(cpu);
+    /*
+     * NB. Don't add any trace records from here until the actual context
+     * switch, else lost_records resume will not work properly.
+     */
+
+    ASSERT(!next->is_running);
+    next->vcpu_list->is_running =3D 1;
+    next->is_running =3D true;
+    next->state_entry_time =3D now;
+}
+
+static bool sched_tasklet_check_cpu(unsigned int cpu)
+{
+    unsigned long *tasklet_work =3D &per_cpu(tasklet_work_to_do, cpu);
=20
-    /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
     {
     case TASKLET_enqueued:
         set_bit(_TASKLET_scheduled, tasklet_work);
         /* fallthrough */
     case TASKLET_enqueued|TASKLET_scheduled:
-        tasklet_work_scheduled =3D true;
+        return true;
         break;
     case TASKLET_scheduled:
         clear_bit(_TASKLET_scheduled, tasklet_work);
+        /* fallthrough */
     case 0:
-        /*tasklet_work_scheduled =3D false;*/
+        /* return false; */
         break;
     default:
         BUG();
     }
=20
-    lock =3D pcpu_schedule_lock_irq(cpu);
+    return false;
+}
=20
-    now =3D NOW();
+static bool sched_tasklet_check(unsigned int cpu)
+{
+    bool tasklet_work_scheduled =3D false;
+    const cpumask_t *mask =3D get_sched_res(cpu)->cpus;
+    unsigned int cpu_iter;
+
+    for_each_cpu ( cpu_iter, mask )
+        if ( sched_tasklet_check_cpu(cpu_iter) )
+            tasklet_work_scheduled =3D true;
=20
-    stop_timer(&sd->s_timer);
+    return tasklet_work_scheduled;
+}
+
+static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t no=
w,
+                                      unsigned int cpu)
+{
+    struct scheduler *sched =3D per_cpu(scheduler, cpu);
+    struct sched_resource *sr =3D get_sched_res(cpu);
+    struct sched_unit *next;
=20
     /* get policy-specific decision on scheduling... */
-    sched =3D this_cpu(scheduler);
-    sched->do_schedule(sched, prev, now, tasklet_work_scheduled);
+    sched->do_schedule(sched, prev, now, sched_tasklet_check(cpu));
=20
     next =3D prev->next_task;
=20
-    sd->curr =3D next;
-
     if ( prev->next_time >=3D 0 ) /* -ve means no limit */
-        set_timer(&sd->s_timer, now + prev->next_time);
+        set_timer(&sr->s_timer, now + prev->next_time);
+
+    if ( likely(prev !=3D next) )
+        sched_switch_units(sr, next, prev, now);
+
+    return next;
+}
+
+static void context_saved(struct vcpu *prev)
+{
+    struct sched_unit *unit =3D prev->sched_unit;
+
+    /* Clear running flag /after/ writing context to memory. */
+    smp_wmb();
+
+    prev->is_running =3D 0;
+    unit->is_running =3D false;
+    unit->state_entry_time =3D NOW();
+
+    /* Check for migration request /after/ clearing running flag. */
+    smp_mb();
+
+    sched_context_saved(vcpu_scheduler(prev), unit);
=20
-    if ( unlikely(prev =3D=3D next) )
+    sched_unit_migrate_finish(unit);
+}
+
+/*
+ * Rendezvous on end of context switch.
+ * As no lock is protecting this rendezvous function we need to use atomic
+ * access functions on the counter.
+ * The counter will be 0 in case no rendezvous is needed. For the rendezvo=
us
+ * case it is initialised to the number of cpus to rendezvous plus 1. Each
+ * member entering decrements the counter. The last one will decrement it =
to
+ * 1 and perform the final needed action in that case (call of context_sav=
ed()
+ * if vcpu was switched), and then set the counter to zero. The other memb=
ers
+ * will wait until the counter becomes zero until they proceed.
+ */
+void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
+{
+    struct sched_unit *next =3D vnext->sched_unit;
+
+    if ( atomic_read(&next->rendezvous_out_cnt) )
+    {
+        int cnt =3D atomic_dec_return(&next->rendezvous_out_cnt);
+
+        /* Call context_saved() before releasing other waiters. */
+        if ( cnt =3D=3D 1 )
+        {
+            if ( vprev !=3D vnext )
+                context_saved(vprev);
+            atomic_set(&next->rendezvous_out_cnt, 0);
+        }
+        else
+            while ( atomic_read(&next->rendezvous_out_cnt) )
+                cpu_relax();
+    }
+    else if ( vprev !=3D vnext )
+        context_saved(vprev);
+}
+
+static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
+                                 s_time_t now)
+{
+    if ( unlikely(vprev =3D=3D vnext) )
     {
-        pcpu_schedule_unlock_irq(lock, cpu);
         TRACE_4D(TRC_SCHED_SWITCH_INFCONT,
-                 next->domain->domain_id, next->unit_id,
-                 now - prev->state_entry_time,
-                 prev->next_time);
-        trace_continue_running(next->vcpu_list);
-        return continue_running(prev->vcpu_list);
+                 vnext->domain->domain_id, vnext->sched_unit->unit_id,
+                 now - vprev->runstate.state_entry_time,
+                 vprev->sched_unit->next_time);
+        sched_context_switched(vprev, vnext);
+        trace_continue_running(vnext);
+        return continue_running(vprev);
     }
=20
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV,
-             prev->domain->domain_id, prev->unit_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT,
-             next->domain->domain_id, next->unit_id,
-             (next->vcpu_list->runstate.state =3D=3D RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0,
-             prev->next_time);
+    SCHED_STAT_CRANK(sched_ctx);
=20
-    ASSERT(prev->vcpu_list->runstate.state =3D=3D RUNSTATE_running);
+    stop_timer(&vprev->periodic_timer);
=20
-    TRACE_4D(TRC_SCHED_SWITCH,
-             prev->domain->domain_id, prev->unit_id,
-             next->domain->domain_id, next->unit_id);
+    if ( vnext->sched_unit->migrated )
+        vcpu_move_irqs(vnext);
=20
-    sched_unit_runstate_change(prev, false, now);
+    vcpu_periodic_timer_work(vnext);
=20
-    ASSERT(next->vcpu_list->runstate.state !=3D RUNSTATE_running);
-    sched_unit_runstate_change(next, true, now);
+    context_switch(vprev, vnext);
+}
=20
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+/*
+ * Rendezvous before taking a scheduling decision.
+ * Called with schedule lock held, so all accesses to the rendezvous count=
er
+ * can be normal ones (no atomic accesses needed).
+ * The counter is initialized to the number of cpus to rendezvous initiall=
y.
+ * Each cpu entering will decrement the counter. In case the counter becom=
es
+ * zero do_schedule() is called and the rendezvous counter for leaving
+ * context_switch() is set. All other members will wait until the counter =
is
+ * becoming zero, dropping the schedule lock in between.
+ */
+static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
+                                                   spinlock_t **lock, int =
cpu,
+                                                   s_time_t now)
+{
+    struct sched_unit *next;
=20
-    ASSERT(!next->is_running);
-    next->vcpu_list->is_running =3D 1;
-    next->is_running =3D true;
-    next->state_entry_time =3D now;
+    if ( !--prev->rendezvous_in_cnt )
+    {
+        next =3D do_schedule(prev, now, cpu);
+        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        return next;
+    }
=20
-    pcpu_schedule_unlock_irq(lock, cpu);
+    while ( prev->rendezvous_in_cnt )
+    {
+        /*
+         * Coming from idle might need to do tasklet work.
+         * In order to avoid deadlocks we can't do that here, but have to
+         * continue the idle loop.
+         * Undo the rendezvous_in_cnt decrement and schedule another call =
of
+         * sched_slave().
+         */
+        if ( is_idle_unit(prev) && sched_tasklet_check_cpu(cpu) )
+        {
+            struct vcpu *vprev =3D current;
=20
-    SCHED_STAT_CRANK(sched_ctx);
+            prev->rendezvous_in_cnt++;
+            atomic_set(&prev->rendezvous_out_cnt, 0);
+
+            pcpu_schedule_unlock_irq(*lock, cpu);
+
+            raise_softirq(SCHED_SLAVE_SOFTIRQ);
+            sched_context_switch(vprev, vprev, now);
+
+            return NULL;         /* ARM only. */
+        }
=20
-    stop_timer(&prev->vcpu_list->periodic_timer);
+        pcpu_schedule_unlock_irq(*lock, cpu);
=20
-    if ( next->migrated )
-        vcpu_move_irqs(next->vcpu_list);
+        cpu_relax();
=20
-    vcpu_periodic_timer_work(next->vcpu_list);
+        *lock =3D pcpu_schedule_lock_irq(cpu);
+    }
=20
-    context_switch(prev->vcpu_list, next->vcpu_list);
+    return prev->next_task;
 }
=20
-void context_saved(struct vcpu *prev)
+static void sched_slave(void)
 {
-    /* Clear running flag /after/ writing context to memory. */
-    smp_wmb();
+    struct vcpu          *vprev =3D current;
+    struct sched_unit    *prev =3D vprev->sched_unit, *next;
+    s_time_t              now;
+    spinlock_t           *lock;
+    unsigned int          cpu =3D smp_processor_id();
=20
-    prev->is_running =3D 0;
-    prev->sched_unit->is_running =3D false;
-    prev->sched_unit->state_entry_time =3D NOW();
+    ASSERT_NOT_IN_ATOMIC();
=20
-    /* Check for migration request /after/ clearing running flag. */
-    smp_mb();
+    lock =3D pcpu_schedule_lock_irq(cpu);
=20
-    sched_context_saved(vcpu_scheduler(prev), prev->sched_unit);
+    now =3D NOW();
+
+    if ( !prev->rendezvous_in_cnt )
+    {
+        pcpu_schedule_unlock_irq(lock, cpu);
+        return;
+    }
+
+    stop_timer(&get_sched_res(cpu)->s_timer);
+
+    next =3D sched_wait_rendezvous_in(prev, &lock, cpu, now);
+    if ( !next )
+        return;
+
+    pcpu_schedule_unlock_irq(lock, cpu);
=20
-    sched_unit_migrate_finish(prev->sched_unit);
+    sched_context_switch(vprev, next->vcpu_list, now);
+}
+
+/*
+ * The main function
+ * - deschedule the current domain (scheduler independent).
+ * - pick a new domain (scheduler dependent).
+ */
+static void schedule(void)
+{
+    struct vcpu          *vnext, *vprev =3D current;
+    struct sched_unit    *prev =3D vprev->sched_unit, *next =3D NULL;
+    s_time_t              now;
+    struct sched_resource *sr;
+    spinlock_t           *lock;
+    int cpu =3D smp_processor_id();
+
+    ASSERT_NOT_IN_ATOMIC();
+
+    SCHED_STAT_CRANK(sched_run);
+
+    sr =3D get_sched_res(cpu);
+
+    lock =3D pcpu_schedule_lock_irq(cpu);
+
+    if ( prev->rendezvous_in_cnt )
+    {
+        /*
+         * We have a race: sched_slave() should be called, so raise a soft=
irq
+         * in order to re-enter schedule() later and call sched_slave() no=
w.
+         */
+        pcpu_schedule_unlock_irq(lock, cpu);
+
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return sched_slave();
+    }
+
+    stop_timer(&sr->s_timer);
+
+    now =3D NOW();
+
+    if ( sched_granularity > 1 )
+    {
+        cpumask_t mask;
+
+        prev->rendezvous_in_cnt =3D sched_granularity;
+        cpumask_andnot(&mask, sr->cpus, cpumask_of(cpu));
+        cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
+        next =3D sched_wait_rendezvous_in(prev, &lock, cpu, now);
+        if ( !next )
+            return;
+    }
+    else
+    {
+        prev->rendezvous_in_cnt =3D 0;
+        next =3D do_schedule(prev, now, cpu);
+        atomic_set(&next->rendezvous_out_cnt, 0);
+    }
+
+    pcpu_schedule_unlock_irq(lock, cpu);
+
+    vnext =3D next->vcpu_list;
+    sched_context_switch(vprev, vnext, now);
 }
=20
 /* The scheduler timer: force a run through the scheduler */
@@ -1881,6 +2076,7 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( sr =3D=3D NULL )
         return -ENOMEM;
     sr->master_cpu =3D cpu;
+    sr->cpus =3D cpumask_of(cpu);
     set_sched_res(cpu, sr);
=20
     per_cpu(scheduler, cpu) =3D &sched_idle_ops;
@@ -1901,6 +2097,8 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] =3D=3D NULL )
         return -ENOMEM;
=20
+    idle_vcpu[cpu]->sched_unit->rendezvous_in_cnt =3D 0;
+
     /*
      * No need to allocate any scheduler data, as cpus coming online are
      * free initially and the idle scheduler doesn't need any data areas
@@ -2001,6 +2199,7 @@ void __init scheduler_init(void)
     int i;
=20
     open_softirq(SCHEDULE_SOFTIRQ, schedule);
+    open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave);
=20
     for ( i =3D 0; i < NUM_SCHEDULERS; i++)
     {
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 83c3c09bd5..2d66193203 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -33,8 +33,8 @@ static void __do_softirq(unsigned long ignore_mask)
     for ( ; ; )
     {
         /*
-         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move
-         * us to another processor.
+         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ or
+         * SCHED_SLAVE_SOFTIRQ may move us to another processor.
          */
         cpu =3D smp_processor_id();
=20
@@ -55,7 +55,7 @@ void process_pending_softirqs(void)
 {
     ASSERT(!in_irq() && local_irq_is_enabled());
     /* Do not enter scheduler as it can preempt the calling context. */
-    __do_softirq(1ul<<SCHEDULE_SOFTIRQ);
+    __do_softirq((1ul << SCHEDULE_SOFTIRQ) | (1ul << SCHED_SLAVE_SOFTIRQ));
 }
=20
 void do_softirq(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 0423be987d..c65dfa943b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -42,6 +42,7 @@ struct sched_resource {
=20
     /* Cpu with lowest id in scheduling resource. */
     unsigned int        master_cpu;
+    const cpumask_t    *cpus;           /* cpus covered by this struct    =
 */
 };
=20
 DECLARE_PER_CPU(struct scheduler *, scheduler);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ebf723a866..c770ab4aa0 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -292,6 +292,12 @@ struct sched_unit {
     /* Next unit to run. */
     struct sched_unit      *next_task;
     s_time_t                next_time;
+
+    /* Number of vcpus not yet joined for context switch. */
+    unsigned int            rendezvous_in_cnt;
+
+    /* Number of vcpus not yet finished with context switch. */
+    atomic_t                rendezvous_out_cnt;
 };
=20
 #define for_each_sched_unit(d, u)                                         \
@@ -696,10 +702,10 @@ void sync_local_execstate(void);
=20
 /*
  * Called by the scheduler to switch to another VCPU. This function must
- * call context_saved(@prev) when the local CPU is no longer running in
- * @prev's context, and that context is saved to memory. Alternatively, if
- * implementing lazy context switching, it suffices to ensure that invoking
- * sync_vcpu_execstate() will switch and commit @prev's state.
+ * call sched_context_switched(@prev, @next) when the local CPU is no long=
er
+ * running in @prev's context, and that context is saved to memory.
+ * Alternatively, if implementing lazy context switching, it suffices to e=
nsure
+ * that invoking sync_vcpu_execstate() will switch and commit @prev's stat=
e.
  */
 void context_switch(
     struct vcpu *prev,
@@ -711,7 +717,7 @@ void context_switch(
  * saved to memory. Alternatively, if implementing lazy context switching,
  * ensure that invoking sync_vcpu_execstate() will switch and commit @prev.
  */
-void context_saved(struct vcpu *prev);
+void sched_context_switched(struct vcpu *prev, struct vcpu *vnext);
=20
 /* Called by the scheduler to continue running the current VCPU. */
 void continue_running(
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index c327c9b6cd..d7273b389b 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -4,6 +4,7 @@
 /* Low-latency softirqs come first in the following list. */
 enum {
     TIMER_SOFTIRQ =3D 0,
+    SCHED_SLAVE_SOFTIRQ,
     SCHEDULE_SOFTIRQ,
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001337; cv=none;
	d=zoho.com; s=zohoarc;
	b=B5XVCaqvLSULgLTnRiez3iHgw5ulJpKLu8lm3R7DJndEz9LQPg8D+Af0fa0bq62MdGEE4237UUyOhwYySjcpEoQBJ9RjB8BooQqKkxTBPx02GB1JjBRtPHAbxsr051o5yMsKvay7ZqgA1EZ+WgHWKyfK1qdP8nU4dE9du3+lf7A=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001337;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=PAGTA3JGr4LgWcjNUmD5eZetb552pn+iKya9eUh7YxY=;
	b=nC8TtfheHPnkYODadgZb8FocjFJLgjkQypOEpCg8wBWN7zKd3/4m+ftpzziMkLumIwn2aBhQ7fMOUWvst8AguyKw8TTmdMh0/5jwH2mkY52Kx2jtEAGgmN6UWvc8kbP08JVspndb2Pv2t2F44MzkiDDBXhm9zO86jH9k4fOVAjk=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001337326892.750412853771;
 Wed, 2 Oct 2019 00:28:57 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3F-0001Hm-GP; Wed, 02 Oct 2019 07:27:57 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3E-0001H3-LW
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:27:56 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 22ddf1ae-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:49 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id BB108AF61;
 Wed,  2 Oct 2019 07:27:48 +0000 (UTC)
X-Inumbo-ID: 22ddf1ae-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:27 +0200
Message-Id: <20191002072745.24919-3-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 02/20] xen/sched: introduce
 unit_runnable_state()
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>,
 Robert VanVossen <robert.vanvossen@dornerworks.com>,
 Tim Deegan <tim@xen.org>,
 Josh Whitehead <josh.whitehead@dornerworks.com>,
 Meng Xu <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule unit have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule unit is probed.

For this purpose introduce a new helper unit_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
RFC V2:
- new patch
V3:
- add vcpu loop to unit_runnable_state() right now instead of doing
  so in next patch (Jan Beulich, Dario Faggioli)
- make new_state unsigned int (Jan Beulich)
V4:
- add comment explaining unit_runnable_state() (Jan Beulich)
---
 xen/common/domain.c         |  1 +
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 49 ++++++++++++++++++++++++-----------------=
----
 xen/common/sched_credit2.c  |  7 ++++---
 xen/common/sched_null.c     |  3 ++-
 xen/common/sched_rt.c       |  8 +++++++-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  | 30 +++++++++++++++++++++++++++
 xen/include/xen/sched.h     |  1 +
 9 files changed, 73 insertions(+), 30 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 601da28c9c..a9882509ed 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -157,6 +157,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int=
 vcpu_id)
     if ( is_idle_domain(d) )
     {
         v->runstate.state =3D RUNSTATE_running;
+        v->new_state =3D RUNSTATE_running;
     }
     else
     {
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index fcf81db19a..dd5876eacd 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -563,7 +563,7 @@ a653sched_do_schedule(
     if ( !((new_task !=3D NULL)
            && (AUNIT(new_task) !=3D NULL)
            && AUNIT(new_task)->awake
-           && unit_runnable(new_task)) )
+           && unit_runnable_state(new_task)) )
         new_task =3D IDLETASK(cpu);
     BUG_ON(new_task =3D=3D NULL);
=20
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 299eff21ac..00beac3ea4 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1894,7 +1894,7 @@ static void csched_schedule(
     if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && unit_runnable(unit)
+         && unit_runnable_state(unit)
          && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
@@ -1939,33 +1939,36 @@ static void csched_schedule(
         dec_nr_runnable(sched_cpu);
     }
=20
-    snext =3D __runq_elem(runq->next);
-
-    /* Tasklet work (which runs in idle UNIT context) overrides all else. =
*/
-    if ( tasklet_work_scheduled )
-    {
-        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext =3D CSCHED_UNIT(sched_idle_unit(sched_cpu));
-        snext->pri =3D CSCHED_PRI_TS_BOOST;
-    }
-
     /*
      * Clear YIELD flag before scheduling out
      */
     clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
=20
-    /*
-     * SMP Load balance:
-     *
-     * If the next highest priority local runnable UNIT has already eaten
-     * through its credits, look on other PCPUs to see if we have more
-     * urgent work... If not, csched_load_balance() will return snext, but
-     * already removed from the runq.
-     */
-    if ( snext->pri > CSCHED_PRI_TS_OVER )
-        __runq_remove(snext);
-    else
-        snext =3D csched_load_balance(prv, sched_cpu, snext, &migrated);
+    do {
+        snext =3D __runq_elem(runq->next);
+
+        /* Tasklet work (which runs in idle UNIT context) overrides all el=
se. */
+        if ( tasklet_work_scheduled )
+        {
+            TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
+            snext =3D CSCHED_UNIT(sched_idle_unit(sched_cpu));
+            snext->pri =3D CSCHED_PRI_TS_BOOST;
+        }
+
+        /*
+         * SMP Load balance:
+         *
+         * If the next highest priority local runnable UNIT has already ea=
ten
+         * through its credits, look on other PCPUs to see if we have more
+         * urgent work... If not, csched_load_balance() will return snext,=
 but
+         * already removed from the runq.
+         */
+        if ( snext->pri > CSCHED_PRI_TS_OVER )
+            __runq_remove(snext);
+        else
+            snext =3D csched_load_balance(prv, sched_cpu, snext, &migrated=
);
+
+    } while ( !unit_runnable_state(snext->unit) );
=20
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 87d142bbe4..0e29e56d5a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3291,7 +3291,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+    if ( !yield && prv->ratelimit_us && unit_runnable_state(scurr->unit) &&
          (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_=
us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3345,7 +3345,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
+    if ( unit_runnable_state(scurr->unit) && !soft_aff_preempt )
         snext =3D scurr;
     else
         snext =3D csched2_unit(sched_idle_unit(cpu));
@@ -3405,7 +3405,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) &&
+             unit_runnable_state(svc->unit) )
             snext =3D svc;
=20
         /* In any case, if we got this far, break. */
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 80a7d45935..3dde1dcd00 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -864,7 +864,8 @@ static void null_schedule(const struct scheduler *ops, =
struct sched_unit *prev,
             cpumask_set_cpu(sched_cpu, &prv->cpus_free);
     }
=20
-    if ( unlikely(prev->next_task =3D=3D NULL || !unit_runnable(prev->next=
_task)) )
+    if ( unlikely(prev->next_task =3D=3D NULL ||
+                  !unit_runnable_state(prev->next_task)) )
         prev->next_task =3D sched_idle_unit(sched_cpu);
=20
     NULL_UNIT_CHECK(prev->next_task);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cfd7d334fa..fd882f2ca4 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1092,12 +1092,18 @@ rt_schedule(const struct scheduler *ops, struct sch=
ed_unit *currunit,
     else
     {
         snext =3D runq_pick(ops, cpumask_of(sched_cpu));
+
         if ( snext =3D=3D NULL )
             snext =3D rt_unit(sched_idle_unit(sched_cpu));
+        else if ( !unit_runnable_state(snext->unit) )
+        {
+            q_remove(snext);
+            snext =3D rt_unit(sched_idle_unit(sched_cpu));
+        }
=20
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
-             unit_runnable(currunit) &&
+             unit_runnable_state(currunit) &&
              scurr->cur_budget > 0 &&
              ( is_idle_unit(snext->unit) ||
                compare_unit_priority(scurr, snext) > 0 ) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ff67fb3633..9c1b044b49 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -280,7 +280,7 @@ static inline void sched_unit_runstate_change(struct sc=
hed_unit *unit,
     for_each_sched_unit_vcpu ( unit, v )
     {
         if ( running )
-            vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+            vcpu_runstate_change(v, v->new_state, new_entry_time);
         else
             vcpu_runstate_change(v,
                 ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c65dfa943b..7e568a9d9f 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -93,6 +93,36 @@ static inline bool unit_runnable(const struct sched_unit=
 *unit)
     return false;
 }
=20
+/*
+ * Returns whether a sched_unit is runnable and sets new_state for each of=
 its
+ * vcpus. It is mandatory to determine the new runstate for all vcpus of a=
 unit
+ * without dropping the schedule lock (which happens when synchronizing the
+ * context switch of the vcpus of a unit) in order to avoid races with e.g.
+ * vcpu_sleep().
+ */
+static inline bool unit_runnable_state(const struct sched_unit *unit)
+{
+    struct vcpu *v;
+    bool runnable, ret =3D false;
+
+    if ( is_idle_unit(unit) )
+        return true;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        runnable =3D vcpu_runnable(v);
+
+        v->new_state =3D runnable ? RUNSTATE_running
+                                : (v->pause_flags & VPF_blocked)
+                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+
+        if ( runnable )
+            ret =3D true;
+    }
+
+    return ret;
+}
+
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c770ab4aa0..12f00cd78d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -174,6 +174,7 @@ struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    unsigned int     new_state;
=20
     /* Has the FPU been initialised? */
     bool             fpu_initialised;
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001335; cv=none;
	d=zoho.com; s=zohoarc;
	b=Uh+O22RMyNiX1Ck4JmvWHtAukkBiJz2QOnSl6lgvwHY0mQ5vOmPhYeO3WYCYZh/I7SGGvnxAO1x4Qj7QJjPFh84xibe2scEgC3Gd/PLMjbfdrn6K/pEYPXPiZKAJIB6xOxPYfNiLKzBj0zQ3zs/IJXqqiBWhr+8G1oUhsgViwEE=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001335;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=uZWCA9bl0YE6VavRlRwcY2QopB2Vu9X6eYvCnOeo6k8=;
	b=nlPT7GRgpSdwWhIqUPipA2qQOBps11Gk4/Yq4oIUozfU5e8PtZd9bz0v1fK6MBMDPpGoTBKpeCW//W8uVr+87xHedaC1FtsBsnh5YPFxeo4XfdSzRPplptjWiSXWiMNz95HkMa+F8j43grHWSwLTmwySCC+nUvpgK6+5oDko2gg=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001335130476.0613320415465;
 Wed, 2 Oct 2019 00:28:55 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3A-0001Eq-Ej; Wed, 02 Oct 2019 07:27:52 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ38-0001Ed-LZ
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:27:50 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 22f20336-e4e6-11e9-b588-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:49 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id D1F32AF81;
 Wed,  2 Oct 2019 07:27:48 +0000 (UTC)
X-Inumbo-ID: 22f20336-e4e6-11e9-b588-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:28 +0200
Message-Id: <20191002072745.24919-4-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 03/20] xen/sched: add support for multiple
 vcpus per sched unit where missing
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

In several places there is support for multiple vcpus per sched unit
missing. Add that missing support (with the exception of initial
allocation) and missing helpers for that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2:
- fix vcpu_runstate_helper()
V1:
- add special handling for idle unit in unit_runnable() and
  unit_runnable_state()
V2:
- handle affinity_broken correctly (Jan Beulich)
V3:
- type for cpu ->unsigned int (Jan Beulich)
---
 xen/common/domain.c        |  5 ++++-
 xen/common/schedule.c      |  9 +++++----
 xen/include/xen/sched-if.h | 16 +++++++++++++++-
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index a9882509ed..93aa856bcb 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1273,7 +1273,10 @@ int vcpu_reset(struct vcpu *v)
     v->async_exception_mask =3D 0;
     memset(v->async_exception_state, 0, sizeof(v->async_exception_state));
 #endif
-    v->affinity_broken =3D 0;
+    if ( v->affinity_broken & VCPU_AFFINITY_OVERRIDE )
+        vcpu_temporary_affinity(v, NR_CPUS, VCPU_AFFINITY_OVERRIDE);
+    if ( v->affinity_broken & VCPU_AFFINITY_WAIT )
+        vcpu_temporary_affinity(v, NR_CPUS, VCPU_AFFINITY_WAIT);
     clear_bit(_VPF_blocked, &v->pause_flags);
     clear_bit(_VPF_in_reset, &v->pause_flags);
=20
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9c1b044b49..3094ff6838 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -252,8 +252,9 @@ static inline void vcpu_runstate_change(
     s_time_t delta;
     struct sched_unit *unit =3D v->sched_unit;
=20
-    ASSERT(v->runstate.state !=3D new_state);
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
+    if ( v->runstate.state =3D=3D new_state )
+        return;
=20
     vcpu_urgent_count_update(v);
=20
@@ -1729,14 +1730,14 @@ static void sched_switch_units(struct sched_resourc=
e *sr,
              (next->vcpu_list->runstate.state =3D=3D RUNSTATE_runnable) ?
              (now - next->state_entry_time) : 0, prev->next_time);
=20
-    ASSERT(prev->vcpu_list->runstate.state =3D=3D RUNSTATE_running);
+    ASSERT(unit_running(prev));
=20
     TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
              next->domain->domain_id, next->unit_id);
=20
     sched_unit_runstate_change(prev, false, now);
=20
-    ASSERT(next->vcpu_list->runstate.state !=3D RUNSTATE_running);
+    ASSERT(!unit_running(next));
     sched_unit_runstate_change(next, true, now);
=20
     /*
@@ -1858,7 +1859,7 @@ void sched_context_switched(struct vcpu *vprev, struc=
t vcpu *vnext)
             while ( atomic_read(&next->rendezvous_out_cnt) )
                 cpu_relax();
     }
-    else if ( vprev !=3D vnext )
+    else if ( vprev !=3D vnext && sched_granularity =3D=3D 1 )
         context_saved(vprev);
 }
=20
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 7e568a9d9f..983f2ece83 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -81,6 +81,11 @@ static inline bool is_unit_online(const struct sched_uni=
t *unit)
     return false;
 }
=20
+static inline unsigned int unit_running(const struct sched_unit *unit)
+{
+    return unit->runstate_cnt[RUNSTATE_running];
+}
+
 /* Returns true if at least one vcpu of the unit is runnable. */
 static inline bool unit_runnable(const struct sched_unit *unit)
 {
@@ -126,7 +131,16 @@ static inline bool unit_runnable_state(const struct sc=
hed_unit *unit)
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
-    unit->vcpu_list->processor =3D res->master_cpu;
+    unsigned int cpu =3D cpumask_first(res->cpus);
+    struct vcpu *v;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        ASSERT(cpu < nr_cpu_ids);
+        v->processor =3D cpu;
+        cpu =3D cpumask_next(cpu, res->cpus);
+    }
+
     unit->res =3D res;
 }
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001344; cv=none;
	d=zoho.com; s=zohoarc;
	b=Ny50JWjc+EqEeYx4A2yQuev8HH554Xgb9x4niRPB2cMXG4rndKoi1UI55/DeKoBDzuRzbhwFVLrxTjC3Ps+iSE6QzqyDHXCcTx8xPr6FnHBOojg3/IXVe/JJfAi7F6Yl9UySU3PsUGLsSyheXf4Q8223tbdEoWE1d6VBVw6yUYs=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001344;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=14ZjAzioV2BqQl6wTOXfvpzuoYuvAWzjvZ5+9gaw91g=;
	b=kWUiO0iglYGU652MpdtxWw6Nt0tkXCdkl2klNZEKKhWHemj2fTBY2nRrm4EMPCXGi/tLOmnq18RmZyJjEwEjrQa9d1XTFLrgIr02p+2R+h7HHzw0GDUyNJD/pz9qCYgHlgUk7CD0Mm6s5nH+OnFDlaC8hrF7GSXW6eMDdIqk82c=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001344792219.9992293914354;
 Wed, 2 Oct 2019 00:29:04 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3K-0001Kb-8B; Wed, 02 Oct 2019 07:28:02 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3J-0001Jz-Lk
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:01 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 233a30de-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:50 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 38856AF8A;
 Wed,  2 Oct 2019 07:27:49 +0000 (UTC)
X-Inumbo-ID: 233a30de-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:29 +0200
Message-Id: <20191002072745.24919-5-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 04/20] xen/sched: modify
 cpupool_domain_cpumask() to be an unit mask
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>,
 Robert VanVossen <robert.vanvossen@dornerworks.com>,
 Dario Faggioli <dfaggioli@suse.com>,
 Josh Whitehead <josh.whitehead@dornerworks.com>,
 Meng Xu <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

cpupool_domain_cpumask() is used by scheduling to select cpus or to
iterate over cpus. In order to support scheduling units spanning
multiple cpus rename cpupool_domain_cpumask() to
cpupool_domain_master_cpumask() and let it return a cpumask with only
one bit set per scheduling resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V4:
- rename to cpupool_domain_master_cpumask() (Jan Beulich)
- check return value of zalloc_cpumask_var() (Jan Beulich)
---
 xen/common/cpupool.c        | 27 ++++++++++++++++++---------
 xen/common/domain.c         |  2 +-
 xen/common/domctl.c         |  2 +-
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   |  4 ++--
 xen/common/sched_credit2.c  | 22 +++++++++++-----------
 xen/common/sched_null.c     |  8 ++++----
 xen/common/sched_rt.c       |  8 ++++----
 xen/common/schedule.c       | 13 +++++++------
 xen/include/xen/sched-if.h  |  9 ++++++---
 10 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index fd30040922..441a26f16c 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -36,26 +36,33 @@ static DEFINE_SPINLOCK(cpupool_lock);
=20
 DEFINE_PER_CPU(struct cpupool *, cpupool);
=20
+static void free_cpupool_struct(struct cpupool *c)
+{
+    if ( c )
+    {
+        free_cpumask_var(c->res_valid);
+        free_cpumask_var(c->cpu_valid);
+    }
+    xfree(c);
+}
+
 static struct cpupool *alloc_cpupool_struct(void)
 {
     struct cpupool *c =3D xzalloc(struct cpupool);
=20
-    if ( !c || !zalloc_cpumask_var(&c->cpu_valid) )
+    if ( !c )
+        return NULL;
+
+    if ( !zalloc_cpumask_var(&c->cpu_valid) ||
+         !zalloc_cpumask_var(&c->res_valid) )
     {
-        xfree(c);
+        free_cpupool_struct(c);
         c =3D NULL;
     }
=20
     return c;
 }
=20
-static void free_cpupool_struct(struct cpupool *c)
-{
-    if ( c )
-        free_cpumask_var(c->cpu_valid);
-    xfree(c);
-}
-
 /*
  * find a cpupool by it's id. to be called with cpupool lock held
  * if exact is not specified, the first cpupool with an id larger or equal=
 to
@@ -269,6 +276,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c,=
 unsigned int cpu)
         cpupool_cpu_moving =3D NULL;
     }
     cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
=20
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
@@ -361,6 +369,7 @@ static int cpupool_unassign_cpu_start(struct cpupool *c=
, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving =3D c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
=20
 out:
     spin_unlock(&cpupool_lock);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 93aa856bcb..9c7360ed2a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -584,7 +584,7 @@ void domain_update_node_affinity(struct domain *d)
         return;
     }
=20
-    online =3D cpupool_domain_cpumask(d);
+    online =3D cpupool_domain_master_cpumask(d);
=20
     spin_lock(&d->node_affinity_lock);
=20
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 8a694e0d37..d597a09f98 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -619,7 +619,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_d=
omctl)
         if ( op->cmd =3D=3D XEN_DOMCTL_setvcpuaffinity )
         {
             cpumask_var_t new_affinity, old_affinity;
-            cpumask_t *online =3D cpupool_domain_cpumask(v->domain);
+            cpumask_t *online =3D cpupool_domain_master_cpumask(v->domain);
=20
             /*
              * We want to be able to restore hard affinity if we are trying
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index dd5876eacd..45c05c6cd9 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -614,7 +614,7 @@ a653sched_pick_resource(const struct scheduler *ops,
      * If present, prefer unit's current processor, else
      * just find the first valid unit.
      */
-    online =3D cpupool_domain_cpumask(unit->domain);
+    online =3D cpupool_domain_master_cpumask(unit->domain);
=20
     cpu =3D cpumask_first(online);
=20
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 00beac3ea4..a6dff8ec62 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -361,7 +361,7 @@ static inline void __runq_tickle(struct csched_unit *ne=
w)
     ASSERT(cur);
     cpumask_clear(&mask);
=20
-    online =3D cpupool_domain_cpumask(new->sdom->dom);
+    online =3D cpupool_domain_master_cpumask(new->sdom->dom);
     cpumask_and(&idle_mask, prv->idlers, online);
     idlers_empty =3D cpumask_empty(&idle_mask);
=20
@@ -724,7 +724,7 @@ _csched_cpu_pick(const struct scheduler *ops, const str=
uct sched_unit *unit,
     /* We must always use cpu's scratch space */
     cpumask_t *cpus =3D cpumask_scratch_cpu(cpu);
     cpumask_t idlers;
-    cpumask_t *online =3D cpupool_domain_cpumask(unit->domain);
+    cpumask_t *online =3D cpupool_domain_master_cpumask(unit->domain);
     struct csched_pcpu *spc =3D NULL;
     int balance_step;
=20
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 0e29e56d5a..d51df05887 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -705,7 +705,7 @@ static int get_fallback_cpu(struct csched2_unit *svc)
=20
         affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    cpupool_domain_cpumask(unit->domain));
+                    cpupool_domain_master_cpumask(unit->domain));
=20
         /*
          * This is cases 1 or 3 (depending on bs): if processor is (still)
@@ -1440,7 +1440,7 @@ runq_tickle(const struct scheduler *ops, struct csche=
d2_unit *new, s_time_t now)
     struct sched_unit *unit =3D new->unit;
     unsigned int bs, cpu =3D sched_unit_master(unit);
     struct csched2_runqueue_data *rqd =3D c2rqd(ops, cpu);
-    cpumask_t *online =3D cpupool_domain_cpumask(unit->domain);
+    cpumask_t *online =3D cpupool_domain_master_cpumask(unit->domain);
     cpumask_t mask;
=20
     ASSERT(new->rqd =3D=3D rqd);
@@ -2243,7 +2243,7 @@ csched2_res_pick(const struct scheduler *ops, const s=
truct sched_unit *unit)
     }
=20
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(unit->domain));
+                cpupool_domain_master_cpumask(unit->domain));
=20
     /*
      * First check to see if we're here because someone else suggested a p=
lace
@@ -2358,8 +2358,8 @@ csched2_res_pick(const struct scheduler *ops, const s=
truct sched_unit *unit)
          * ok because:
          * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity =
have
          *   a non-empty intersection (because has_soft is true);
-         * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() al=
ready
-         *   in cpumask_scratch, we do save a lot doing like this.
+         * - we have unit->cpu_hard_affinity & cpupool_domain_master_cpuma=
sk()
+         *   already in cpumask_scratch, we do save a lot doing like this.
          *
          * It's kind of like open coding affinity_balance_cpumask() but, in
          * this specific case, calling that would mean a lot of (unnecessa=
ry)
@@ -2378,7 +2378,7 @@ csched2_res_pick(const struct scheduler *ops, const s=
truct sched_unit *unit)
          * affinity, so go for it.
          *
          * cpumask_scratch already has unit->cpu_hard_affinity &
-         * cpupool_domain_cpumask() in it, so it's enough that we filter
+         * cpupool_domain_master_cpumask() in it, so it's enough that we f=
ilter
          * with the cpus of the runq.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
@@ -2513,7 +2513,7 @@ static void migrate(const struct scheduler *ops,
         _runq_deassign(svc);
=20
         cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                    cpupool_domain_cpumask(unit->domain));
+                    cpupool_domain_master_cpumask(unit->domain));
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
         sched_set_res(unit,
@@ -2547,7 +2547,7 @@ static bool unit_is_migrateable(struct csched2_unit *=
svc,
     int cpu =3D sched_unit_master(unit);
=20
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(unit->domain));
+                cpupool_domain_master_cpumask(unit->domain));
=20
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
            cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
@@ -2763,7 +2763,7 @@ csched2_unit_migrate(
      * v->processor will be chosen, and during actual domain unpause that
      * the unit will be assigned to and added to the proper runqueue.
      */
-    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_master_cpumask=
(d))) )
     {
         ASSERT(system_state =3D=3D SYS_STATE_suspend);
         if ( unit_on_runq(svc) )
@@ -3069,7 +3069,7 @@ csched2_alloc_domdata(const struct scheduler *ops, st=
ruct domain *dom)
     sdom->nr_units =3D 0;
=20
     init_timer(&sdom->repl_timer, replenish_domain_budget, sdom,
-               cpumask_any(cpupool_domain_cpumask(dom)));
+               cpumask_any(cpupool_domain_master_cpumask(dom)));
     spin_lock_init(&sdom->budget_lock);
     INIT_LIST_HEAD(&sdom->parked_units);
=20
@@ -3317,7 +3317,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                                  cpumask_scratch);
         if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) )
         {
-            cpumask_t *online =3D cpupool_domain_cpumask(scurr->unit->doma=
in);
+            cpumask_t *online =3D cpupool_domain_master_cpumask(scurr->uni=
t->domain);
=20
             /* Ok, is any of the pcpus in scurr soft-affinity idle? */
             cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle);
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 3dde1dcd00..2525464a7c 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -125,7 +125,7 @@ static inline bool unit_check_affinity(struct sched_uni=
t *unit,
 {
     affinity_balance_cpumask(unit, balance_step, cpumask_scratch_cpu(cpu));
     cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                cpupool_domain_cpumask(unit->domain));
+                cpupool_domain_master_cpumask(unit->domain));
=20
     return cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu));
 }
@@ -266,7 +266,7 @@ pick_res(struct null_private *prv, const struct sched_u=
nit *unit)
 {
     unsigned int bs;
     unsigned int cpu =3D sched_unit_master(unit), new_cpu;
-    cpumask_t *cpus =3D cpupool_domain_cpumask(unit->domain);
+    cpumask_t *cpus =3D cpupool_domain_master_cpumask(unit->domain);
=20
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
=20
@@ -467,7 +467,7 @@ static void null_unit_insert(const struct scheduler *op=
s,
     lock =3D unit_schedule_lock(unit);
=20
     cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                cpupool_domain_cpumask(unit->domain));
+                cpupool_domain_master_cpumask(unit->domain));
=20
     /* If the pCPU is free, we assign unit to it */
     if ( likely(per_cpu(npc, cpu).unit =3D=3D NULL) )
@@ -579,7 +579,7 @@ static void null_unit_wake(const struct scheduler *ops,
         spin_unlock(&prv->waitq_lock);
=20
         cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                    cpupool_domain_cpumask(unit->domain));
+                    cpupool_domain_master_cpumask(unit->domain));
=20
         if ( !cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)=
) )
         {
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index fd882f2ca4..d21c416cae 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -326,7 +326,7 @@ rt_dump_unit(const struct scheduler *ops, const struct =
rt_unit *svc)
      */
     mask =3D cpumask_scratch_cpu(sched_unit_master(svc->unit));
=20
-    cpupool_mask =3D cpupool_domain_cpumask(svc->unit->domain);
+    cpupool_mask =3D cpupool_domain_master_cpumask(svc->unit->domain);
     cpumask_and(mask, cpupool_mask, svc->unit->cpu_hard_affinity);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=3D%"PRI_stime" cur_d=3D%"PRI_stime" last_start=3D%"PRI_=
stime"\n"
@@ -642,7 +642,7 @@ rt_res_pick(const struct scheduler *ops, const struct s=
ched_unit *unit)
     cpumask_t *online;
     int cpu;
=20
-    online =3D cpupool_domain_cpumask(unit->domain);
+    online =3D cpupool_domain_master_cpumask(unit->domain);
     cpumask_and(&cpus, online, unit->cpu_hard_affinity);
=20
     cpu =3D cpumask_test_cpu(sched_unit_master(unit), &cpus)
@@ -1016,7 +1016,7 @@ runq_pick(const struct scheduler *ops, const cpumask_=
t *mask)
         iter_svc =3D q_elem(iter);
=20
         /* mask cpu_hard_affinity & cpupool & mask */
-        online =3D cpupool_domain_cpumask(iter_svc->unit->domain);
+        online =3D cpupool_domain_master_cpumask(iter_svc->unit->domain);
         cpumask_and(&cpu_common, online, iter_svc->unit->cpu_hard_affinity=
);
         cpumask_and(&cpu_common, mask, &cpu_common);
         if ( cpumask_empty(&cpu_common) )
@@ -1191,7 +1191,7 @@ runq_tickle(const struct scheduler *ops, struct rt_un=
it *new)
     if ( new =3D=3D NULL || is_idle_unit(new->unit) )
         return;
=20
-    online =3D cpupool_domain_cpumask(new->unit->domain);
+    online =3D cpupool_domain_master_cpumask(new->unit->domain);
     cpumask_and(&not_tickled, online, new->unit->cpu_hard_affinity);
     cpumask_andnot(&not_tickled, &not_tickled, &prv->tickled);
=20
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3094ff6838..36b1d3df6e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -63,6 +63,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
=20
 /* Number of vcpus per struct sched_unit. */
 static unsigned int __read_mostly sched_granularity =3D 1;
+const cpumask_t *sched_res_mask =3D &cpumask_all;
=20
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
@@ -188,7 +189,7 @@ static inline struct scheduler *vcpu_scheduler(const st=
ruct vcpu *v)
 {
     return unit_scheduler(v->sched_unit);
 }
-#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
+#define VCPU2ONLINE(_v) cpupool_domain_master_cpumask((_v)->domain)
=20
 static inline void trace_runstate_change(struct vcpu *v, int new_state)
 {
@@ -425,9 +426,9 @@ static unsigned int sched_select_initial_cpu(const stru=
ct vcpu *v)
     cpumask_clear(cpus);
     for_each_node_mask ( node, d->node_affinity )
         cpumask_or(cpus, cpus, &node_to_cpumask(node));
-    cpumask_and(cpus, cpus, cpupool_domain_cpumask(d));
+    cpumask_and(cpus, cpus, d->cpupool->cpu_valid);
     if ( cpumask_empty(cpus) )
-        cpumask_copy(cpus, cpupool_domain_cpumask(d));
+        cpumask_copy(cpus, d->cpupool->cpu_valid);
=20
     if ( v->vcpu_id =3D=3D 0 )
         cpu_ret =3D cpumask_first(cpus);
@@ -973,7 +974,7 @@ void restore_vcpu_affinity(struct domain *d)
         lock =3D unit_schedule_lock_irq(unit);
=20
         cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-                    cpupool_domain_cpumask(d));
+                    cpupool_domain_master_cpumask(d));
         if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
         {
             if ( sched_check_affinity_broken(unit) )
@@ -981,7 +982,7 @@ void restore_vcpu_affinity(struct domain *d)
                 sched_set_affinity(unit, unit->cpu_hard_affinity_saved, NU=
LL);
                 sched_reset_affinity_broken(unit);
                 cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affin=
ity,
-                            cpupool_domain_cpumask(d));
+                            cpupool_domain_master_cpumask(d));
             }
=20
             if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
@@ -991,7 +992,7 @@ void restore_vcpu_affinity(struct domain *d)
                        unit->vcpu_list);
                 sched_set_affinity(unit, &cpumask_all, NULL);
                 cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affin=
ity,
-                            cpupool_domain_cpumask(d));
+                            cpupool_domain_master_cpumask(d));
             }
         }
=20
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 983f2ece83..1b296b150f 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -22,6 +22,8 @@ extern cpumask_t cpupool_free_cpus;
 #define SCHED_DEFAULT_RATELIMIT_US 1000
 extern int sched_ratelimit_us;
=20
+/* Scheduling resource mask. */
+extern const cpumask_t *sched_res_mask;
=20
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -535,6 +537,7 @@ struct cpupool
     int              cpupool_id;
     unsigned int     n_dom;
     cpumask_var_t    cpu_valid;      /* all cpus assigned to pool */
+    cpumask_var_t    res_valid;      /* all scheduling resources of pool */
     struct cpupool   *next;
     struct scheduler *sched;
     atomic_t         refcnt;
@@ -543,14 +546,14 @@ struct cpupool
 #define cpupool_online_cpumask(_pool) \
     (((_pool) =3D=3D NULL) ? &cpu_online_map : (_pool)->cpu_valid)
=20
-static inline cpumask_t *cpupool_domain_cpumask(const struct domain *d)
+static inline cpumask_t *cpupool_domain_master_cpumask(const struct domain=
 *d)
 {
     /*
      * d->cpupool is NULL only for the idle domain, and no one should
      * be interested in calling this for the idle domain.
      */
     ASSERT(d->cpupool !=3D NULL);
-    return d->cpupool->cpu_valid;
+    return d->cpupool->res_valid;
 }
=20
 /*
@@ -590,7 +593,7 @@ static inline cpumask_t *cpupool_domain_cpumask(const s=
truct domain *d)
 static inline int has_soft_affinity(const struct sched_unit *unit)
 {
     return unit->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(unit->domain),
+           !cpumask_subset(cpupool_domain_master_cpumask(unit->domain),
                            unit->cpu_soft_affinity);
 }
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001343; cv=none;
	d=zoho.com; s=zohoarc;
	b=EKB6RROGGoO1fAYwKPiDiEVSGSrgMDIhMsKrGDUYfOB7YuGtU27k1Wgu8Vuzd/6ZVBWqSH69F6srKfwNumQYo71lB/qjF/KJTkhOJqdWCmQS4Byvh0tPEIVKF8J1MrpCYQYRhDqq20acOM4Bltgw+uGIn+3q6x31sAbotYz7Pmk=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001343;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=jk/sEwM0WlqV7N0IrzD8pAROAFnI8EEY4DLi+oApRaY=;
	b=IIrx15KMJTThLMds6XNFmtcrplUP8NFJlHjtVDXiogJ0IuTs6akZLO1zEg3cLG2So0eODY31oiRnuiL8KR2KeQMqnRwC9BEGzGckVor/l11bexsqOw2A/bz2RmIJwKf8YysZl9XuDZOu6s2ZS0hv7GWuTXrG5Sn24Q09fdhW08E=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001343366124.33629084561619;
 Wed, 2 Oct 2019 00:29:03 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3J-0001KC-Sz; Wed, 02 Oct 2019 07:28:01 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3I-0001JY-MK
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:00 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 23d7c808-e4e6-11e9-8628-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:51 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 5AE6FAF98;
 Wed,  2 Oct 2019 07:27:49 +0000 (UTC)
X-Inumbo-ID: 23d7c808-e4e6-11e9-8628-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:30 +0200
Message-Id: <20191002072745.24919-6-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 05/20] xen/sched: support allocating multiple
 vcpus into one sched unit
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

With a scheduling granularity greater than 1 multiple vcpus share the
same struct sched_unit. Support that.

Setting the initial processor must be done carefully: we can't use
sched_set_res() as that relies on for_each_sched_unit_vcpu() which in
turn needs the vcpu already as a member of the domain's vcpu linked
list, which isn't the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V4:
- merge patch 36 of V3 into this one (Jan Beulich)
- add some comments (Jan Beulich)
- use unit_id instead of vcpu_list->vcpu_id (Jan Beulich)
---
 xen/common/schedule.c | 97 ++++++++++++++++++++++++++++++++++++++++-------=
----
 1 file changed, 76 insertions(+), 21 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 36b1d3df6e..37002b4c0e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -349,7 +349,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1,=
 spinlock_t *lock2,
     spin_unlock_irqrestore(lock1, flags);
 }
=20
-static void sched_free_unit(struct sched_unit *unit)
+static void sched_free_unit_mem(struct sched_unit *unit)
 {
     struct sched_unit *prev_unit;
     struct domain *d =3D unit->domain;
@@ -368,8 +368,6 @@ static void sched_free_unit(struct sched_unit *unit)
         }
     }
=20
-    unit->vcpu_list->sched_unit =3D NULL;
-
     free_cpumask_var(unit->cpu_hard_affinity);
     free_cpumask_var(unit->cpu_hard_affinity_saved);
     free_cpumask_var(unit->cpu_soft_affinity);
@@ -377,18 +375,65 @@ static void sched_free_unit(struct sched_unit *unit)
     xfree(unit);
 }
=20
+static void sched_free_unit(struct sched_unit *unit, struct vcpu *v)
+{
+    struct vcpu *vunit;
+    unsigned int cnt =3D 0;
+
+    /* Don't count to be released vcpu, might be not in vcpu list yet. */
+    for_each_sched_unit_vcpu ( unit, vunit )
+        if ( vunit !=3D v )
+            cnt++;
+
+    v->sched_unit =3D NULL;
+    unit->runstate_cnt[v->runstate.state]--;
+
+    if ( unit->vcpu_list =3D=3D v )
+        unit->vcpu_list =3D v->next_in_list;
+
+    if ( !cnt )
+        sched_free_unit_mem(unit);
+}
+
+static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v)
+{
+    v->sched_unit =3D unit;
+
+    /* All but idle vcpus are allocated with sequential vcpu_id. */
+    if ( !unit->vcpu_list || unit->vcpu_list->vcpu_id > v->vcpu_id )
+    {
+        unit->vcpu_list =3D v;
+        /*
+         * unit_id is always the same as lowest vcpu_id of unit.
+         * This is used for stopping for_each_sched_unit_vcpu() loop and in
+         * order to support cpupools with different granularities.
+         */
+        unit->unit_id =3D v->vcpu_id;
+    }
+    unit->runstate_cnt[v->runstate.state]++;
+}
+
 static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d =3D v->domain;
=20
+    for_each_sched_unit ( d, unit )
+        if ( unit->unit_id / sched_granularity =3D=3D
+             v->vcpu_id / sched_granularity )
+            break;
+
+    if ( unit )
+    {
+        sched_unit_add_vcpu(unit, v);
+        return unit;
+    }
+
     if ( (unit =3D xzalloc(struct sched_unit)) =3D=3D NULL )
         return NULL;
=20
-    unit->vcpu_list =3D v;
-    unit->unit_id =3D v->vcpu_id;
     unit->domain =3D d;
-    unit->runstate_cnt[v->runstate.state]++;
+    sched_unit_add_vcpu(unit, v);
=20
     for ( prev_unit =3D &d->sched_unit_list; *prev_unit;
           prev_unit =3D &(*prev_unit)->next_in_list )
@@ -404,12 +449,10 @@ static struct sched_unit *sched_alloc_unit(struct vcp=
u *v)
          !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
         goto fail;
=20
-    v->sched_unit =3D unit;
-
     return unit;
=20
  fail:
-    sched_free_unit(unit);
+    sched_free_unit(unit, v);
     return NULL;
 }
=20
@@ -459,21 +502,26 @@ int sched_init_vcpu(struct vcpu *v)
     else
         processor =3D sched_select_initial_cpu(v);
=20
-    sched_set_res(unit, get_sched_res(processor));
-
     /* Initialise the per-vcpu timers. */
     spin_lock_init(&v->periodic_timer_lock);
-    init_timer(&v->periodic_timer, vcpu_periodic_timer_fn,
-               v, v->processor);
-    init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn,
-               v, v->processor);
-    init_timer(&v->poll_timer, poll_timer_fn,
-               v, v->processor);
+    init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, v, processor);
+    init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn, v, processo=
r);
+    init_timer(&v->poll_timer, poll_timer_fn, v, processor);
+
+    /* If this is not the first vcpu of the unit we are done. */
+    if ( unit->priv !=3D NULL )
+    {
+        v->processor =3D processor;
+        return 0;
+    }
+
+    /* The first vcpu of an unit can be set via sched_set_res(). */
+    sched_set_res(unit, get_sched_res(processor));
=20
     unit->priv =3D sched_alloc_udata(dom_scheduler(d), unit, d->sched_priv=
);
     if ( unit->priv =3D=3D NULL )
     {
-        sched_free_unit(unit);
+        sched_free_unit(unit, v);
         return 1;
     }
=20
@@ -633,9 +681,16 @@ void sched_destroy_vcpu(struct vcpu *v)
     kill_timer(&v->poll_timer);
     if ( test_and_clear_bool(v->is_urgent) )
         atomic_dec(&per_cpu(sched_urgent_count, v->processor));
-    sched_remove_unit(vcpu_scheduler(v), unit);
-    sched_free_udata(vcpu_scheduler(v), unit->priv);
-    sched_free_unit(unit);
+    /*
+     * Vcpus are being destroyed top-down. So being the first vcpu of an u=
nit
+     * is the same as being the only one.
+     */
+    if ( unit->vcpu_list =3D=3D v )
+    {
+        sched_remove_unit(vcpu_scheduler(v), unit);
+        sched_free_udata(vcpu_scheduler(v), unit->priv);
+        sched_free_unit(unit, v);
+    }
 }
=20
 int sched_init_domain(struct domain *d, int poolid)
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001344; cv=none;
	d=zoho.com; s=zohoarc;
	b=jySQMcrhOrDoPb8Zdt9ozigGSr51CYsr+aS3AT1VH317ZyRNqF6mbv2xhBTFxHFvOou6T2tOrXQaxQ3ITN25JtnpBuaG25hdXN7NL2S8Cw+dwM5qbzSjw5fCd6X9mk+w7MDk+uIQxjT4Omx9y/yHMmRiKsIjoEQY0uzdvrMVNv8=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001344;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=y/QM81XlCM2xxDolS79u7Ai6yiZ5GZ+ElR2/CFzqIrM=;
	b=PBQjaoYY0Ccc05lcsMbgP7Wdi6Jx9MIwwBHBNTcVwjroZf2st22PvIKz2R1Z0QjQDOPzJD/NMlg4j+etHBMoU8JIc1m32tcXIbLtLhIzk3Mlw1Y12iSEys53ThwpzVwAnb4BY+PJ6lhWDm7lMUcm0eBXm+XxK6xAGF72GGDeObA=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001344627295.27647997053725;
 Wed, 2 Oct 2019 00:29:04 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3P-0001Ou-8O; Wed, 02 Oct 2019 07:28:07 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3O-0001OI-Mz
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:06 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 23d9a40c-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:51 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 7FE9BAFAE;
 Wed,  2 Oct 2019 07:27:49 +0000 (UTC)
X-Inumbo-ID: 23d9a40c-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:31 +0200
Message-Id: <20191002072745.24919-7-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 06/20] xen/sched: add a percpu resource index
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Add a percpu variable holding the index of the cpu in the current
sched_resource structure. This index is used to get the correct vcpu
of a sched_unit on a specific cpu.

For now this index will be zero for all cpus, but with core scheduling
it will be possible to have higher values, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
RFC V2: new patch (carved out from RFC V1 patch 49)
V4:
- make function parameter const (Jan Beulich)
---
 xen/common/schedule.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 37002b4c0e..c8e2999407 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -77,6 +77,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);
=20
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -144,6 +145,12 @@ static struct scheduler sched_idle_ops =3D {
     .switch_sched   =3D sched_idle_switch_sched,
 };
=20
+static inline struct vcpu *sched_unit2vcpu_cpu(const struct sched_unit *un=
it,
+                                               unsigned int cpu)
+{
+    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+}
+
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool !=3D NULL) )
@@ -2030,7 +2037,7 @@ static void sched_slave(void)
=20
     pcpu_schedule_unlock_irq(lock, cpu);
=20
-    sched_context_switch(vprev, next->vcpu_list, now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
 }
=20
 /*
@@ -2091,7 +2098,7 @@ static void schedule(void)
=20
     pcpu_schedule_unlock_irq(lock, cpu);
=20
-    vnext =3D next->vcpu_list;
+    vnext =3D sched_unit2vcpu_cpu(next, cpu);
     sched_context_switch(vprev, vnext, now);
 }
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001361; cv=none;
	d=zoho.com; s=zohoarc;
	b=Ood9LCOFgEYpT6l88z59nN3r1zJTMqr0Jk/rY/Hno5RdX4OpBkMUB/zZij5CClznjLn6TQxLE5BAV3Jy69RsIEJ16XW7ke9nKnwfCxWXqfoKgz/gGi2TZJF8EL7z4xEqvZ92DmcJoMpnOeybFBoofzxXEy/wPfQlenInjFLATsk=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001361;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=PnyNiWiSxqHP8WQ1OH9Bde3VwrBXBYuEkkoOOpx/RBg=;
	b=aFq1EscUKtwjpZFrTtzVmbT24N7eJ+x2IihNuiTLOEElNFl+qYnyRx3Q9dWNRXI3icGwq61wCNwZQdqpgrjZ2r7Lg8GbOgcFKxGh/2vA2hXc4G7W7T6G3LkTaD4/sfFvQNvsDZ06FgZtKSv4UkMD2lK7I2D7fiKSM0E1uuA9uFU=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 157000136152238.96986846287837;
 Wed, 2 Oct 2019 00:29:21 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3Z-0001Ym-SS; Wed, 02 Oct 2019 07:28:17 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3Y-0001XS-MY
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:16 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 23d9a40d-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:51 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id E35C8AFC2;
 Wed,  2 Oct 2019 07:27:49 +0000 (UTC)
X-Inumbo-ID: 23d9a40d-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:32 +0200
Message-Id: <20191002072745.24919-8-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 07/20] xen/sched: add fall back to idle vcpu
 when scheduling unit
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>,
 Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
 =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

When scheduling an unit with multiple vcpus there is no guarantee all
vcpus are available (e.g. above maxvcpus or vcpu offline). Fall back to
idle vcpu of the current cpu in that case. This requires to store the
correct schedule_unit pointer in the idle vcpu as long as it used as
fallback vcpu.

In order to modify the runstates of the correct vcpus when switching
schedule units merge sched_unit_runstate_change() into
sched_switch_units() and loop over the affected physical cpus instead
of the unit's vcpus. This in turn requires an access function to the
current variable of other cpus.

Today context_saved() is called in case previous and next vcpus differ
when doing a context switch. With an idle vcpu being capable to be a
substitute for an offline vcpu this is problematic when switching to
an idle scheduling unit. An idle previous vcpu leaves us in doubt which
schedule unit was active previously, so save the previous unit pointer
in the per-schedule resource area. If it is NULL the unit has not
changed and we don't have to set the previous unit to be not running.

When running an idle vcpu in a non-idle scheduling unit use a specific
guest idle loop not performing any non-softirq tasklets and
livepatching in order to avoid populating the cpu caches with memory
used by other domains (as far as possible). Softirqs are considered to
be save.

In order to avoid livepatching when going to guest idle another
variant of reset_stack_and_jump() not calling check_for_livepatch_work
is needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
RFC V2:
- new patch (Andrew Cooper)

V1:
- use urgent_count to select correct idle routine (Jan Beulich)

V2:
- set vcpu->is_running in context_saved()
- introduce reset_stack_and_jump_nolp() (Jan Beulich)
- readd scrubbing (Jan Beulich, Andrew Cooper)
- get_cpu_current() _NOT_ moved to include/asm-x86/current.h as the
  needed reference of stack_base[] results in a #include hell

V3:
- split context_saved() into unit_context_saved() and vcpu_context_saved()

V4:
- rename sd -> sr (Jan Beulich)
- use unsigned int for cpu (Jan Beulich)
- add comment in sched_context_switch() (Jan Beulich)
- add comment before definition of get_cpu_current() (Jan Beulich)

V5:
- add comment (Dario Faggioli)
---
 xen/arch/x86/domain.c         |  23 +++++
 xen/common/schedule.c         | 195 +++++++++++++++++++++++++++++---------=
----
 xen/include/asm-arm/current.h |   1 +
 xen/include/asm-x86/current.h |  19 +++-
 xen/include/asm-x86/smp.h     |   7 ++
 xen/include/xen/sched-if.h    |   4 +-
 xen/include/xen/sched.h       |   1 +
 7 files changed, 187 insertions(+), 63 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 27f99d3bcc..c8d7f491ea 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -159,6 +159,25 @@ static void idle_loop(void)
     }
 }
=20
+/*
+ * Idle loop for siblings in active schedule units.
+ * We don't do any standard idle work like tasklets or livepatching.
+ */
+static void guest_idle_loop(void)
+{
+    unsigned int cpu =3D smp_processor_id();
+
+    for ( ; ; )
+    {
+        ASSERT(!cpu_is_offline(cpu));
+
+        if ( !softirq_pending(cpu) && !scrub_free_pages() &&
+             !softirq_pending(cpu))
+            sched_guest_idle(pm_idle, cpu);
+        do_softirq();
+    }
+}
+
 void startup_cpu_idle_loop(void)
 {
     struct vcpu *v =3D current;
@@ -172,6 +191,10 @@ void startup_cpu_idle_loop(void)
=20
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
+    /* Idle vcpus might be attached to non-idle units! */
+    if ( !is_idle_domain(v->sched_unit->domain) )
+        reset_stack_and_jump_nolp(guest_idle_loop);
+
     reset_stack_and_jump(idle_loop);
 }
=20
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c8e2999407..b4c4b04ebe 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -145,10 +145,21 @@ static struct scheduler sched_idle_ops =3D {
     .switch_sched   =3D sched_idle_switch_sched,
 };
=20
+static inline struct vcpu *unit2vcpu_cpu(const struct sched_unit *unit,
+                                         unsigned int cpu)
+{
+    unsigned int idx =3D unit->unit_id + per_cpu(sched_res_idx, cpu);
+    const struct domain *d =3D unit->domain;
+
+    return (idx < d->max_vcpus) ? d->vcpu[idx] : NULL;
+}
+
 static inline struct vcpu *sched_unit2vcpu_cpu(const struct sched_unit *un=
it,
                                                unsigned int cpu)
 {
-    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+    struct vcpu *v =3D unit2vcpu_cpu(unit, cpu);
+
+    return (v && v->new_state =3D=3D RUNSTATE_running) ? v : idle_vcpu[cpu=
];
 }
=20
 static inline struct scheduler *dom_scheduler(const struct domain *d)
@@ -268,8 +279,11 @@ static inline void vcpu_runstate_change(
=20
     trace_runstate_change(v, new_state);
=20
-    unit->runstate_cnt[v->runstate.state]--;
-    unit->runstate_cnt[new_state]++;
+    if ( !is_idle_vcpu(v) )
+    {
+        unit->runstate_cnt[v->runstate.state]--;
+        unit->runstate_cnt[new_state]++;
+    }
=20
     delta =3D new_entry_time - v->runstate.state_entry_time;
     if ( delta > 0 )
@@ -281,21 +295,18 @@ static inline void vcpu_runstate_change(
     v->runstate.state =3D new_state;
 }
=20
-static inline void sched_unit_runstate_change(struct sched_unit *unit,
-    bool running, s_time_t new_entry_time)
+void sched_guest_idle(void (*idle) (void), unsigned int cpu)
 {
-    struct vcpu *v;
-
-    for_each_sched_unit_vcpu ( unit, v )
-    {
-        if ( running )
-            vcpu_runstate_change(v, v->new_state, new_entry_time);
-        else
-            vcpu_runstate_change(v,
-                ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
-                 (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)=
),
-                new_entry_time);
-    }
+    /*
+     * Another vcpu of the unit is active in guest context while this one =
is
+     * idle. In case of a scheduling event we don't want to have high late=
ncies
+     * due to a cpu needing to wake up from deep C state for joining the
+     * rendezvous, so avoid those deep C states by incrementing the urgent
+     * count of the cpu.
+     */
+    atomic_inc(&per_cpu(sched_urgent_count, cpu));
+    idle();
+    atomic_dec(&per_cpu(sched_urgent_count, cpu));
 }
=20
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
@@ -545,6 +556,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( is_idle_domain(d) )
     {
         get_sched_res(v->processor)->curr =3D unit;
+        get_sched_res(v->processor)->sched_unit_idle =3D unit;
         v->is_running =3D 1;
         unit->is_running =3D true;
         unit->state_entry_time =3D NOW();
@@ -877,7 +889,7 @@ static void sched_unit_move_locked(struct sched_unit *u=
nit,
  *
  * sched_unit_migrate_finish() will do the work now if it can, or simply
  * return if it can't (because unit is still running); in that case
- * sched_unit_migrate_finish() will be called by context_saved().
+ * sched_unit_migrate_finish() will be called by unit_context_saved().
  */
 static void sched_unit_migrate_start(struct sched_unit *unit)
 {
@@ -900,7 +912,7 @@ static void sched_unit_migrate_finish(struct sched_unit=
 *unit)
=20
     /*
      * If the unit is currently running, this will be handled by
-     * context_saved(); and in any case, if the bit is cleared, then
+     * unit_context_saved(); and in any case, if the bit is cleared, then
      * someone else has already done the work so we don't need to.
      */
     if ( unit->is_running )
@@ -1785,33 +1797,66 @@ static void sched_switch_units(struct sched_resourc=
e *sr,
                                struct sched_unit *next, struct sched_unit =
*prev,
                                s_time_t now)
 {
-    sr->curr =3D next;
-
-    TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit=
_id,
-             now - prev->state_entry_time);
-    TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit=
_id,
-             (next->vcpu_list->runstate.state =3D=3D RUNSTATE_runnable) ?
-             (now - next->state_entry_time) : 0, prev->next_time);
+    unsigned int cpu;
=20
     ASSERT(unit_running(prev));
=20
-    TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
-             next->domain->domain_id, next->unit_id);
+    if ( prev !=3D next )
+    {
+        sr->curr =3D next;
+        sr->prev =3D prev;
=20
-    sched_unit_runstate_change(prev, false, now);
+        TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id,
+                 prev->unit_id, now - prev->state_entry_time);
+        TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id,
+                 next->unit_id,
+                 (next->vcpu_list->runstate.state =3D=3D RUNSTATE_runnable=
) ?
+                 (now - next->state_entry_time) : 0, prev->next_time);
+        TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id,
+                 next->domain->domain_id, next->unit_id);
=20
-    ASSERT(!unit_running(next));
-    sched_unit_runstate_change(next, true, now);
+        ASSERT(!unit_running(next));
=20
-    /*
-     * NB. Don't add any trace records from here until the actual context
-     * switch, else lost_records resume will not work properly.
-     */
+        /*
+         * NB. Don't add any trace records from here until the actual cont=
ext
+         * switch, else lost_records resume will not work properly.
+         */
+
+        ASSERT(!next->is_running);
+        next->is_running =3D true;
+        next->state_entry_time =3D now;
+
+        if ( is_idle_unit(prev) )
+        {
+            prev->runstate_cnt[RUNSTATE_running] =3D 0;
+            prev->runstate_cnt[RUNSTATE_runnable] =3D sched_granularity;
+        }
+        if ( is_idle_unit(next) )
+        {
+            next->runstate_cnt[RUNSTATE_running] =3D sched_granularity;
+            next->runstate_cnt[RUNSTATE_runnable] =3D 0;
+        }
+    }
+
+    for_each_cpu ( cpu, sr->cpus )
+    {
+        struct vcpu *vprev =3D get_cpu_current(cpu);
+        struct vcpu *vnext =3D sched_unit2vcpu_cpu(next, cpu);
+
+        if ( vprev !=3D vnext || vprev->runstate.state !=3D vnext->new_sta=
te )
+        {
+            vcpu_runstate_change(vprev,
+                ((vprev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
+                 (vcpu_runnable(vprev) ? RUNSTATE_runnable : RUNSTATE_offl=
ine)),
+                now);
+            vcpu_runstate_change(vnext, vnext->new_state, now);
+        }
=20
-    ASSERT(!next->is_running);
-    next->vcpu_list->is_running =3D 1;
-    next->is_running =3D true;
-    next->state_entry_time =3D now;
+        vnext->is_running =3D 1;
+
+        if ( is_idle_vcpu(vnext) )
+            vnext->sched_unit =3D next;
+    }
 }
=20
 static bool sched_tasklet_check_cpu(unsigned int cpu)
@@ -1867,29 +1912,39 @@ static struct sched_unit *do_schedule(struct sched_=
unit *prev, s_time_t now,
     if ( prev->next_time >=3D 0 ) /* -ve means no limit */
         set_timer(&sr->s_timer, now + prev->next_time);
=20
-    if ( likely(prev !=3D next) )
-        sched_switch_units(sr, next, prev, now);
+    sched_switch_units(sr, next, prev, now);
=20
     return next;
 }
=20
-static void context_saved(struct vcpu *prev)
+static void vcpu_context_saved(struct vcpu *vprev, struct vcpu *vnext)
 {
-    struct sched_unit *unit =3D prev->sched_unit;
-
     /* Clear running flag /after/ writing context to memory. */
     smp_wmb();
=20
-    prev->is_running =3D 0;
+    if ( vprev !=3D vnext )
+        vprev->is_running =3D 0;
+}
+
+static void unit_context_saved(struct sched_resource *sr)
+{
+    struct sched_unit *unit =3D sr->prev;
+
+    if ( !unit )
+        return;
+
     unit->is_running =3D false;
     unit->state_entry_time =3D NOW();
+    sr->prev =3D NULL;
=20
     /* Check for migration request /after/ clearing running flag. */
     smp_mb();
=20
-    sched_context_saved(vcpu_scheduler(prev), unit);
+    sched_context_saved(unit_scheduler(unit), unit);
=20
-    sched_unit_migrate_finish(unit);
+    /* Idle never migrates and idle vcpus might belong to other units. */
+    if ( !is_idle_unit(unit) )
+        sched_unit_migrate_finish(unit);
 }
=20
 /*
@@ -1899,35 +1954,44 @@ static void context_saved(struct vcpu *prev)
  * The counter will be 0 in case no rendezvous is needed. For the rendezvo=
us
  * case it is initialised to the number of cpus to rendezvous plus 1. Each
  * member entering decrements the counter. The last one will decrement it =
to
- * 1 and perform the final needed action in that case (call of context_sav=
ed()
- * if vcpu was switched), and then set the counter to zero. The other memb=
ers
+ * 1 and perform the final needed action in that case (call of
+ * unit_context_saved()), and then set the counter to zero. The other memb=
ers
  * will wait until the counter becomes zero until they proceed.
  */
 void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 {
     struct sched_unit *next =3D vnext->sched_unit;
+    struct sched_resource *sr =3D get_sched_res(smp_processor_id());
=20
     if ( atomic_read(&next->rendezvous_out_cnt) )
     {
         int cnt =3D atomic_dec_return(&next->rendezvous_out_cnt);
=20
-        /* Call context_saved() before releasing other waiters. */
+        vcpu_context_saved(vprev, vnext);
+
+        /* Call unit_context_saved() before releasing other waiters. */
         if ( cnt =3D=3D 1 )
         {
-            if ( vprev !=3D vnext )
-                context_saved(vprev);
+            unit_context_saved(sr);
             atomic_set(&next->rendezvous_out_cnt, 0);
         }
         else
             while ( atomic_read(&next->rendezvous_out_cnt) )
                 cpu_relax();
     }
-    else if ( vprev !=3D vnext && sched_granularity =3D=3D 1 )
-        context_saved(vprev);
+    else
+    {
+        vcpu_context_saved(vprev, vnext);
+        if ( sched_granularity =3D=3D 1 )
+            unit_context_saved(sr);
+    }
+
+    if ( is_idle_vcpu(vprev) && vprev !=3D vnext )
+        vprev->sched_unit =3D sr->sched_unit_idle;
 }
=20
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
-                                 s_time_t now)
+                                 bool reset_idle_unit, s_time_t now)
 {
     if ( unlikely(vprev =3D=3D vnext) )
     {
@@ -1936,6 +2000,17 @@ static void sched_context_switch(struct vcpu *vprev,=
 struct vcpu *vnext,
                  now - vprev->runstate.state_entry_time,
                  vprev->sched_unit->next_time);
         sched_context_switched(vprev, vnext);
+
+        /*
+         * We are switching from a non-idle to an idle unit.
+         * A vcpu of the idle unit might have been running before due to
+         * the guest vcpu being blocked. We must adjust the unit of the id=
le
+         * vcpu which might have been set to the guest's one.
+         */
+        if ( reset_idle_unit )
+            vnext->sched_unit =3D
+                get_sched_res(smp_processor_id())->sched_unit_idle;
+
         trace_continue_running(vnext);
         return continue_running(vprev);
     }
@@ -1994,7 +2069,7 @@ static struct sched_unit *sched_wait_rendezvous_in(st=
ruct sched_unit *prev,
             pcpu_schedule_unlock_irq(*lock, cpu);
=20
             raise_softirq(SCHED_SLAVE_SOFTIRQ);
-            sched_context_switch(vprev, vprev, now);
+            sched_context_switch(vprev, vprev, false, now);
=20
             return NULL;         /* ARM only. */
         }
@@ -2037,7 +2112,8 @@ static void sched_slave(void)
=20
     pcpu_schedule_unlock_irq(lock, cpu);
=20
-    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu),
+                         is_idle_unit(next) && !is_idle_unit(prev), now);
 }
=20
 /*
@@ -2099,7 +2175,8 @@ static void schedule(void)
     pcpu_schedule_unlock_irq(lock, cpu);
=20
     vnext =3D sched_unit2vcpu_cpu(next, cpu);
-    sched_context_switch(vprev, vnext, now);
+    sched_context_switch(vprev, vnext,
+                         !is_idle_unit(prev) && is_idle_unit(next), now);
 }
=20
 /* The scheduler timer: force a run through the scheduler */
@@ -2170,6 +2247,7 @@ static int cpu_schedule_up(unsigned int cpu)
      */
=20
     sr->curr =3D idle_vcpu[cpu]->sched_unit;
+    sr->sched_unit_idle =3D idle_vcpu[cpu]->sched_unit;
=20
     sr->sched_priv =3D NULL;
=20
@@ -2339,6 +2417,7 @@ void __init scheduler_init(void)
     if ( vcpu_create(idle_domain, 0) =3D=3D NULL )
         BUG();
     get_sched_res(0)->curr =3D idle_vcpu[0]->sched_unit;
+    get_sched_res(0)->sched_unit_idle =3D idle_vcpu[0]->sched_unit;
 }
=20
 /*
diff --git a/xen/include/asm-arm/current.h b/xen/include/asm-arm/current.h
index 1653e89d30..88beb4645a 100644
--- a/xen/include/asm-arm/current.h
+++ b/xen/include/asm-arm/current.h
@@ -18,6 +18,7 @@ DECLARE_PER_CPU(struct vcpu *, curr_vcpu);
=20
 #define current            (this_cpu(curr_vcpu))
 #define set_current(vcpu)  do { current =3D (vcpu); } while (0)
+#define get_cpu_current(cpu)  (per_cpu(curr_vcpu, cpu))
=20
 /* Per-VCPU state that lives at the top of the stack */
 struct cpu_info {
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index f3508c3c08..0b47485337 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -77,6 +77,11 @@ struct cpu_info {
     /* get_stack_bottom() must be 16-byte aligned */
 };
=20
+static inline struct cpu_info *get_cpu_info_from_stack(unsigned long sp)
+{
+    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
+}
+
 static inline struct cpu_info *get_cpu_info(void)
 {
 #ifdef __clang__
@@ -87,7 +92,7 @@ static inline struct cpu_info *get_cpu_info(void)
     register unsigned long sp asm("rsp");
 #endif
=20
-    return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1;
+    return get_cpu_info_from_stack(sp);
 }
=20
 #define get_current()         (get_cpu_info()->current_vcpu)
@@ -124,16 +129,22 @@ unsigned long get_stack_dump_bottom (unsigned long sp=
);
 # define CHECK_FOR_LIVEPATCH_WORK ""
 #endif
=20
-#define reset_stack_and_jump(__fn)                                      \
+#define switch_stack_and_jump(fn, instr)                                \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
             "mov %0,%%"__OP"sp;"                                        \
-            CHECK_FOR_LIVEPATCH_WORK                                      \
+            instr                                                       \
              "jmp %c1"                                                  \
-            : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
+            : : "r" (guest_cpu_user_regs()), "i" (fn) : "memory" );     \
         unreachable();                                                  \
     })
=20
+#define reset_stack_and_jump(fn)                                        \
+    switch_stack_and_jump(fn, CHECK_FOR_LIVEPATCH_WORK)
+
+#define reset_stack_and_jump_nolp(fn)                                   \
+    switch_stack_and_jump(fn, "")
+
 /*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 61446d0efd..dbeed2fd41 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -77,6 +77,13 @@ void set_nr_sockets(void);
 /* Representing HT and core siblings in each socket. */
 extern cpumask_t **socket_cpumask;
=20
+/*
+ * To be used only while no context switch can occur on the cpu, i.e.
+ * by certain scheduling code only.
+ */
+#define get_cpu_current(cpu) \
+    (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu)
+
 #endif /* !__ASSEMBLY__ */
=20
 #endif
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 1b296b150f..41a1083a08 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -39,6 +39,8 @@ struct sched_resource {
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;
+    struct sched_unit  *sched_unit_idle;
+    struct sched_unit  *prev;
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer               =
 */
=20
@@ -194,7 +196,7 @@ static inline void sched_clear_pause_flags_atomic(struc=
t sched_unit *unit,
=20
 static inline struct sched_unit *sched_idle_unit(unsigned int cpu)
 {
-    return idle_vcpu[cpu]->sched_unit;
+    return get_sched_res(cpu)->sched_unit_idle;
 }
=20
 static inline unsigned int sched_get_resource_cpu(unsigned int cpu)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 12f00cd78d..ce4329db72 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -929,6 +929,7 @@ void restore_vcpu_affinity(struct domain *d);
=20
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate=
);
 uint64_t get_cpu_idle_time(unsigned int cpu);
+void sched_guest_idle(void (*idle) (void), unsigned int cpu);
=20
 /*
  * Used by idle loop to decide whether there is work to do:
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001347; cv=none;
	d=zoho.com; s=zohoarc;
	b=KNeagMbXNBjnx3oH0ZzYhHioZdBMgKwi9I6zHZ8FMO0T2V8vpSlLUWuR0CaYe7xnHeHs1wNltOTTIWETgpIHp4R7s/mLMAgbsHgGZoeGeqOikfIV41LQZWtOnjtR9kSY3Qb9c0rotVFCHMYqDCfYClV7onHMw/2dOVChYV5QgdU=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001347;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=Cw4vRLp2gSuBasku8QPpOfwMcIpkBpQfyDUtnYepU4k=;
	b=USlAiBI9bmM4wl4HN7vl8eq/SC3sfRB5HSQthDdRoUNwhkFYxnDUdj+QOmGzQpvzk7ZYBkQuZ57N19UH5PXftb2PltVGeUAv/p6pVISLU5aVG2OsEYyggD+MZJC0IbjsHHzpHHWjucfTTGlCmhMrnzh62uKQHpAVyhMe8PoFyk0=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001347089805.559481217406;
 Wed, 2 Oct 2019 00:29:07 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3O-0001OS-RL; Wed, 02 Oct 2019 07:28:06 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3N-0001NW-Lq
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:05 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 23eeb4b4-e4e6-11e9-8628-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:51 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 349B1AFE4;
 Wed,  2 Oct 2019 07:27:50 +0000 (UTC)
X-Inumbo-ID: 23eeb4b4-e4e6-11e9-8628-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:33 +0200
Message-Id: <20191002072745.24919-9-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 08/20] xen/sched: make vcpu_wake() and
 vcpu_sleep() core scheduling aware
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

vcpu_wake() and vcpu_sleep() need to be made core scheduling aware:
they might need to switch a single vcpu of an already scheduled unit
between running and not running.

Especially when vcpu_sleep() for a vcpu is being called by a vcpu of
the same scheduling unit special care must be taken in order to avoid
a deadlock: the vcpu to be put asleep must be forced through a
context switch without doing so for the calling vcpu. For this
purpose add a vcpu flag handled in sched_slave() and in
sched_wait_rendezvous_in() allowing a vcpu of the currently running
unit to switch state at a higher priority than a normal schedule
event.

Use the same mechanism when waking up a vcpu of a currently active
unit.

While at it make vcpu_sleep_nosync_locked() static as it is used in
schedule.c only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
RFC V2: add vcpu_sleep() handling and force_context_switch flag
V2: fix runstate change in sched_force_context_switch()
V4:
- use unit_scheduler() where appropriate (Jan Beulich)
- make cpu parameter unsigned int (Jan Beulich)
- comments (Jan Beulich)
- use true instead 1 for setting bool (Jan Beulich)
- const parameter (Jan Beulich)
V5:
- add comments (Dario Faggioli)
---
 xen/common/schedule.c      | 134 +++++++++++++++++++++++++++++++++++++++++=
++--
 xen/include/xen/sched-if.h |   9 ++-
 xen/include/xen/sched.h    |   2 +
 3 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index b4c4b04ebe..9442be1c83 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -751,8 +751,10 @@ void sched_destroy_domain(struct domain *d)
     }
 }
=20
-void vcpu_sleep_nosync_locked(struct vcpu *v)
+static void vcpu_sleep_nosync_locked(struct vcpu *v)
 {
+    struct sched_unit *unit =3D v->sched_unit;
+
     ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock));
=20
     if ( likely(!vcpu_runnable(v)) )
@@ -760,7 +762,15 @@ void vcpu_sleep_nosync_locked(struct vcpu *v)
         if ( v->runstate.state =3D=3D RUNSTATE_runnable )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
=20
-        sched_sleep(vcpu_scheduler(v), v->sched_unit);
+        /* Only put unit to sleep in case all vcpus are not runnable. */
+        if ( likely(!unit_runnable(unit)) )
+            sched_sleep(unit_scheduler(unit), unit);
+        else if ( unit_running(unit) > 1 && v->is_running &&
+                  !v->force_context_switch )
+        {
+            v->force_context_switch =3D true;
+            cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ);
+        }
     }
 }
=20
@@ -792,16 +802,27 @@ void vcpu_wake(struct vcpu *v)
 {
     unsigned long flags;
     spinlock_t *lock;
+    struct sched_unit *unit =3D v->sched_unit;
=20
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
=20
-    lock =3D unit_schedule_lock_irqsave(v->sched_unit, &flags);
+    lock =3D unit_schedule_lock_irqsave(unit, &flags);
=20
     if ( likely(vcpu_runnable(v)) )
     {
         if ( v->runstate.state >=3D RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        sched_wake(vcpu_scheduler(v), v->sched_unit);
+        /*
+         * Call sched_wake() unconditionally, even if unit is running alre=
ady.
+         * We might have not been de-scheduled after vcpu_sleep_nosync_loc=
ked()
+         * and are now to be woken up again.
+         */
+        sched_wake(unit_scheduler(unit), unit);
+        if ( unit->is_running && !v->is_running && !v->force_context_switc=
h )
+        {
+            v->force_context_switch =3D true;
+            cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ);
+        }
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -809,7 +830,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
     }
=20
-    unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+    unit_schedule_unlock_irqrestore(lock, flags, unit);
 }
=20
 void vcpu_unblock(struct vcpu *v)
@@ -2027,6 +2048,65 @@ static void sched_context_switch(struct vcpu *vprev,=
 struct vcpu *vnext,
     context_switch(vprev, vnext);
 }
=20
+/*
+ * Force a context switch of a single vcpu of an unit.
+ * Might be called either if a vcpu of an already running unit is woken up
+ * or if a vcpu of a running unit is put asleep with other vcpus of the sa=
me
+ * unit still running.
+ * Returns either NULL if v is already in the correct state or the vcpu to
+ * run next.
+ */
+static struct vcpu *sched_force_context_switch(struct vcpu *vprev,
+                                               struct vcpu *v,
+                                               unsigned int cpu, s_time_t =
now)
+{
+    v->force_context_switch =3D false;
+
+    if ( vcpu_runnable(v) =3D=3D v->is_running )
+        return NULL;
+
+    if ( vcpu_runnable(v) )
+    {
+        if ( is_idle_vcpu(vprev) )
+        {
+            vcpu_runstate_change(vprev, RUNSTATE_runnable, now);
+            vprev->sched_unit =3D get_sched_res(cpu)->sched_unit_idle;
+        }
+        vcpu_runstate_change(v, RUNSTATE_running, now);
+    }
+    else
+    {
+        /* Make sure not to switch last vcpu of an unit away. */
+        if ( unit_running(v->sched_unit) =3D=3D 1 )
+            return NULL;
+
+        v->new_state =3D vcpu_runstate_blocked(v);
+        vcpu_runstate_change(v, v->new_state, now);
+        v =3D sched_unit2vcpu_cpu(vprev->sched_unit, cpu);
+        if ( v !=3D vprev )
+        {
+            if ( is_idle_vcpu(vprev) )
+            {
+                vcpu_runstate_change(vprev, RUNSTATE_runnable, now);
+                vprev->sched_unit =3D get_sched_res(cpu)->sched_unit_idle;
+            }
+            else
+            {
+                v->sched_unit =3D vprev->sched_unit;
+                vcpu_runstate_change(v, RUNSTATE_running, now);
+            }
+        }
+    }
+
+    /* This vcpu will be switched to. */
+    v->is_running =3D true;
+
+    /* Make sure not to loose another slave call. */
+    raise_softirq(SCHED_SLAVE_SOFTIRQ);
+
+    return v;
+}
+
 /*
  * Rendezvous before taking a scheduling decision.
  * Called with schedule lock held, so all accesses to the rendezvous count=
er
@@ -2042,6 +2122,7 @@ static struct sched_unit *sched_wait_rendezvous_in(st=
ruct sched_unit *prev,
                                                    s_time_t now)
 {
     struct sched_unit *next;
+    struct vcpu *v;
=20
     if ( !--prev->rendezvous_in_cnt )
     {
@@ -2050,8 +2131,28 @@ static struct sched_unit *sched_wait_rendezvous_in(s=
truct sched_unit *prev,
         return next;
     }
=20
+    v =3D unit2vcpu_cpu(prev, cpu);
     while ( prev->rendezvous_in_cnt )
     {
+        if ( v && v->force_context_switch )
+        {
+            struct vcpu *vprev =3D current;
+
+            v =3D sched_force_context_switch(vprev, v, cpu, now);
+
+            if ( v )
+            {
+                /* We'll come back another time, so adjust rendezvous_in_c=
nt. */
+                prev->rendezvous_in_cnt++;
+                atomic_set(&prev->rendezvous_out_cnt, 0);
+
+                pcpu_schedule_unlock_irq(*lock, cpu);
+
+                sched_context_switch(vprev, v, false, now);
+            }
+
+            v =3D unit2vcpu_cpu(prev, cpu);
+        }
         /*
          * Coming from idle might need to do tasklet work.
          * In order to avoid deadlocks we can't do that here, but have to
@@ -2086,10 +2187,11 @@ static struct sched_unit *sched_wait_rendezvous_in(=
struct sched_unit *prev,
=20
 static void sched_slave(void)
 {
-    struct vcpu          *vprev =3D current;
+    struct vcpu          *v, *vprev =3D current;
     struct sched_unit    *prev =3D vprev->sched_unit, *next;
     s_time_t              now;
     spinlock_t           *lock;
+    bool                  do_softirq =3D false;
     unsigned int          cpu =3D smp_processor_id();
=20
     ASSERT_NOT_IN_ATOMIC();
@@ -2098,9 +2200,29 @@ static void sched_slave(void)
=20
     now =3D NOW();
=20
+    v =3D unit2vcpu_cpu(prev, cpu);
+    if ( v && v->force_context_switch )
+    {
+        v =3D sched_force_context_switch(vprev, v, cpu, now);
+
+        if ( v )
+        {
+            pcpu_schedule_unlock_irq(lock, cpu);
+
+            sched_context_switch(vprev, v, false, now);
+        }
+
+        do_softirq =3D true;
+    }
+
     if ( !prev->rendezvous_in_cnt )
     {
         pcpu_schedule_unlock_irq(lock, cpu);
+
+        /* Check for failed forced context switch. */
+        if ( do_softirq )
+            raise_softirq(SCHEDULE_SOFTIRQ);
+
         return;
     }
=20
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 41a1083a08..021c1d7c2c 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -102,6 +102,11 @@ static inline bool unit_runnable(const struct sched_un=
it *unit)
     return false;
 }
=20
+static inline int vcpu_runstate_blocked(const struct vcpu *v)
+{
+    return (v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : RUNSTATE_of=
fline;
+}
+
 /*
  * Returns whether a sched_unit is runnable and sets new_state for each of=
 its
  * vcpus. It is mandatory to determine the new runstate for all vcpus of a=
 unit
@@ -121,9 +126,7 @@ static inline bool unit_runnable_state(const struct sch=
ed_unit *unit)
     {
         runnable =3D vcpu_runnable(v);
=20
-        v->new_state =3D runnable ? RUNSTATE_running
-                                : (v->pause_flags & VPF_blocked)
-                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+        v->new_state =3D runnable ? RUNSTATE_running : vcpu_runstate_block=
ed(v);
=20
         if ( runnable )
             ret =3D true;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ce4329db72..f97303668a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -186,6 +186,8 @@ struct vcpu
     bool             is_running;
     /* VCPU should wake fast (do not deep sleep the CPU). */
     bool             is_urgent;
+    /* VCPU must context_switch without scheduling unit. */
+    bool             force_context_switch;
=20
 #ifdef VCPU_TRAP_LAST
 #define VCPU_TRAP_NONE    0
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001350; cv=none;
	d=zoho.com; s=zohoarc;
	b=LFiCsHXecWSNyz/RVvAPs0QVda4Y+0AKUBbIHi3ucanHmZlhiumWS+GelbGu2iZiiBfFItoPSph/nYCOgZkQi3ejee/+6itx16CivwzzNHMTJ56haEWk10FdB4ORdeHHRn2sKHiDA0nB/1k+4VfaFqGEDWuEONJGHNcA78s9SCs=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001350;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=w11bo7fgDFJLLvlwgsWa9U/O+KsZnHrlADSHDw/gFsQ=;
	b=fc7kS/A0SnLwAXK/wAwOocrsKQqtXiUCahstOThHCSu0N8mBiT7fR2HBVx+MUoNyydo6LFbUCj2ImpZrWsa9gXRBNZbKfx2Ae00LOHU/NA8LosmY3+f2Xy1mG/7pd71p6hThH4itTkGojIDrYzbI1puy1Z75bmxtw4rb+LNKGxw=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001350659575.7481446039134;
 Wed, 2 Oct 2019 00:29:10 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3T-0001SM-KE; Wed, 02 Oct 2019 07:28:11 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3S-0001Rt-LK
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:10 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 245a92ba-e4e6-11e9-b588-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:52 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 7F732AFF0;
 Wed,  2 Oct 2019 07:27:50 +0000 (UTC)
X-Inumbo-ID: 245a92ba-e4e6-11e9-b588-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:34 +0200
Message-Id: <20191002072745.24919-10-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 09/20] xen/sched: move per-cpu variable
 scheduler to struct sched_resource
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Having a pointer to struct scheduler in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch
V4:
- several renames sd -> sr (Jan Beulich)
- use ops instead or sr->scheduler (Jan Beulich)
---
 xen/common/sched_credit.c  | 18 +++++++++++-------
 xen/common/sched_credit2.c |  3 ++-
 xen/common/schedule.c      | 15 +++++++--------
 xen/include/xen/sched-if.h |  2 +-
 4 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index a6dff8ec62..86603adcb6 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -352,9 +352,10 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu);
 static inline void __runq_tickle(struct csched_unit *new)
 {
     unsigned int cpu =3D sched_unit_master(new->unit);
+    struct sched_resource *sr =3D get_sched_res(cpu);
     struct sched_unit *unit =3D new->unit;
     struct csched_unit * const cur =3D CSCHED_UNIT(curr_on_cpu(cpu));
-    struct csched_private *prv =3D CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct csched_private *prv =3D CSCHED_PRIV(sr->scheduler);
     cpumask_t mask, idle_mask, *online;
     int balance_step, idlers_empty;
=20
@@ -931,7 +932,8 @@ csched_unit_acct(struct csched_private *prv, unsigned i=
nt cpu)
 {
     struct sched_unit *currunit =3D current->sched_unit;
     struct csched_unit * const svc =3D CSCHED_UNIT(currunit);
-    const struct scheduler *ops =3D per_cpu(scheduler, cpu);
+    struct sched_resource *sr =3D get_sched_res(cpu);
+    const struct scheduler *ops =3D sr->scheduler;
=20
     ASSERT( sched_unit_master(currunit) =3D=3D cpu );
     ASSERT( svc->sdom !=3D NULL );
@@ -987,8 +989,7 @@ csched_unit_acct(struct csched_private *prv, unsigned i=
nt cpu)
              * idlers. But, if we are here, it means there is someone runn=
ing
              * on it, and hence the bit must be zero already.
              */
-            ASSERT(!cpumask_test_cpu(cpu,
-                                     CSCHED_PRIV(per_cpu(scheduler, cpu))-=
>idlers));
+            ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(ops)->idlers));
             cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
         }
     }
@@ -1083,6 +1084,7 @@ csched_unit_sleep(const struct scheduler *ops, struct=
 sched_unit *unit)
 {
     struct csched_unit * const svc =3D CSCHED_UNIT(unit);
     unsigned int cpu =3D sched_unit_master(unit);
+    struct sched_resource *sr =3D get_sched_res(cpu);
=20
     SCHED_STAT_CRANK(unit_sleep);
=20
@@ -1095,7 +1097,7 @@ csched_unit_sleep(const struct scheduler *ops, struct=
 sched_unit *unit)
          * But, we are here because unit is going to sleep while running o=
n cpu,
          * so the bit must be zero already.
          */
-        ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))=
->idlers));
+        ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(sr->scheduler)->idlers));
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
     }
     else if ( __unit_on_runq(svc) )
@@ -1575,8 +1577,9 @@ static void
 csched_tick(void *_cpu)
 {
     unsigned int cpu =3D (unsigned long)_cpu;
+    struct sched_resource *sr =3D get_sched_res(cpu);
     struct csched_pcpu *spc =3D CSCHED_PCPU(cpu);
-    struct csched_private *prv =3D CSCHED_PRIV(per_cpu(scheduler, cpu));
+    struct csched_private *prv =3D CSCHED_PRIV(sr->scheduler);
=20
     spc->tick++;
=20
@@ -1601,7 +1604,8 @@ csched_tick(void *_cpu)
 static struct csched_unit *
 csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 {
-    const struct csched_private * const prv =3D CSCHED_PRIV(per_cpu(schedu=
ler, cpu));
+    struct sched_resource *sr =3D get_sched_res(cpu);
+    const struct csched_private * const prv =3D CSCHED_PRIV(sr->scheduler);
     const struct csched_pcpu * const peer_pcpu =3D CSCHED_PCPU(peer_cpu);
     struct csched_unit *speer;
     struct list_head *iter;
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index d51df05887..af58ee161d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3268,8 +3268,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                unsigned int *skipped)
 {
     struct list_head *iter, *temp;
+    struct sched_resource *sr =3D get_sched_res(cpu);
     struct csched2_unit *snext =3D NULL;
-    struct csched2_private *prv =3D csched2_priv(per_cpu(scheduler, cpu));
+    struct csched2_private *prv =3D csched2_priv(sr->scheduler);
     bool yield =3D false, soft_aff_preempt =3D false;
=20
     *skipped =3D 0;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9442be1c83..5e9cee1f82 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -75,7 +75,6 @@ static void vcpu_singleshot_timer_fn(void *data);
 static void poll_timer_fn(void *data);
=20
 /* This is global for now so that private implementations can reach it */
-DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
 static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);
=20
@@ -200,7 +199,7 @@ static inline struct scheduler *unit_scheduler(const st=
ruct sched_unit *unit)
      */
=20
     ASSERT(is_idle_domain(d));
-    return per_cpu(scheduler, unit->res->master_cpu);
+    return unit->res->scheduler;
 }
=20
 static inline struct scheduler *vcpu_scheduler(const struct vcpu *v)
@@ -1921,8 +1920,8 @@ static bool sched_tasklet_check(unsigned int cpu)
 static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t no=
w,
                                       unsigned int cpu)
 {
-    struct scheduler *sched =3D per_cpu(scheduler, cpu);
     struct sched_resource *sr =3D get_sched_res(cpu);
+    struct scheduler *sched =3D sr->scheduler;
     struct sched_unit *next;
=20
     /* get policy-specific decision on scheduling... */
@@ -2342,7 +2341,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sr->cpus =3D cpumask_of(cpu);
     set_sched_res(cpu, sr);
=20
-    per_cpu(scheduler, cpu) =3D &sched_idle_ops;
+    sr->scheduler =3D &sched_idle_ops;
     spin_lock_init(&sr->_lock);
     sr->schedule_lock =3D &sched_free_cpu_lock;
     init_timer(&sr->s_timer, s_timer_fn, NULL, cpu);
@@ -2553,7 +2552,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
-    struct scheduler *old_ops =3D per_cpu(scheduler, cpu);
+    struct scheduler *old_ops =3D get_sched_res(cpu)->scheduler;
     struct scheduler *new_ops =3D (c =3D=3D NULL) ? &sched_idle_ops : c->s=
ched;
     struct cpupool *old_pool =3D per_cpu(cpupool, cpu);
     struct sched_resource *sd =3D get_sched_res(cpu);
@@ -2617,7 +2616,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
     ppriv_old =3D sd->sched_priv;
     new_lock =3D sched_switch_sched(new_ops, cpu, ppriv, vpriv);
=20
-    per_cpu(scheduler, cpu) =3D new_ops;
+    sd->scheduler =3D new_ops;
     sd->sched_priv =3D ppriv;
=20
     /*
@@ -2717,7 +2716,7 @@ void sched_tick_suspend(void)
     struct scheduler *sched;
     unsigned int cpu =3D smp_processor_id();
=20
-    sched =3D per_cpu(scheduler, cpu);
+    sched =3D get_sched_res(cpu)->scheduler;
     sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
@@ -2730,7 +2729,7 @@ void sched_tick_resume(void)
=20
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
-    sched =3D per_cpu(scheduler, cpu);
+    sched =3D get_sched_res(cpu)->scheduler;
     sched_do_tick_resume(sched, cpu);
 }
=20
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 021c1d7c2c..01821b3e5b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -36,6 +36,7 @@ extern const cpumask_t *sched_res_mask;
  * as the rest of the struct.  Just have the scheduler point to the
  * one it wants (This may be the one right in front of it).*/
 struct sched_resource {
+    struct scheduler   *scheduler;
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;
@@ -49,7 +50,6 @@ struct sched_resource {
     const cpumask_t    *cpus;           /* cpus covered by this struct    =
 */
 };
=20
-DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001357; cv=none;
	d=zoho.com; s=zohoarc;
	b=AlSUfY85vVO/9QpKeK3CQNDB9kbo4AA5mEMNiQF9pryLS0KCBLfZwJ8Jj0g9T99hM1MDvzcnQBT3BzS1k1xz9H+NkdgRMii6FffX/xTi16ijyvujfptwUTmm00nvJSxWpNDt2ZDZhrT8iFbxqnH4YpFzPMpUC994CY2l1TTonvU=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001357;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=UpXLqbFXyMwNw2ef8ng6ZsKQk6zDf0Z+/6Y5fLQVsDI=;
	b=BGwWX70jkMPsUM2KoW5oMeQpRjD9QO9NcvW8FUW36rM23kEkpsmvEP7+76jslvt1E7H9+Ap5eziJsvMVh3YdLkYiKOuknb2bRxL4gQm1Ldnau67oHHpvvZDAVVK6hT2bde5pub98LPcXm+21M24PFK1zNBf6FqQGqXMpycLhtks=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001357603269.3396827809013;
 Wed, 2 Oct 2019 00:29:17 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3V-0001Tc-0P; Wed, 02 Oct 2019 07:28:13 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3T-0001SP-Mv
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:11 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 245ca942-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:52 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id D5007B016;
 Wed,  2 Oct 2019 07:27:50 +0000 (UTC)
X-Inumbo-ID: 245ca942-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:35 +0200
Message-Id: <20191002072745.24919-11-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 10/20] xen/sched: move per-cpu variable
 cpupool to struct sched_resource
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Meng Xu <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Having a pointer to struct cpupool in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 4 +---
 xen/common/sched_credit.c  | 2 +-
 xen/common/sched_rt.c      | 2 +-
 xen/common/schedule.c      | 8 ++++----
 xen/include/xen/sched-if.h | 2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 441a26f16c..60a85f50e1 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -34,8 +34,6 @@ static cpumask_t cpupool_locked_cpus;
=20
 static DEFINE_SPINLOCK(cpupool_lock);
=20
-DEFINE_PER_CPU(struct cpupool *, cpupool);
-
 static void free_cpupool_struct(struct cpupool *c)
 {
     if ( c )
@@ -504,7 +502,7 @@ static int cpupool_cpu_add(unsigned int cpu)
      * (or unplugging would have failed) and that is the default behavior
      * anyway.
      */
-    per_cpu(cpupool, cpu) =3D NULL;
+    get_sched_res(cpu)->cpupool =3D NULL;
     ret =3D cpupool_assign_cpu_locked(cpupool0, cpu);
=20
     spin_unlock(&cpupool_lock);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 86603adcb6..31fdcd6a2f 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1681,7 +1681,7 @@ static struct csched_unit *
 csched_load_balance(struct csched_private *prv, int cpu,
     struct csched_unit *snext, bool *stolen)
 {
-    struct cpupool *c =3D per_cpu(cpupool, cpu);
+    struct cpupool *c =3D get_sched_res(cpu)->cpupool;
     struct csched_unit *speer;
     cpumask_t workers;
     cpumask_t *online;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index d21c416cae..6e93e50acb 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -774,7 +774,7 @@ rt_deinit_pdata(const struct scheduler *ops, void *pcpu=
, int cpu)
=20
     if ( prv->repl_timer.cpu =3D=3D cpu )
     {
-        struct cpupool *c =3D per_cpu(cpupool, cpu);
+        struct cpupool *c =3D get_sched_res(cpu)->cpupool;
         unsigned int new_cpu =3D cpumask_cycle(cpu, cpupool_online_cpumask=
(c));
=20
         /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5e9cee1f82..249ff8a882 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1120,7 +1120,7 @@ int cpu_disable_scheduler(unsigned int cpu)
     cpumask_t online_affinity;
     int ret =3D 0;
=20
-    c =3D per_cpu(cpupool, cpu);
+    c =3D get_sched_res(cpu)->cpupool;
     if ( c =3D=3D NULL )
         return ret;
=20
@@ -1189,7 +1189,7 @@ static int cpu_disable_scheduler_check(unsigned int c=
pu)
     struct vcpu *v;
     struct cpupool *c;
=20
-    c =3D per_cpu(cpupool, cpu);
+    c =3D get_sched_res(cpu)->cpupool;
     if ( c =3D=3D NULL )
         return 0;
=20
@@ -2554,8 +2554,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops =3D get_sched_res(cpu)->scheduler;
     struct scheduler *new_ops =3D (c =3D=3D NULL) ? &sched_idle_ops : c->s=
ched;
-    struct cpupool *old_pool =3D per_cpu(cpupool, cpu);
     struct sched_resource *sd =3D get_sched_res(cpu);
+    struct cpupool *old_pool =3D sd->cpupool;
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
=20
@@ -2637,7 +2637,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
     sched_free_udata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
=20
-    per_cpu(cpupool, cpu) =3D c;
+    get_sched_res(cpu)->cpupool =3D c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c !=3D NULL )
         cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 01821b3e5b..e675061290 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -37,6 +37,7 @@ extern const cpumask_t *sched_res_mask;
  * one it wants (This may be the one right in front of it).*/
 struct sched_resource {
     struct scheduler   *scheduler;
+    struct cpupool     *cpupool;
     spinlock_t         *schedule_lock,
                        _lock;
     struct sched_unit  *curr;
@@ -50,7 +51,6 @@ struct sched_resource {
     const cpumask_t    *cpus;           /* cpus covered by this struct    =
 */
 };
=20
-DECLARE_PER_CPU(struct cpupool *, cpupool);
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
=20
 static inline struct sched_resource *get_sched_res(unsigned int cpu)
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001352; cv=none;
	d=zoho.com; s=zohoarc;
	b=ZU+36Ub2FHZsaJenbZTgvfvtQ+gQSKexzGjHflu66DZjIQ0NNVOexKQQo03QtwlLSpqxBaLq8vqwkSOv+/fzhYK1SbFiHkg8CQSSq2+ttbZpItXjmBydSdP5RJnQE/hn7uthiS+m3z/7n4sMkzTwqBuBIiIHSVTzmgC/5KCmRhg=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001352;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=1rcn478HlGHF1o+OfL0om000jwo7QhWZVsMLVBxIse4=;
	b=eT1r+J/3llJTfHNDdgFJxXg6QII2xK+s2gsEi+P0tdIcrq1CVrrlZEYjNIkDUbJt3XHeDDmNSUYQ1Ot1bVi81b2xk6cGbMmaboZwHCfdQObjsTzViQd9qdkklZRMPN+Jlhi/+wpK09fdsLTc8ybwTipQAfFc7trPVsawHJPkAoU=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001352427879.816905734293;
 Wed, 2 Oct 2019 00:29:12 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3Y-0001XG-HW; Wed, 02 Oct 2019 07:28:16 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3X-0001We-MK
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:15 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 249dfdde-e4e6-11e9-b588-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:52 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 36916B022;
 Wed,  2 Oct 2019 07:27:51 +0000 (UTC)
X-Inumbo-ID: 249dfdde-e4e6-11e9-b588-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:36 +0200
Message-Id: <20191002072745.24919-12-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 11/20] xen/sched: reject switching smt on/off
 with core scheduling active
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>,
 =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

When core or socket scheduling are active enabling or disabling smt is
not possible as that would require a major host reconfiguration.

Add a bool sched_disable_smt_switching which will be set for core or
socket scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
---
V1:
- new patch
V2:
- EBUSY as return code (Jan Beulich, Dario Faggioli)
- __read_mostly for sched_disable_smt_switching (Jan Beulich)
---
 xen/arch/x86/sysctl.c   | 5 +++++
 xen/common/schedule.c   | 1 +
 xen/include/xen/sched.h | 1 +
 3 files changed, 7 insertions(+)

diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 3742ede61b..4a76f0f47f 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -209,6 +209,11 @@ long arch_do_sysctl(
                 ret =3D -EOPNOTSUPP;
                 break;
             }
+            if ( sched_disable_smt_switching )
+            {
+                ret =3D -EBUSY;
+                break;
+            }
             plug =3D op =3D=3D XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE;
             fn =3D smt_up_down_helper;
             hcpu =3D _p(plug);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 249ff8a882..0dcf004d78 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -63,6 +63,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
=20
 /* Number of vcpus per struct sched_unit. */
 static unsigned int __read_mostly sched_granularity =3D 1;
+bool __read_mostly sched_disable_smt_switching;
 const cpumask_t *sched_res_mask =3D &cpumask_all;
=20
 /* Common lock for free cpus. */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f97303668a..aa8257edc9 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1037,6 +1037,7 @@ static inline bool is_iommu_enabled(const struct doma=
in *d)
 }
=20
 extern bool sched_smt_power_savings;
+extern bool sched_disable_smt_switching;
=20
 extern enum cpufreq_controller {
     FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001366; cv=none;
	d=zoho.com; s=zohoarc;
	b=YmgXiKDSMgmlScNk34alnz7rFt8TfyMPY2AKlo7MXZLUjZi3JIXpg2WIYfCvFc4GitOyxdrJdD9oxodTRHb5IPBVmYMeD5AqK85ShhqpZwaQbRf5cbAbdR542+qo39Q5Rv5R5z6nLRyL3QZOBQHfVEuD7VBbH284SpbXLh9jSr8=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001366;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=rxPO0CT65wczWK7yC6X53dc2yLMKv8metxwamXhKE0Y=;
	b=XUeZumAXkP2wN2yzsTBDsDNlZnSGurN4TzLhS6Rhu3YUmwri0Bfdy6QqtQafBY+DK7Ch0KIzEnIKGpZSrmdoIyNJD9V5T3hG0m4PCqVT1rDUBwxBORbrqvIuXCr23mm9dMXubUSQxT8QqqGkFk375QI/dr5LTBH/63PvqUSPCd8=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001366909483.5398644296273;
 Wed, 2 Oct 2019 00:29:26 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3e-0001ed-VH; Wed, 02 Oct 2019 07:28:22 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3d-0001dD-MZ
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:21 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 24a0f1f6-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:52 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 820DAB03E;
 Wed,  2 Oct 2019 07:27:51 +0000 (UTC)
X-Inumbo-ID: 24a0f1f6-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:37 +0200
Message-Id: <20191002072745.24919-13-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 12/20] xen/sched: prepare per-cpupool
 scheduling granularity
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

On- and offlining cpus with core scheduling is rather complicated as
the cpus are taken on- or offline one by one, but scheduling wants them
rather to be handled per core.

As the future plan is to be able to select scheduling granularity per
cpupool prepare that by storing the granularity in struct
sched_resource (we need it there for free cpus which are not
associated to any cpupool). Free cpus will always use granularity 1.

Store the selected granularity option (cpu, core or socket) in the
cpupool , as we will need it to select the appropriate cpu mask when
populating the cpupool with cpus.

This will make on- and offlining of cpus much easier and avoids
writing code which would needed to be thrown away later.

Move the granularity related variables to cpupool.c as they are now
used form there only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch
V4:
- move opt_sched_granularity and sched_granularity to cpupool.c
  (Jan Beulich)
- rename c->opt_sched_granularity, drop c->granularity (Jan Beulich)
---
 xen/common/cpupool.c       |  9 +++++++++
 xen/common/schedule.c      | 27 ++++++++++++++++-----------
 xen/include/xen/sched-if.h | 11 +++++++++++
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 60a85f50e1..51f0ff0d88 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -34,6 +34,14 @@ static cpumask_t cpupool_locked_cpus;
=20
 static DEFINE_SPINLOCK(cpupool_lock);
=20
+static enum sched_gran __read_mostly opt_sched_granularity =3D SCHED_GRAN_=
cpu;
+static unsigned int __read_mostly sched_granularity =3D 1;
+
+unsigned int cpupool_get_granularity(const struct cpupool *c)
+{
+    return c ? sched_granularity : 1;
+}
+
 static void free_cpupool_struct(struct cpupool *c)
 {
     if ( c )
@@ -173,6 +181,7 @@ static struct cpupool *cpupool_create(
             return NULL;
         }
     }
+    c->gran =3D opt_sched_granularity;
=20
     *q =3D c;
=20
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 0dcf004d78..5257225050 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -62,7 +62,6 @@ int sched_ratelimit_us =3D SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
=20
 /* Number of vcpus per struct sched_unit. */
-static unsigned int __read_mostly sched_granularity =3D 1;
 bool __read_mostly sched_disable_smt_switching;
 const cpumask_t *sched_res_mask =3D &cpumask_all;
=20
@@ -435,10 +434,10 @@ static struct sched_unit *sched_alloc_unit(struct vcp=
u *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d =3D v->domain;
+    unsigned int gran =3D cpupool_get_granularity(d->cpupool);
=20
     for_each_sched_unit ( d, unit )
-        if ( unit->unit_id / sched_granularity =3D=3D
-             v->vcpu_id / sched_granularity )
+        if ( unit->unit_id / gran =3D=3D v->vcpu_id / gran )
             break;
=20
     if ( unit )
@@ -593,6 +592,7 @@ int sched_move_domain(struct domain *d, struct cpupool =
*c)
     void *unitdata;
     struct scheduler *old_ops;
     void *old_domdata;
+    unsigned int gran =3D cpupool_get_granularity(c);
=20
     for_each_vcpu ( d, v )
     {
@@ -604,8 +604,7 @@ int sched_move_domain(struct domain *d, struct cpupool =
*c)
     if ( IS_ERR(domdata) )
         return PTR_ERR(domdata);
=20
-    unit_priv =3D xzalloc_array(void *,
-                              DIV_ROUND_UP(d->max_vcpus, sched_granularity=
));
+    unit_priv =3D xzalloc_array(void *, DIV_ROUND_UP(d->max_vcpus, gran));
     if ( unit_priv =3D=3D NULL )
     {
         sched_free_domdata(c->sched, domdata);
@@ -1850,11 +1849,11 @@ static void sched_switch_units(struct sched_resourc=
e *sr,
         if ( is_idle_unit(prev) )
         {
             prev->runstate_cnt[RUNSTATE_running] =3D 0;
-            prev->runstate_cnt[RUNSTATE_runnable] =3D sched_granularity;
+            prev->runstate_cnt[RUNSTATE_runnable] =3D sr->granularity;
         }
         if ( is_idle_unit(next) )
         {
-            next->runstate_cnt[RUNSTATE_running] =3D sched_granularity;
+            next->runstate_cnt[RUNSTATE_running] =3D sr->granularity;
             next->runstate_cnt[RUNSTATE_runnable] =3D 0;
         }
     }
@@ -2003,7 +2002,7 @@ void sched_context_switched(struct vcpu *vprev, struc=
t vcpu *vnext)
     else
     {
         vcpu_context_saved(vprev, vnext);
-        if ( sched_granularity =3D=3D 1 )
+        if ( sr->granularity =3D=3D 1 )
             unit_context_saved(sr);
     }
=20
@@ -2123,11 +2122,12 @@ static struct sched_unit *sched_wait_rendezvous_in(=
struct sched_unit *prev,
 {
     struct sched_unit *next;
     struct vcpu *v;
+    unsigned int gran =3D get_sched_res(cpu)->granularity;
=20
     if ( !--prev->rendezvous_in_cnt )
     {
         next =3D do_schedule(prev, now, cpu);
-        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        atomic_set(&next->rendezvous_out_cnt, gran + 1);
         return next;
     }
=20
@@ -2251,6 +2251,7 @@ static void schedule(void)
     struct sched_resource *sr;
     spinlock_t           *lock;
     int cpu =3D smp_processor_id();
+    unsigned int          gran =3D get_sched_res(cpu)->granularity;
=20
     ASSERT_NOT_IN_ATOMIC();
=20
@@ -2276,11 +2277,11 @@ static void schedule(void)
=20
     now =3D NOW();
=20
-    if ( sched_granularity > 1 )
+    if ( gran > 1 )
     {
         cpumask_t mask;
=20
-        prev->rendezvous_in_cnt =3D sched_granularity;
+        prev->rendezvous_in_cnt =3D gran;
         cpumask_andnot(&mask, sr->cpus, cpumask_of(cpu));
         cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
         next =3D sched_wait_rendezvous_in(prev, &lock, cpu, now);
@@ -2348,6 +2349,9 @@ static int cpu_schedule_up(unsigned int cpu)
     init_timer(&sr->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&per_cpu(sched_urgent_count, cpu), 0);
=20
+    /* We start with cpu granularity. */
+    sr->granularity =3D 1;
+
     /* Boot CPU is dealt with later in scheduler_init(). */
     if ( cpu =3D=3D 0 )
         return 0;
@@ -2638,6 +2642,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
     sched_free_udata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
=20
+    get_sched_res(cpu)->granularity =3D cpupool_get_granularity(c);
     get_sched_res(cpu)->cpupool =3D c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c !=3D NULL )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index e675061290..f8f0f484cb 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -25,6 +25,13 @@ extern int sched_ratelimit_us;
 /* Scheduling resource mask. */
 extern const cpumask_t *sched_res_mask;
=20
+/* Number of vcpus per struct sched_unit. */
+enum sched_gran {
+    SCHED_GRAN_cpu,
+    SCHED_GRAN_core,
+    SCHED_GRAN_socket
+};
+
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
  * we have a per-cpu pointer, along with a pre-allocated set of
@@ -48,6 +55,7 @@ struct sched_resource {
=20
     /* Cpu with lowest id in scheduling resource. */
     unsigned int        master_cpu;
+    unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct    =
 */
 };
=20
@@ -546,6 +554,7 @@ struct cpupool
     struct cpupool   *next;
     struct scheduler *sched;
     atomic_t         refcnt;
+    enum sched_gran  gran;
 };
=20
 #define cpupool_online_cpumask(_pool) \
@@ -561,6 +570,8 @@ static inline cpumask_t *cpupool_domain_master_cpumask(=
const struct domain *d)
     return d->cpupool->res_valid;
 }
=20
+unsigned int cpupool_get_granularity(const struct cpupool *c);
+
 /*
  * Hard and soft affinity load balancing.
  *
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001362; cv=none;
	d=zoho.com; s=zohoarc;
	b=b20F1xNSvs4pf2vMPNgyapYyO3LzNGkCTzJTedZrppHqh714W9PPmQuXujfVTXBLy3Nj0453ckrGf/jFEipN5L1HT9GJ3AuRtIEr0AszywkPUqbjWHWHay6pOfp70IcD5oQ2n/JBtVCUFzxjC378eCMrW5EcVAdvFmCB3PGPk78=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001362;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=mwIrYjMSuN5B23oOwnfROF+hglyB3HoY+Xs7fUudbbE=;
	b=ZJwkcSoxrfzcqbq0MO7qOFZyzvDXrrs/3uEQt8494hEuP+0h32ofgHG7InVWNyC+ODzzCGdXJdB/+dFzc5ERkeTwbG+3MVUnEjyZ308wJ+oIKlXe1xoAqd41GSpezjFWhNZWtQPdXzy6qCXgWc8GWNJ9pRhx3DdJNpHyvUHIUpo=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001362846229.9121802985777;
 Wed, 2 Oct 2019 00:29:22 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3e-0001dg-AE; Wed, 02 Oct 2019 07:28:22 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3c-0001c7-Mh
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:20 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 24eddc64-e4e6-11e9-bf31-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:53 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id C4214AF93;
 Wed,  2 Oct 2019 07:27:51 +0000 (UTC)
X-Inumbo-ID: 24eddc64-e4e6-11e9-bf31-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:38 +0200
Message-Id: <20191002072745.24919-14-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 13/20] xen/sched: split schedule_cpu_switch()
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Instead of letting schedule_cpu_switch() handle moving cpus from and
to cpupools, split it into schedule_cpu_add() and schedule_cpu_rm().

This will allow us to drop allocating/freeing scheduler data for free
cpus as the idle scheduler doesn't need such data.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch
V4:
- rename sd -> sr (Jan Beulich)
---
 xen/common/cpupool.c    |   4 +-
 xen/common/schedule.c   | 133 +++++++++++++++++++++++++++-----------------=
----
 xen/include/xen/sched.h |   3 +-
 3 files changed, 78 insertions(+), 62 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 51f0ff0d88..02825e779d 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -271,7 +271,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c,=
 unsigned int cpu)
=20
     if ( (cpupool_moving_cpu =3D=3D cpu) && (c !=3D cpupool_cpu_moving) )
         return -EADDRNOTAVAIL;
-    ret =3D schedule_cpu_switch(cpu, c);
+    ret =3D schedule_cpu_add(cpu, c);
     if ( ret )
         return ret;
=20
@@ -321,7 +321,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *=
c)
      */
     if ( !ret )
     {
-        ret =3D schedule_cpu_switch(cpu, NULL);
+        ret =3D schedule_cpu_rm(cpu);
         if ( ret )
             cpumask_clear_cpu(cpu, &cpupool_free_cpus);
         else
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5257225050..a96fc82282 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -93,15 +93,6 @@ static struct scheduler __read_mostly ops;
 static void sched_set_affinity(
     struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
=20
-static spinlock_t *
-sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
-                        void *pdata, void *vdata)
-{
-    sched_idle_unit(cpu)->priv =3D NULL;
-
-    return &sched_free_cpu_lock;
-}
-
 static struct sched_resource *
 sched_idle_res_pick(const struct scheduler *ops, const struct sched_unit *=
unit)
 {
@@ -141,7 +132,6 @@ static struct scheduler sched_idle_ops =3D {
=20
     .alloc_udata    =3D sched_idle_alloc_udata,
     .free_udata     =3D sched_idle_free_udata,
-    .switch_sched   =3D sched_idle_switch_sched,
 };
=20
 static inline struct vcpu *unit2vcpu_cpu(const struct sched_unit *unit,
@@ -2547,36 +2537,22 @@ void __init scheduler_init(void)
 }
=20
 /*
- * Move a pCPU outside of the influence of the scheduler of its current
- * cpupool, or subject it to the scheduler of a new cpupool.
- *
- * For the pCPUs that are removed from their cpupool, their scheduler beco=
mes
- * &sched_idle_ops (the idle scheduler).
+ * Move a pCPU from free cpus (running the idle scheduler) to a cpupool
+ * using any "real" scheduler.
+ * The cpu is still marked as "free" and not yet valid for its cpupool.
  */
-int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
+int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
-    struct scheduler *old_ops =3D get_sched_res(cpu)->scheduler;
-    struct scheduler *new_ops =3D (c =3D=3D NULL) ? &sched_idle_ops : c->s=
ched;
-    struct sched_resource *sd =3D get_sched_res(cpu);
-    struct cpupool *old_pool =3D sd->cpupool;
+    void *ppriv, *vpriv;
+    struct scheduler *new_ops =3D c->sched;
+    struct sched_resource *sr =3D get_sched_res(cpu);
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
=20
-    /*
-     * pCPUs only move from a valid cpupool to free (i.e., out of any pool=
),
-     * or from free to a valid cpupool. In the former case (which happens =
when
-     * c is NULL), we want the CPU to have been marked as free already, as
-     * well as to not be valid for the source pool any longer, when we get=
 to
-     * here. In the latter case (which happens when c is a valid cpupool),=
 we
-     * want the CPU to still be marked as free, as well as to not yet be v=
alid
-     * for the destination pool.
-     */
-    ASSERT(c !=3D old_pool && (c !=3D NULL || old_pool !=3D NULL));
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
-    ASSERT((c =3D=3D NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) =
||
-           (c !=3D NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
+    ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid));
+    ASSERT(get_sched_res(cpu)->cpupool =3D=3D NULL);
=20
     /*
      * To setup the cpu for the new scheduler we need:
@@ -2601,52 +2577,91 @@ int schedule_cpu_switch(unsigned int cpu, struct cp=
upool *c)
         return -ENOMEM;
     }
=20
-    sched_do_tick_suspend(old_ops, cpu);
-
     /*
-     * The actual switch, including (if necessary) the rerouting of the
-     * scheduler lock to whatever new_ops prefers,  needs to happen in one
-     * critical section, protected by old_ops' lock, or races are possible.
-     * It is, in fact, the lock of another scheduler that we are taking (t=
he
-     * scheduler of the cpupool that cpu still belongs to). But that is ok
-     * as, anyone trying to schedule on this cpu will spin until when we
-     * release that lock (bottom of this function). When he'll get the lock
-     * --thanks to the loop inside *_schedule_lock() functions-- he'll not=
ice
-     * that the lock itself changed, and retry acquiring the new one (which
-     * will be the correct, remapped one, at that point).
+     * The actual switch, including the rerouting of the scheduler lock to
+     * whatever new_ops prefers, needs to happen in one critical section,
+     * protected by old_ops' lock, or races are possible.
+     * It is, in fact, the lock of the idle scheduler that we are taking.
+     * But that is ok as anyone trying to schedule on this cpu will spin u=
ntil
+     * when we release that lock (bottom of this function). When he'll get=
 the
+     * lock --thanks to the loop inside *_schedule_lock() functions-- he'll
+     * notice that the lock itself changed, and retry acquiring the new one
+     * (which will be the correct, remapped one, at that point).
      */
     old_lock =3D pcpu_schedule_lock_irqsave(cpu, &flags);
=20
-    vpriv_old =3D idle->sched_unit->priv;
-    ppriv_old =3D sd->sched_priv;
     new_lock =3D sched_switch_sched(new_ops, cpu, ppriv, vpriv);
=20
-    sd->scheduler =3D new_ops;
-    sd->sched_priv =3D ppriv;
+    sr->scheduler =3D new_ops;
+    sr->sched_priv =3D ppriv;
=20
     /*
-     * The data above is protected under new_lock, which may be unlocked.
-     * Another CPU can take new_lock as soon as sd->schedule_lock is visib=
le,
-     * and must observe all prior initialisation.
+     * Reroute the lock to the per pCPU lock as /last/ thing. In fact,
+     * if it is free (and it can be) we want that anyone that manages
+     * taking it, finds all the initializations we've done above in place.
      */
     smp_wmb();
-    sd->schedule_lock =3D new_lock;
+    sr->schedule_lock =3D new_lock;
=20
-    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
     spin_unlock_irqrestore(old_lock, flags);
=20
     sched_do_tick_resume(new_ops, cpu);
=20
+    sr->granularity =3D cpupool_get_granularity(c);
+    sr->cpupool =3D c;
+    /* The  cpu is added to a pool, trigger it to go pick up some work */
+    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+
+    return 0;
+}
+
+/*
+ * Remove a pCPU from its cpupool. Its scheduler becomes &sched_idle_ops
+ * (the idle scheduler).
+ * The cpu is already marked as "free" and not valid any longer for its
+ * cpupool.
+ */
+int schedule_cpu_rm(unsigned int cpu)
+{
+    struct vcpu *idle;
+    void *ppriv_old, *vpriv_old;
+    struct sched_resource *sr =3D get_sched_res(cpu);
+    struct scheduler *old_ops =3D sr->scheduler;
+    spinlock_t *old_lock;
+    unsigned long flags;
+
+    ASSERT(sr->cpupool !=3D NULL);
+    ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
+    ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid));
+
+    idle =3D idle_vcpu[cpu];
+
+    sched_do_tick_suspend(old_ops, cpu);
+
+    /* See comment in schedule_cpu_add() regarding lock switching. */
+    old_lock =3D pcpu_schedule_lock_irqsave(cpu, &flags);
+
+    vpriv_old =3D idle->sched_unit->priv;
+    ppriv_old =3D sr->sched_priv;
+
+    idle->sched_unit->priv =3D NULL;
+    sr->scheduler =3D &sched_idle_ops;
+    sr->sched_priv =3D NULL;
+
+    smp_mb();
+    sr->schedule_lock =3D &sched_free_cpu_lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock_irqrestore(old_lock, flags);
+
     sched_deinit_pdata(old_ops, ppriv_old, cpu);
=20
     sched_free_udata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
=20
-    get_sched_res(cpu)->granularity =3D cpupool_get_granularity(c);
-    get_sched_res(cpu)->cpupool =3D c;
-    /* When a cpu is added to a pool, trigger it to go pick up some work */
-    if ( c !=3D NULL )
-        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+    sr->granularity =3D 1;
+    sr->cpupool =3D NULL;
=20
     return 0;
 }
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index aa8257edc9..a40bd5fb56 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -920,7 +920,8 @@ struct scheduler;
 struct scheduler *scheduler_get_default(void);
 struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr);
 void scheduler_free(struct scheduler *sched);
-int schedule_cpu_switch(unsigned int cpu, struct cpupool *c);
+int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
+int schedule_cpu_rm(unsigned int cpu);
 void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
 int cpu_disable_scheduler(unsigned int cpu);
 void sched_setup_dom0_vcpus(struct domain *d);
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001360; cv=none;
	d=zoho.com; s=zohoarc;
	b=E8xAdCeQRaTxZfzRJzP0MBp+JXVB3HvXfnRHSTJ87A/J/IB6XjO/9IgTzhd2xBzHBzGTnOt9xranXqhHKynKy2b6cyt2dPjR7MSI2iFMO5ASc6oJiwAc8n7zFvDZmeFLNr+PbBy3OE8sLqi1CyqqN7IbwrnEitQe26rQkUfE1DQ=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001360;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=2Ob4Z5rQinycDSNsM70cMTcGj4vXIZQhxqlpN0S/FbU=;
	b=FY/3Bgaaa2Lnjuu8t70pVtC/PvLoVYKhG9VWlceBbd33K6sTHSqfCQzg/rBDXTW3iKTMdvkI1l5U72CR0nF3kgjfrbMpmjqYMhbGe7rqNMDAXb081hiqtELcZi57P5xQbJ4z3Vi2jjedk9VFhT/VCOT0r35y+HPre+UUDmYGiGg=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001360473591.608087862734;
 Wed, 2 Oct 2019 00:29:20 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3v-0001zi-1b; Wed, 02 Oct 2019 07:28:39 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3s-0001wP-Ns
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:36 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 24f85b9e-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:53 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 28E5EB04F;
 Wed,  2 Oct 2019 07:27:52 +0000 (UTC)
X-Inumbo-ID: 24f85b9e-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:39 +0200
Message-Id: <20191002072745.24919-15-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 14/20] xen/sched: protect scheduling resource
 via rcu
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

In order to be able to move cpus to cpupools with core scheduling
active it is mandatory to merge multiple cpus into one scheduling
resource or to split a scheduling resource with multiple cpus in it
into multiple scheduling resources. This in turn requires to modify
the cpu <-> scheduling resource relation. In order to be able to free
unused resources protect struct sched_resource via RCU. This ensures
there are no users left when freeing such a resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       |   4 +
 xen/common/schedule.c      | 187 ++++++++++++++++++++++++++++++++++++++++-=
----
 xen/include/xen/sched-if.h |   7 +-
 3 files changed, 178 insertions(+), 20 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 02825e779d..7228ca84b4 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -511,8 +511,10 @@ static int cpupool_cpu_add(unsigned int cpu)
      * (or unplugging would have failed) and that is the default behavior
      * anyway.
      */
+    rcu_read_lock(&sched_res_rculock);
     get_sched_res(cpu)->cpupool =3D NULL;
     ret =3D cpupool_assign_cpu_locked(cpupool0, cpu);
+    rcu_read_unlock(&sched_res_rculock);
=20
     spin_unlock(&cpupool_lock);
=20
@@ -597,7 +599,9 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
         }
     }
=20
+    rcu_read_lock(&sched_res_rculock);
     sched_rm_cpu(cpu);
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 /*
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a96fc82282..1f23bf0e83 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -77,6 +77,7 @@ static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
 static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);
+DEFINE_RCU_READ_LOCK(sched_res_rculock);
=20
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -300,10 +301,12 @@ void sched_guest_idle(void (*idle) (void), unsigned i=
nt cpu)
=20
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate)
 {
-    spinlock_t *lock =3D likely(v =3D=3D current)
-                       ? NULL : unit_schedule_lock_irq(v->sched_unit);
+    spinlock_t *lock;
     s_time_t delta;
=20
+    rcu_read_lock(&sched_res_rculock);
+
+    lock =3D likely(v =3D=3D current) ? NULL : unit_schedule_lock_irq(v->s=
ched_unit);
     memcpy(runstate, &v->runstate, sizeof(*runstate));
     delta =3D NOW() - runstate->state_entry_time;
     if ( delta > 0 )
@@ -311,6 +314,8 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runs=
tate_info *runstate)
=20
     if ( unlikely(lock !=3D NULL) )
         unit_schedule_unlock_irq(lock, v->sched_unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 uint64_t get_cpu_idle_time(unsigned int cpu)
@@ -522,6 +527,8 @@ int sched_init_vcpu(struct vcpu *v)
         return 0;
     }
=20
+    rcu_read_lock(&sched_res_rculock);
+
     /* The first vcpu of an unit can be set via sched_set_res(). */
     sched_set_res(unit, get_sched_res(processor));
=20
@@ -529,6 +536,7 @@ int sched_init_vcpu(struct vcpu *v)
     if ( unit->priv =3D=3D NULL )
     {
         sched_free_unit(unit, v);
+        rcu_read_unlock(&sched_res_rculock);
         return 1;
     }
=20
@@ -555,6 +563,8 @@ int sched_init_vcpu(struct vcpu *v)
         sched_insert_unit(dom_scheduler(d), unit);
     }
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     return 0;
 }
=20
@@ -583,6 +593,7 @@ int sched_move_domain(struct domain *d, struct cpupool =
*c)
     struct scheduler *old_ops;
     void *old_domdata;
     unsigned int gran =3D cpupool_get_granularity(c);
+    int ret =3D 0;
=20
     for_each_vcpu ( d, v )
     {
@@ -590,15 +601,21 @@ int sched_move_domain(struct domain *d, struct cpupoo=
l *c)
             return -EBUSY;
     }
=20
+    rcu_read_lock(&sched_res_rculock);
+
     domdata =3D sched_alloc_domdata(c->sched, d);
     if ( IS_ERR(domdata) )
-        return PTR_ERR(domdata);
+    {
+        ret =3D PTR_ERR(domdata);
+        goto out;
+    }
=20
     unit_priv =3D xzalloc_array(void *, DIV_ROUND_UP(d->max_vcpus, gran));
     if ( unit_priv =3D=3D NULL )
     {
         sched_free_domdata(c->sched, domdata);
-        return -ENOMEM;
+        ret =3D -ENOMEM;
+        goto out;
     }
=20
     unit_idx =3D 0;
@@ -611,7 +628,8 @@ int sched_move_domain(struct domain *d, struct cpupool =
*c)
                 sched_free_udata(c->sched, unit_priv[unit_idx]);
             xfree(unit_priv);
             sched_free_domdata(c->sched, domdata);
-            return -ENOMEM;
+            ret =3D -ENOMEM;
+            goto out;
         }
         unit_idx++;
     }
@@ -677,7 +695,10 @@ int sched_move_domain(struct domain *d, struct cpupool=
 *c)
=20
     xfree(unit_priv);
=20
-    return 0;
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
+    return ret;
 }
=20
 void sched_destroy_vcpu(struct vcpu *v)
@@ -695,9 +716,13 @@ void sched_destroy_vcpu(struct vcpu *v)
      */
     if ( unit->vcpu_list =3D=3D v )
     {
+        rcu_read_lock(&sched_res_rculock);
+
         sched_remove_unit(vcpu_scheduler(v), unit);
         sched_free_udata(vcpu_scheduler(v), unit->priv);
         sched_free_unit(unit, v);
+
+        rcu_read_unlock(&sched_res_rculock);
     }
 }
=20
@@ -715,7 +740,12 @@ int sched_init_domain(struct domain *d, int poolid)
     SCHED_STAT_CRANK(dom_init);
     TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id);
=20
+    rcu_read_lock(&sched_res_rculock);
+
     sdom =3D sched_alloc_domdata(dom_scheduler(d), d);
+
+    rcu_read_unlock(&sched_res_rculock);
+
     if ( IS_ERR(sdom) )
         return PTR_ERR(sdom);
=20
@@ -733,9 +763,13 @@ void sched_destroy_domain(struct domain *d)
         SCHED_STAT_CRANK(dom_destroy);
         TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
=20
+        rcu_read_lock(&sched_res_rculock);
+
         sched_free_domdata(dom_scheduler(d), d->sched_priv);
         d->sched_priv =3D NULL;
=20
+        rcu_read_unlock(&sched_res_rculock);
+
         cpupool_rm_domain(d);
     }
 }
@@ -770,11 +804,15 @@ void vcpu_sleep_nosync(struct vcpu *v)
=20
     TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id);
=20
+    rcu_read_lock(&sched_res_rculock);
+
     lock =3D unit_schedule_lock_irqsave(v->sched_unit, &flags);
=20
     vcpu_sleep_nosync_locked(v);
=20
     unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 void vcpu_sleep_sync(struct vcpu *v)
@@ -795,6 +833,8 @@ void vcpu_wake(struct vcpu *v)
=20
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
=20
+    rcu_read_lock(&sched_res_rculock);
+
     lock =3D unit_schedule_lock_irqsave(unit, &flags);
=20
     if ( likely(vcpu_runnable(v)) )
@@ -820,6 +860,8 @@ void vcpu_wake(struct vcpu *v)
     }
=20
     unit_schedule_unlock_irqrestore(lock, flags, unit);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 void vcpu_unblock(struct vcpu *v)
@@ -853,6 +895,8 @@ static void sched_unit_move_locked(struct sched_unit *u=
nit,
     unsigned int old_cpu =3D unit->res->master_cpu;
     struct vcpu *v;
=20
+    rcu_read_lock(&sched_res_rculock);
+
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
      * once the switch occurs, v->is_urgent is no longer protected by
@@ -872,6 +916,8 @@ static void sched_unit_move_locked(struct sched_unit *u=
nit,
      * pointer can't change while the current lock is held.
      */
     sched_migrate(unit_scheduler(unit), unit, new_cpu);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 /*
@@ -1039,6 +1085,8 @@ void restore_vcpu_affinity(struct domain *d)
=20
     ASSERT(system_state =3D=3D SYS_STATE_resume);
=20
+    rcu_read_lock(&sched_res_rculock);
+
     for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
@@ -1095,6 +1143,8 @@ void restore_vcpu_affinity(struct domain *d)
             sched_move_irqs(unit);
     }
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     domain_update_node_affinity(d);
 }
=20
@@ -1110,9 +1160,11 @@ int cpu_disable_scheduler(unsigned int cpu)
     cpumask_t online_affinity;
     int ret =3D 0;
=20
+    rcu_read_lock(&sched_res_rculock);
+
     c =3D get_sched_res(cpu)->cpupool;
     if ( c =3D=3D NULL )
-        return ret;
+        goto out;
=20
     for_each_domain_in_cpupool ( d, c )
     {
@@ -1170,6 +1222,9 @@ int cpu_disable_scheduler(unsigned int cpu)
         }
     }
=20
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
=20
@@ -1201,7 +1256,9 @@ static int cpu_disable_scheduler_check(unsigned int c=
pu)
 static void sched_set_affinity(
     struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft)
 {
+    rcu_read_lock(&sched_res_rculock);
     sched_adjust_affinity(dom_scheduler(unit->domain), unit, hard, soft);
+    rcu_read_unlock(&sched_res_rculock);
=20
     if ( hard )
         cpumask_copy(unit->cpu_hard_affinity, hard);
@@ -1221,6 +1278,8 @@ static int vcpu_set_affinity(
     spinlock_t *lock;
     int ret =3D 0;
=20
+    rcu_read_lock(&sched_res_rculock);
+
     lock =3D unit_schedule_lock_irq(unit);
=20
     if ( v->affinity_broken )
@@ -1249,6 +1308,8 @@ static int vcpu_set_affinity(
=20
     sched_unit_migrate_finish(unit);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
=20
@@ -1375,11 +1436,16 @@ static long do_poll(struct sched_poll *sched_poll)
 long vcpu_yield(void)
 {
     struct vcpu * v=3Dcurrent;
-    spinlock_t *lock =3D unit_schedule_lock_irq(v->sched_unit);
+    spinlock_t *lock;
+
+    rcu_read_lock(&sched_res_rculock);
=20
+    lock =3D unit_schedule_lock_irq(v->sched_unit);
     sched_yield(vcpu_scheduler(v), v->sched_unit);
     unit_schedule_unlock_irq(lock, v->sched_unit);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     SCHED_STAT_CRANK(vcpu_yield);
=20
     TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id=
);
@@ -1476,6 +1542,8 @@ int vcpu_temporary_affinity(struct vcpu *v, unsigned =
int cpu, uint8_t reason)
     int ret =3D -EINVAL;
     bool migrate;
=20
+    rcu_read_lock(&sched_res_rculock);
+
     lock =3D unit_schedule_lock_irq(unit);
=20
     if ( cpu =3D=3D NR_CPUS )
@@ -1515,6 +1583,8 @@ int vcpu_temporary_affinity(struct vcpu *v, unsigned =
int cpu, uint8_t reason)
     if ( migrate )
         sched_unit_migrate_finish(unit);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
=20
@@ -1726,9 +1796,13 @@ long sched_adjust(struct domain *d, struct xen_domct=
l_scheduler_op *op)
=20
     /* NB: the pluggable scheduler code needs to take care
      * of locking by itself. */
+    rcu_read_lock(&sched_res_rculock);
+
     if ( (ret =3D sched_adjust_dom(dom_scheduler(d), d, op)) =3D=3D 0 )
         TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     return ret;
 }
=20
@@ -1749,9 +1823,13 @@ long sched_adjust_global(struct xen_sysctl_scheduler=
_op *op)
     if ( pool =3D=3D NULL )
         return -ESRCH;
=20
+    rcu_read_lock(&sched_res_rculock);
+
     rc =3D ((op->sched_id =3D=3D pool->sched->sched_id)
           ? sched_adjust_cpupool(pool->sched, op) : -EINVAL);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     cpupool_put(pool);
=20
     return rc;
@@ -1971,7 +2049,11 @@ static void unit_context_saved(struct sched_resource=
 *sr)
 void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
 {
     struct sched_unit *next =3D vnext->sched_unit;
-    struct sched_resource *sr =3D get_sched_res(smp_processor_id());
+    struct sched_resource *sr;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sr =3D get_sched_res(smp_processor_id());
=20
     if ( atomic_read(&next->rendezvous_out_cnt) )
     {
@@ -1998,6 +2080,8 @@ void sched_context_switched(struct vcpu *vprev, struc=
t vcpu *vnext)
=20
     if ( is_idle_vcpu(vprev) && vprev !=3D vnext )
         vprev->sched_unit =3D sr->sched_unit_idle;
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext,
@@ -2021,6 +2105,8 @@ static void sched_context_switch(struct vcpu *vprev, =
struct vcpu *vnext,
             vnext->sched_unit =3D
                 get_sched_res(smp_processor_id())->sched_unit_idle;
=20
+        rcu_read_unlock(&sched_res_rculock);
+
         trace_continue_running(vnext);
         return continue_running(vprev);
     }
@@ -2034,6 +2120,8 @@ static void sched_context_switch(struct vcpu *vprev, =
struct vcpu *vnext,
=20
     vcpu_periodic_timer_work(vnext);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     context_switch(vprev, vnext);
 }
=20
@@ -2186,6 +2274,8 @@ static void sched_slave(void)
=20
     ASSERT_NOT_IN_ATOMIC();
=20
+    rcu_read_lock(&sched_res_rculock);
+
     lock =3D pcpu_schedule_lock_irq(cpu);
=20
     now =3D NOW();
@@ -2209,6 +2299,8 @@ static void sched_slave(void)
     {
         pcpu_schedule_unlock_irq(lock, cpu);
=20
+        rcu_read_unlock(&sched_res_rculock);
+
         /* Check for failed forced context switch. */
         if ( do_softirq )
             raise_softirq(SCHEDULE_SOFTIRQ);
@@ -2241,13 +2333,16 @@ static void schedule(void)
     struct sched_resource *sr;
     spinlock_t           *lock;
     int cpu =3D smp_processor_id();
-    unsigned int          gran =3D get_sched_res(cpu)->granularity;
+    unsigned int          gran;
=20
     ASSERT_NOT_IN_ATOMIC();
=20
     SCHED_STAT_CRANK(sched_run);
=20
+    rcu_read_lock(&sched_res_rculock);
+
     sr =3D get_sched_res(cpu);
+    gran =3D sr->granularity;
=20
     lock =3D pcpu_schedule_lock_irq(cpu);
=20
@@ -2259,6 +2354,8 @@ static void schedule(void)
          */
         pcpu_schedule_unlock_irq(lock, cpu);
=20
+        rcu_read_unlock(&sched_res_rculock);
+
         raise_softirq(SCHEDULE_SOFTIRQ);
         return sched_slave();
     }
@@ -2370,14 +2467,27 @@ static int cpu_schedule_up(unsigned int cpu)
     return 0;
 }
=20
+static void sched_res_free(struct rcu_head *head)
+{
+    struct sched_resource *sr =3D container_of(head, struct sched_resource=
, rcu);
+
+    xfree(sr);
+}
+
 static void cpu_schedule_down(unsigned int cpu)
 {
-    struct sched_resource *sr =3D get_sched_res(cpu);
+    struct sched_resource *sr;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sr =3D get_sched_res(cpu);
=20
     kill_timer(&sr->s_timer);
=20
     set_sched_res(cpu, NULL);
-    xfree(sr);
+    call_rcu(&sr->rcu, sched_res_free);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 void sched_rm_cpu(unsigned int cpu)
@@ -2397,6 +2507,8 @@ static int cpu_schedule_callback(
     unsigned int cpu =3D (unsigned long)hcpu;
     int rc =3D 0;
=20
+    rcu_read_lock(&sched_res_rculock);
+
     /*
      * From the scheduler perspective, bringing up a pCPU requires
      * allocating and initializing the per-pCPU scheduler specific data,
@@ -2443,6 +2555,8 @@ static int cpu_schedule_callback(
         break;
     }
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
 }
=20
@@ -2532,8 +2646,13 @@ void __init scheduler_init(void)
     idle_domain->max_vcpus =3D nr_cpu_ids;
     if ( vcpu_create(idle_domain, 0) =3D=3D NULL )
         BUG();
+
+    rcu_read_lock(&sched_res_rculock);
+
     get_sched_res(0)->curr =3D idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_unit_idle =3D idle_vcpu[0]->sched_unit;
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 /*
@@ -2546,9 +2665,14 @@ int schedule_cpu_add(unsigned int cpu, struct cpupoo=
l *c)
     struct vcpu *idle;
     void *ppriv, *vpriv;
     struct scheduler *new_ops =3D c->sched;
-    struct sched_resource *sr =3D get_sched_res(cpu);
+    struct sched_resource *sr;
     spinlock_t *old_lock, *new_lock;
     unsigned long flags;
+    int ret =3D 0;
+
+    rcu_read_lock(&sched_res_rculock);
+
+    sr =3D get_sched_res(cpu);
=20
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid));
@@ -2568,13 +2692,18 @@ int schedule_cpu_add(unsigned int cpu, struct cpupo=
ol *c)
     idle =3D idle_vcpu[cpu];
     ppriv =3D sched_alloc_pdata(new_ops, cpu);
     if ( IS_ERR(ppriv) )
-        return PTR_ERR(ppriv);
+    {
+        ret =3D PTR_ERR(ppriv);
+        goto out;
+    }
+
     vpriv =3D sched_alloc_udata(new_ops, idle->sched_unit,
                               idle->domain->sched_priv);
     if ( vpriv =3D=3D NULL )
     {
         sched_free_pdata(new_ops, ppriv, cpu);
-        return -ENOMEM;
+        ret =3D -ENOMEM;
+        goto out;
     }
=20
     /*
@@ -2613,7 +2742,10 @@ int schedule_cpu_add(unsigned int cpu, struct cpupoo=
l *c)
     /* The  cpu is added to a pool, trigger it to go pick up some work */
     cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
=20
-    return 0;
+out:
+    rcu_read_unlock(&sched_res_rculock);
+
+    return ret;
 }
=20
 /*
@@ -2626,11 +2758,16 @@ int schedule_cpu_rm(unsigned int cpu)
 {
     struct vcpu *idle;
     void *ppriv_old, *vpriv_old;
-    struct sched_resource *sr =3D get_sched_res(cpu);
-    struct scheduler *old_ops =3D sr->scheduler;
+    struct sched_resource *sr;
+    struct scheduler *old_ops;
     spinlock_t *old_lock;
     unsigned long flags;
=20
+    rcu_read_lock(&sched_res_rculock);
+
+    sr =3D get_sched_res(cpu);
+    old_ops =3D sr->scheduler;
+
     ASSERT(sr->cpupool !=3D NULL);
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid));
@@ -2663,6 +2800,8 @@ int schedule_cpu_rm(unsigned int cpu)
     sr->granularity =3D 1;
     sr->cpupool =3D NULL;
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     return 0;
 }
=20
@@ -2711,6 +2850,8 @@ void schedule_dump(struct cpupool *c)
=20
     /* Locking, if necessary, must be handled withing each scheduler */
=20
+    rcu_read_lock(&sched_res_rculock);
+
     if ( c !=3D NULL )
     {
         sched =3D c->sched;
@@ -2730,6 +2871,8 @@ void schedule_dump(struct cpupool *c)
         for_each_cpu (i, cpus)
             sched_dump_cpu_state(sched, i);
     }
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 void sched_tick_suspend(void)
@@ -2737,10 +2880,14 @@ void sched_tick_suspend(void)
     struct scheduler *sched;
     unsigned int cpu =3D smp_processor_id();
=20
+    rcu_read_lock(&sched_res_rculock);
+
     sched =3D get_sched_res(cpu)->scheduler;
     sched_do_tick_suspend(sched, cpu);
     rcu_idle_enter(cpu);
     rcu_idle_timer_start();
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 void sched_tick_resume(void)
@@ -2748,10 +2895,14 @@ void sched_tick_resume(void)
     struct scheduler *sched;
     unsigned int cpu =3D smp_processor_id();
=20
+    rcu_read_lock(&sched_res_rculock);
+
     rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
     sched =3D get_sched_res(cpu)->scheduler;
     sched_do_tick_resume(sched, cpu);
+
+    rcu_read_unlock(&sched_res_rculock);
 }
=20
 void wait(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index f8f0f484cb..3988985ee6 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -10,6 +10,7 @@
=20
 #include <xen/percpu.h>
 #include <xen/err.h>
+#include <xen/rcupdate.h>
=20
 /* A global pointer to the initial cpupool (POOL0). */
 extern struct cpupool *cpupool0;
@@ -57,18 +58,20 @@ struct sched_resource {
     unsigned int        master_cpu;
     unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct    =
 */
+    struct rcu_head     rcu;
 };
=20
 DECLARE_PER_CPU(struct sched_resource *, sched_res);
+extern rcu_read_lock_t sched_res_rculock;
=20
 static inline struct sched_resource *get_sched_res(unsigned int cpu)
 {
-    return per_cpu(sched_res, cpu);
+    return rcu_dereference(per_cpu(sched_res, cpu));
 }
=20
 static inline void set_sched_res(unsigned int cpu, struct sched_resource *=
res)
 {
-    per_cpu(sched_res, cpu) =3D res;
+    rcu_assign_pointer(per_cpu(sched_res, cpu), res);
 }
=20
 static inline struct sched_unit *curr_on_cpu(unsigned int cpu)
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001367; cv=none;
	d=zoho.com; s=zohoarc;
	b=WQQhIZ2Ztv8zOfT51JVIan+LsHo/ys6nUhXBh9BZSJXOMalJMz+gBYz2skzwPd0kIwBacNjFpcB18NqbTd3Ntu7QlyRC9+MV06Ta2KuQypbD/dFJxOPaB2ZgjAiehFzCg6ud9Ge0enwXOUAJpUROXseYXD4Nj4cNWBuIvEgdE8M=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001367;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=RxuXabKP8MBNF05fMH9J0Bf6nRqKypcDT1IiKjXjW98=;
	b=KT8wHY4qcjYt7g2Cvqkw7fVlh0CIAEStL7sGXOF1JKPQc3Ssm+492CbjWgttWZZoxSgzSzFaOAdFanMMiC0yc+BfYEHOVu+JEJX+/FTCm3nfrJv5mtET/GJiw+307P5G3HGRAf3uEGYyMtKyG5mOjx4cbLcMHJDuF+ANajVKMSs=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001367042511.202567685944;
 Wed, 2 Oct 2019 00:29:27 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3j-0001k1-Eu; Wed, 02 Oct 2019 07:28:27 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3h-0001i4-Mn
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:25 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 251f510e-e4e6-11e9-bf31-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:53 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 66B17B061;
 Wed,  2 Oct 2019 07:27:52 +0000 (UTC)
X-Inumbo-ID: 251f510e-e4e6-11e9-bf31-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:40 +0200
Message-Id: <20191002072745.24919-16-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 15/20] xen/sched: support multiple cpus per
 scheduling resource
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Prepare supporting multiple cpus per scheduling resource by allocating
the cpumask per resource dynamically.

Modify sched_res_mask to have only one bit per scheduling resource set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch (carved out from other patch)
V4:
- use cpumask_t for sched_res_mask (Jan Beulich)
- clear cpu in sched_res_mask when taking cpu away (Jan Beulich)
---
 xen/common/cpupool.c       |  4 ++--
 xen/common/schedule.c      | 15 +++++++++++++--
 xen/include/xen/sched-if.h |  4 ++--
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 7228ca84b4..13dffaadcf 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -283,7 +283,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c,=
 unsigned int cpu)
         cpupool_cpu_moving =3D NULL;
     }
     cpumask_set_cpu(cpu, c->cpu_valid);
-    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
+    cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask);
=20
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
@@ -376,7 +376,7 @@ static int cpupool_unassign_cpu_start(struct cpupool *c=
, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving =3D c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
-    cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask);
+    cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask);
=20
 out:
     spin_unlock(&cpupool_lock);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1f23bf0e83..efe077b01f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -63,7 +63,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us);
=20
 /* Number of vcpus per struct sched_unit. */
 bool __read_mostly sched_disable_smt_switching;
-const cpumask_t *sched_res_mask =3D &cpumask_all;
+cpumask_t sched_res_mask;
=20
 /* Common lock for free cpus. */
 static DEFINE_SPINLOCK(sched_free_cpu_lock);
@@ -2426,8 +2426,14 @@ static int cpu_schedule_up(unsigned int cpu)
     sr =3D xzalloc(struct sched_resource);
     if ( sr =3D=3D NULL )
         return -ENOMEM;
+    if ( !zalloc_cpumask_var(&sr->cpus) )
+    {
+        xfree(sr);
+        return -ENOMEM;
+    }
+
     sr->master_cpu =3D cpu;
-    sr->cpus =3D cpumask_of(cpu);
+    cpumask_copy(sr->cpus, cpumask_of(cpu));
     set_sched_res(cpu, sr);
=20
     sr->scheduler =3D &sched_idle_ops;
@@ -2439,6 +2445,8 @@ static int cpu_schedule_up(unsigned int cpu)
     /* We start with cpu granularity. */
     sr->granularity =3D 1;
=20
+    cpumask_set_cpu(cpu, &sched_res_mask);
+
     /* Boot CPU is dealt with later in scheduler_init(). */
     if ( cpu =3D=3D 0 )
         return 0;
@@ -2471,6 +2479,7 @@ static void sched_res_free(struct rcu_head *head)
 {
     struct sched_resource *sr =3D container_of(head, struct sched_resource=
, rcu);
=20
+    free_cpumask_var(sr->cpus);
     xfree(sr);
 }
=20
@@ -2484,7 +2493,9 @@ static void cpu_schedule_down(unsigned int cpu)
=20
     kill_timer(&sr->s_timer);
=20
+    cpumask_clear_cpu(cpu, &sched_res_mask);
     set_sched_res(cpu, NULL);
+
     call_rcu(&sr->rcu, sched_res_free);
=20
     rcu_read_unlock(&sched_res_rculock);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 3988985ee6..780735dda3 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -24,7 +24,7 @@ extern cpumask_t cpupool_free_cpus;
 extern int sched_ratelimit_us;
=20
 /* Scheduling resource mask. */
-extern const cpumask_t *sched_res_mask;
+extern cpumask_t sched_res_mask;
=20
 /* Number of vcpus per struct sched_unit. */
 enum sched_gran {
@@ -57,7 +57,7 @@ struct sched_resource {
     /* Cpu with lowest id in scheduling resource. */
     unsigned int        master_cpu;
     unsigned int        granularity;
-    const cpumask_t    *cpus;           /* cpus covered by this struct    =
 */
+    cpumask_var_t       cpus;           /* cpus covered by this struct    =
 */
     struct rcu_head     rcu;
 };
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001351; cv=none;
	d=zoho.com; s=zohoarc;
	b=YPGxjzWok2bsY58R1UeiVOEnt5t6re8OIMtbMGtFnqqLkMffJl4nxq117V68JAKVaH7f61rfXzKKxLOTy3AEW9evH+4OF2MIIsTHgQKhI0/eQW/cHdp6hUoyZnpkw3UlPPwUwG4Z0tgZH3Q9gPQ0ihmfH/zEQib39lfXUeZ6FvY=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001351;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=1C6NPAEVkfXWdM60D8Pcu4QR9d+PB9Arv7r6gBWcLL0=;
	b=CKsd1IJ5vUK72RIGFTkOUaKMwJqCMexx8QfgcAmZHMtO2KEltruP0nseUrQVizSbumuwgCxmkfvYyihCnnUIkb7Jmp8+BN8K4+IildLCBfekOl4rtyka9NUlzpBUTYOLwwUge8wh3dGSr0JkkD2jyeto7x2YQumKZQ2/ra7GXHI=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001351599113.05492129523896;
 Wed, 2 Oct 2019 00:29:11 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3j-0001ka-SX; Wed, 02 Oct 2019 07:28:27 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3i-0001jA-No
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:26 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 24a0f1f7-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:53 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 8E3A4B077;
 Wed,  2 Oct 2019 07:27:52 +0000 (UTC)
X-Inumbo-ID: 24a0f1f7-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:41 +0200
Message-Id: <20191002072745.24919-17-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 16/20] xen/sched: support differing
 granularity in schedule_cpu_[add/rm]()
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

With core scheduling active schedule_cpu_[add/rm]() has to cope with
different scheduling granularity: a cpu not in any cpupool is subject
to granularity 1 (cpu scheduling), while a cpu in a cpupool might be
in a scheduling resource with more than one cpu.

Handle that by having arrays of old/new pdata and vdata and loop over
those where appropriate.

Additionally the scheduling resource(s) must either be merged or
split.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
 xen/common/cpupool.c  |  18 ++--
 xen/common/schedule.c | 226 +++++++++++++++++++++++++++++++++++++++++++---=
----
 2 files changed, 204 insertions(+), 40 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 13dffaadcf..04c3b3c04b 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -536,6 +536,7 @@ static void cpupool_cpu_remove(unsigned int cpu)
         ret =3D cpupool_unassign_cpu_finish(cpupool0);
         BUG_ON(ret);
     }
+    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
 }
=20
 /*
@@ -585,20 +586,19 @@ static void cpupool_cpu_remove_forced(unsigned int cp=
u)
     struct cpupool **c;
     int ret;
=20
-    if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
-        cpumask_clear_cpu(cpu, &cpupool_free_cpus);
-    else
+    for_each_cpupool ( c )
     {
-        for_each_cpupool(c)
+        if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
         {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
-            {
-                ret =3D cpupool_unassign_cpu(*c, cpu);
-                BUG_ON(ret);
-            }
+            ret =3D cpupool_unassign_cpu_start(*c, cpu);
+            BUG_ON(ret);
+            ret =3D cpupool_unassign_cpu_finish(*c);
+            BUG_ON(ret);
         }
     }
=20
+    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+
     rcu_read_lock(&sched_res_rculock);
     sched_rm_cpu(cpu);
     rcu_read_unlock(&sched_res_rculock);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index efe077b01f..e411b6d03e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -425,27 +425,30 @@ static void sched_unit_add_vcpu(struct sched_unit *un=
it, struct vcpu *v)
     unit->runstate_cnt[v->runstate.state]++;
 }
=20
-static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+static struct sched_unit *sched_alloc_unit_mem(void)
 {
-    struct sched_unit *unit, **prev_unit;
-    struct domain *d =3D v->domain;
-    unsigned int gran =3D cpupool_get_granularity(d->cpupool);
+    struct sched_unit *unit;
=20
-    for_each_sched_unit ( d, unit )
-        if ( unit->unit_id / gran =3D=3D v->vcpu_id / gran )
-            break;
+    unit =3D xzalloc(struct sched_unit);
+    if ( !unit )
+        return NULL;
=20
-    if ( unit )
+    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
+         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
+         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
     {
-        sched_unit_add_vcpu(unit, v);
-        return unit;
+        sched_free_unit_mem(unit);
+        unit =3D NULL;
     }
=20
-    if ( (unit =3D xzalloc(struct sched_unit)) =3D=3D NULL )
-        return NULL;
+    return unit;
+}
+
+static void sched_domain_insert_unit(struct sched_unit *unit, struct domai=
n *d)
+{
+    struct sched_unit **prev_unit;
=20
     unit->domain =3D d;
-    sched_unit_add_vcpu(unit, v);
=20
     for ( prev_unit =3D &d->sched_unit_list; *prev_unit;
           prev_unit =3D &(*prev_unit)->next_in_list )
@@ -455,17 +458,31 @@ static struct sched_unit *sched_alloc_unit(struct vcp=
u *v)
=20
     unit->next_in_list =3D *prev_unit;
     *prev_unit =3D unit;
+}
=20
-    if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&unit->cpu_soft_affinity) )
-        goto fail;
+static struct sched_unit *sched_alloc_unit(struct vcpu *v)
+{
+    struct sched_unit *unit;
+    struct domain *d =3D v->domain;
+    unsigned int gran =3D cpupool_get_granularity(d->cpupool);
=20
-    return unit;
+    for_each_sched_unit ( d, unit )
+        if ( unit->unit_id / gran =3D=3D v->vcpu_id / gran )
+            break;
=20
- fail:
-    sched_free_unit(unit, v);
-    return NULL;
+    if ( unit )
+    {
+        sched_unit_add_vcpu(unit, v);
+        return unit;
+    }
+
+    if ( (unit =3D sched_alloc_unit_mem()) =3D=3D NULL )
+        return NULL;
+
+    sched_unit_add_vcpu(unit, v);
+    sched_domain_insert_unit(unit, d);
+
+    return unit;
 }
=20
 static unsigned int sched_select_initial_cpu(const struct vcpu *v)
@@ -2419,18 +2436,28 @@ static void poll_timer_fn(void *data)
         vcpu_unblock(v);
 }
=20
-static int cpu_schedule_up(unsigned int cpu)
+static struct sched_resource *sched_alloc_res(void)
 {
     struct sched_resource *sr;
=20
     sr =3D xzalloc(struct sched_resource);
     if ( sr =3D=3D NULL )
-        return -ENOMEM;
+        return NULL;
     if ( !zalloc_cpumask_var(&sr->cpus) )
     {
         xfree(sr);
-        return -ENOMEM;
+        return NULL;
     }
+    return sr;
+}
+
+static int cpu_schedule_up(unsigned int cpu)
+{
+    struct sched_resource *sr;
+
+    sr =3D sched_alloc_res();
+    if ( sr =3D=3D NULL )
+        return -ENOMEM;
=20
     sr->master_cpu =3D cpu;
     cpumask_copy(sr->cpus, cpumask_of(cpu));
@@ -2480,6 +2507,8 @@ static void sched_res_free(struct rcu_head *head)
     struct sched_resource *sr =3D container_of(head, struct sched_resource=
, rcu);
=20
     free_cpumask_var(sr->cpus);
+    if ( sr->sched_unit_idle )
+        sched_free_unit_mem(sr->sched_unit_idle);
     xfree(sr);
 }
=20
@@ -2496,6 +2525,8 @@ static void cpu_schedule_down(unsigned int cpu)
     cpumask_clear_cpu(cpu, &sched_res_mask);
     set_sched_res(cpu, NULL);
=20
+    /* Keep idle unit. */
+    sr->sched_unit_idle =3D NULL;
     call_rcu(&sr->rcu, sched_res_free);
=20
     rcu_read_unlock(&sched_res_rculock);
@@ -2575,6 +2606,30 @@ static struct notifier_block cpu_schedule_nfb =3D {
     .notifier_call =3D cpu_schedule_callback
 };
=20
+static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt,
+                                              unsigned int cpu)
+{
+    const cpumask_t *mask;
+
+    switch ( opt )
+    {
+    case SCHED_GRAN_cpu:
+        mask =3D cpumask_of(cpu);
+        break;
+    case SCHED_GRAN_core:
+        mask =3D per_cpu(cpu_sibling_mask, cpu);
+        break;
+    case SCHED_GRAN_socket:
+        mask =3D per_cpu(cpu_core_mask, cpu);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        return NULL;
+    }
+
+    return mask;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
@@ -2730,6 +2785,46 @@ int schedule_cpu_add(unsigned int cpu, struct cpupoo=
l *c)
      */
     old_lock =3D pcpu_schedule_lock_irqsave(cpu, &flags);
=20
+    if ( cpupool_get_granularity(c) > 1 )
+    {
+        const cpumask_t *mask;
+        unsigned int cpu_iter, idx =3D 0;
+        struct sched_unit *old_unit, *master_unit;
+        struct sched_resource *sr_old;
+
+        /*
+         * We need to merge multiple idle_vcpu units and sched_resource st=
ructs
+         * into one. As the free cpus all share the same lock we are fine =
doing
+         * that now. The worst which could happen would be someone waiting=
 for
+         * the lock, thus dereferencing sched_res->schedule_lock. This is =
the
+         * reason we are freeing struct sched_res via call_rcu() to avoid =
the
+         * lock pointer suddenly disappearing.
+         */
+        mask =3D sched_get_opt_cpumask(c->gran, cpu);
+        master_unit =3D idle_vcpu[cpu]->sched_unit;
+
+        for_each_cpu ( cpu_iter, mask )
+        {
+            if ( idx )
+                cpumask_clear_cpu(cpu_iter, &sched_res_mask);
+
+            per_cpu(sched_res_idx, cpu_iter) =3D idx++;
+
+            if ( cpu =3D=3D cpu_iter )
+                continue;
+
+            old_unit =3D idle_vcpu[cpu_iter]->sched_unit;
+            sr_old =3D get_sched_res(cpu_iter);
+            kill_timer(&sr_old->s_timer);
+            idle_vcpu[cpu_iter]->sched_unit =3D master_unit;
+            master_unit->runstate_cnt[RUNSTATE_running]++;
+            set_sched_res(cpu_iter, sr);
+            cpumask_set_cpu(cpu_iter, sr->cpus);
+
+            call_rcu(&sr_old->rcu, sched_res_free);
+        }
+    }
+
     new_lock =3D sched_switch_sched(new_ops, cpu, ppriv, vpriv);
=20
     sr->scheduler =3D new_ops;
@@ -2767,33 +2862,100 @@ out:
  */
 int schedule_cpu_rm(unsigned int cpu)
 {
-    struct vcpu *idle;
     void *ppriv_old, *vpriv_old;
-    struct sched_resource *sr;
+    struct sched_resource *sr, **sr_new =3D NULL;
+    struct sched_unit *unit;
     struct scheduler *old_ops;
     spinlock_t *old_lock;
     unsigned long flags;
+    int idx, ret =3D -ENOMEM;
+    unsigned int cpu_iter;
=20
     rcu_read_lock(&sched_res_rculock);
=20
     sr =3D get_sched_res(cpu);
     old_ops =3D sr->scheduler;
=20
+    if ( sr->granularity > 1 )
+    {
+        sr_new =3D xmalloc_array(struct sched_resource *, sr->granularity =
- 1);
+        if ( !sr_new )
+            goto out;
+        for ( idx =3D 0; idx < sr->granularity - 1; idx++ )
+        {
+            sr_new[idx] =3D sched_alloc_res();
+            if ( sr_new[idx] )
+            {
+                sr_new[idx]->sched_unit_idle =3D sched_alloc_unit_mem();
+                if ( !sr_new[idx]->sched_unit_idle )
+                {
+                    sched_res_free(&sr_new[idx]->rcu);
+                    sr_new[idx] =3D NULL;
+                }
+            }
+            if ( !sr_new[idx] )
+            {
+                for ( idx--; idx >=3D 0; idx-- )
+                    sched_res_free(&sr_new[idx]->rcu);
+                goto out;
+            }
+            sr_new[idx]->curr =3D sr_new[idx]->sched_unit_idle;
+            sr_new[idx]->scheduler =3D &sched_idle_ops;
+            sr_new[idx]->granularity =3D 1;
+
+            /* We want the lock not to change when replacing the resource.=
 */
+            sr_new[idx]->schedule_lock =3D sr->schedule_lock;
+        }
+    }
+
+    ret =3D 0;
     ASSERT(sr->cpupool !=3D NULL);
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid));
=20
-    idle =3D idle_vcpu[cpu];
-
     sched_do_tick_suspend(old_ops, cpu);
=20
     /* See comment in schedule_cpu_add() regarding lock switching. */
     old_lock =3D pcpu_schedule_lock_irqsave(cpu, &flags);
=20
-    vpriv_old =3D idle->sched_unit->priv;
+    vpriv_old =3D idle_vcpu[cpu]->sched_unit->priv;
     ppriv_old =3D sr->sched_priv;
=20
-    idle->sched_unit->priv =3D NULL;
+    idx =3D 0;
+    for_each_cpu ( cpu_iter, sr->cpus )
+    {
+        per_cpu(sched_res_idx, cpu_iter) =3D 0;
+        if ( cpu_iter =3D=3D cpu )
+        {
+            idle_vcpu[cpu_iter]->sched_unit->priv =3D NULL;
+        }
+        else
+        {
+            /* Initialize unit. */
+            unit =3D sr_new[idx]->sched_unit_idle;
+            unit->res =3D sr_new[idx];
+            unit->is_running =3D true;
+            sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]);
+            sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain);
+
+            /* Adjust cpu masks of resources (old and new). */
+            cpumask_clear_cpu(cpu_iter, sr->cpus);
+            cpumask_set_cpu(cpu_iter, sr_new[idx]->cpus);
+
+            /* Init timer. */
+            init_timer(&sr_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
+
+            /* Last resource initializations and insert resource pointer. =
*/
+            sr_new[idx]->master_cpu =3D cpu_iter;
+            set_sched_res(cpu_iter, sr_new[idx]);
+
+            /* Last action: set the new lock pointer. */
+            smp_mb();
+            sr_new[idx]->schedule_lock =3D &sched_free_cpu_lock;
+
+            idx++;
+        }
+    }
     sr->scheduler =3D &sched_idle_ops;
     sr->sched_priv =3D NULL;
=20
@@ -2811,9 +2973,11 @@ int schedule_cpu_rm(unsigned int cpu)
     sr->granularity =3D 1;
     sr->cpupool =3D NULL;
=20
+out:
     rcu_read_unlock(&sched_res_rculock);
+    xfree(sr_new);
=20
-    return 0;
+    return ret;
 }
=20
 struct scheduler *scheduler_get_default(void)
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001360; cv=none;
	d=zoho.com; s=zohoarc;
	b=ZYdM30zAblOclX/L6pX/zTJHw/wNv84TpArUF1D8E3Avdu2S7N90JK2T6Ych7IpoC5sJYufDghe5BFo7EyWWJ9BwiRuh0/oJBZQRo8VzYaeo2h8Iw7HWr4vv0lKT7yOeLlB7+baefSszlHuIhwePkv0sXOPMc6s6P4zcONXsB04=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001360;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=jiiKM3G8wknH3Yc76Gq8BhKRTpmLgtnhK7KLwxOXZbg=;
	b=aL7VaioAokq5QTvsEsDdqx6jxcG5P3LD4heDFjN68CZdTUnPe/K1bU7CBi15dt5FJkzKTthZ45SL+XgSzCg2PsyEpSuSPLIGD2IPp948KE847dZ+Ygpba9mReH8Lh7kfs/jkGj4/WGBlQ3iET820Z8kqZ+pBWLIQ1dhKL7zVVKg=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001360500734.1004317145864;
 Wed, 2 Oct 2019 00:29:20 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3t-0001xU-KF; Wed, 02 Oct 2019 07:28:37 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3r-0001vI-MK
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:35 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 25634a4e-e4e6-11e9-bf31-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:53 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id E9424AF81;
 Wed,  2 Oct 2019 07:27:52 +0000 (UTC)
X-Inumbo-ID: 25634a4e-e4e6-11e9-bf31-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:42 +0200
Message-Id: <20191002072745.24919-18-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 17/20] xen/sched: support core scheduling for
 moving cpus to/from cpupools
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

With core scheduling active it is necessary to move multiple cpus at
the same time to or from a cpupool in order to avoid split scheduling
resources in between.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       | 100 +++++++++++++++++++++++++++++++++--------=
----
 xen/common/schedule.c      |   3 +-
 xen/include/xen/sched-if.h |   1 +
 3 files changed, 76 insertions(+), 28 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 04c3b3c04b..f7a13c7a4c 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -268,23 +268,30 @@ static int cpupool_assign_cpu_locked(struct cpupool *=
c, unsigned int cpu)
 {
     int ret;
     struct domain *d;
+    const cpumask_t *cpus;
+
+    cpus =3D sched_get_opt_cpumask(c->gran, cpu);
=20
     if ( (cpupool_moving_cpu =3D=3D cpu) && (c !=3D cpupool_cpu_moving) )
         return -EADDRNOTAVAIL;
-    ret =3D schedule_cpu_add(cpu, c);
+    ret =3D schedule_cpu_add(cpumask_first(cpus), c);
     if ( ret )
         return ret;
=20
-    cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+    rcu_read_lock(&sched_res_rculock);
+
+    cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
     if (cpupool_moving_cpu =3D=3D cpu)
     {
         cpupool_moving_cpu =3D -1;
         cpupool_put(cpupool_cpu_moving);
         cpupool_cpu_moving =3D NULL;
     }
-    cpumask_set_cpu(cpu, c->cpu_valid);
+    cpumask_or(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask);
=20
+    rcu_read_unlock(&sched_res_rculock);
+
     rcu_read_lock(&domlist_read_lock);
     for_each_domain_in_cpupool(d, c)
     {
@@ -298,6 +305,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c,=
 unsigned int cpu)
 static int cpupool_unassign_cpu_finish(struct cpupool *c)
 {
     int cpu =3D cpupool_moving_cpu;
+    const cpumask_t *cpus;
     struct domain *d;
     int ret;
=20
@@ -310,7 +318,10 @@ static int cpupool_unassign_cpu_finish(struct cpupool =
*c)
      */
     rcu_read_lock(&domlist_read_lock);
     ret =3D cpu_disable_scheduler(cpu);
-    cpumask_set_cpu(cpu, &cpupool_free_cpus);
+
+    rcu_read_lock(&sched_res_rculock);
+    cpus =3D get_sched_res(cpu)->cpus;
+    cpumask_or(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
=20
     /*
      * cpu_disable_scheduler() returning an error doesn't require resetting
@@ -323,7 +334,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *=
c)
     {
         ret =3D schedule_cpu_rm(cpu);
         if ( ret )
-            cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+            cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
         else
         {
             cpupool_moving_cpu =3D -1;
@@ -331,6 +342,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *=
c)
             cpupool_cpu_moving =3D NULL;
         }
     }
+    rcu_read_unlock(&sched_res_rculock);
=20
     for_each_domain_in_cpupool(d, c)
     {
@@ -345,6 +357,7 @@ static int cpupool_unassign_cpu_start(struct cpupool *c=
, unsigned int cpu)
 {
     int ret;
     struct domain *d;
+    const cpumask_t *cpus;
=20
     spin_lock(&cpupool_lock);
     ret =3D -EADDRNOTAVAIL;
@@ -353,7 +366,11 @@ static int cpupool_unassign_cpu_start(struct cpupool *=
c, unsigned int cpu)
         goto out;
=20
     ret =3D 0;
-    if ( (c->n_dom > 0) && (cpumask_weight(c->cpu_valid) =3D=3D 1) &&
+    rcu_read_lock(&sched_res_rculock);
+    cpus =3D get_sched_res(cpu)->cpus;
+
+    if ( (c->n_dom > 0) &&
+         (cpumask_weight(c->cpu_valid) =3D=3D cpumask_weight(cpus)) &&
          (cpu !=3D cpupool_moving_cpu) )
     {
         rcu_read_lock(&domlist_read_lock);
@@ -375,9 +392,10 @@ static int cpupool_unassign_cpu_start(struct cpupool *=
c, unsigned int cpu)
     cpupool_moving_cpu =3D cpu;
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving =3D c;
-    cpumask_clear_cpu(cpu, c->cpu_valid);
+    cpumask_andnot(c->cpu_valid, c->cpu_valid, cpus);
     cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask);
=20
+    rcu_read_unlock(&domlist_read_lock);
 out:
     spin_unlock(&cpupool_lock);
=20
@@ -417,11 +435,13 @@ static int cpupool_unassign_cpu(struct cpupool *c, un=
signed int cpu)
 {
     int work_cpu;
     int ret;
+    unsigned int master_cpu;
=20
     debugtrace_printk("cpupool_unassign_cpu(pool=3D%d,cpu=3D%d)\n",
                       c->cpupool_id, cpu);
=20
-    ret =3D cpupool_unassign_cpu_start(c, cpu);
+    master_cpu =3D sched_get_resource_cpu(cpu);
+    ret =3D cpupool_unassign_cpu_start(c, master_cpu);
     if ( ret )
     {
         debugtrace_printk("cpupool_unassign_cpu(pool=3D%d,cpu=3D%d) ret %d=
\n",
@@ -429,12 +449,12 @@ static int cpupool_unassign_cpu(struct cpupool *c, un=
signed int cpu)
         return ret;
     }
=20
-    work_cpu =3D smp_processor_id();
-    if ( work_cpu =3D=3D cpu )
+    work_cpu =3D sched_get_resource_cpu(smp_processor_id());
+    if ( work_cpu =3D=3D master_cpu )
     {
         work_cpu =3D cpumask_first(cpupool0->cpu_valid);
-        if ( work_cpu =3D=3D cpu )
-            work_cpu =3D cpumask_next(cpu, cpupool0->cpu_valid);
+        if ( work_cpu =3D=3D master_cpu )
+            work_cpu =3D cpumask_last(cpupool0->cpu_valid);
     }
     return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper=
, c);
 }
@@ -500,6 +520,7 @@ void cpupool_rm_domain(struct domain *d)
 static int cpupool_cpu_add(unsigned int cpu)
 {
     int ret =3D 0;
+    const cpumask_t *cpus;
=20
     spin_lock(&cpupool_lock);
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
@@ -513,7 +534,11 @@ static int cpupool_cpu_add(unsigned int cpu)
      */
     rcu_read_lock(&sched_res_rculock);
     get_sched_res(cpu)->cpupool =3D NULL;
-    ret =3D cpupool_assign_cpu_locked(cpupool0, cpu);
+
+    cpus =3D sched_get_opt_cpumask(cpupool0->gran, cpu);
+    if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+        ret =3D cpupool_assign_cpu_locked(cpupool0, cpu);
+
     rcu_read_unlock(&sched_res_rculock);
=20
     spin_unlock(&cpupool_lock);
@@ -548,27 +573,33 @@ static void cpupool_cpu_remove(unsigned int cpu)
 static int cpupool_cpu_remove_prologue(unsigned int cpu)
 {
     int ret =3D 0;
+    cpumask_t *cpus;
+    unsigned int master_cpu;
=20
     spin_lock(&cpupool_lock);
=20
-    if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
+    rcu_read_lock(&sched_res_rculock);
+    cpus =3D get_sched_res(cpu)->cpus;
+    master_cpu =3D sched_get_resource_cpu(cpu);
+    if ( cpumask_intersects(cpus, &cpupool_locked_cpus) )
         ret =3D -EBUSY;
     else
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
+    rcu_read_unlock(&sched_res_rculock);
=20
     spin_unlock(&cpupool_lock);
=20
     if ( ret )
         return  ret;
=20
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( cpumask_test_cpu(master_cpu, cpupool0->cpu_valid) )
     {
         /* Cpupool0 is populated only after all cpus are up. */
         ASSERT(system_state =3D=3D SYS_STATE_active);
=20
-        ret =3D cpupool_unassign_cpu_start(cpupool0, cpu);
+        ret =3D cpupool_unassign_cpu_start(cpupool0, master_cpu);
     }
-    else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+    else if ( !cpumask_test_cpu(master_cpu, &cpupool_free_cpus) )
         ret =3D -ENODEV;
=20
     return ret;
@@ -585,12 +616,13 @@ static void cpupool_cpu_remove_forced(unsigned int cp=
u)
 {
     struct cpupool **c;
     int ret;
+    unsigned int master_cpu =3D sched_get_resource_cpu(cpu);
=20
     for_each_cpupool ( c )
     {
-        if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
+        if ( cpumask_test_cpu(master_cpu, (*c)->cpu_valid) )
         {
-            ret =3D cpupool_unassign_cpu_start(*c, cpu);
+            ret =3D cpupool_unassign_cpu_start(*c, master_cpu);
             BUG_ON(ret);
             ret =3D cpupool_unassign_cpu_finish(*c);
             BUG_ON(ret);
@@ -658,29 +690,45 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *o=
p)
     case XEN_SYSCTL_CPUPOOL_OP_ADDCPU:
     {
         unsigned cpu;
+        const cpumask_t *cpus;
=20
         cpu =3D op->cpu;
         debugtrace_printk("cpupool_assign_cpu(pool=3D%d,cpu=3D%d)\n",
                           op->cpupool_id, cpu);
+
         spin_lock(&cpupool_lock);
+
+        c =3D cpupool_find_by_id(op->cpupool_id);
+        ret =3D -ENOENT;
+        if ( c =3D=3D NULL )
+            goto addcpu_out;
         if ( cpu =3D=3D XEN_SYSCTL_CPUPOOL_PAR_ANY )
-            cpu =3D cpumask_first(&cpupool_free_cpus);
+        {
+            for_each_cpu ( cpu, &cpupool_free_cpus )
+            {
+                cpus =3D sched_get_opt_cpumask(c->gran, cpu);
+                if ( cpumask_subset(cpus, &cpupool_free_cpus) )
+                    break;
+            }
+            ret =3D -ENODEV;
+            if ( cpu >=3D nr_cpu_ids )
+                goto addcpu_out;
+        }
         ret =3D -EINVAL;
         if ( cpu >=3D nr_cpu_ids )
             goto addcpu_out;
         ret =3D -ENODEV;
-        if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-             cpumask_test_cpu(cpu, &cpupool_locked_cpus) )
-            goto addcpu_out;
-        c =3D cpupool_find_by_id(op->cpupool_id);
-        ret =3D -ENOENT;
-        if ( c =3D=3D NULL )
+        cpus =3D sched_get_opt_cpumask(c->gran, cpu);
+        if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
+             cpumask_intersects(cpus, &cpupool_locked_cpus) )
             goto addcpu_out;
         ret =3D cpupool_assign_cpu_locked(c, cpu);
+
     addcpu_out:
         spin_unlock(&cpupool_lock);
         debugtrace_printk("cpupool_assign_cpu(pool=3D%d,cpu=3D%d) ret %d\n=
",
                           op->cpupool_id, cpu, ret);
+
     }
     break;
=20
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e411b6d03e..48ddbdfd7e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2606,8 +2606,7 @@ static struct notifier_block cpu_schedule_nfb =3D {
     .notifier_call =3D cpu_schedule_callback
 };
=20
-static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt,
-                                              unsigned int cpu)
+const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int c=
pu)
 {
     const cpumask_t *mask;
=20
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 780735dda3..cd731d7172 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -638,5 +638,6 @@ affinity_balance_cpumask(const struct sched_unit *unit,=
 int step,
 }
=20
 void sched_rm_cpu(unsigned int cpu);
+const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int c=
pu);
=20
 #endif /* __XEN_SCHED_IF_H__ */
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001353; cv=none;
	d=zoho.com; s=zohoarc;
	b=NreDe5vTnudoJj7ctgllsXX2UxQQVF/WR8GrE6NJdtTggmWTGF1BQjoFo63nnEUpJQUF/Z9ygyGLlQlrNor64PUEs8NjN/D1abahZN34zeKrLyRLsaw/rzf/qeM9CZkjaP0+cqRy2qAQZE1CB3M+etpljm2usEzVQu56+i9XKYY=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001353;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=GxmwnH2XuZRJ3QezYUtWhaWD/u37JpObsZu3WosnlE4=;
	b=Vnfgd55AWe6CBUXT4gWY5I4H0QOxl8gN0fNdVSZPWQWheQRFRv73p6I0eb/ioSSUp14kDchPabQMJQsLGPF6/8ZKhsMdUYQhw2qJNDnmEgyl5WbGKs+BYno7mco/z9Guav/gO2fo7ctYyFWbAJSMd8z72GffTOW9C71/WFJX2QI=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001353653825.2109169951851;
 Wed, 2 Oct 2019 00:29:13 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3o-0001qB-AJ; Wed, 02 Oct 2019 07:28:32 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3m-0001oD-N6
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:30 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 258ec854-e4e6-11e9-bf31-bc764e2007e4;
 Wed, 02 Oct 2019 07:27:54 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 3F240B090;
 Wed,  2 Oct 2019 07:27:53 +0000 (UTC)
X-Inumbo-ID: 258ec854-e4e6-11e9-bf31-bc764e2007e4
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:43 +0200
Message-Id: <20191002072745.24919-19-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 18/20] xen/sched: disable scheduling when
 entering ACPI deep sleep states
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>,
 =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

When entering deep sleep states all domains are paused resulting in
all cpus only running idle vcpus. This enables us to stop scheduling
completely in order to avoid synchronization problems with core
scheduling when individual cpus are offlined.

Disabling the scheduler is done by replacing the softirq handler
with a dummy scheduling routine only enabling tasklets to run.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V2: new patch
---
 xen/arch/x86/acpi/power.c |  4 ++++
 xen/common/schedule.c     | 31 +++++++++++++++++++++++++++++--
 xen/include/xen/sched.h   |  2 ++
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 01e6aec4e8..8078352312 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -145,12 +145,16 @@ static void freeze_domains(void)
     for_each_domain ( d )
         domain_pause(d);
     rcu_read_unlock(&domlist_read_lock);
+
+    scheduler_disable();
 }
=20
 static void thaw_domains(void)
 {
     struct domain *d;
=20
+    scheduler_enable();
+
     rcu_read_lock(&domlist_read_lock);
     for_each_domain ( d )
     {
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 48ddbdfd7e..dbffec8cf2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -91,6 +91,8 @@ extern const struct scheduler *__start_schedulers_array[]=
, *__end_schedulers_arr
=20
 static struct scheduler __read_mostly ops;
=20
+static bool scheduler_active;
+
 static void sched_set_affinity(
     struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft);
=20
@@ -2275,6 +2277,13 @@ static struct sched_unit *sched_wait_rendezvous_in(s=
truct sched_unit *prev,
         cpu_relax();
=20
         *lock =3D pcpu_schedule_lock_irq(cpu);
+
+        if ( unlikely(!scheduler_active) )
+        {
+            ASSERT(is_idle_unit(prev));
+            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
+            prev->rendezvous_in_cnt =3D 0;
+        }
     }
=20
     return prev->next_task;
@@ -2629,14 +2638,32 @@ const cpumask_t *sched_get_opt_cpumask(enum sched_g=
ran opt, unsigned int cpu)
     return mask;
 }
=20
+static void schedule_dummy(void)
+{
+    sched_tasklet_check_cpu(smp_processor_id());
+}
+
+void scheduler_disable(void)
+{
+    scheduler_active =3D false;
+    open_softirq(SCHEDULE_SOFTIRQ, schedule_dummy);
+    open_softirq(SCHED_SLAVE_SOFTIRQ, schedule_dummy);
+}
+
+void scheduler_enable(void)
+{
+    open_softirq(SCHEDULE_SOFTIRQ, schedule);
+    open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave);
+    scheduler_active =3D true;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
     struct domain *idle_domain;
     int i;
=20
-    open_softirq(SCHEDULE_SOFTIRQ, schedule);
-    open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave);
+    scheduler_enable();
=20
     for ( i =3D 0; i < NUM_SCHEDULERS; i++)
     {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index a40bd5fb56..629a4c52e0 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -933,6 +933,8 @@ void restore_vcpu_affinity(struct domain *d);
 void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate=
);
 uint64_t get_cpu_idle_time(unsigned int cpu);
 void sched_guest_idle(void (*idle) (void), unsigned int cpu);
+void scheduler_enable(void);
+void scheduler_disable(void);
=20
 /*
  * Used by idle loop to decide whether there is work to do:
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001353; cv=none;
	d=zoho.com; s=zohoarc;
	b=GRc61C8he8wVqmUF143qMKSCxW6Qk9o6DR6Ne4ybXhhBu5mTXbPZ94O+GMSfT763ZuwM47f2Ly6eKclmiH4QSC/0B1lxAUVsiiyBWlUGk1cevoXIAMQSbXQMTR76LfxWS4MM9SKbFf18rcdARCvyy9ay299iHAhUZEJZ675/Oeo=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001353;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=zeOb1FuzlZF0Ddwgp5L6x+gNVoXuxaarIJKcIlfL5mg=;
	b=GHByfCMxx9yAwdBtGOQn6HNGXFICZMDZXUnn0z+GMyDQpyX5OoGBWVoPA9gWMFny0Od/3j7c1SuSjS9HF4I8QddQdo0kibVHbAFTRlivaLbvQOH3MK6rEABrjCCLZtRYc2JP2LNSxuosW2NWInWxt/1G0wvCOQDjWJGneg/1LB0=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001353549603.4477012479505;
 Wed, 2 Oct 2019 00:29:13 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3p-0001rK-41; Wed, 02 Oct 2019 07:28:33 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3n-0001pT-N5
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:31 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 25c0dbd2-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:54 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 8C7D4B0B6;
 Wed,  2 Oct 2019 07:27:53 +0000 (UTC)
X-Inumbo-ID: 25c0dbd2-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:44 +0200
Message-Id: <20191002072745.24919-20-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 19/20] xen/sched: add scheduling granularity
 enum
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>,
 =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Add a scheduling granularity enum ("cpu", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"cpu", this can be modified by the new boot parameter (x86 only)
"sched-gran".

According to the selected granularity sched_granularity is set after
all cpus are online.

A test is added for all sched resources holding the same number of
cpus. Fall back to core- or cpu-scheduling in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
RFC V2:
- fixed freeing of sched_res when merging cpus
- rename parameter to "sched-gran" (Jan Beulich)
- rename parameter option from "thread" to "cpu" (Jan Beulich)

V1:
- rename scheduler_smp_init() to scheduler_gran_init(), let it be called
  by cpupool_init()
- avoid using literal cpu number 0 in scheduler_percpu_init() (Jan Beulich)
- style correction (Jan Beulich)
- fallback to smaller granularity instead of panic in case of
  unbalanced cpu configuration

V2:
- style changes (Jan Beulich)
- introduce CONFIG_HAS_SCHED_GRANULARITY (Jan Beulich)

V4:
- move code to cpupool.c
---
 xen/arch/x86/Kconfig |  1 +
 xen/common/Kconfig   |  3 ++
 xen/common/cpupool.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++=
++++
 3 files changed, 84 insertions(+)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 288dc6c042..3f88adae97 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -22,6 +22,7 @@ config X86
 	select HAS_PASSTHROUGH
 	select HAS_PCI
 	select HAS_PDX
+	select HAS_SCHED_GRANULARITY
 	select HAS_UBSAN
 	select HAS_VPCI if !PV_SHIM_EXCLUSIVE && HVM
 	select NEEDS_LIBELF
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 16829f6274..e9247871a8 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -63,6 +63,9 @@ config HAS_GDBSX
 config HAS_IOPORTS
 	bool
=20
+config HAS_SCHED_GRANULARITY
+	bool
+
 config NEEDS_LIBELF
 	bool
=20
diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index f7a13c7a4c..4d3adbdd8d 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -17,6 +17,7 @@
 #include <xen/percpu.h>
 #include <xen/sched.h>
 #include <xen/sched-if.h>
+#include <xen/warning.h>
 #include <xen/keyhandler.h>
 #include <xen/cpu.h>
=20
@@ -37,6 +38,83 @@ static DEFINE_SPINLOCK(cpupool_lock);
 static enum sched_gran __read_mostly opt_sched_granularity =3D SCHED_GRAN_=
cpu;
 static unsigned int __read_mostly sched_granularity =3D 1;
=20
+#ifdef CONFIG_HAS_SCHED_GRANULARITY
+static int __init sched_select_granularity(const char *str)
+{
+    if ( strcmp("cpu", str) =3D=3D 0 )
+        opt_sched_granularity =3D SCHED_GRAN_cpu;
+    else if ( strcmp("core", str) =3D=3D 0 )
+        opt_sched_granularity =3D SCHED_GRAN_core;
+    else if ( strcmp("socket", str) =3D=3D 0 )
+        opt_sched_granularity =3D SCHED_GRAN_socket;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+custom_param("sched-gran", sched_select_granularity);
+#endif
+
+static unsigned int __init cpupool_check_granularity(void)
+{
+    unsigned int cpu;
+    unsigned int siblings, gran =3D 0;
+
+    if ( opt_sched_granularity =3D=3D SCHED_GRAN_cpu )
+        return 1;
+
+    for_each_online_cpu ( cpu )
+    {
+        siblings =3D cpumask_weight(sched_get_opt_cpumask(opt_sched_granul=
arity,
+                                                        cpu));
+        if ( gran =3D=3D 0 )
+            gran =3D siblings;
+        else if ( gran !=3D siblings )
+            return 0;
+    }
+
+    sched_disable_smt_switching =3D true;
+
+    return gran;
+}
+
+/* Setup data for selected scheduler granularity. */
+static void __init cpupool_gran_init(void)
+{
+    unsigned int gran =3D 0;
+    const char *fallback =3D NULL;
+
+    while ( gran =3D=3D 0 )
+    {
+        gran =3D cpupool_check_granularity();
+
+        if ( gran =3D=3D 0 )
+        {
+            switch ( opt_sched_granularity )
+            {
+            case SCHED_GRAN_core:
+                opt_sched_granularity =3D SCHED_GRAN_cpu;
+                fallback =3D "Asymmetric cpu configuration.\n"
+                           "Falling back to sched-gran=3Dcpu.\n";
+                break;
+            case SCHED_GRAN_socket:
+                opt_sched_granularity =3D SCHED_GRAN_core;
+                fallback =3D "Asymmetric cpu configuration.\n"
+                           "Falling back to sched-gran=3Dcore.\n";
+                break;
+            default:
+                ASSERT_UNREACHABLE();
+                break;
+            }
+        }
+    }
+
+    if ( fallback )
+        warning_add(fallback);
+
+    sched_granularity =3D gran;
+}
+
 unsigned int cpupool_get_granularity(const struct cpupool *c)
 {
     return c ? sched_granularity : 1;
@@ -871,6 +949,8 @@ static int __init cpupool_init(void)
     unsigned int cpu;
     int err;
=20
+    cpupool_gran_init();
+
     cpupool0 =3D cpupool_create(0, 0, &err);
     BUG_ON(cpupool0 =3D=3D NULL);
     cpupool_put(cpupool0);
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
From nobody Mon Feb  9 04:47:34 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1570001367; cv=none;
	d=zoho.com; s=zohoarc;
	b=lDhTiwFi7jqX5h3C/meoQh+eMiATQ53hEEajo0bQuGC5+SzQ/dvlaE98O2l15UX6fez459G+oLI1hy7S1ZEjh5IlG2zEPQq+CXtWAd4zMuwav9xG7G80TSsaAF7OxPgNbrMIwIiQkjs9X61CY+WG9/gzSdszAx8hL0sS2ZYFrvI=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1570001367;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=PTQYjb2HfNW4l/u8qYyyv3mbLJOKJvX6OgEBb8cZw18=;
	b=QFi7lZqQ6YJcasZybCQbjStXEcyqzG6qJsAvJnwjbcUudovFpcq+H7kz9zmWGZF4oKfzRVHnQaU5R7CHeTlGHxe8xK2AhqniDT4/9lR6lwJT1IOpjQEeoq1NiBPMmYC70X+r33IuJsN0h/NpDsKVJmIFk3nBeFAAjgTU8/h+7s8=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1570001367859668.8979340120997;
 Wed, 2 Oct 2019 00:29:27 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iFZ3y-00024p-M3; Wed, 02 Oct 2019 07:28:42 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=lglc=X3=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1iFZ3x-00023T-Na
 for xen-devel@lists.xenproject.org; Wed, 02 Oct 2019 07:28:41 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by localhost (Halon) with ESMTPS
 id 25c0dbd3-e4e6-11e9-9710-12813bfff9fa;
 Wed, 02 Oct 2019 07:27:54 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id D7184B0B7;
 Wed,  2 Oct 2019 07:27:53 +0000 (UTC)
X-Inumbo-ID: 25c0dbd3-e4e6-11e9-9710-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Wed,  2 Oct 2019 09:27:45 +0200
Message-Id: <20191002072745.24919-21-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20191002072745.24919-1-jgross@suse.com>
References: <20191002072745.24919-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v6 20/20] docs: add "sched-gran" boot parameter
 documentation
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Add documentation for the new "sched-gran" hypervisor boot parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V6:
- add a note regarding different AMD/Intel terminology (Jan Beulich)
---
 docs/misc/xen-command-line.pandoc | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line=
.pandoc
index fc64429064..74aceac5db 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1782,6 +1782,34 @@ Set the timeslice of the credit1 scheduler, in milli=
seconds.  The
 default is 30ms.  Reasonable values may include 10, 5, or even 1 for
 very latency-sensitive workloads.
=20
+### sched-gran (x86)
+> `=3D cpu | core | socket`
+
+> Default: `sched-gran=3Dcpu`
+
+Set the scheduling granularity. In case the granularity is larger than 1 (=
e.g.
+`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned
+statically to a "scheduling unit" which will then be subject to scheduling.
+This assignment of vcpus to scheduling units is fixed.
+
+`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a
+hyperthread using x86/Intel terminology)
+
+`core`: As many vcpus as there are cpus on a physical core are scheduled
+together on a physical core.
+
+`socket`: As many vcpus as there are cpus on a physical sockets are schedu=
led
+together on a physical socket.
+
+Note: a value other than `cpu` will result in rejecting a runtime modifica=
tion
+attempt of the "smt" setting.
+
+Note: for AMD x86 processors before Fam17 the terminology in the official =
data
+sheets is different: a cpu is named "core" and multiple "cores" are running
+in the same "compute unit". As from Fam17 on AMD is using the same names as
+Intel ("thread" and "core") the topology levels are named "cpu", "core" and
+"socket" even on older AMD processors.
+
 ### sched_ratelimit_us
 > `=3D <integer>`
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel