From nobody Mon Feb  9 16:38:28 2026
Delivered-To: importer@patchew.org
Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied
 by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1559039739; cv=none;
	d=zoho.com; s=zohoarc;
	b=B8gqvMBNDE2Ks2DMg/gK2iiYtYdlv+3/F9buRs8ER4ZiHSylaA1cnSL2GmW9pSB5+goe7u/aDai0WrK3Ua9yzownnhZpHKV84LFp7LUJCb2+rvZDMPpKoj6ab72/kxCmNdEnsIvlujSpPMIJnzZ7N2493RpLFidKmp8g4whc9P4=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc;
	t=1559039739;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results;
	bh=IdP4x3rNWeEwcVPdUBrTghVesid0YC5Mjpy5qFGUI8k=;
	b=g0381vwEwUhGR1Dsb7i474aR5GcSlgwtB7nNcjQ+O597zNgZtacrY+5ZctKLW6Aq1SY2eUW0btGIf7gnANigLrPlK9/oUxhD5hkrgk5gKpCCiXnlcsFmxXsCEVjD6JwQOvn5kt0vIh/ddHVQEWeyEjMPLK4jBTvWMrQfowVRyPQ=
ARC-Authentication-Results: i=1; mx.zoho.com;
	spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain
 of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1559039739170642.6348795475992;
 Tue, 28 May 2019 03:35:39 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1hVZRT-0007c2-1C; Tue, 28 May 2019 10:34:51 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=31/0=T4=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1hVZQI-00052c-Rt
 for xen-devel@lists.xenproject.org; Tue, 28 May 2019 10:33:38 +0000
Received: from mx1.suse.de (unknown [195.135.220.15])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 096a3088-8134-11e9-8980-bc764e045a96;
 Tue, 28 May 2019 10:33:31 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 74263B02F;
 Tue, 28 May 2019 10:33:30 +0000 (UTC)
X-Inumbo-ID: 096a3088-8134-11e9-8980-bc764e045a96
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Tue, 28 May 2019 12:33:07 +0200
Message-Id: <20190528103313.1343-55-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20190528103313.1343-1-jgross@suse.com>
References: <20190528103313.1343-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle
 scheduler for free cpus
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Julien Grall <julien.grall@arm.com>, Jan Beulich <jbeulich@suse.com>,
 Dario Faggioli <dfaggioli@suse.com>,
 =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Instead of having a full blown scheduler running for the free cpus
add a very minimalistic scheduler for that purpose only ever scheduling
the related idle vcpu. This has the big advantage of not needing any
per-cpu, per-domain or per-scheduling unit data for free cpus and in
turn simplifying moving cpus to and from cpupools a lot.

As this new scheduler is not user selectable don't register it as an
official scheduler, but just include it in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/arch/arm/smpboot.c  |   2 -
 xen/arch/x86/smpboot.c  |   2 -
 xen/common/schedule.c   | 143 +++++++++++++++++++++++---------------------=
----
 xen/include/xen/sched.h |   1 -
 4 files changed, 67 insertions(+), 81 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 9a6582f2a6..f756444362 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset,
=20
     setup_cpu_sibling_map(cpuid);
=20
-    scheduler_percpu_init(cpuid);
-
     /* Run local notifiers */
     notify_cpu_starting(cpuid);
     /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7e95b2cdac..153bfbb4b7 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -382,8 +382,6 @@ void start_secondary(void *unused)
=20
     set_cpu_sibling_map(cpu);
=20
-    scheduler_percpu_init(cpu);
-
     init_percpu_time();
=20
     setup_secondary_APIC_clock();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7a5ab4b1b6..d3e4ae226c 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -83,6 +83,57 @@ extern const struct scheduler *__start_schedulers_array[=
], *__end_schedulers_arr
=20
 static struct scheduler __read_mostly ops;
=20
+static spinlock_t *
+sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu,
+                        void *pdata, void *vdata)
+{
+    sched_idle_unit(cpu)->priv =3D NULL;
+
+    return &sched_free_cpu_lock;
+}
+
+static struct sched_resource *
+sched_idle_res_pick(const struct scheduler *ops, struct sched_unit *unit)
+{
+    return unit->res;
+}
+
+static void *
+sched_idle_alloc_vdata(const struct scheduler *ops, struct sched_unit *uni=
t,
+                       void *dd)
+{
+    /* Any non-NULL pointer is fine here. */
+    return (void *)1UL;
+}
+
+static void
+sched_idle_free_vdata(const struct scheduler *ops, void *priv)
+{
+}
+
+static void sched_idle_schedule(
+    const struct scheduler *ops, struct sched_unit *unit, s_time_t now,
+    bool tasklet_work_scheduled)
+{
+    const unsigned int cpu =3D smp_processor_id();
+
+    unit->next_time =3D -1;
+    unit->next_task =3D sched_idle_unit(sched_get_resource_cpu(cpu));
+}
+
+static struct scheduler sched_idle_ops =3D {
+    .name           =3D "Idle Scheduler",
+    .opt_name       =3D "idle",
+    .sched_data     =3D NULL,
+
+    .pick_resource  =3D sched_idle_res_pick,
+    .do_schedule    =3D sched_idle_schedule,
+
+    .alloc_vdata    =3D sched_idle_alloc_vdata,
+    .free_vdata     =3D sched_idle_free_vdata,
+    .switch_sched   =3D sched_idle_switch_sched,
+};
+
 static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit,
                                          unsigned int cpu)
 {
@@ -2141,7 +2192,6 @@ static void poll_timer_fn(void *data)
 static int cpu_schedule_up(unsigned int cpu)
 {
     struct sched_resource *sd;
-    void *sched_priv;
=20
     sd =3D xzalloc(struct sched_resource);
     if ( sd =3D=3D NULL )
@@ -2150,7 +2200,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->cpus =3D cpumask_of(cpu);
     set_sched_res(cpu, sd);
=20
-    sd->scheduler =3D &ops;
+    sd->scheduler =3D &sched_idle_ops;
     spin_lock_init(&sd->_lock);
     sd->schedule_lock =3D &sched_free_cpu_lock;
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
@@ -2171,20 +2221,10 @@ static int cpu_schedule_up(unsigned int cpu)
         struct sched_unit *unit =3D idle->sched_unit;
=20
         /*
-         * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
-         * while its scheduler specific data (what is pointed by sched_pri=
v)
-         * is. Also, at this stage of the resume path, we attach the pCPU
-         * to the default scheduler, no matter in what cpupool it was befo=
re
-         * suspend. To avoid inconsistency, let's allocate default schedul=
er
-         * data for the idle vCPU here. If the pCPU was in a different pool
-         * with a different scheduler, it is schedule_cpu_switch(), invoked
-         * later, that will set things up as appropriate.
+         * No need to allocate any scheduler data, as cpus coming online a=
re
+         * free initially and the idle scheduler doesn't need any data are=
as
+         * allocated.
          */
-        ASSERT(unit->priv =3D=3D NULL);
-
-        unit->priv =3D sched_alloc_vdata(&ops, unit, idle->domain->sched_p=
riv);
-        if ( unit->priv =3D=3D NULL )
-            return -ENOMEM;
=20
         /* Update the resource pointer in the idle unit. */
         unit->res =3D sd;
@@ -2195,16 +2235,7 @@ static int cpu_schedule_up(unsigned int cpu)
     sd->curr =3D idle_vcpu[cpu]->sched_unit;
     sd->sched_unit_idle =3D idle_vcpu[cpu]->sched_unit;
=20
-    /*
-     * We don't want to risk calling xfree() on an sd->sched_priv
-     * (e.g., inside free_pdata, from cpu_schedule_down() called
-     * during CPU_UP_CANCELLED) that contains an IS_ERR value.
-     */
-    sched_priv =3D sched_alloc_pdata(&ops, cpu);
-    if ( IS_ERR(sched_priv) )
-        return PTR_ERR(sched_priv);
-
-    sd->sched_priv =3D sched_priv;
+    sd->sched_priv =3D NULL;
=20
     return 0;
 }
@@ -2212,13 +2243,6 @@ static int cpu_schedule_up(unsigned int cpu)
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct sched_resource *sd =3D get_sched_res(cpu);
-    struct scheduler *sched =3D sd->scheduler;
-
-    sched_free_pdata(sched, sd->sched_priv, cpu);
-    sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv);
-
-    idle_vcpu[cpu]->sched_unit->priv =3D NULL;
-    sd->sched_priv =3D NULL;
=20
     kill_timer(&sd->s_timer);
=20
@@ -2226,26 +2250,14 @@ static void cpu_schedule_down(unsigned int cpu)
     xfree(sd);
 }
=20
-void scheduler_percpu_init(unsigned int cpu)
-{
-    struct sched_resource *sd =3D get_sched_res(cpu);
-    struct scheduler *sched =3D sd->scheduler;
-
-    if ( system_state !=3D SYS_STATE_resume )
-        sched_init_pdata(sched, sd->sched_priv, cpu);
-}
-
 void sched_rm_cpu(unsigned int cpu)
 {
     int rc;
-    struct sched_resource *sd =3D get_sched_res(cpu);
-    struct scheduler *sched =3D sd->scheduler;
=20
     rcu_read_lock(&domlist_read_lock);
     rc =3D cpu_disable_scheduler(cpu);
     BUG_ON(rc);
     rcu_read_unlock(&domlist_read_lock);
-    sched_deinit_pdata(sched, sd->sched_priv, cpu);
     cpu_schedule_down(cpu);
 }
=20
@@ -2260,32 +2272,22 @@ static int cpu_schedule_callback(
      * allocating and initializing the per-pCPU scheduler specific data,
      * as well as "registering" this pCPU to the scheduler (which may
      * involve modifying some scheduler wide data structures).
-     * This happens by calling the alloc_pdata and init_pdata hooks, in
-     * this order. A scheduler that does not need to allocate any per-pCPU
-     * data can avoid implementing alloc_pdata. init_pdata may, however, be
-     * necessary/useful in this case too (e.g., it can contain the "regist=
er
-     * the pCPU to the scheduler" part). alloc_pdata (if present) is called
-     * during CPU_UP_PREPARE. init_pdata (if present) is called before
-     * CPU_STARTING in scheduler_percpu_init().
+     * As new pCPUs always start as "free" cpus with the minimal idle
+     * scheduler being in charge, we don't need any of that.
      *
      * On the other hand, at teardown, we need to reverse what has been do=
ne
-     * during initialization, and then free the per-pCPU specific data. Th=
is
-     * happens by calling the deinit_pdata and free_pdata hooks, in this
+     * during initialization, and then free the per-pCPU specific data. A
+     * pCPU brought down is not forced through "free" cpus, so here we nee=
d to
+     * use the appropriate hooks.
+     *
+     * This happens by calling the deinit_pdata and free_pdata hooks, in t=
his
      * order. If no per-pCPU memory was allocated, there is no need to
      * provide an implementation of free_pdata. deinit_pdata may, however,
      * be necessary/useful in this case too (e.g., it can undo something d=
one
      * on scheduler wide data structure during init_pdata). Both deinit_pd=
ata
      * and free_pdata are called during CPU_DEAD.
      *
-     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED
-     * *before* having called init_pdata. In this case, as there is no
-     * initialization needing undoing, only free_pdata should be called.
-     * This means it is possible to call free_pdata just after alloc_pdata,
-     * without a init_pdata/deinit_pdata "cycle" in between the two.
-     *
-     * So, in summary, the usage pattern should look either
-     *  - alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata, or
-     *  - alloc_pdata-->free_pdata.
+     * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED.
      */
     switch ( action )
     {
@@ -2402,9 +2404,6 @@ void __init scheduler_init(void)
         BUG();
     get_sched_res(0)->curr =3D idle_vcpu[0]->sched_unit;
     get_sched_res(0)->sched_unit_idle =3D idle_vcpu[0]->sched_unit;
-    get_sched_res(0)->sched_priv =3D sched_alloc_pdata(&ops, 0);
-    BUG_ON(IS_ERR(get_sched_res(0)->sched_priv));
-    scheduler_percpu_init(0);
 }
=20
 /*
@@ -2412,18 +2411,14 @@ void __init scheduler_init(void)
  * cpupool, or subject it to the scheduler of a new cpupool.
  *
  * For the pCPUs that are removed from their cpupool, their scheduler beco=
mes
- * &ops (the default scheduler, selected at boot, which also services the
- * default cpupool). However, as these pCPUs are not really part of any po=
ol,
- * there won't be any scheduling event on them, not even from the default
- * scheduler. Basically, they will just sit idle until they are explicitly
- * added back to a cpupool.
+ * &sched_idle_ops (the idle scheduler).
  */
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops =3D get_sched_res(cpu)->scheduler;
-    struct scheduler *new_ops =3D (c =3D=3D NULL) ? &ops : c->sched;
+    struct scheduler *new_ops =3D (c =3D=3D NULL) ? &sched_idle_ops : c->s=
ched;
     struct sched_resource *sd =3D get_sched_res(cpu);
     struct cpupool *old_pool =3D sd->cpupool;
     spinlock_t *old_lock, *new_lock;
@@ -2443,9 +2438,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
     ASSERT((c =3D=3D NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) =
||
            (c !=3D NULL && !cpumask_test_cpu(cpu, c->cpu_valid)));
=20
-    if ( old_ops =3D=3D new_ops )
-        goto out;
-
     /*
      * To setup the cpu for the new scheduler we need:
      *  - a valid instance of per-CPU scheduler specific data, as it is
@@ -2498,7 +2490,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
      * taking it, finds all the initializations we've done above in place.
      */
     smp_mb();
-    sd->schedule_lock =3D c ? new_lock : &sched_free_cpu_lock;
+    sd->schedule_lock =3D new_lock;
=20
     /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
     spin_unlock_irqrestore(old_lock, flags);
@@ -2510,7 +2502,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup=
ool *c)
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
=20
- out:
     get_sched_res(cpu)->granularity =3D c ? c->granularity : 1;
     get_sched_res(cpu)->cpupool =3D c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 7dc63c449b..e689bba361 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -677,7 +677,6 @@ void __domain_crash(struct domain *d);
 void noreturn asm_domain_crash_synchronous(unsigned long addr);
=20
 void scheduler_init(void);
-void scheduler_percpu_init(unsigned int cpu);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel