From nobody Mon Feb 9 16:38:28 2026 Delivered-To: importer@patchew.org Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org ARC-Seal: i=1; a=rsa-sha256; t=1559039739; cv=none; d=zoho.com; s=zohoarc; b=B8gqvMBNDE2Ks2DMg/gK2iiYtYdlv+3/F9buRs8ER4ZiHSylaA1cnSL2GmW9pSB5+goe7u/aDai0WrK3Ua9yzownnhZpHKV84LFp7LUJCb2+rvZDMPpKoj6ab72/kxCmNdEnsIvlujSpPMIJnzZ7N2493RpLFidKmp8g4whc9P4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559039739; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=IdP4x3rNWeEwcVPdUBrTghVesid0YC5Mjpy5qFGUI8k=; b=g0381vwEwUhGR1Dsb7i474aR5GcSlgwtB7nNcjQ+O597zNgZtacrY+5ZctKLW6Aq1SY2eUW0btGIf7gnANigLrPlK9/oUxhD5hkrgk5gKpCCiXnlcsFmxXsCEVjD6JwQOvn5kt0vIh/ddHVQEWeyEjMPLK4jBTvWMrQfowVRyPQ= ARC-Authentication-Results: i=1; mx.zoho.com; spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1559039739170642.6348795475992; Tue, 28 May 2019 03:35:39 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hVZRT-0007c2-1C; Tue, 28 May 2019 10:34:51 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hVZQI-00052c-Rt for xen-devel@lists.xenproject.org; Tue, 28 May 2019 10:33:38 +0000 Received: from mx1.suse.de (unknown [195.135.220.15]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 096a3088-8134-11e9-8980-bc764e045a96; Tue, 28 May 2019 10:33:31 +0000 (UTC) Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 74263B02F; Tue, 28 May 2019 10:33:30 +0000 (UTC) X-Inumbo-ID: 096a3088-8134-11e9-8980-bc764e045a96 X-Virus-Scanned: by amavisd-new at test-mx.suse.de From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Tue, 28 May 2019 12:33:07 +0200 Message-Id: <20190528103313.1343-55-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190528103313.1343-1-jgross@suse.com> References: <20190528103313.1343-1-jgross@suse.com> Subject: [Xen-devel] [PATCH 54/60] xen/sched: add minimalistic idle scheduler for free cpus X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Instead of having a full blown scheduler running for the free cpus add a very minimalistic scheduler for that purpose only ever scheduling the related idle vcpu. This has the big advantage of not needing any per-cpu, per-domain or per-scheduling unit data for free cpus and in turn simplifying moving cpus to and from cpupools a lot. As this new scheduler is not user selectable don't register it as an official scheduler, but just include it in schedule.c. Signed-off-by: Juergen Gross --- V1: new patch --- xen/arch/arm/smpboot.c | 2 - xen/arch/x86/smpboot.c | 2 - xen/common/schedule.c | 143 +++++++++++++++++++++++---------------------= ---- xen/include/xen/sched.h | 1 - 4 files changed, 67 insertions(+), 81 deletions(-) diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 9a6582f2a6..f756444362 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -350,8 +350,6 @@ void start_secondary(unsigned long boot_phys_offset, =20 setup_cpu_sibling_map(cpuid); =20 - scheduler_percpu_init(cpuid); - /* Run local notifiers */ notify_cpu_starting(cpuid); /* diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 7e95b2cdac..153bfbb4b7 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -382,8 +382,6 @@ void start_secondary(void *unused) =20 set_cpu_sibling_map(cpu); =20 - scheduler_percpu_init(cpu); - init_percpu_time(); =20 setup_secondary_APIC_clock(); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 7a5ab4b1b6..d3e4ae226c 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -83,6 +83,57 @@ extern const struct scheduler *__start_schedulers_array[= ], *__end_schedulers_arr =20 static struct scheduler __read_mostly ops; =20 +static spinlock_t * +sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu, + void *pdata, void *vdata) +{ + sched_idle_unit(cpu)->priv =3D NULL; + + return &sched_free_cpu_lock; +} + +static struct sched_resource * +sched_idle_res_pick(const struct scheduler *ops, struct sched_unit *unit) +{ + return unit->res; +} + +static void * +sched_idle_alloc_vdata(const struct scheduler *ops, struct sched_unit *uni= t, + void *dd) +{ + /* Any non-NULL pointer is fine here. */ + return (void *)1UL; +} + +static void +sched_idle_free_vdata(const struct scheduler *ops, void *priv) +{ +} + +static void sched_idle_schedule( + const struct scheduler *ops, struct sched_unit *unit, s_time_t now, + bool tasklet_work_scheduled) +{ + const unsigned int cpu =3D smp_processor_id(); + + unit->next_time =3D -1; + unit->next_task =3D sched_idle_unit(sched_get_resource_cpu(cpu)); +} + +static struct scheduler sched_idle_ops =3D { + .name =3D "Idle Scheduler", + .opt_name =3D "idle", + .sched_data =3D NULL, + + .pick_resource =3D sched_idle_res_pick, + .do_schedule =3D sched_idle_schedule, + + .alloc_vdata =3D sched_idle_alloc_vdata, + .free_vdata =3D sched_idle_free_vdata, + .switch_sched =3D sched_idle_switch_sched, +}; + static inline struct vcpu *unit2vcpu_cpu(struct sched_unit *unit, unsigned int cpu) { @@ -2141,7 +2192,6 @@ static void poll_timer_fn(void *data) static int cpu_schedule_up(unsigned int cpu) { struct sched_resource *sd; - void *sched_priv; =20 sd =3D xzalloc(struct sched_resource); if ( sd =3D=3D NULL ) @@ -2150,7 +2200,7 @@ static int cpu_schedule_up(unsigned int cpu) sd->cpus =3D cpumask_of(cpu); set_sched_res(cpu, sd); =20 - sd->scheduler =3D &ops; + sd->scheduler =3D &sched_idle_ops; spin_lock_init(&sd->_lock); sd->schedule_lock =3D &sched_free_cpu_lock; init_timer(&sd->s_timer, s_timer_fn, NULL, cpu); @@ -2171,20 +2221,10 @@ static int cpu_schedule_up(unsigned int cpu) struct sched_unit *unit =3D idle->sched_unit; =20 /* - * During (ACPI?) suspend the idle vCPU for this pCPU is not freed, - * while its scheduler specific data (what is pointed by sched_pri= v) - * is. Also, at this stage of the resume path, we attach the pCPU - * to the default scheduler, no matter in what cpupool it was befo= re - * suspend. To avoid inconsistency, let's allocate default schedul= er - * data for the idle vCPU here. If the pCPU was in a different pool - * with a different scheduler, it is schedule_cpu_switch(), invoked - * later, that will set things up as appropriate. + * No need to allocate any scheduler data, as cpus coming online a= re + * free initially and the idle scheduler doesn't need any data are= as + * allocated. */ - ASSERT(unit->priv =3D=3D NULL); - - unit->priv =3D sched_alloc_vdata(&ops, unit, idle->domain->sched_p= riv); - if ( unit->priv =3D=3D NULL ) - return -ENOMEM; =20 /* Update the resource pointer in the idle unit. */ unit->res =3D sd; @@ -2195,16 +2235,7 @@ static int cpu_schedule_up(unsigned int cpu) sd->curr =3D idle_vcpu[cpu]->sched_unit; sd->sched_unit_idle =3D idle_vcpu[cpu]->sched_unit; =20 - /* - * We don't want to risk calling xfree() on an sd->sched_priv - * (e.g., inside free_pdata, from cpu_schedule_down() called - * during CPU_UP_CANCELLED) that contains an IS_ERR value. - */ - sched_priv =3D sched_alloc_pdata(&ops, cpu); - if ( IS_ERR(sched_priv) ) - return PTR_ERR(sched_priv); - - sd->sched_priv =3D sched_priv; + sd->sched_priv =3D NULL; =20 return 0; } @@ -2212,13 +2243,6 @@ static int cpu_schedule_up(unsigned int cpu) static void cpu_schedule_down(unsigned int cpu) { struct sched_resource *sd =3D get_sched_res(cpu); - struct scheduler *sched =3D sd->scheduler; - - sched_free_pdata(sched, sd->sched_priv, cpu); - sched_free_vdata(sched, idle_vcpu[cpu]->sched_unit->priv); - - idle_vcpu[cpu]->sched_unit->priv =3D NULL; - sd->sched_priv =3D NULL; =20 kill_timer(&sd->s_timer); =20 @@ -2226,26 +2250,14 @@ static void cpu_schedule_down(unsigned int cpu) xfree(sd); } =20 -void scheduler_percpu_init(unsigned int cpu) -{ - struct sched_resource *sd =3D get_sched_res(cpu); - struct scheduler *sched =3D sd->scheduler; - - if ( system_state !=3D SYS_STATE_resume ) - sched_init_pdata(sched, sd->sched_priv, cpu); -} - void sched_rm_cpu(unsigned int cpu) { int rc; - struct sched_resource *sd =3D get_sched_res(cpu); - struct scheduler *sched =3D sd->scheduler; =20 rcu_read_lock(&domlist_read_lock); rc =3D cpu_disable_scheduler(cpu); BUG_ON(rc); rcu_read_unlock(&domlist_read_lock); - sched_deinit_pdata(sched, sd->sched_priv, cpu); cpu_schedule_down(cpu); } =20 @@ -2260,32 +2272,22 @@ static int cpu_schedule_callback( * allocating and initializing the per-pCPU scheduler specific data, * as well as "registering" this pCPU to the scheduler (which may * involve modifying some scheduler wide data structures). - * This happens by calling the alloc_pdata and init_pdata hooks, in - * this order. A scheduler that does not need to allocate any per-pCPU - * data can avoid implementing alloc_pdata. init_pdata may, however, be - * necessary/useful in this case too (e.g., it can contain the "regist= er - * the pCPU to the scheduler" part). alloc_pdata (if present) is called - * during CPU_UP_PREPARE. init_pdata (if present) is called before - * CPU_STARTING in scheduler_percpu_init(). + * As new pCPUs always start as "free" cpus with the minimal idle + * scheduler being in charge, we don't need any of that. * * On the other hand, at teardown, we need to reverse what has been do= ne - * during initialization, and then free the per-pCPU specific data. Th= is - * happens by calling the deinit_pdata and free_pdata hooks, in this + * during initialization, and then free the per-pCPU specific data. A + * pCPU brought down is not forced through "free" cpus, so here we nee= d to + * use the appropriate hooks. + * + * This happens by calling the deinit_pdata and free_pdata hooks, in t= his * order. If no per-pCPU memory was allocated, there is no need to * provide an implementation of free_pdata. deinit_pdata may, however, * be necessary/useful in this case too (e.g., it can undo something d= one * on scheduler wide data structure during init_pdata). Both deinit_pd= ata * and free_pdata are called during CPU_DEAD. * - * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED - * *before* having called init_pdata. In this case, as there is no - * initialization needing undoing, only free_pdata should be called. - * This means it is possible to call free_pdata just after alloc_pdata, - * without a init_pdata/deinit_pdata "cycle" in between the two. - * - * So, in summary, the usage pattern should look either - * - alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata, or - * - alloc_pdata-->free_pdata. + * If someting goes wrong during bringup, we go to CPU_UP_CANCELLED. */ switch ( action ) { @@ -2402,9 +2404,6 @@ void __init scheduler_init(void) BUG(); get_sched_res(0)->curr =3D idle_vcpu[0]->sched_unit; get_sched_res(0)->sched_unit_idle =3D idle_vcpu[0]->sched_unit; - get_sched_res(0)->sched_priv =3D sched_alloc_pdata(&ops, 0); - BUG_ON(IS_ERR(get_sched_res(0)->sched_priv)); - scheduler_percpu_init(0); } =20 /* @@ -2412,18 +2411,14 @@ void __init scheduler_init(void) * cpupool, or subject it to the scheduler of a new cpupool. * * For the pCPUs that are removed from their cpupool, their scheduler beco= mes - * &ops (the default scheduler, selected at boot, which also services the - * default cpupool). However, as these pCPUs are not really part of any po= ol, - * there won't be any scheduling event on them, not even from the default - * scheduler. Basically, they will just sit idle until they are explicitly - * added back to a cpupool. + * &sched_idle_ops (the idle scheduler). */ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) { struct vcpu *idle; void *ppriv, *ppriv_old, *vpriv, *vpriv_old; struct scheduler *old_ops =3D get_sched_res(cpu)->scheduler; - struct scheduler *new_ops =3D (c =3D=3D NULL) ? &ops : c->sched; + struct scheduler *new_ops =3D (c =3D=3D NULL) ? &sched_idle_ops : c->s= ched; struct sched_resource *sd =3D get_sched_res(cpu); struct cpupool *old_pool =3D sd->cpupool; spinlock_t *old_lock, *new_lock; @@ -2443,9 +2438,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup= ool *c) ASSERT((c =3D=3D NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) = || (c !=3D NULL && !cpumask_test_cpu(cpu, c->cpu_valid))); =20 - if ( old_ops =3D=3D new_ops ) - goto out; - /* * To setup the cpu for the new scheduler we need: * - a valid instance of per-CPU scheduler specific data, as it is @@ -2498,7 +2490,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup= ool *c) * taking it, finds all the initializations we've done above in place. */ smp_mb(); - sd->schedule_lock =3D c ? new_lock : &sched_free_cpu_lock; + sd->schedule_lock =3D new_lock; =20 /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */ spin_unlock_irqrestore(old_lock, flags); @@ -2510,7 +2502,6 @@ int schedule_cpu_switch(unsigned int cpu, struct cpup= ool *c) sched_free_vdata(old_ops, vpriv_old); sched_free_pdata(old_ops, ppriv_old, cpu); =20 - out: get_sched_res(cpu)->granularity =3D c ? c->granularity : 1; get_sched_res(cpu)->cpupool =3D c; /* When a cpu is added to a pool, trigger it to go pick up some work */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 7dc63c449b..e689bba361 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -677,7 +677,6 @@ void __domain_crash(struct domain *d); void noreturn asm_domain_crash_synchronous(unsigned long addr); =20 void scheduler_init(void); -void scheduler_percpu_init(unsigned int cpu); int sched_init_vcpu(struct vcpu *v); void sched_destroy_vcpu(struct vcpu *v); int sched_init_domain(struct domain *d, int poolid); --=20 2.16.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel