From nobody Thu May  2 10:53:38 2024
Delivered-To: importer@patchew.org
Received-SPF: none (zohomail.com: 192.237.175.120 is neither permitted nor
 denied by domain of lists.xenproject.org) client-ip=192.237.175.120;
 envelope-from=xen-devel-bounces@lists.xenproject.org;
 helo=lists.xenproject.org;
Authentication-Results: mx.zohomail.com;
	spf=none (zohomail.com: 192.237.175.120 is neither permitted nor denied by
 domain of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
ARC-Seal: i=1; a=rsa-sha256; t=1582206040; cv=none;
	d=zohomail.com; s=zohoarc;
	b=IbcVmBXJxqBBmUYVLt91t+WvhJcNmDbXtFYAn+jHMoeLiOZ9XTtqs/IfQK9msNYr6CXAQ4V6BXphKJzZumVB1aRn67tgPWig8yHy9bua+gAQlH/e2Uw0yyIuu9sGdtVZni54CE5VsIpWBaaRzfYbFpzGbLGdB0y8bV/wR8cWzys=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com;
 s=zohoarc;
	t=1582206040;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To;
	bh=mQZ+kLyxKcN0B2JPRWGZQhhw5nhNm9lkDSZGcWRPBjw=;
	b=Xm/hCKD75NAniwW+3sxhHhAnynhZ1xye0PYtst0vIbWmzWjSrGPwDlake0i/bmI4jAoSwPkBkuRbEK3/mc9uonwLvFVKR88yPKLoWdO53KQvVPu7mkitW77bQ8Biew92VnOVgq2fyK8RU6VE0+dV/F+DBMGTl3pB15LYiUy/t8A=
ARC-Authentication-Results: i=1; mx.zohomail.com;
	spf=none (zohomail.com: 192.237.175.120 is neither permitted nor denied by
 domain of lists.xenproject.org)
  smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
 by mx.zohomail.com
	with SMTPS id 1582206040496666.0843835523458;
 Thu, 20 Feb 2020 05:40:40 -0800 (PST)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1j4m3X-0006XQ-8t; Thu, 20 Feb 2020 13:39:55 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=vvQh=4I=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1j4m3V-0006XL-Qt
 for xen-devel@lists.xenproject.org; Thu, 20 Feb 2020 13:39:53 +0000
Received: from mx2.suse.de (unknown [195.135.220.15])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 785ad3a6-53e6-11ea-852c-12813bfff9fa;
 Thu, 20 Feb 2020 13:39:52 +0000 (UTC)
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx2.suse.de (Postfix) with ESMTP id 23C3DAD68;
 Thu, 20 Feb 2020 13:39:51 +0000 (UTC)
X-Inumbo-ID: 785ad3a6-53e6-11ea-852c-12813bfff9fa
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Thu, 20 Feb 2020 14:39:49 +0100
Message-Id: <20200220133949.29832-1-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
Subject: [Xen-devel] [PATCH v2] xen/sched: rework credit2 run-queue
 allocation
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Currently the memory for each run-queue of the credit2 scheduler is
allocated at the scheduler's init function: for each cpu in the system
a struct csched2_runqueue_data is being allocated, even if the
current scheduler only handles one physical cpu or is configured to
work with a single run-queue. As each struct contains 4 cpumasks this
sums up to rather large memory sizes pretty fast.

Rework the memory allocation for run-queues to be done only when
needed, i.e. when adding a physical cpu to the scheduler requiring a
new run-queue.

In fact this fixes a bug in credit2 related to run-queue handling:
cpu_to_runqueue() will return the first free or matching run-queue,
which ever is found first. So in case a cpu is removed from credit2
this could result in e.g. run-queue 0 becoming free, so when another
cpu is added it will in any case be assigned to that free run-queue,
even if it would have found another run-queue matching later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V2:
- added two comments (Dario Faggioli)
---
 xen/common/sched/credit2.c | 371 ++++++++++++++++++++++-------------------=
----
 1 file changed, 183 insertions(+), 188 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 7d104f15d0..8ae3b80d2e 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -467,8 +467,12 @@ custom_param("credit2_runqueue", parse_credit2_runqueu=
e);
 struct csched2_runqueue_data {
     spinlock_t lock;           /* Lock for this runqueue                  =
   */
=20
+    struct list_head rql;      /* List of runqueues                       =
   */
     struct list_head runq;     /* Ordered list of runnable vms            =
   */
+    unsigned int refcnt;       /* How many CPUs reference this runqueue   =
   */
+                               /* (including not yet active ones)         =
   */
     unsigned int nr_cpus;      /* How many CPUs are sharing this runqueue =
   */
+                               /* (only active ones)                      =
   */
     int id;                    /* ID of this runqueue (-1 if invalid)     =
   */
=20
     int load;                  /* Instantaneous load (num of non-idle unit=
s) */
@@ -496,8 +500,8 @@ struct csched2_private {
     unsigned int load_window_shift;    /* Lenght of load decaying window  =
   */
     unsigned int ratelimit_us;         /* Rate limiting for this scheduler=
   */
=20
-    cpumask_t active_queues;           /* Runqueues with (maybe) active cp=
us */
-    struct csched2_runqueue_data *rqd; /* Data of the various runqueues   =
   */
+    unsigned int active_queues;        /* Number of active runqueues      =
   */
+    struct list_head rql;              /* List of runqueues               =
   */
=20
     cpumask_t initialized;             /* CPUs part of this scheduler     =
   */
     struct list_head sdom;             /* List of domains (for debug key) =
   */
@@ -508,7 +512,7 @@ struct csched2_private {
  */
 struct csched2_pcpu {
     cpumask_t sibling_mask;            /* Siblings in the same runqueue   =
   */
-    int runq_id;
+    struct csched2_runqueue_data *rqd; /* Runqueue for this CPU           =
   */
 };
=20
 /*
@@ -586,14 +590,13 @@ static inline struct csched2_dom *csched2_dom(const s=
truct domain *d)
 /* CPU to runq_id macro */
 static inline int c2r(unsigned int cpu)
 {
-    return csched2_pcpu(cpu)->runq_id;
+    return csched2_pcpu(cpu)->rqd->id;
 }
=20
 /* CPU to runqueue struct macro */
-static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *=
ops,
-                                                  unsigned int cpu)
+static inline struct csched2_runqueue_data *c2rqd(unsigned int cpu)
 {
-    return &csched2_priv(ops)->rqd[c2r(cpu)];
+    return csched2_pcpu(cpu)->rqd;
 }
=20
 /* Does the domain of this unit have a cap? */
@@ -804,36 +807,6 @@ static inline struct csched2_unit * runq_elem(struct l=
ist_head *elem)
     return list_entry(elem, struct csched2_unit, runq_elem);
 }
=20
-static void activate_runqueue(struct csched2_private *prv, int rqi)
-{
-    struct csched2_runqueue_data *rqd;
-
-    rqd =3D prv->rqd + rqi;
-
-    BUG_ON(!cpumask_empty(&rqd->active));
-
-    rqd->max_weight =3D 1;
-    rqd->id =3D rqi;
-    INIT_LIST_HEAD(&rqd->svc);
-    INIT_LIST_HEAD(&rqd->runq);
-    spin_lock_init(&rqd->lock);
-
-    __cpumask_set_cpu(rqi, &prv->active_queues);
-}
-
-static void deactivate_runqueue(struct csched2_private *prv, int rqi)
-{
-    struct csched2_runqueue_data *rqd;
-
-    rqd =3D prv->rqd + rqi;
-
-    BUG_ON(!cpumask_empty(&rqd->active));
-
-    rqd->id =3D -1;
-
-    __cpumask_clear_cpu(rqi, &prv->active_queues);
-}
-
 static inline bool same_node(unsigned int cpua, unsigned int cpub)
 {
     return cpu_to_node(cpua) =3D=3D cpu_to_node(cpub);
@@ -850,51 +823,73 @@ static inline bool same_core(unsigned int cpua, unsig=
ned int cpub)
            cpu_to_core(cpua) =3D=3D cpu_to_core(cpub);
 }
=20
-static unsigned int
-cpu_to_runqueue(const struct csched2_private *prv, unsigned int cpu)
+static struct csched2_runqueue_data *
+cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
 {
-    const struct csched2_runqueue_data *rqd;
-    unsigned int rqi;
+    struct csched2_runqueue_data *rqd, *rqd_new;
+    struct list_head *rqd_ins;
+    unsigned long flags;
+    int rqi =3D 0;
+    bool rqi_unused =3D false, rqd_valid =3D false;
=20
-    for ( rqi =3D 0; rqi < nr_cpu_ids; rqi++ )
+    /* Prealloc in case we need it - not allowed with interrupts off. */
+    rqd_new =3D xzalloc(struct csched2_runqueue_data);
+
+    write_lock_irqsave(&prv->lock, flags);
+
+    rqd_ins =3D &prv->rql;
+    list_for_each_entry ( rqd, &prv->rql, rql )
     {
         unsigned int peer_cpu;
=20
-        /*
-         * As soon as we come across an uninitialized runqueue, use it.
-         * In fact, either:
-         *  - we are initializing the first cpu, and we assign it to
-         *    runqueue 0. This is handy, especially if we are dealing
-         *    with the boot cpu (if credit2 is the default scheduler),
-         *    as we would not be able to use cpu_to_socket() and similar
-         *    helpers anyway (they're result of which is not reliable yet);
-         *  - we have gone through all the active runqueues, and have not
-         *    found anyone whose cpus' topology matches the one we are
-         *    dealing with, so activating a new runqueue is what we want.
-         */
-        if ( prv->rqd[rqi].id =3D=3D -1 )
-            break;
-
-        rqd =3D prv->rqd + rqi;
-        BUG_ON(cpumask_empty(&rqd->active));
+        /* Remember first unused queue index. */
+        if ( !rqi_unused && rqd->id > rqi )
+            rqi_unused =3D true;
=20
-        peer_cpu =3D cpumask_first(&rqd->active);
+        peer_cpu =3D rqd->pick_bias;
         BUG_ON(cpu_to_socket(cpu) =3D=3D XEN_INVALID_SOCKET_ID ||
                cpu_to_socket(peer_cpu) =3D=3D XEN_INVALID_SOCKET_ID);
=20
-        if (opt_runqueue =3D=3D OPT_RUNQUEUE_CPU)
-            continue;
+        /* OPT_RUNQUEUE_CPU will never find an existing runqueue. */
         if ( opt_runqueue =3D=3D OPT_RUNQUEUE_ALL ||
              (opt_runqueue =3D=3D OPT_RUNQUEUE_CORE && same_core(peer_cpu,=
 cpu)) ||
              (opt_runqueue =3D=3D OPT_RUNQUEUE_SOCKET && same_socket(peer_=
cpu, cpu)) ||
              (opt_runqueue =3D=3D OPT_RUNQUEUE_NODE && same_node(peer_cpu,=
 cpu)) )
+        {
+            rqd_valid =3D true;
             break;
+        }
+
+        if ( !rqi_unused )
+        {
+            rqi++;
+            rqd_ins =3D &rqd->rql;
+        }
+    }
+
+    if ( !rqd_valid )
+    {
+        if ( !rqd_new )
+        {
+            rqd =3D ERR_PTR(-ENOMEM);
+            goto out;
+        }
+        rqd =3D rqd_new;
+        rqd_new =3D NULL;
+
+        list_add(&rqd->rql, rqd_ins);
+        rqd->pick_bias =3D cpu;
+        rqd->id =3D rqi;
     }
=20
-    /* We really expect to be able to assign each cpu to a runqueue. */
-    BUG_ON(rqi >=3D nr_cpu_ids);
+    rqd->refcnt++;
=20
-    return rqi;
+ out:
+    write_unlock_irqrestore(&prv->lock, flags);
+
+    xfree(rqd_new);
+
+    return rqd;
 }
=20
 /* Find the domain with the highest weight. */
@@ -972,13 +967,13 @@ _runq_assign(struct csched2_unit *svc, struct csched2=
_runqueue_data *rqd)
 }
=20
 static void
-runq_assign(const struct scheduler *ops, const struct sched_unit *unit)
+runq_assign(const struct sched_unit *unit)
 {
     struct csched2_unit *svc =3D unit->priv;
=20
     ASSERT(svc->rqd =3D=3D NULL);
=20
-    _runq_assign(svc, c2rqd(ops, sched_unit_master(unit)));
+    _runq_assign(svc, c2rqd(sched_unit_master(unit)));
 }
=20
 static void
@@ -999,11 +994,11 @@ _runq_deassign(struct csched2_unit *svc)
 }
=20
 static void
-runq_deassign(const struct scheduler *ops, const struct sched_unit *unit)
+runq_deassign(const struct sched_unit *unit)
 {
     struct csched2_unit *svc =3D unit->priv;
=20
-    ASSERT(svc->rqd =3D=3D c2rqd(ops, sched_unit_master(unit)));
+    ASSERT(svc->rqd =3D=3D c2rqd(sched_unit_master(unit)));
=20
     _runq_deassign(svc);
 }
@@ -1272,12 +1267,11 @@ update_load(const struct scheduler *ops,
         update_svc_load(ops, svc, change, now);
 }
=20
-static void
-runq_insert(const struct scheduler *ops, struct csched2_unit *svc)
+static void runq_insert(struct csched2_unit *svc)
 {
     struct list_head *iter;
     unsigned int cpu =3D sched_unit_master(svc->unit);
-    struct list_head * runq =3D &c2rqd(ops, cpu)->runq;
+    struct list_head *runq =3D &c2rqd(cpu)->runq;
     int pos =3D 0;
=20
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
@@ -1366,7 +1360,7 @@ static inline bool is_preemptable(const struct csched=
2_unit *svc,
 static s_time_t tickle_score(const struct scheduler *ops, s_time_t now,
                              const struct csched2_unit *new, unsigned int =
cpu)
 {
-    struct csched2_runqueue_data *rqd =3D c2rqd(ops, cpu);
+    struct csched2_runqueue_data *rqd =3D c2rqd(cpu);
     struct csched2_unit * cur =3D csched2_unit(curr_on_cpu(cpu));
     const struct csched2_private *prv =3D csched2_priv(ops);
     s_time_t score;
@@ -1442,7 +1436,7 @@ runq_tickle(const struct scheduler *ops, struct csche=
d2_unit *new, s_time_t now)
     s_time_t max =3D 0;
     struct sched_unit *unit =3D new->unit;
     unsigned int bs, cpu =3D sched_unit_master(unit);
-    struct csched2_runqueue_data *rqd =3D c2rqd(ops, cpu);
+    struct csched2_runqueue_data *rqd =3D c2rqd(cpu);
     const cpumask_t *online =3D cpupool_domain_master_cpumask(unit->domain=
);
     cpumask_t mask;
=20
@@ -1618,10 +1612,9 @@ runq_tickle(const struct scheduler *ops, struct csch=
ed2_unit *new, s_time_t now)
 /*
  * Credit-related code
  */
-static void reset_credit(const struct scheduler *ops, int cpu, s_time_t no=
w,
-                         struct csched2_unit *snext)
+static void reset_credit(int cpu, s_time_t now, struct csched2_unit *snext)
 {
-    struct csched2_runqueue_data *rqd =3D c2rqd(ops, cpu);
+    struct csched2_runqueue_data *rqd =3D c2rqd(cpu);
     struct list_head *iter;
     int m;
=20
@@ -1910,7 +1903,7 @@ unpark_parked_units(const struct scheduler *ops, stru=
ct list_head *units)
              * for the newly replenished budget.
              */
             ASSERT( svc->rqd !=3D NULL );
-            ASSERT( c2rqd(ops, sched_unit_master(svc->unit)) =3D=3D svc->r=
qd );
+            ASSERT( c2rqd(sched_unit_master(svc->unit)) =3D=3D svc->rqd );
             __set_bit(__CSFLAG_delayed_runq_add, &svc->flags);
         }
         else if ( unit_runnable(svc->unit) )
@@ -1923,7 +1916,7 @@ unpark_parked_units(const struct scheduler *ops, stru=
ct list_head *units)
              */
             now =3D NOW();
             update_load(ops, svc->rqd, svc, 1, now);
-            runq_insert(ops, svc);
+            runq_insert(svc);
             runq_tickle(ops, svc, now);
         }
         list_del_init(&svc->parked_elem);
@@ -2088,7 +2081,7 @@ csched2_unit_sleep(const struct scheduler *ops, struc=
t sched_unit *unit)
     }
     else if ( unit_on_runq(svc) )
     {
-        ASSERT(svc->rqd =3D=3D c2rqd(ops, sched_unit_master(unit)));
+        ASSERT(svc->rqd =3D=3D c2rqd(sched_unit_master(unit)));
         update_load(ops, svc->rqd, svc, -1, NOW());
         runq_remove(svc);
     }
@@ -2135,16 +2128,16 @@ csched2_unit_wake(const struct scheduler *ops, stru=
ct sched_unit *unit)
=20
     /* Add into the new runqueue if necessary */
     if ( svc->rqd =3D=3D NULL )
-        runq_assign(ops, unit);
+        runq_assign(unit);
     else
-        ASSERT(c2rqd(ops, sched_unit_master(unit)) =3D=3D svc->rqd );
+        ASSERT(c2rqd(sched_unit_master(unit)) =3D=3D svc->rqd );
=20
     now =3D NOW();
=20
     update_load(ops, svc->rqd, svc, 1, now);
=20
     /* Put the UNIT on the runq */
-    runq_insert(ops, svc);
+    runq_insert(svc);
     runq_tickle(ops, svc, now);
=20
 out:
@@ -2168,7 +2161,7 @@ csched2_context_saved(const struct scheduler *ops, st=
ruct sched_unit *unit)
     LIST_HEAD(were_parked);
=20
     ASSERT(is_idle_unit(unit) ||
-           svc->rqd =3D=3D c2rqd(ops, sched_unit_master(unit)));
+           svc->rqd =3D=3D c2rqd(sched_unit_master(unit)));
=20
     /* This unit is now eligible to be put on the runqueue again */
     __clear_bit(__CSFLAG_scheduled, &svc->flags);
@@ -2189,7 +2182,7 @@ csched2_context_saved(const struct scheduler *ops, st=
ruct sched_unit *unit)
     {
         ASSERT(!unit_on_runq(svc));
=20
-        runq_insert(ops, svc);
+        runq_insert(svc);
         runq_tickle(ops, svc, now);
     }
     else if ( !is_idle_unit(unit) )
@@ -2205,13 +2198,13 @@ static struct sched_resource *
 csched2_res_pick(const struct scheduler *ops, const struct sched_unit *uni=
t)
 {
     struct csched2_private *prv =3D csched2_priv(ops);
-    int i, min_rqi =3D -1, min_s_rqi =3D -1;
     unsigned int new_cpu, cpu =3D sched_unit_master(unit);
     struct csched2_unit *svc =3D csched2_unit(unit);
     s_time_t min_avgload =3D MAX_LOAD, min_s_avgload =3D MAX_LOAD;
     bool has_soft;
+    struct csched2_runqueue_data *rqd, *min_rqd =3D NULL, *min_s_rqd =3D N=
ULL;
=20
-    ASSERT(!cpumask_empty(&prv->active_queues));
+    ASSERT(!list_empty(&prv->rql));
=20
     SCHED_STAT_CRANK(pick_resource);
=20
@@ -2289,13 +2282,10 @@ csched2_res_pick(const struct scheduler *ops, const=
 struct sched_unit *unit)
      * Find both runqueues in one pass.
      */
     has_soft =3D has_soft_affinity(unit);
-    for_each_cpu(i, &prv->active_queues)
+    list_for_each_entry ( rqd, &prv->rql, rql )
     {
-        struct csched2_runqueue_data *rqd;
         s_time_t rqd_avgload =3D MAX_LOAD;
=20
-        rqd =3D prv->rqd + i;
-
         /*
          * If none of the cpus of this runqueue is in svc's hard-affinity,
          * skip the runqueue.
@@ -2338,18 +2328,18 @@ csched2_res_pick(const struct scheduler *ops, const=
 struct sched_unit *unit)
             if ( cpumask_intersects(&mask, unit->cpu_soft_affinity) )
             {
                 min_s_avgload =3D rqd_avgload;
-                min_s_rqi =3D i;
+                min_s_rqd =3D rqd;
             }
         }
         /* In any case, keep the "hard-affinity minimum" updated too. */
         if ( rqd_avgload < min_avgload )
         {
             min_avgload =3D rqd_avgload;
-            min_rqi =3D i;
+            min_rqd =3D rqd;
         }
     }
=20
-    if ( has_soft && min_s_rqi !=3D -1 )
+    if ( has_soft && min_s_rqd )
     {
         /*
          * We have soft affinity, and we have a candidate runq, so go for =
it.
@@ -2369,9 +2359,9 @@ csched2_res_pick(const struct scheduler *ops, const s=
truct sched_unit *unit)
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     unit->cpu_soft_affinity);
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    &prv->rqd[min_s_rqi].active);
+                    &min_s_rqd->active);
     }
-    else if ( min_rqi !=3D -1 )
+    else if ( min_rqd )
     {
         /*
          * Either we don't have soft-affinity, or we do, but we did not fi=
nd
@@ -2383,7 +2373,7 @@ csched2_res_pick(const struct scheduler *ops, const s=
truct sched_unit *unit)
          * with the cpus of the runq.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                    &prv->rqd[min_rqi].active);
+                    &min_rqd->active);
     }
     else
     {
@@ -2392,14 +2382,13 @@ csched2_res_pick(const struct scheduler *ops, const=
 struct sched_unit *unit)
          * contention).
          */
         new_cpu =3D get_fallback_cpu(svc);
-        min_rqi =3D c2r(new_cpu);
-        min_avgload =3D prv->rqd[min_rqi].b_avgload;
+        min_rqd =3D c2rqd(new_cpu);
+        min_avgload =3D min_rqd->b_avgload;
         goto out_up;
     }
=20
-    new_cpu =3D cpumask_cycle(prv->rqd[min_rqi].pick_bias,
-                            cpumask_scratch_cpu(cpu));
-    prv->rqd[min_rqi].pick_bias =3D new_cpu;
+    new_cpu =3D cpumask_cycle(min_rqd->pick_bias, cpumask_scratch_cpu(cpu)=
);
+    min_rqd->pick_bias =3D new_cpu;
     BUG_ON(new_cpu >=3D nr_cpu_ids);
=20
  out_up:
@@ -2414,7 +2403,7 @@ csched2_res_pick(const struct scheduler *ops, const s=
truct sched_unit *unit)
         } d;
         d.dom =3D unit->domain->domain_id;
         d.unit =3D unit->unit_id;
-        d.rq_id =3D min_rqi;
+        d.rq_id =3D min_rqd->id;
         d.b_avgload =3D min_avgload;
         d.new_cpu =3D new_cpu;
         __trace_var(TRC_CSCHED2_PICKED_CPU, 1,
@@ -2527,7 +2516,7 @@ static void migrate(const struct scheduler *ops,
         if ( on_runq )
         {
             update_load(ops, svc->rqd, NULL, 1, now);
-            runq_insert(ops, svc);
+            runq_insert(svc);
             runq_tickle(ops, svc, now);
             SCHED_STAT_CRANK(migrate_on_runq);
         }
@@ -2557,9 +2546,9 @@ static bool unit_is_migrateable(const struct csched2_=
unit *svc,
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t no=
w)
 {
     struct csched2_private *prv =3D csched2_priv(ops);
-    int i, max_delta_rqi;
     struct list_head *push_iter, *pull_iter;
     bool inner_load_updated =3D 0;
+    struct csched2_runqueue_data *rqd, *max_delta_rqd;
=20
     balance_state_t st =3D { .best_push_svc =3D NULL, .best_pull_svc =3D N=
ULL };
=20
@@ -2571,22 +2560,22 @@ static void balance_load(const struct scheduler *op=
s, int cpu, s_time_t now)
      */
=20
     ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock));
-    st.lrqd =3D c2rqd(ops, cpu);
+    st.lrqd =3D c2rqd(cpu);
=20
     update_runq_load(ops, st.lrqd, 0, now);
=20
 retry:
-    max_delta_rqi =3D -1;
+    max_delta_rqd =3D NULL;
     if ( !read_trylock(&prv->lock) )
         return;
=20
     st.load_delta =3D 0;
=20
-    for_each_cpu(i, &prv->active_queues)
+    list_for_each_entry ( rqd, &prv->rql, rql )
     {
         s_time_t delta;
=20
-        st.orqd =3D prv->rqd + i;
+        st.orqd =3D rqd;
=20
         if ( st.orqd =3D=3D st.lrqd
              || !spin_trylock(&st.orqd->lock) )
@@ -2601,7 +2590,7 @@ retry:
         if ( delta > st.load_delta )
         {
             st.load_delta =3D delta;
-            max_delta_rqi =3D i;
+            max_delta_rqd =3D rqd;
         }
=20
         spin_unlock(&st.orqd->lock);
@@ -2609,7 +2598,7 @@ retry:
=20
     /* Minimize holding the private scheduler lock. */
     read_unlock(&prv->lock);
-    if ( max_delta_rqi =3D=3D -1 )
+    if ( !max_delta_rqd )
         goto out;
=20
     {
@@ -2621,10 +2610,7 @@ retry:
         if ( st.orqd->b_avgload > load_max )
             load_max =3D st.orqd->b_avgload;
=20
-        cpus_max =3D st.lrqd->nr_cpus;
-        i =3D st.orqd->nr_cpus;
-        if ( i > cpus_max )
-            cpus_max =3D i;
+        cpus_max =3D max(st.lrqd->nr_cpus, st.orqd->nr_cpus);
=20
         if ( unlikely(tb_init_done) )
         {
@@ -2660,7 +2646,7 @@ retry:
      * meantime, try the process over again.  This can't deadlock
      * because if it doesn't get any other rqd locks, it will simply
      * give up and return. */
-    st.orqd =3D prv->rqd + max_delta_rqi;
+    st.orqd =3D max_delta_rqd;
     if ( !spin_trylock(&st.orqd->lock) )
         goto retry;
=20
@@ -2751,7 +2737,7 @@ csched2_unit_migrate(
     ASSERT(cpumask_test_cpu(new_cpu, &csched2_priv(ops)->initialized));
     ASSERT(cpumask_test_cpu(new_cpu, unit->cpu_hard_affinity));
=20
-    trqd =3D c2rqd(ops, new_cpu);
+    trqd =3D c2rqd(new_cpu);
=20
     /*
      * Do the actual movement toward new_cpu, and update vc->processor.
@@ -2815,7 +2801,7 @@ csched2_dom_cntl(
                 struct csched2_unit *svc =3D csched2_unit(unit);
                 spinlock_t *lock =3D unit_schedule_lock(unit);
=20
-                ASSERT(svc->rqd =3D=3D c2rqd(ops, sched_unit_master(unit))=
);
+                ASSERT(svc->rqd =3D=3D c2rqd(sched_unit_master(unit)));
=20
                 svc->weight =3D sdom->weight;
                 update_max_weight(svc->rqd, svc->weight, old_weight);
@@ -2898,7 +2884,7 @@ csched2_dom_cntl(
                     if ( unit->is_running )
                     {
                         unsigned int cpu =3D sched_unit_master(unit);
-                        struct csched2_runqueue_data *rqd =3D c2rqd(ops, c=
pu);
+                        struct csched2_runqueue_data *rqd =3D c2rqd(cpu);
=20
                         ASSERT(curr_on_cpu(cpu) =3D=3D unit);
=20
@@ -3093,7 +3079,7 @@ csched2_unit_insert(const struct scheduler *ops, stru=
ct sched_unit *unit)
     lock =3D unit_schedule_lock_irq(unit);
=20
     /* Add unit to runqueue of initial processor */
-    runq_assign(ops, unit);
+    runq_assign(unit);
=20
     unit_schedule_unlock_irq(lock, unit);
=20
@@ -3126,7 +3112,7 @@ csched2_unit_remove(const struct scheduler *ops, stru=
ct sched_unit *unit)
     /* Remove from runqueue */
     lock =3D unit_schedule_lock_irq(unit);
=20
-    runq_deassign(ops, unit);
+    runq_deassign(unit);
=20
     unit_schedule_unlock_irq(lock, unit);
=20
@@ -3140,7 +3126,7 @@ csched2_runtime(const struct scheduler *ops, int cpu,
 {
     s_time_t time, min_time;
     int rt_credit; /* Proposed runtime measured in credits */
-    struct csched2_runqueue_data *rqd =3D c2rqd(ops, cpu);
+    struct csched2_runqueue_data *rqd =3D c2rqd(cpu);
     struct list_head *runq =3D &rqd->runq;
     const struct csched2_private *prv =3D csched2_priv(ops);
=20
@@ -3437,7 +3423,7 @@ static void csched2_schedule(
=20
     BUG_ON(!cpumask_test_cpu(sched_cpu, &csched2_priv(ops)->initialized));
=20
-    rqd =3D c2rqd(ops, sched_cpu);
+    rqd =3D c2rqd(sched_cpu);
     BUG_ON(!cpumask_test_cpu(sched_cpu, &rqd->active));
=20
     ASSERT(spin_is_locked(get_sched_res(sched_cpu)->schedule_lock));
@@ -3551,7 +3537,7 @@ static void csched2_schedule(
          */
         if ( skipped_units =3D=3D 0 && snext->credit <=3D CSCHED2_CREDIT_R=
ESET )
         {
-            reset_credit(ops, sched_cpu, now, snext);
+            reset_credit(sched_cpu, now, snext);
             balance_load(ops, sched_cpu, now);
         }
=20
@@ -3650,7 +3636,8 @@ csched2_dump(const struct scheduler *ops)
     struct list_head *iter_sdom;
     struct csched2_private *prv =3D csched2_priv(ops);
     unsigned long flags;
-    unsigned int i, j, loop;
+    unsigned int j, loop;
+    struct csched2_runqueue_data *rqd;
=20
     /*
      * We need the private scheduler lock as we access global
@@ -3660,13 +3647,13 @@ csched2_dump(const struct scheduler *ops)
=20
     printk("Active queues: %d\n"
            "\tdefault-weight     =3D %d\n",
-           cpumask_weight(&prv->active_queues),
+           prv->active_queues,
            CSCHED2_DEFAULT_WEIGHT);
-    for_each_cpu(i, &prv->active_queues)
+    list_for_each_entry ( rqd, &prv->rql, rql )
     {
         s_time_t fraction;
=20
-        fraction =3D (prv->rqd[i].avgload * 100) >> prv->load_precision_sh=
ift;
+        fraction =3D (rqd->avgload * 100) >> prv->load_precision_shift;
=20
         printk("Runqueue %d:\n"
                "\tncpus              =3D %u\n"
@@ -3675,21 +3662,21 @@ csched2_dump(const struct scheduler *ops)
                "\tpick_bias          =3D %u\n"
                "\tinstload           =3D %d\n"
                "\taveload            =3D %"PRI_stime" (~%"PRI_stime"%%)\n",
-               i,
-               prv->rqd[i].nr_cpus,
-               CPUMASK_PR(&prv->rqd[i].active),
-               prv->rqd[i].max_weight,
-               prv->rqd[i].pick_bias,
-               prv->rqd[i].load,
-               prv->rqd[i].avgload,
+               rqd->id,
+               rqd->nr_cpus,
+               CPUMASK_PR(&rqd->active),
+               rqd->max_weight,
+               rqd->pick_bias,
+               rqd->load,
+               rqd->avgload,
                fraction);
=20
         printk("\tidlers: %*pb\n"
                "\ttickled: %*pb\n"
                "\tfully idle cores: %*pb\n",
-               CPUMASK_PR(&prv->rqd[i].idle),
-               CPUMASK_PR(&prv->rqd[i].tickled),
-               CPUMASK_PR(&prv->rqd[i].smt_idle));
+               CPUMASK_PR(&rqd->idle),
+               CPUMASK_PR(&rqd->tickled),
+               CPUMASK_PR(&rqd->smt_idle));
     }
=20
     printk("Domain info:\n");
@@ -3721,16 +3708,15 @@ csched2_dump(const struct scheduler *ops)
         }
     }
=20
-    for_each_cpu(i, &prv->active_queues)
+    list_for_each_entry ( rqd, &prv->rql, rql )
     {
-        struct csched2_runqueue_data *rqd =3D prv->rqd + i;
         struct list_head *iter, *runq =3D &rqd->runq;
         int loop =3D 0;
=20
         /* We need the lock to scan the runqueue. */
         spin_lock(&rqd->lock);
=20
-        printk("Runqueue %d:\n", i);
+        printk("Runqueue %d:\n", rqd->id);
=20
         for_each_cpu(j, &rqd->active)
             dump_pcpu(ops, j);
@@ -3755,20 +3741,28 @@ csched2_dump(const struct scheduler *ops)
 static void *
 csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
+    struct csched2_private *prv =3D csched2_priv(ops);
     struct csched2_pcpu *spc;
+    struct csched2_runqueue_data *rqd;
=20
     spc =3D xzalloc(struct csched2_pcpu);
     if ( spc =3D=3D NULL )
         return ERR_PTR(-ENOMEM);
=20
-    /* Not in any runqueue yet */
-    spc->runq_id =3D -1;
+    rqd =3D cpu_add_to_runqueue(prv, cpu);
+    if ( IS_ERR(rqd) )
+    {
+        xfree(spc);
+        return rqd;
+    }
+
+    spc->rqd =3D rqd;
=20
     return spc;
 }
=20
 /* Returns the ID of the runqueue the cpu is assigned to. */
-static unsigned
+static struct csched2_runqueue_data *
 init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
            unsigned int cpu)
 {
@@ -3778,18 +3772,23 @@ init_pdata(struct csched2_private *prv, struct csch=
ed2_pcpu *spc,
     ASSERT(rw_is_write_locked(&prv->lock));
     ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
     /* CPU data needs to be allocated, but still uninitialized. */
-    ASSERT(spc && spc->runq_id =3D=3D -1);
+    ASSERT(spc);
=20
-    /* Figure out which runqueue to put it in */
-    spc->runq_id =3D cpu_to_runqueue(prv, cpu);
+    rqd =3D spc->rqd;
=20
-    rqd =3D prv->rqd + spc->runq_id;
+    ASSERT(rqd && !cpumask_test_cpu(cpu, &spc->rqd->active));
=20
-    printk(XENLOG_INFO "Adding cpu %d to runqueue %d\n", cpu, spc->runq_id=
);
-    if ( ! cpumask_test_cpu(spc->runq_id, &prv->active_queues) )
+    printk(XENLOG_INFO "Adding cpu %d to runqueue %d\n", cpu, rqd->id);
+    if ( !rqd->nr_cpus )
     {
         printk(XENLOG_INFO " First cpu on runqueue, activating\n");
-        activate_runqueue(prv, spc->runq_id);
+
+        BUG_ON(!cpumask_empty(&rqd->active));
+        rqd->max_weight =3D 1;
+        INIT_LIST_HEAD(&rqd->svc);
+        INIT_LIST_HEAD(&rqd->runq);
+        spin_lock_init(&rqd->lock);
+        prv->active_queues++;
     }
=20
     __cpumask_set_cpu(cpu, &spc->sibling_mask);
@@ -3813,7 +3812,7 @@ init_pdata(struct csched2_private *prv, struct csched=
2_pcpu *spc,
     if ( rqd->nr_cpus =3D=3D 1 )
         rqd->pick_bias =3D cpu;
=20
-    return spc->runq_id;
+    return rqd;
 }
=20
 /* Change the scheduler of cpu to us (Credit2). */
@@ -3823,7 +3822,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsig=
ned int cpu,
 {
     struct csched2_private *prv =3D csched2_priv(new_ops);
     struct csched2_unit *svc =3D vdata;
-    unsigned rqi;
+    struct csched2_runqueue_data *rqd;
=20
     ASSERT(pdata && svc && is_idle_unit(svc->unit));
=20
@@ -3840,7 +3839,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsig=
ned int cpu,
=20
     sched_idle_unit(cpu)->priv =3D vdata;
=20
-    rqi =3D init_pdata(prv, pdata, cpu);
+    rqd =3D init_pdata(prv, pdata, cpu);
=20
     /*
      * Now that we know what runqueue we'll go in, double check what's said
@@ -3848,11 +3847,11 @@ csched2_switch_sched(struct scheduler *new_ops, uns=
igned int cpu,
      * this scheduler, and so it's safe to have taken it /before/ our
      * private global lock.
      */
-    ASSERT(get_sched_res(cpu)->schedule_lock !=3D &prv->rqd[rqi].lock);
+    ASSERT(get_sched_res(cpu)->schedule_lock !=3D &rqd->lock);
=20
     write_unlock(&prv->lock);
=20
-    return &prv->rqd[rqi].lock;
+    return &rqd->lock;
 }
=20
 static void
@@ -3866,10 +3865,6 @@ csched2_deinit_pdata(const struct scheduler *ops, vo=
id *pcpu, int cpu)
=20
     write_lock_irqsave(&prv->lock, flags);
=20
-    /*
-     * alloc_pdata is not implemented, so pcpu must be NULL. On the other
-     * hand, init_pdata must have been called for this pCPU.
-     */
     /*
      * Scheduler specific data for this pCPU must still be there and and be
      * valid. In fact, if we are here:
@@ -3878,20 +3873,21 @@ csched2_deinit_pdata(const struct scheduler *ops, v=
oid *pcpu, int cpu)
      *  2. init_pdata must have been called on this cpu, and deinit_pdata
      *     (us!) must not have been called on it already.
      */
-    ASSERT(spc && spc->runq_id !=3D -1);
+    ASSERT(spc && spc->rqd);
     ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
=20
     /* Find the old runqueue and remove this cpu from it */
-    rqd =3D prv->rqd + spc->runq_id;
+    rqd =3D spc->rqd;
=20
     /* No need to save IRQs here, they're already disabled */
     spin_lock(&rqd->lock);
=20
-    printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, spc->run=
q_id);
+    printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, rqd->id);
=20
     __cpumask_clear_cpu(cpu, &rqd->idle);
     __cpumask_clear_cpu(cpu, &rqd->smt_idle);
     __cpumask_clear_cpu(cpu, &rqd->active);
+    __cpumask_clear_cpu(cpu, &rqd->tickled);
=20
     for_each_cpu ( rcpu, &rqd->active )
         __cpumask_clear_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
@@ -3902,13 +3898,13 @@ csched2_deinit_pdata(const struct scheduler *ops, v=
oid *pcpu, int cpu)
     if ( rqd->nr_cpus =3D=3D 0 )
     {
         printk(XENLOG_INFO " No cpus left on runqueue, disabling\n");
-        deactivate_runqueue(prv, spc->runq_id);
+
+        BUG_ON(!cpumask_empty(&rqd->active));
+        prv->active_queues--;
     }
     else if ( rqd->pick_bias =3D=3D cpu )
         rqd->pick_bias =3D cpumask_first(&rqd->active);
=20
-    spc->runq_id =3D -1;
-
     spin_unlock(&rqd->lock);
=20
     __cpumask_clear_cpu(cpu, &prv->initialized);
@@ -3921,18 +3917,29 @@ csched2_deinit_pdata(const struct scheduler *ops, v=
oid *pcpu, int cpu)
 static void
 csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
+    struct csched2_private *prv =3D csched2_priv(ops);
     struct csched2_pcpu *spc =3D pcpu;
+    struct csched2_runqueue_data *rqd;
+    unsigned long flags;
=20
-    /*
-     * pcpu either points to a valid struct csched2_pcpu, or is NULL (if
-     * CPU bringup failed, and we're beeing called from CPU_UP_CANCELLED).
-     * xfree() does not really mind, but we want to be sure that either
-     * init_pdata has never been called, or deinit_pdata has been called
-     * already.
-     */
-    ASSERT(!pcpu || spc->runq_id =3D=3D -1);
-    ASSERT(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized));
+    if ( !spc )
+        return;
+
+    write_lock_irqsave(&prv->lock, flags);
+
+    rqd =3D spc->rqd;
+    ASSERT(rqd && rqd->refcnt);
+    ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
+
+    rqd->refcnt--;
+    if ( !rqd->refcnt )
+        list_del(&rqd->rql);
+    else
+        rqd =3D NULL;
+
+    write_unlock_irqrestore(&prv->lock, flags);
=20
+    xfree(rqd);
     xfree(pcpu);
 }
=20
@@ -3966,7 +3973,6 @@ csched2_global_init(void)
 static int
 csched2_init(struct scheduler *ops)
 {
-    int i;
     struct csched2_private *prv;
=20
     printk("Initializing Credit2 scheduler\n");
@@ -3999,18 +4005,9 @@ csched2_init(struct scheduler *ops)
     ops->sched_data =3D prv;
=20
     rwlock_init(&prv->lock);
+    INIT_LIST_HEAD(&prv->rql);
     INIT_LIST_HEAD(&prv->sdom);
=20
-    /* Allocate all runqueues and mark them as un-initialized */
-    prv->rqd =3D xzalloc_array(struct csched2_runqueue_data, nr_cpu_ids);
-    if ( !prv->rqd )
-    {
-        xfree(prv);
-        return -ENOMEM;
-    }
-    for ( i =3D 0; i < nr_cpu_ids; i++ )
-        prv->rqd[i].id =3D -1;
-
     /* initialize ratelimit */
     prv->ratelimit_us =3D sched_ratelimit_us;
=20
@@ -4028,8 +4025,6 @@ csched2_deinit(struct scheduler *ops)
=20
     prv =3D csched2_priv(ops);
     ops->sched_data =3D NULL;
-    if ( prv )
-        xfree(prv->rqd);
     xfree(prv);
 }
=20
--=20
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel