From nobody Fri Mar 29 01:59:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org ARC-Seal: i=1; a=rsa-sha256; t=1584576785; cv=none; d=zohomail.com; s=zohoarc; b=SOnLaInyj6wy57XTzFxw3mHTrtY9BgPEVul9IGC1JHCokf/Ftw2aFwft0oJUFdSTQ7AqVg35wlfzEXIWbCilS5EyWWN87okx0oKidzVEaleIlNJzcYNPssQ7f5RMqol3LOLunH3pAEXtdQQ20jhfrczc7SpphAbhymeLZZIfuPs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1584576785; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=9amYgPckB3w2JQTXw3M+n0jJQNEbJVRsTrH+fKaLQt4=; b=iTPxMRuU8/oAm2VHiIZX5qpu1KXQZTOMK+wYkU2HjvcRvmI0m5s55R6/zk4Els5ZXAw0YHwPP5a9AYi2LIFCw+rI4Y6j2pfaqXf1CE3GC03tl/pUoTrjF4pTj8bH6p3qMch5/BHWM3Zi4gbJQcVmzQM9nJ3FkPCR7ZyQDSqJnP4= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1584576785723679.5662272536159; Wed, 18 Mar 2020 17:13:05 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jEin1-0006hX-IJ; Thu, 19 Mar 2020 00:11:59 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jEimz-0006gi-Mq for xen-devel@lists.xenproject.org; Thu, 19 Mar 2020 00:11:57 +0000 Received: from mail-wm1-f66.google.com (unknown [209.85.128.66]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 3e0c1124-6976-11ea-bba2-12813bfff9fa; Thu, 19 Mar 2020 00:11:56 +0000 (UTC) Received: by mail-wm1-f66.google.com with SMTP id c187so175142wme.1 for ; Wed, 18 Mar 2020 17:11:56 -0700 (PDT) Received: from [192.168.0.36] (87.78.186.89.cust.ip.kpnqwest.it. [89.186.78.87]) by smtp.gmail.com with ESMTPSA id k3sm605947wmf.16.2020.03.18.17.11.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Mar 2020 17:11:54 -0700 (PDT) X-Inumbo-ID: 3e0c1124-6976-11ea-bba2-12813bfff9fa X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=y1kRVxDY+nl7SvNgqvrXz449PSZNYDfGzpNCJISLgjk=; b=Ig+2538wIaZo/Z8TkUbwt0Zt7AuiP98l0F+tBK8TRFTsQILMjLbPRX5q6ic44xpa8z QhLopQIgT6Kop/He/SPgrsl3KsgZYRoOfBdOBJ9DlZxJX1Z3ajvrsty/hboiq6KchsMP BxhOyOnDzh70hTQaJBFrxTwOqV74jy/7nfHVNhEnYVMw32RMQum5xj7plge6YJaB4Qq+ 5PzjazEJMNDCom2efZ4njiIpLqlwSPZbhqL0cpTNlBri7hz2XARSbaoY71wjjI8W3Jgk NOqGnqg9fu1y0WvQhvoJNj1hqJNqQrJfoZgXJdfGCItJyFad1m/7AjHE0/gCI3ipRAW/ IqlA== X-Gm-Message-State: ANhLgQ0L2L+K3yOJAmWQN9cmDacCtud/GwnTSOAMsnO+ahxoiEEPgW7P W/TX0hP1bE87XTnFDF0/go0= X-Google-Smtp-Source: ADFU+vtV5NRKGFQMtR74+Ce8BjPV7ZH0A/WuSpC/IdBwpZtCfBu7+QN7fDIvcH/jq9bIREtAhzI7xQ== X-Received: by 2002:a1c:bb86:: with SMTP id l128mr264938wmf.41.1584576715144; Wed, 18 Mar 2020 17:11:55 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 19 Mar 2020 01:11:53 +0100 Message-ID: <158457671301.11355.1086453211878144633.stgit@Palanthas> In-Reply-To: <158457508246.11355.6457403441669388939.stgit@Palanthas> References: <158457508246.11355.6457403441669388939.stgit@Palanthas> User-Agent: StGit/0.21 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v2 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Charles Arnold , Tomas Mozes , Glen , George Dunlap , Jan Beulich , Sarah Newman Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" There have been report of stalls of guest vCPUs, when Credit2 was used. It seemed like these vCPUs were not getting scheduled for very long time, even under light load conditions (e.g., during dom0 boot). Investigations led to the discovery that --although rarely-- it can happen that a vCPU manages to run for very long timeslices. In Credit2, this means that, when runtime accounting happens, the vCPU will lose a large quantity of credits. This in turn may lead to the vCPU having less credits than the idle vCPUs (-2^30). At this point, the scheduler will pick the idle vCPU, instead of the ready to run vCPU, for a few "epochs", which often times is enough for the guest kernel to think the vCPU is not responding and crashing. An example of this situation is shown here. In fact, we can see d0v1 sitting in the runqueue while all the CPUs are idle, as it has -1254238270 credits, which is smaller than -2^30 =3D =E2=88=921073741824: (XEN) Runqueue 0: (XEN) ncpus =3D 28 (XEN) cpus =3D 0-27 (XEN) max_weight =3D 256 (XEN) pick_bias =3D 22 (XEN) instload =3D 1 (XEN) aveload =3D 293391 (~111%) (XEN) idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff (XEN) tickled: 00,00000000,00000000,00000000,00000000,00000000,000000= 00 (XEN) fully idle cores: 00,00000000,00000000,00000000,00000000,000000= 00,0fffffff [...] (XEN) Runqueue 0: (XEN) CPU[00] runq=3D0, sibling=3D00,..., core=3D00,... (XEN) CPU[01] runq=3D0, sibling=3D00,..., core=3D00,... [...] (XEN) CPU[26] runq=3D0, sibling=3D00,..., core=3D00,... (XEN) CPU[27] runq=3D0, sibling=3D00,..., core=3D00,... (XEN) RUNQ: (XEN) 0: [0.1] flags=3D0 cpu=3D5 credit=3D-1254238270 [w=3D256] loa= d=3D262144 (~100%) We certainly don't want, under any circumstance, this to happen. Let's, therefore, define a minimum amount of credits a vCPU can have. During accounting, we make sure that, for however long the vCPU has run, it will never get to have less than such minimum amount of credits. Then, we set the credits of the idle vCPU to an even smaller value. NOTE: investigations have been done about _how_ it is possible for a vCPU to execute for so much time that its credits becomes so low. While still not completely clear, there are evidence that: - it only happens very rarely, - it appears to be both machine and workload specific, - it does not look to be a Credit2 (e.g., as it happens when running with Credit1 as well) issue, or a scheduler issue. This patch makes Credit2 more robust to events like this, whatever the cause is, and should hence be backported (as far as possible). Reported-by: Glen Reported-by: Tomas Mozes Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap --- Cc: George Dunlap Cc: Juergen Gross Cc: Jan Beulich Cc: Charles Arnold Cc: Sarah Newman --- Changes from v1: - different approach. Instead than using INT_MIN for idle vCPUs' credits, limit the minimum number of credits regular vCPUs can have. --- I will provide the backports myself, at least for 4.13 and 4.12.x (and feel free to ask for more). --- For Sarah, looking back at the various threads, I am not quite sure whether you also experienced the issue and reported it. If yes, I'm happy to add a "Reported-by:" line about you too (or, if this is fine to go in, for this to be done while committing, if possible). --- tools/xentrace/formats | 2 +- tools/xentrace/xenalyze.c | 5 ++-- xen/common/sched/credit2.c | 53 +++++++++++++++++++++++-----------------= ---- 3 files changed, 31 insertions(+), 29 deletions(-) diff --git a/tools/xentrace/formats b/tools/xentrace/formats index d6e7e3f800..8f126f65f1 100644 --- a/tools/xentrace/formats +++ b/tools/xentrace/formats @@ -55,7 +55,7 @@ 0x00022204 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:credit_add 0x00022205 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:tickle_check [ d= om:vcpu =3D 0x%(1)08x, credit =3D %(2)d, score =3D %(3)d ] 0x00022206 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:tickle [ c= pu =3D %(1)d ] -0x00022207 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:credit_reset [ d= om:vcpu =3D 0x%(1)08x, cr_start =3D %(2)d, cr_end =3D %(3)d, mult =3D %(4)d= ] +0x00022207 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:credit_reset [ d= om:vcpu =3D 0x%(1)08x, cr_start =3D %(2)d, cr_end =3D %(3)d ] 0x00022208 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:sched_tasklet 0x00022209 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:update_load 0x0002220a CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:runq_assign [ d= om:vcpu =3D 0x%(1)08x, rq_id =3D %(2)d ] diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c index aa894673ad..d3c8368e9d 100644 --- a/tools/xentrace/xenalyze.c +++ b/tools/xentrace/xenalyze.c @@ -7718,13 +7718,12 @@ void sched_process(struct pcpu_info *p) struct { unsigned int vcpuid:16, domid:16; int credit_start, credit_end; - unsigned int multiplier; } *r =3D (typeof(r))ri->d; =20 printf(" %s csched2:reset_credits d%uv%u, " - "credit_start =3D %d, credit_end =3D %d, mult =3D %= u\n", + "credit_start =3D %d, credit_end =3D %d\n", ri->dump_header, r->domid, r->vcpuid, - r->credit_start, r->credit_end, r->multiplier); + r->credit_start, r->credit_end); } break; case TRC_SCHED_CLASS_EVT(CSCHED2, 8): /* SCHED_TASKLET */ diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c index c7241944a8..cbf9ce2b97 100644 --- a/xen/common/sched/credit2.c +++ b/xen/common/sched/credit2.c @@ -229,12 +229,22 @@ * before a reset event. */ #define CSCHED2_CREDIT_INIT MILLISECS(10) +/* + * Minimum amount of credits VMs can have. Ideally, no VM would get + * close to this (unless a vCPU manages to execute for really long + * time uninterrupted). In case it happens, it makes no sense to + * track even deeper undershoots. + * + * NOTE: If making this smaller than -CSCHED2_CREDIT_INIT, adjust + * reset_credit() accordingly. + */ +#define CSCHED2_CREDIT_MIN (-CSCHED2_CREDIT_INIT) /* * Amount of credit the idle units have. It never changes, as idle * units does not consume credits, and it must be lower than whatever * amount of credit 'regular' unit would end up with. */ -#define CSCHED2_IDLE_CREDIT (-(1U<<30)) +#define CSCHED2_IDLE_CREDIT (CSCHED2_CREDIT_MIN-1) /* * Carryover: How much "extra" credit may be carried over after * a reset. @@ -781,10 +791,15 @@ static int get_fallback_cpu(struct csched2_unit *svc) static void t2c_update(const struct csched2_runqueue_data *rqd, s_time_t t= ime, struct csched2_unit *svc) { - uint64_t val =3D time * rqd->max_weight + svc->residual; + int64_t val =3D time * rqd->max_weight + svc->residual; =20 svc->residual =3D do_div(val, svc->weight); - svc->credit -=3D val; + /* Getting to lower credit than CSCHED2_CREDIT_MIN makes no sense. */ + val =3D svc->credit - val; + if ( unlikely(val < CSCHED2_CREDIT_MIN) ) + svc->credit =3D CSCHED2_CREDIT_MIN; + else + svc->credit =3D val; } =20 static s_time_t c2t(const struct csched2_runqueue_data *rqd, s_time_t cred= it, @@ -1616,28 +1631,25 @@ static void reset_credit(int cpu, s_time_t now, str= uct csched2_unit *snext) { struct csched2_runqueue_data *rqd =3D c2rqd(cpu); struct list_head *iter; - int m; + int reset =3D CSCHED2_CREDIT_INIT; =20 /* * Under normal circumstances, snext->credit should never be less * than -CSCHED2_MIN_TIMER. However, under some circumstances, an * unit with low credits may be allowed to run long enough that - * its credits are actually less than -CSCHED2_CREDIT_INIT. + * its credits are actually much lower than that. * (Instances have been observed, for example, where an unit with * 200us of credit was allowed to run for 11ms, giving it -10.8ms * of credit. Thus it was still negative even after the reset.) * * If this is the case for snext, we simply want to keep moving - * everyone up until it is in the black again. This fair because - * none of the other units want to run at the moment. - * - * Rather than looping, however, we just calculate a multiplier, - * avoiding an integer division and multiplication in the common - * case. + * everyone up until it is in the black again. This means that, + * since CSCHED2_CREDIT_MIN is -CSCHED2_CREDIT_INIT, we need to + * actually add 2*CSCHED2_CREDIT_INIT. */ - m =3D 1; - if ( snext->credit < -CSCHED2_CREDIT_INIT ) - m +=3D (-snext->credit) / CSCHED2_CREDIT_INIT; + ASSERT(snext->credit >=3D CSCHED2_CREDIT_MIN); + if ( unlikely(snext->credit =3D=3D CSCHED2_CREDIT_MIN) ) + reset +=3D CSCHED2_CREDIT_INIT; =20 list_for_each( iter, &rqd->svc ) { @@ -1668,15 +1680,7 @@ static void reset_credit(int cpu, s_time_t now, stru= ct csched2_unit *snext) } =20 start_credit =3D svc->credit; - - /* - * Add INIT * m, avoiding integer multiplication in the common cas= e. - */ - if ( likely(m=3D=3D1) ) - svc->credit +=3D CSCHED2_CREDIT_INIT; - else - svc->credit +=3D m * CSCHED2_CREDIT_INIT; - + svc->credit +=3D reset; /* "Clip" credits to max carryover */ if ( svc->credit > CSCHED2_CREDIT_INIT + CSCHED2_CARRYOVER_MAX ) svc->credit =3D CSCHED2_CREDIT_INIT + CSCHED2_CARRYOVER_MAX; @@ -1688,19 +1692,18 @@ static void reset_credit(int cpu, s_time_t now, str= uct csched2_unit *snext) struct { unsigned unit:16, dom:16; int credit_start, credit_end; - unsigned multiplier; } d; d.dom =3D svc->unit->domain->domain_id; d.unit =3D svc->unit->unit_id; d.credit_start =3D start_credit; d.credit_end =3D svc->credit; - d.multiplier =3D m; __trace_var(TRC_CSCHED2_CREDIT_RESET, 1, sizeof(d), (unsigned char *)&d); } } =20 + ASSERT(snext->credit > 0); SCHED_STAT_CRANK(credit_reset); =20 /* No need to resort runqueue, as everyone's order should be the same.= */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel From nobody Fri Mar 29 01:59:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org ARC-Seal: i=1; a=rsa-sha256; t=1584576784; cv=none; d=zohomail.com; s=zohoarc; b=bGnHkk9mvh9pkUYWIlpSBqaVKw+/eSMg3Rn1zi9asT21ct6+ns5y0gBEjnicSMFUmR9YQ6KInGg8bPT0pn5op+nJZjCl5uLOO2wvl2zIPcZGnRq8ZIYBSPDyOMIxIkjdoWrsjN1L73lZVghJtEoC/mdwPoyk8X5Qfd/Ux0INPYM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1584576784; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=pJUF5qZCzHhyicQKF7rxXVhjh/4d8AVARKIVdG3Q8bU=; b=Z93wmewAKagRsMolgRescxOzkt2JS54F4TeIgONlE8q/PN8FXQSjs3bGje6gbxeHiiN0RO5AjivVkyUFSM1pfb72af8U/z+yqfbi0OXouKIJdUfHQXCyAJTva2Iq2+mLP3kMh1Iatv4g6yT8B0Fn1a7I2oVYN0+EYpwGmwoo1SM= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1584576784452583.6417616493641; Wed, 18 Mar 2020 17:13:04 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jEin6-0006iN-RM; Thu, 19 Mar 2020 00:12:04 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jEin5-0006iA-Kh for xen-devel@lists.xenproject.org; Thu, 19 Mar 2020 00:12:03 +0000 Received: from mail-wr1-f66.google.com (unknown [209.85.221.66]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 420207d4-6976-11ea-a6c1-bc764e2007e4; Thu, 19 Mar 2020 00:12:02 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id v11so598594wrm.9 for ; Wed, 18 Mar 2020 17:12:02 -0700 (PDT) Received: from [192.168.0.35] (87.78.186.89.cust.ip.kpnqwest.it. [89.186.78.87]) by smtp.gmail.com with ESMTPSA id 9sm568176wmx.32.2020.03.18.17.12.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Mar 2020 17:12:01 -0700 (PDT) X-Inumbo-ID: 420207d4-6976-11ea-a6c1-bc764e2007e4 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=6LVaQZsild5ol32UwnzGQmPz19U6z3oNdRtfhkD/Gzs=; b=Oh2hgINdkV9Gtpf2QNp56K77BBr8bJJYw5GL0xVUmWkOCiTZMZ82o2GLSn5eVlv3u+ NjZoKSYGdWiwRAzLzwJCWoCXbw1lBdYPULypWqgoHK8GibLqDZ0EsEVd6iKfTOTnP6lw YNJcss+y/tSC66TqZsfD83JX+MEwGSKuulBGLev3pC5mg7COWrkU7nJDhzmazZL0TVi4 m/urrkau3Y0Ql4wsk9+BfBIhAWjLsbA2k/OJqfHQBVzDz7oxHBej5JRbEErLKClwcigy HhU7P/HT+aGcjMCeaLXR5EipFE59RtOLgbEzeVrb19JmjrBmJlqEdMXaaAfVANDTqGwR JggQ== X-Gm-Message-State: ANhLgQ2j9qEmA3NDk90BL7R9mcPmeBLe67XGUd08xtM2aI38lxpZc1ig rlSnQfPIPVS3sAuguMfiIkZLcC2QDOQ= X-Google-Smtp-Source: ADFU+vuVyNS/z7cEzbfe2PNFKvWjw3T9atmm6oWG989S58iPrh9LWyJ2gLKZFxXvciZxYVtKeAXvvA== X-Received: by 2002:a5d:4c87:: with SMTP id z7mr538933wrs.39.1584576721826; Wed, 18 Mar 2020 17:12:01 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 19 Mar 2020 01:12:00 +0100 Message-ID: <158457672023.11355.16720240521867328301.stgit@Palanthas> In-Reply-To: <158457508246.11355.6457403441669388939.stgit@Palanthas> References: <158457508246.11355.6457403441669388939.stgit@Palanthas> User-Agent: StGit/0.21 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v2 2/2] xen: credit2: fix credit reset happening too few times X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Charles Arnold , Jan Beulich , Glen , George Dunlap , Tomas Mozes , Sarah Newman Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" There is a bug in commit 5e4b4199667b9 ("xen: credit2: only reset credit on reset condition"). In fact, the aim of that commit was to make sure that we do not perform too many credit reset operations (which are not super cheap, and in an hot-path). But the check used to determine whether a reset is necessary was the wrong one. In fact, knowing just that some vCPUs have been skipped, while traversing the runqueue (in runq_candidate()), is not enough. We need to check explicitly whether the first vCPU in the runqueue has a negative amount of credit. Since a trace record is changed, this patch updates xentrace format file and xenalyze as well This should be backported. Signed-off-by: Dario Faggioli Acked-by: George Dunlap --- Cc: George Dunlap Cc: Juergen Gross Cc: Jan Beulich Cc: Charles Arnold Cc: Glen Cc: Tomas Mozes Cc: Sarah Newman --- About the Credit2 stall issue reported recently, and mentioned in patch 1 of this series. This second patch, alone, was already mitigating the issue quite substantially. Still, the proper fix for the issue itself is patch 1, while this is a fix for a bug in the code, introduced with a previous change, which happens to help to cure the sympthoms of the problem at hand. --- tools/xentrace/formats | 2 +- tools/xentrace/xenalyze.c | 8 +++----- xen/common/sched/credit2.c | 30 +++++++++++++----------------- 3 files changed, 17 insertions(+), 23 deletions(-) diff --git a/tools/xentrace/formats b/tools/xentrace/formats index 8f126f65f1..deac4d8598 100644 --- a/tools/xentrace/formats +++ b/tools/xentrace/formats @@ -67,7 +67,7 @@ 0x00022210 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:load_check [ l= rq_id[16]:orq_id[16] =3D 0x%(1)08x, delta =3D %(2)d ] 0x00022211 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:load_balance [ l= _bavgload =3D 0x%(2)08x%(1)08x, o_bavgload =3D 0x%(4)08x%(3)08x, lrq_id[16]= :orq_id[16] =3D 0x%(5)08x ] 0x00022212 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:pick_cpu [ b= _avgload =3D 0x%(2)08x%(1)08x, dom:vcpu =3D 0x%(3)08x, rq_id[16]:new_cpu[16= ] =3D %(4)d ] -0x00022213 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:runq_candidate [ d= om:vcpu =3D 0x%(1)08x, credit =3D %(4)d, skipped_vcpus =3D %(3)d, tickled_c= pu =3D %(2)d ] +0x00022213 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:runq_candidate [ d= om:vcpu =3D 0x%(1)08x, credit =3D %(3)d, tickled_cpu =3D %(2)d ] 0x00022214 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:schedule [ r= q:cpu =3D 0x%(1)08x, tasklet[8]:idle[8]:smt_idle[8]:tickled[8] =3D %(2)08x ] 0x00022215 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:ratelimit [ d= om:vcpu =3D 0x%(1)08x, runtime =3D %(2)d ] 0x00022216 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:runq_cand_chk [ d= om:vcpu =3D 0x%(1)08x ] diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c index d3c8368e9d..b7f4e2bea8 100644 --- a/tools/xentrace/xenalyze.c +++ b/tools/xentrace/xenalyze.c @@ -7855,14 +7855,12 @@ void sched_process(struct pcpu_info *p) if (opt.dump_all) { struct { unsigned vcpuid:16, domid:16; - unsigned tickled_cpu, skipped; + unsigned tickled_cpu; int credit; } *r =3D (typeof(r))ri->d; =20 - printf(" %s csched2:runq_candidate d%uv%u, credit =3D %d, " - "%u vcpus skipped, ", - ri->dump_header, r->domid, r->vcpuid, - r->credit, r->skipped); + printf(" %s csched2:runq_candidate d%uv%u, credit =3D %d, = ", + ri->dump_header, r->domid, r->vcpuid, r->credit); if (r->tickled_cpu =3D=3D (unsigned)-1) printf("no cpu was tickled\n"); else diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c index cbf9ce2b97..34f05c3e2a 100644 --- a/xen/common/sched/credit2.c +++ b/xen/common/sched/credit2.c @@ -3224,8 +3224,7 @@ csched2_runtime(const struct scheduler *ops, int cpu, static struct csched2_unit * runq_candidate(struct csched2_runqueue_data *rqd, struct csched2_unit *scurr, - int cpu, s_time_t now, - unsigned int *skipped) + int cpu, s_time_t now) { struct list_head *iter, *temp; const struct sched_resource *sr =3D get_sched_res(cpu); @@ -3233,8 +3232,6 @@ runq_candidate(struct csched2_runqueue_data *rqd, struct csched2_private *prv =3D csched2_priv(sr->scheduler); bool yield =3D false, soft_aff_preempt =3D false; =20 - *skipped =3D 0; - if ( unlikely(is_idle_unit(scurr->unit)) ) { snext =3D scurr; @@ -3328,12 +3325,9 @@ runq_candidate(struct csched2_runqueue_data *rqd, (unsigned char *)&d); } =20 - /* Only consider units that are allowed to run on this processor. = */ + /* Only consider vcpus that are allowed to run on this processor. = */ if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) ) - { - (*skipped)++; continue; - } =20 /* * If an unit is meant to be picked up by another processor, and s= uch @@ -3342,7 +3336,6 @@ runq_candidate(struct csched2_runqueue_data *rqd, if ( svc->tickled_cpu !=3D -1 && svc->tickled_cpu !=3D cpu && cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) ) { - (*skipped)++; SCHED_STAT_CRANK(deferred_to_tickled_cpu); continue; } @@ -3354,7 +3347,6 @@ runq_candidate(struct csched2_runqueue_data *rqd, if ( sched_unit_master(svc->unit) !=3D cpu && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit ) { - (*skipped)++; SCHED_STAT_CRANK(migrate_resisted); continue; } @@ -3378,14 +3370,13 @@ runq_candidate(struct csched2_runqueue_data *rqd, { struct { unsigned unit:16, dom:16; - unsigned tickled_cpu, skipped; + unsigned tickled_cpu; int credit; } d; d.dom =3D snext->unit->domain->domain_id; d.unit =3D snext->unit->unit_id; d.credit =3D snext->credit; d.tickled_cpu =3D snext->tickled_cpu; - d.skipped =3D *skipped; __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1, sizeof(d), (unsigned char *)&d); @@ -3417,7 +3408,6 @@ static void csched2_schedule( struct csched2_runqueue_data *rqd; struct csched2_unit * const scurr =3D csched2_unit(currunit); struct csched2_unit *snext =3D NULL; - unsigned int skipped_units =3D 0; bool tickled; bool migrated =3D false; =20 @@ -3495,7 +3485,7 @@ static void csched2_schedule( snext =3D csched2_unit(sched_idle_unit(sched_cpu)); } else - snext =3D runq_candidate(rqd, scurr, sched_cpu, now, &skipped_unit= s); + snext =3D runq_candidate(rqd, scurr, sched_cpu, now); =20 /* If switching from a non-idle runnable unit, put it * back on the runqueue. */ @@ -3507,6 +3497,8 @@ static void csched2_schedule( /* Accounting for non-idle tasks */ if ( !is_idle_unit(snext->unit) ) { + int top_credit; + /* If switching, remove this from the runqueue and mark it schedul= ed */ if ( snext !=3D scurr ) { @@ -3534,11 +3526,15 @@ static void csched2_schedule( * 2) no other unit with higher credits wants to run. * * Here, where we want to check for reset, we need to make sure the - * proper unit is being used. In fact, runqueue_candidate() may ha= ve - * not returned the first unit in the runqueue, for various reasons + * proper unit is being used. In fact, runq_candidate() may have n= ot + * returned the first unit in the runqueue, for various reasons * (e.g., affinity). Only trigger a reset when it does. */ - if ( skipped_units =3D=3D 0 && snext->credit <=3D CSCHED2_CREDIT_R= ESET ) + if ( list_empty(&rqd->runq) ) + top_credit =3D snext->credit; + else + top_credit =3D max(snext->credit, runq_elem(rqd->runq.next)->c= redit); + if ( top_credit <=3D CSCHED2_CREDIT_RESET ) { reset_credit(sched_cpu, now, snext); balance_load(ops, sched_cpu, now); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel