From nobody Mon Feb 9 09:33:15 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1500521671089913.2430033996163; Wed, 19 Jul 2017 20:34:31 -0700 (PDT) Received: from localhost ([::1]:35906 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dY2EP-0007ia-7G for importer@patchew.org; Wed, 19 Jul 2017 23:34:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60411) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dY1qT-0001Af-Ld for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dY1qN-0005wA-3S for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:45 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:41411) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dY1qM-0005ue-NF for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:38 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 5747820B77; Wed, 19 Jul 2017 23:09:38 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute4.internal (MEProxy); Wed, 19 Jul 2017 23:09:38 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 169822418A; Wed, 19 Jul 2017 23:09:38 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=kCx LoMwLWZbYv4XNvIWzY0frNjMS6RkimQOCi9a9cT4=; b=PyLFpBoiNbnEs8q9UIv EXeQRbCDFxngTSdqWUWu09GhQGCK/QrKUbaduAUXQeIDsSbZspTAS0lZeRw8Dd2m 0ukPdqgWjK12uDZBpOP4b8T9/38nCpyVAPMi56Qxpm0IkVkq1wbaKVdo4rkjNI1B XH+jaj8Wf0DRmqhF7VitI4UA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=kCxLoMwLWZbYv4XNvIWzY0frNjMS6RkimQOCi9a9c T4=; b=g2WDa2QPJaUSuCNzMT7LmQvhHTJ1fQ77oF+LppPn+zpvBLodE02MQv870 u2y1tllrPSQ4BzB+XP8xcfQPFO264Ezxi8UQbC/tpi9RQXRqLadfRF5YikW1Mlj4 iKmRrUlwnVadSHc5NbFpK0OXz5uav4VY4v+p8L/4jRGlUBmdiFOluSasHRlCUWD+ i3sKVfVeh+sIaoPbL0l4HDXPnPmChWVso18maOZhR8FWIFTYN8tS8GKXMiIiYJop K3MEjLC8VwaG+glkE4efxldVF8zN6jNLT9KbNULDvYNwwUDfoK9mmsyqjqJe3w5j VwJg1wCvLZy6IHBKJAFYbWRP8vBjg== X-ME-Sender: X-Sasl-enc: bSG7ixaPuejl+igVLxjsOX5fqf86U+79CQiXx1QVSFei 1500520178 From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Wed, 19 Jul 2017 23:09:23 -0400 Message-Id: <1500520169-23367-38-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1500520169-23367-1-git-send-email-cota@braap.org> References: <1500520169-23367-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.25 Subject: [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- tcg/tcg.h | 38 +++++++++------- accel/tcg/translate-all.c | 23 +++++----- tcg/tcg.c | 110 ++++++++++++++++++++++++++++++++++++++----= ---- 3 files changed, 126 insertions(+), 45 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index f83f9b0..3611141 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -641,6 +641,26 @@ QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14)); /* Make sure that we don't overflow 64 bits without noticing. */ QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8); =20 +typedef struct TCGProfile { + int64_t tb_count1; + int64_t tb_count; + int64_t op_count; /* total insn count */ + int op_count_max; /* max insn per TB */ + int64_t temp_count; + int temp_count_max; + int64_t del_op_count; + int64_t code_in_len; + int64_t code_out_len; + int64_t search_out_len; + int64_t interm_time; + int64_t code_time; + int64_t la_time; + int64_t opt_time; + int64_t restore_count; + int64_t restore_time; + int64_t table_op_count[NB_OPS]; +} TCGProfile; + struct TCGContext { uint8_t *pool_cur, *pool_end; TCGPool *pool_first, *pool_current, *pool_first_large; @@ -665,23 +685,7 @@ struct TCGContext { tcg_insn_unit *code_ptr; =20 #ifdef CONFIG_PROFILER - /* profiling info */ - int64_t tb_count1; - int64_t tb_count; - int64_t op_count; /* total insn count */ - int op_count_max; /* max insn per TB */ - int64_t temp_count; - int temp_count_max; - int64_t del_op_count; - int64_t code_in_len; - int64_t code_out_len; - int64_t search_out_len; - int64_t interm_time; - int64_t code_time; - int64_t la_time; - int64_t opt_time; - int64_t restore_count; - int64_t restore_time; + TCGProfile prof; #endif =20 #ifdef CONFIG_DEBUG_TCG diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index e6ee4e3..36b17ac 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -312,6 +312,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, Tra= nslationBlock *tb, uint8_t *p =3D tb->tc.search; int i, j, num_insns =3D tb->icount; #ifdef CONFIG_PROFILER + TCGProfile *prof =3D &tcg_ctx->prof; int64_t ti =3D profile_getclock(); #endif =20 @@ -346,8 +347,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, Tra= nslationBlock *tb, restore_state_to_opc(env, tb, data); =20 #ifdef CONFIG_PROFILER - tcg_ctx->restore_time +=3D profile_getclock() - ti; - tcg_ctx->restore_count++; + atomic_set(&prof->restore_time, + prof->restore_time + profile_getclock() - ti); + atomic_set(&prof->restore_count, prof->restore_count + 1); #endif return 0; } @@ -1302,6 +1304,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_insn_unit *gen_code_buf; int gen_code_size, search_size; #ifdef CONFIG_PROFILER + TCGProfile *prof =3D &tcg_ctx->prof; int64_t ti; #endif assert_memory_lock(); @@ -1332,8 +1335,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_ctx->cf_parallel =3D !!(cflags & CF_PARALLEL); =20 #ifdef CONFIG_PROFILER - tcg_ctx->tb_count1++; /* includes aborted translations because of - exceptions */ + /* includes aborted translations because of exceptions */ + atomic_set(&prof->tb_count1, prof->tb_count1 + 1); ti =3D profile_getclock(); #endif =20 @@ -1358,8 +1361,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, #endif =20 #ifdef CONFIG_PROFILER - tcg_ctx->tb_count++; - tcg_ctx->interm_time +=3D profile_getclock() - ti; + atomic_set(&prof->tb_count, prof->tb_count + 1); + atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() = - ti); ti =3D profile_getclock(); #endif =20 @@ -1379,10 +1382,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tb->tc.size =3D gen_code_size; =20 #ifdef CONFIG_PROFILER - tcg_ctx->code_time +=3D profile_getclock() - ti; - tcg_ctx->code_in_len +=3D tb->size; - tcg_ctx->code_out_len +=3D gen_code_size; - tcg_ctx->search_out_len +=3D search_size; + atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti= ); + atomic_set(&prof->code_in_len, prof->code_in_len + tb->size); + atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size); + atomic_set(&prof->search_out_len, prof->search_out_len + search_size); #endif =20 #ifdef DEBUG_DISAS diff --git a/tcg/tcg.c b/tcg/tcg.c index 2217314..0ddd0dc 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1344,7 +1344,7 @@ void tcg_op_remove(TCGContext *s, TCGOp *op) memset(op, 0, sizeof(*op)); =20 #ifdef CONFIG_PROFILER - s->del_op_count++; + atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1); #endif } =20 @@ -2515,15 +2515,79 @@ static void tcg_reg_alloc_call(TCGContext *s, int n= b_oargs, int nb_iargs, =20 #ifdef CONFIG_PROFILER =20 -static int64_t tcg_table_op_count[NB_OPS]; +/* avoid copy/paste errors */ +#define PROF_ADD(to, from, field) \ + do { \ + (to)->field +=3D atomic_read(&((from)->field)); \ + } while (0) + +#define PROF_MAX(to, from, field) \ + do { \ + typeof((from)->field) val__ =3D atomic_read(&((from)->field)); \ + if (val__ > (to)->field) { \ + (to)->field =3D val__; \ + } \ + } while (0) + +/* Pass in a zero'ed @prof */ +static inline +void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table) +{ + unsigned int i; + + for (i =3D 0; i < n_tcg_ctxs; i++) { + const TCGProfile *orig =3D &tcg_ctxs[i]->prof; + + if (counters) { + PROF_ADD(prof, orig, tb_count1); + PROF_ADD(prof, orig, tb_count); + PROF_ADD(prof, orig, op_count); + PROF_MAX(prof, orig, op_count_max); + PROF_ADD(prof, orig, temp_count); + PROF_MAX(prof, orig, temp_count_max); + PROF_ADD(prof, orig, del_op_count); + PROF_ADD(prof, orig, code_in_len); + PROF_ADD(prof, orig, code_out_len); + PROF_ADD(prof, orig, search_out_len); + PROF_ADD(prof, orig, interm_time); + PROF_ADD(prof, orig, code_time); + PROF_ADD(prof, orig, la_time); + PROF_ADD(prof, orig, opt_time); + PROF_ADD(prof, orig, restore_count); + PROF_ADD(prof, orig, restore_time); + } + if (table) { + int i; + + for (i =3D 0; i < NB_OPS; i++) { + PROF_ADD(prof, orig, table_op_count[i]); + } + } + } +} + +#undef PROF_ADD +#undef PROF_MAX + +static void tcg_profile_snapshot_counters(TCGProfile *prof) +{ + tcg_profile_snapshot(prof, true, false); +} + +static void tcg_profile_snapshot_table(TCGProfile *prof) +{ + tcg_profile_snapshot(prof, false, true); +} =20 void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf) { + TCGProfile prof =3D {}; int i; =20 + tcg_profile_snapshot_table(&prof); for (i =3D 0; i < NB_OPS; i++) { cpu_fprintf(f, "%s %" PRId64 "\n", tcg_op_defs[i].name, - tcg_table_op_count[i]); + prof.table_op_count[i]); } } #else @@ -2536,6 +2600,9 @@ void tcg_dump_op_count(FILE *f, fprintf_function cpu_= fprintf) =20 int tcg_gen_code(TCGContext *s, TranslationBlock *tb) { +#ifdef CONFIG_PROFILER + TCGProfile *prof =3D &s->prof; +#endif int i, oi, oi_next, num_insns; =20 #ifdef CONFIG_PROFILER @@ -2543,15 +2610,15 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *t= b) int n; =20 n =3D s->gen_op_buf[0].prev + 1; - s->op_count +=3D n; - if (n > s->op_count_max) { - s->op_count_max =3D n; + atomic_set(&prof->op_count, prof->op_count + n); + if (n > prof->op_count_max) { + atomic_set(&prof->op_count_max, n); } =20 n =3D s->nb_temps; - s->temp_count +=3D n; - if (n > s->temp_count_max) { - s->temp_count_max =3D n; + atomic_set(&prof->temp_count, prof->temp_count + n); + if (n > prof->temp_count_max) { + atomic_set(&prof->temp_count_max, n); } } #endif @@ -2568,7 +2635,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) #endif =20 #ifdef CONFIG_PROFILER - s->opt_time -=3D profile_getclock(); + atomic_set(&prof->opt_time, prof->opt_time - profile_getclock()); #endif =20 #ifdef USE_TCG_OPTIMIZATIONS @@ -2576,8 +2643,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) #endif =20 #ifdef CONFIG_PROFILER - s->opt_time +=3D profile_getclock(); - s->la_time -=3D profile_getclock(); + atomic_set(&prof->opt_time, prof->opt_time + profile_getclock()); + atomic_set(&prof->la_time, prof->la_time - profile_getclock()); #endif =20 { @@ -2605,7 +2672,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) } =20 #ifdef CONFIG_PROFILER - s->la_time +=3D profile_getclock(); + atomic_set(&prof->la_time, prof->la_time + profile_getclock()); #endif =20 #ifdef DEBUG_DISAS @@ -2636,7 +2703,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) =20 oi_next =3D op->next; #ifdef CONFIG_PROFILER - tcg_table_op_count[opc]++; + atomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] += 1); #endif =20 switch (opc) { @@ -2712,10 +2779,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *t= b) #ifdef CONFIG_PROFILER void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf) { - TCGContext *s =3D tcg_ctx; - int64_t tb_count =3D s->tb_count; - int64_t tb_div_count =3D tb_count ? tb_count : 1; - int64_t tot =3D s->interm_time + s->code_time; + TCGProfile prof =3D {}; + const TCGProfile *s; + int64_t tb_count; + int64_t tb_div_count; + int64_t tot; + + tcg_profile_snapshot_counters(&prof); + s =3D &prof; + tb_count =3D s->tb_count; + tb_div_count =3D tb_count ? tb_count : 1; + tot =3D s->interm_time + s->code_time; =20 cpu_fprintf(f, "JIT cycles %" PRId64 " (%0.3f s at 2.4 GHz)\n= ", tot, tot / 2.4e9); --=20 2.7.4