From nobody Mon Feb 9 14:50:43 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 150023729973570.34547912360324; Sun, 16 Jul 2017 13:34:59 -0700 (PDT) Received: from localhost ([::1]:46834 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dWqFm-0000yz-7T for importer@patchew.org; Sun, 16 Jul 2017 16:34:58 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46428) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dWpmg-0001EJ-VO for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:05:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dWpmZ-0008R8-Ru for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:04:54 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:46747) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dWpmZ-0008QJ-Jb for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:04:47 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 3618920A3C; Sun, 16 Jul 2017 16:04:47 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute4.internal (MEProxy); Sun, 16 Jul 2017 16:04:47 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id EE40124606; Sun, 16 Jul 2017 16:04:46 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=hAJ KmjYdq3rIkx8R3r/4C3ftFuNBYF7Whdd9/7PfxDQ=; b=tR70ZeGJsdsd+pZipFD bKZitNcBA/jHX9Y3YiiWy1Zn9r26O5trED6wmluCOVN2fzeZIgMu0qJnYH7oxMZY a0bZfQoysO/oUWFHdnRpmtlxBqq8f04Y9t87M7ZByHPwywvvVuYN6Lk1GalfG5J9 ZbiDh8P/IL6I2fmVkvnjFs5U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=hAJKmjYdq3rIkx8R3r/4C3ftFuNBYF7Whdd9/7Pfx DQ=; b=sgRZCQxCJ3vmymTlovkSLNpfceoLjDXjaEB9mYComHotJvb0qyMjLUktq UCIWQu7bkw7qMYZhqV0YLC1XIYZPyMbKXQgfbvayrrRl/6YRspJUJVZ1temKOtkO d9NGfY97LlUbBkPkcGpp+ie9dmEqR9eqff+0kIsquY2wnXmPt1E4XKaDjUah6Kjv 4zKvKCKnHDMHWyuFEbDTA0n/E9PD3+uAxKWdLZc1WpTt/U9zO6sdybUeSseF8rhG 2H+95uM8qsj1+RsAA0qzGvNCDgBcGPZVXOSf5W7boIfTGVYiLXj6g3cYVrkyonIN dQFUvPZXjBuZsyVlWsbu8WthGZ/Fg== X-ME-Sender: X-Sasl-enc: svz0emxzh3AE9bCDaoP/TDVTY8PUDosMy0lPcBiHQ6PD 1500235486 From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Sun, 16 Jul 2017 16:04:21 -0400 Message-Id: <1500235468-15341-39-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1500235468-15341-1-git-send-email-cota@braap.org> References: <1500235468-15341-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.25 Subject: [Qemu-devel] [PATCH v2 38/45] tcg: distribute profiling counters across TCGContext's X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Signed-off-by: Emilio G. Cota Reviewed-by: Richard Henderson --- tcg/tcg.h | 38 +++++++++------- accel/tcg/translate-all.c | 23 +++++----- tcg/tcg.c | 110 ++++++++++++++++++++++++++++++++++++++----= ---- 3 files changed, 126 insertions(+), 45 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index 175d4de..9d17584 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -649,6 +649,26 @@ struct tcg_temp_info { tcg_target_ulong mask; }; =20 +typedef struct TCGProfile { + int64_t tb_count1; + int64_t tb_count; + int64_t op_count; /* total insn count */ + int op_count_max; /* max insn per TB */ + int64_t temp_count; + int temp_count_max; + int64_t del_op_count; + int64_t code_in_len; + int64_t code_out_len; + int64_t search_out_len; + int64_t interm_time; + int64_t code_time; + int64_t la_time; + int64_t opt_time; + int64_t restore_count; + int64_t restore_time; + int64_t table_op_count[NB_OPS]; +} TCGProfile; + struct TCGContext { uint8_t *pool_cur, *pool_end; TCGPool *pool_first, *pool_current, *pool_first_large; @@ -673,23 +693,7 @@ struct TCGContext { tcg_insn_unit *code_ptr; =20 #ifdef CONFIG_PROFILER - /* profiling info */ - int64_t tb_count1; - int64_t tb_count; - int64_t op_count; /* total insn count */ - int op_count_max; /* max insn per TB */ - int64_t temp_count; - int temp_count_max; - int64_t del_op_count; - int64_t code_in_len; - int64_t code_out_len; - int64_t search_out_len; - int64_t interm_time; - int64_t code_time; - int64_t la_time; - int64_t opt_time; - int64_t restore_count; - int64_t restore_time; + TCGProfile prof; #endif =20 #ifdef CONFIG_DEBUG_TCG diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 961e357..fd3e4a0 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -312,6 +312,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, Tra= nslationBlock *tb, uint8_t *p =3D tb->tc.search; int i, j, num_insns =3D tb->icount; #ifdef CONFIG_PROFILER + TCGProfile *prof =3D &tcg_ctx->prof; int64_t ti =3D profile_getclock(); #endif =20 @@ -346,8 +347,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, Tra= nslationBlock *tb, restore_state_to_opc(env, tb, data); =20 #ifdef CONFIG_PROFILER - tcg_ctx->restore_time +=3D profile_getclock() - ti; - tcg_ctx->restore_count++; + atomic_set(&prof->restore_time, + prof->restore_time + profile_getclock() - ti); + atomic_set(&prof->restore_count, prof->restore_count + 1); #endif return 0; } @@ -1306,6 +1308,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_insn_unit *gen_code_buf; int gen_code_size, search_size; #ifdef CONFIG_PROFILER + TCGProfile *prof =3D &tcg_ctx->prof; int64_t ti; #endif assert_memory_lock(); @@ -1336,8 +1339,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_ctx->cf_parallel =3D !!(cflags & CF_PARALLEL); =20 #ifdef CONFIG_PROFILER - tcg_ctx->tb_count1++; /* includes aborted translations because of - exceptions */ + /* includes aborted translations because of exceptions */ + atomic_set(&prof->tb_count1, prof->tb_count1 + 1); ti =3D profile_getclock(); #endif =20 @@ -1362,8 +1365,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, #endif =20 #ifdef CONFIG_PROFILER - tcg_ctx->tb_count++; - tcg_ctx->interm_time +=3D profile_getclock() - ti; + atomic_set(&prof->tb_count, prof->tb_count + 1); + atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() = - ti); ti =3D profile_getclock(); #endif =20 @@ -1383,10 +1386,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tb->tc.size =3D gen_code_size; =20 #ifdef CONFIG_PROFILER - tcg_ctx->code_time +=3D profile_getclock() - ti; - tcg_ctx->code_in_len +=3D tb->size; - tcg_ctx->code_out_len +=3D gen_code_size; - tcg_ctx->search_out_len +=3D search_size; + atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti= ); + atomic_set(&prof->code_in_len, prof->code_in_len + tb->size); + atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size); + atomic_set(&prof->search_out_len, prof->search_out_len + search_size); #endif =20 #ifdef DEBUG_DISAS diff --git a/tcg/tcg.c b/tcg/tcg.c index 8094278..5afb80a 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1350,7 +1350,7 @@ void tcg_op_remove(TCGContext *s, TCGOp *op) memset(op, 0, sizeof(*op)); =20 #ifdef CONFIG_PROFILER - s->del_op_count++; + atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1); #endif } =20 @@ -2521,15 +2521,79 @@ static void tcg_reg_alloc_call(TCGContext *s, int n= b_oargs, int nb_iargs, =20 #ifdef CONFIG_PROFILER =20 -static int64_t tcg_table_op_count[NB_OPS]; +/* avoid copy/paste errors */ +#define PROF_ADD(to, from, field) \ + do { \ + (to)->field +=3D atomic_read(&((from)->field)); \ + } while (0) + +#define PROF_ADD_MAX(to, from, field) \ + do { \ + typeof((from)->field) val__ =3D atomic_read(&((from)->field)); \ + if (val__ > (to)->field) { \ + (to)->field =3D val__; \ + } \ + } while (0) + +/* Pass in a zero'ed @prof */ +static inline +void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table) +{ + unsigned int i; + + for (i =3D 0; i < n_tcg_ctxs; i++) { + const TCGProfile *orig =3D &tcg_ctxs[i]->prof; + + if (counters) { + PROF_ADD(prof, orig, tb_count1); + PROF_ADD(prof, orig, tb_count); + PROF_ADD(prof, orig, op_count); + PROF_ADD_MAX(prof, orig, op_count_max); + PROF_ADD(prof, orig, temp_count); + PROF_ADD_MAX(prof, orig, temp_count_max); + PROF_ADD(prof, orig, del_op_count); + PROF_ADD(prof, orig, code_in_len); + PROF_ADD(prof, orig, code_out_len); + PROF_ADD(prof, orig, search_out_len); + PROF_ADD(prof, orig, interm_time); + PROF_ADD(prof, orig, code_time); + PROF_ADD(prof, orig, la_time); + PROF_ADD(prof, orig, opt_time); + PROF_ADD(prof, orig, restore_count); + PROF_ADD(prof, orig, restore_time); + } + if (table) { + int i; + + for (i =3D 0; i < NB_OPS; i++) { + PROF_ADD(prof, orig, table_op_count[i]); + } + } + } +} + +#undef PROF_ADD +#undef PROF_ADD_MAX + +static void tcg_profile_snapshot_counters(TCGProfile *prof) +{ + tcg_profile_snapshot(prof, true, false); +} + +static void tcg_profile_snapshot_table(TCGProfile *prof) +{ + tcg_profile_snapshot(prof, false, true); +} =20 void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf) { + TCGProfile prof =3D {}; int i; =20 + tcg_profile_snapshot_table(&prof); for (i =3D 0; i < NB_OPS; i++) { cpu_fprintf(f, "%s %" PRId64 "\n", tcg_op_defs[i].name, - tcg_table_op_count[i]); + prof.table_op_count[i]); } } #else @@ -2542,6 +2606,9 @@ void tcg_dump_op_count(FILE *f, fprintf_function cpu_= fprintf) =20 int tcg_gen_code(TCGContext *s, TranslationBlock *tb) { +#ifdef CONFIG_PROFILER + TCGProfile *prof =3D &s->prof; +#endif int i, oi, oi_next, num_insns; =20 #ifdef CONFIG_PROFILER @@ -2549,15 +2616,15 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *t= b) int n; =20 n =3D s->gen_op_buf[0].prev + 1; - s->op_count +=3D n; - if (n > s->op_count_max) { - s->op_count_max =3D n; + atomic_set(&prof->op_count, prof->op_count + n); + if (n > prof->op_count_max) { + atomic_set(&prof->op_count_max, n); } =20 n =3D s->nb_temps; - s->temp_count +=3D n; - if (n > s->temp_count_max) { - s->temp_count_max =3D n; + atomic_set(&prof->temp_count, prof->temp_count + n); + if (n > prof->temp_count_max) { + atomic_set(&prof->temp_count_max, n); } } #endif @@ -2574,7 +2641,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) #endif =20 #ifdef CONFIG_PROFILER - s->opt_time -=3D profile_getclock(); + atomic_set(&prof->opt_time, prof->opt_time - profile_getclock()); #endif =20 #ifdef USE_TCG_OPTIMIZATIONS @@ -2582,8 +2649,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) #endif =20 #ifdef CONFIG_PROFILER - s->opt_time +=3D profile_getclock(); - s->la_time -=3D profile_getclock(); + atomic_set(&prof->opt_time, prof->opt_time + profile_getclock()); + atomic_set(&prof->la_time, prof->la_time - profile_getclock()); #endif =20 { @@ -2611,7 +2678,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) } =20 #ifdef CONFIG_PROFILER - s->la_time +=3D profile_getclock(); + atomic_set(&prof->la_time, prof->la_time + profile_getclock()); #endif =20 #ifdef DEBUG_DISAS @@ -2642,7 +2709,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) =20 oi_next =3D op->next; #ifdef CONFIG_PROFILER - tcg_table_op_count[opc]++; + atomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] += 1); #endif =20 switch (opc) { @@ -2718,10 +2785,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *t= b) #ifdef CONFIG_PROFILER void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf) { - TCGContext *s =3D tcg_ctx; - int64_t tb_count =3D s->tb_count; - int64_t tb_div_count =3D tb_count ? tb_count : 1; - int64_t tot =3D s->interm_time + s->code_time; + TCGProfile prof =3D {}; + const TCGProfile *s; + int64_t tb_count; + int64_t tb_div_count; + int64_t tot; + + tcg_profile_snapshot_counters(&prof); + s =3D &prof; + tb_count =3D s->tb_count; + tb_div_count =3D tb_count ? tb_count : 1; + tot =3D s->interm_time + s->code_time; =20 cpu_fprintf(f, "JIT cycles %" PRId64 " (%0.3f s at 2.4 GHz)\n= ", tot, tot / 2.4e9); --=20 2.7.4