From nobody Thu Dec 18 08:07:38 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 155018594041753.790836823401946; Thu, 14 Feb 2019 15:12:20 -0800 (PST) Received: from localhost ([127.0.0.1]:56450 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guQAz-0004cK-A9 for importer@patchew.org; Thu, 14 Feb 2019 18:12:17 -0500 Received: from eggs.gnu.org ([209.51.188.92]:49454) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guQ8m-00034j-R2 for qemu-devel@nongnu.org; Thu, 14 Feb 2019 18:10:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1guQ0E-0005F3-2L for qemu-devel@nongnu.org; Thu, 14 Feb 2019 18:01:13 -0500 Received: from mail-lf1-x142.google.com ([2a00:1450:4864:20::142]:40844) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1guQ0D-0005DB-DU for qemu-devel@nongnu.org; Thu, 14 Feb 2019 18:01:09 -0500 Received: by mail-lf1-x142.google.com with SMTP id t14so5805704lfk.7 for ; Thu, 14 Feb 2019 15:01:09 -0800 (PST) Received: from octofox.cadence.com (jcmvbkbc-1-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:1fa::2]) by smtp.gmail.com with ESMTPSA id h123sm172384lfh.26.2019.02.14.15.01.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Feb 2019 15:01:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TsvBw/YvGjS4bBd4Y0xUt7/szwKw/HYtjaTD4996HOo=; b=mg58NrgHsypHFyVylMA7NuYMUz1ztGe8Q1/qp2XpQYOEj/P4YC0mIZ37lAUsc2GDqJ g+Lc22ZBLZ+ncZUyQNWL0F0phhw/nHapNfUWUIwqeBF3y9qiQFw22QRf+WXUxqWxJLBR Rgy+HiWuYJDTewHrdawGC20AS2FK599ggdgw6e0WdYUC0njQE8bc2XQm4fCgVkTR144R 01Pj8Xxx4Cx3gYZOh98l6snBf1AZitlGO1vpnjcjckkJWlHN35KkjA5ZJp1LpLm9fEZG wTyDoCMkeokSm95k6lOOEDqX24V0ONAEBnQTAlGHe2Ufow4321ZTv6ZxWVwbi83kewQE Xi3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=TsvBw/YvGjS4bBd4Y0xUt7/szwKw/HYtjaTD4996HOo=; b=U81sakmPI/3wgWZGrv7ToTRbQUuqzPIVnvHHglt1j0ZVSXVxDUdVgdL5cI9lpwhl65 lX6RG24OdN0X6yzAOYbziJd53C0/BBzHPY/QzYPFWVOPF3MHIg9SQhf6RQZPcCSM4Z3V eFbmm+Qrim9VcLwUBF30MAJXWtacRKEb9aJnCpyGjrj8ygc8XsE4+73JbIq9zDXiOhJ1 +fLIWdHijpTa+3lUazCkRNC9xAtJ8Nmflcw0nLKP3revxuLIu6SSCt8kk9mIp090DgCY Nee/xmixrRAz+zHqoqdQSgVOZHIMcNyJBHRodh8Sl7ocG9wOqYdDF3pdmo+79qW7tueB m1dg== X-Gm-Message-State: AHQUAuYMCtK172xhxnOw9cttEVZrHlsteCvsXJa7s6sxDJSidPu1aKe6 5q+gcldZQjsXItH4L+1Y1J4b7/YDKVU= X-Google-Smtp-Source: AHgI3IbE9FRSPnMtAWZPYNayMMcxn4VLJikgSGrnOqq1VV/0d8AeB5Ynp0YPAlCChsmvrLM36i/4Vg== X-Received: by 2002:a19:9b50:: with SMTP id d77mr3571327lfe.137.1550185267884; Thu, 14 Feb 2019 15:01:07 -0800 (PST) From: Max Filippov To: qemu-devel@nongnu.org Date: Thu, 14 Feb 2019 14:59:59 -0800 Message-Id: <20190214230000.24894-13-jcmvbkbc@gmail.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190214230000.24894-1-jcmvbkbc@gmail.com> References: <20190214230000.24894-1-jcmvbkbc@gmail.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::142 Subject: [Qemu-devel] [PATCH 12/13] target/xtensa: break circular register dependencies X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Max Filippov , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Currently topologic opcode sorting stops at the first detected dependency loop. Introduce struct opcode_arg_copy that describes temporary register copy. Scan remaining opcodes searching for dependencies that can be broken, break them by introducing temporary register copies and record them in an array. In case of success create local temporaries and initialize them with current register values. Share single temporary copy between all register users. Delete temporaries after translation. Signed-off-by: Max Filippov --- target/xtensa/translate.c | 127 ++++++++++++++++++++++++++++++++++++++++++= ++-- 1 file changed, 123 insertions(+), 4 deletions(-) diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c index 276b435ce81e..8bc272d05b4b 100644 --- a/target/xtensa/translate.c +++ b/target/xtensa/translate.c @@ -935,6 +935,12 @@ static int gen_postprocess(DisasContext *dc, int slot) return slot; } =20 +struct opcode_arg_copy { + uint32_t resource; + void *temp; + OpcodeArg *arg; +}; + struct opcode_arg_info { uint32_t resource; int index; @@ -961,6 +967,11 @@ static uint32_t encode_resource(enum resource_type r, = unsigned g, unsigned n) return (r << 24) | (g << 16) | n; } =20 +static enum resource_type get_resource_type(uint32_t resource) +{ + return resource >> 24; +} + /* * a depends on b if b must be executed before a, * because a's side effects will destroy b's inputs. @@ -987,6 +998,49 @@ static bool op_depends_on(const struct slot_prop *a, } =20 /* + * Try to break a dependency on b, append temporary register copy records + * to the end of copy and update n_copy in case of success. + * This is not always possible: e.g. control flow must always be the last, + * load/store must be first and state dependencies are not supported yet. + */ +static bool break_dependency(struct slot_prop *a, + struct slot_prop *b, + struct opcode_arg_copy *copy, + unsigned *n_copy) +{ + unsigned i =3D 0; + unsigned j =3D 0; + unsigned n =3D *n_copy; + bool rv =3D false; + + if (a->op_flags & XTENSA_OP_CONTROL_FLOW) { + return false; + } + while (i < a->n_out && j < b->n_in) { + if (a->out[i].resource < b->in[j].resource) { + ++i; + } else if (a->out[i].resource > b->in[j].resource) { + ++j; + } else { + int index =3D b->in[j].index; + + if (get_resource_type(a->out[i].resource) !=3D RES_REGFILE || + index < 0) { + return false; + } + copy[n].resource =3D b->in[j].resource; + copy[n].arg =3D b->arg + index; + ++n; + ++i; + ++j; + rv =3D true; + } + } + *n_copy =3D n; + return rv; +} + +/* * Calculate evaluation order for slot opcodes. * Build opcode order graph and output its nodes in topological sort order. * An edge a -> b in the graph means that opcode a must be followed by @@ -994,7 +1048,9 @@ static bool op_depends_on(const struct slot_prop *a, */ static bool tsort(struct slot_prop *slot, struct slot_prop *sorted[], - unsigned n) + unsigned n, + struct opcode_arg_copy *copy, + unsigned *n_copy) { struct tsnode { unsigned n_in_edge; @@ -1007,7 +1063,8 @@ static bool tsort(struct slot_prop *slot, unsigned n_in =3D 0; unsigned n_out =3D 0; unsigned n_edge =3D 0; - unsigned in_idx; + unsigned in_idx =3D 0; + unsigned node_idx =3D 0; =20 for (i =3D 0; i < n; ++i) { node[i].n_in_edge =3D 0; @@ -1035,7 +1092,8 @@ static bool tsort(struct slot_prop *slot, } } =20 - for (in_idx =3D 0; in_idx < n_in; ++in_idx) { +again: + for (; in_idx < n_in; ++in_idx) { i =3D in[in_idx]; sorted[n_out] =3D slot + i; ++n_out; @@ -1047,6 +1105,29 @@ static bool tsort(struct slot_prop *slot, } } } + if (n_edge) { + for (; node_idx < n; ++node_idx) { + struct tsnode *cnode =3D node + node_idx; + + if (cnode->n_in_edge) { + for (j =3D 0; j < cnode->n_out_edge; ++j) { + unsigned k =3D cnode->out_edge[j]; + + if (break_dependency(slot + k, slot + node_idx, + copy, n_copy) && + --node[k].n_in_edge =3D=3D 0) { + in[n_in] =3D k; + ++n_in; + --n_edge; + cnode->out_edge[j] =3D + cnode->out_edge[cnode->n_out_edge - 1]; + --cnode->n_out_edge; + goto again; + } + } + } + } + } return n_edge =3D=3D 0; } =20 @@ -1084,6 +1165,15 @@ static int resource_compare(const void *a, const voi= d *b) -1 : (pa->resource > pb->resource ? 1 : 0); } =20 +static int arg_copy_compare(const void *a, const void *b) +{ + const struct opcode_arg_copy *pa =3D a; + const struct opcode_arg_copy *pb =3D b; + + return pa->resource < pb->resource ? + -1 : (pa->resource > pb->resource ? 1 : 0); +} + static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc) { xtensa_isa isa =3D dc->config->isa; @@ -1095,6 +1185,8 @@ static void disas_xtensa_insn(CPUXtensaState *env, Di= sasContext *dc) uint32_t op_flags =3D 0; struct slot_prop slot_prop[MAX_INSN_SLOTS]; struct slot_prop *ordered[MAX_INSN_SLOTS]; + struct opcode_arg_copy arg_copy[MAX_INSN_SLOTS * MAX_OPCODE_ARGS]; + unsigned n_arg_copy =3D 0; uint32_t debug_cause =3D 0; uint32_t windowed_register =3D 0; uint32_t coprocessor =3D 0; @@ -1249,7 +1341,7 @@ static void disas_xtensa_insn(CPUXtensaState *env, Di= sasContext *dc) } =20 if (slots > 1) { - if (!tsort(slot_prop, ordered, slots)) { + if (!tsort(slot_prop, ordered, slots, arg_copy, &n_arg_copy)) { qemu_log_mask(LOG_UNIMP, "Circular resource dependencies (pc =3D %08x)\n", dc->pc); @@ -1297,6 +1389,29 @@ static void disas_xtensa_insn(CPUXtensaState *env, D= isasContext *dc) return; } =20 + if (n_arg_copy) { + uint32_t resource; + void *temp; + unsigned j; + + qsort(arg_copy, n_arg_copy, sizeof(*arg_copy), arg_copy_compare); + for (i =3D j =3D 0; i < n_arg_copy; ++i) { + if (i =3D=3D 0 || arg_copy[i].resource !=3D resource) { + resource =3D arg_copy[i].resource; + temp =3D tcg_temp_local_new(); + tcg_gen_mov_i32(temp, arg_copy[i].arg->in); + arg_copy[i].temp =3D temp; + + if (i !=3D j) { + arg_copy[j] =3D arg_copy[i]; + } + ++j; + } + arg_copy[i].arg->in =3D temp; + } + n_arg_copy =3D j; + } + if (op_flags & XTENSA_OP_DIVIDE_BY_ZERO) { for (slot =3D 0; slot < slots; ++slot) { if (slot_prop[slot].ops->op_flags & XTENSA_OP_DIVIDE_BY_ZERO) { @@ -1314,6 +1429,10 @@ static void disas_xtensa_insn(CPUXtensaState *env, D= isasContext *dc) ops->translate(dc, pslot->arg, ops->par); } =20 + for (i =3D 0; i < n_arg_copy; ++i) { + tcg_temp_free(arg_copy[i].temp); + } + if (dc->base.is_jmp =3D=3D DISAS_NEXT) { gen_postprocess(dc, 0); dc->op_flags =3D 0; --=20 2.11.0