From nobody Mon Feb 9 08:33:35 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559414350; cv=none; d=zoho.com; s=zohoarc; b=cU1HPOpt3rd7OAFDZU/2zmBlVF0YbXsFp51BTpQKPlD0apMTVpoJ3gqBJrG87wiFAaZwRYm8Ojo+vW3pqrCDhE5U1Fgvu6PApz+pj17o2I+DMlcKw/8HIH5c7iSl3iZGF02mST1u7BLMEuspqVYVP7r9o0HTQ3lQpv9lr3ABIb8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559414350; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=qGBS/LObMQzhIVs2ANO8iQvM/CYjEkj1/uKFWMQ+nl4=; b=KwjAn5AskXJSTkubeKDvRIgbS6O2ZYR7uStoPg6SleCUcHYkxuSeV1ydd+GRD35gPa6Fa8unKd/KouhUrNUMFEgjmriUi0IdjzHpRFp9i0ptIJsb0CKQ8MGh1fadCGltJAHETUc7j+1RE8RmIwuHcPqDfBeH/l1/n5yYu/lHaZU= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559414350654442.0242101957599; Sat, 1 Jun 2019 11:39:10 -0700 (PDT) Received: from localhost ([127.0.0.1]:40030 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hX8uF-0008EE-Gb for importer@patchew.org; Sat, 01 Jun 2019 14:39:03 -0400 Received: from eggs.gnu.org ([209.51.188.92]:48257) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hX8nV-0002sr-Kl for qemu-devel@nongnu.org; Sat, 01 Jun 2019 14:32:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hX8nS-0005uO-1S for qemu-devel@nongnu.org; Sat, 01 Jun 2019 14:32:05 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33321 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hX8nR-0005tU-IJ for qemu-devel@nongnu.org; Sat, 01 Jun 2019 14:32:01 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 0DE561A1D11; Sat, 1 Jun 2019 20:30:56 +0200 (CEST) Received: from rtrkw774-lin.domain.local (rtrkw774-lin.domain.local [10.10.13.43]) by mail.rt-rk.com (Postfix) with ESMTPSA id BA8CE1A1E14; Sat, 1 Jun 2019 20:30:55 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Aleksandar Markovic To: qemu-devel@nongnu.org Date: Sat, 1 Jun 2019 20:30:46 +0200 Message-Id: <1559413846-4402-9-git-send-email-aleksandar.markovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559413846-4402-1-git-send-email-aleksandar.markovic@rt-rk.com> References: <1559413846-4402-1-git-send-email-aleksandar.markovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PULL 8/8] target/mips: Improve performance of certain MSA instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, amarkovic@wavecomp.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Mateja Marjanovic Eliminate loops for better performance. Following MSA instructions from "UNOP" group are affected: - NLZC. - NLOC. - PCNT. Following MSA instructions from "BINOP" group are affected: - ADD_A. - ADDS_A. - ADDS_S. - ADDS_U. - ADDV. - ASUB_S. - ASUB_U. - AVE_S. - AVE_U. - AVER_S. - AVER_U. - BCLR. - BNEG. - BSET. - CEQ. - CLE_S. - CLE_U. - CLT_S. - CLT_U. - DIV_S. - DIV_U. - DOTP_S. - DOTP_U. - HADD_S. - HADD_U. - HSUB_S. - HSUB_U. - MAX_A. - MAX_S. - MAX_U. - MIN_A. - MIN_S. - MIN_U. - MOD_S. - MOD_U. - MUL_Q. - MULR_Q. - MULV. - SLL. - SRA. - SRAR. - SRL. - SRLR. - SUBS_S. - SUBS_U. - SUBSUS_U. - SUBSUU_S. - SUBV. Following MSA instructions from "TEROP" group are affected: - BINSL. - BINSR. - DPADD_S. - DPADD_U. - DPSUB_S. - DPSUB_U. - MADD_Q. - MADDR_Q. - MADDV. - MSUB_Q. - MSUBR_Q. - MSUBV. Additionally, following MSA instructionas are also affected: - ILVL. - ILVR. - ILVEV. - ILVOD. - PCKEV. - PCKOD. Signed-off-by: Mateja Marjanovic Signed-off-by: Aleksandar Markovic Reviewed-by: Aleksandar Markovic Message-Id: <1551718283-4487-2-git-send-email-mateja.marjanovic@rt-rk.com> --- target/mips/msa_helper.c | 542 +++++++++++++++++++++++++++++++++++++------= ---- 1 file changed, 433 insertions(+), 109 deletions(-) diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c index ee1b1fa..f6e16c2 100644 --- a/target/mips/msa_helper.c +++ b/target/mips/msa_helper.c @@ -805,28 +805,45 @@ void helper_msa_ ## func ## _df(CPUMIPSState *env, ui= nt32_t df, \ wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); \ wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); \ wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); \ - uint32_t i; \ \ switch (df) { \ case DF_BYTE: \ - for (i =3D 0; i < DF_ELEMENTS(DF_BYTE); i++) { \ - pwd->b[i] =3D msa_ ## func ## _df(df, pws->b[i], pwt->b[i]); \ - } \ + pwd->b[0] =3D msa_ ## func ## _df(df, pws->b[0], pwt->b[0]); \ + pwd->b[1] =3D msa_ ## func ## _df(df, pws->b[1], pwt->b[1]); \ + pwd->b[2] =3D msa_ ## func ## _df(df, pws->b[2], pwt->b[2]); \ + pwd->b[3] =3D msa_ ## func ## _df(df, pws->b[3], pwt->b[3]); \ + pwd->b[4] =3D msa_ ## func ## _df(df, pws->b[4], pwt->b[4]); \ + pwd->b[5] =3D msa_ ## func ## _df(df, pws->b[5], pwt->b[5]); \ + pwd->b[6] =3D msa_ ## func ## _df(df, pws->b[6], pwt->b[6]); \ + pwd->b[7] =3D msa_ ## func ## _df(df, pws->b[7], pwt->b[7]); \ + pwd->b[8] =3D msa_ ## func ## _df(df, pws->b[8], pwt->b[8]); \ + pwd->b[9] =3D msa_ ## func ## _df(df, pws->b[9], pwt->b[9]); \ + pwd->b[10] =3D msa_ ## func ## _df(df, pws->b[10], pwt->b[10]); \ + pwd->b[11] =3D msa_ ## func ## _df(df, pws->b[11], pwt->b[11]); \ + pwd->b[12] =3D msa_ ## func ## _df(df, pws->b[12], pwt->b[12]); \ + pwd->b[13] =3D msa_ ## func ## _df(df, pws->b[13], pwt->b[13]); \ + pwd->b[14] =3D msa_ ## func ## _df(df, pws->b[14], pwt->b[14]); \ + pwd->b[15] =3D msa_ ## func ## _df(df, pws->b[15], pwt->b[15]); \ break; \ case DF_HALF: \ - for (i =3D 0; i < DF_ELEMENTS(DF_HALF); i++) { \ - pwd->h[i] =3D msa_ ## func ## _df(df, pws->h[i], pwt->h[i]); \ - } \ + pwd->h[0] =3D msa_ ## func ## _df(df, pws->h[0], pwt->h[0]); \ + pwd->h[1] =3D msa_ ## func ## _df(df, pws->h[1], pwt->h[1]); \ + pwd->h[2] =3D msa_ ## func ## _df(df, pws->h[2], pwt->h[2]); \ + pwd->h[3] =3D msa_ ## func ## _df(df, pws->h[3], pwt->h[3]); \ + pwd->h[4] =3D msa_ ## func ## _df(df, pws->h[4], pwt->h[4]); \ + pwd->h[5] =3D msa_ ## func ## _df(df, pws->h[5], pwt->h[5]); \ + pwd->h[6] =3D msa_ ## func ## _df(df, pws->h[6], pwt->h[6]); \ + pwd->h[7] =3D msa_ ## func ## _df(df, pws->h[7], pwt->h[7]); \ break; \ case DF_WORD: \ - for (i =3D 0; i < DF_ELEMENTS(DF_WORD); i++) { \ - pwd->w[i] =3D msa_ ## func ## _df(df, pws->w[i], pwt->w[i]); \ - } \ + pwd->w[0] =3D msa_ ## func ## _df(df, pws->w[0], pwt->w[0]); \ + pwd->w[1] =3D msa_ ## func ## _df(df, pws->w[1], pwt->w[1]); \ + pwd->w[2] =3D msa_ ## func ## _df(df, pws->w[2], pwt->w[2]); \ + pwd->w[3] =3D msa_ ## func ## _df(df, pws->w[3], pwt->w[3]); \ break; \ case DF_DOUBLE: \ - for (i =3D 0; i < DF_ELEMENTS(DF_DOUBLE); i++) { \ - pwd->d[i] =3D msa_ ## func ## _df(df, pws->d[i], pwt->d[i]); \ - } \ + pwd->d[0] =3D msa_ ## func ## _df(df, pws->d[0], pwt->d[0]); \ + pwd->d[1] =3D msa_ ## func ## _df(df, pws->d[1], pwt->d[1]); \ break; \ default: \ assert(0); \ @@ -1012,42 +1029,71 @@ static inline int64_t msa_msubr_q_df(uint32_t df, i= nt64_t dest, int64_t arg1, } =20 #define MSA_TEROP_DF(func) \ -void helper_msa_ ## func ## _df(CPUMIPSState *env, uint32_t df, uint32_t w= d, \ - uint32_t ws, uint32_t wt) \ -{ \ - wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); \ - wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); \ - wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); \ - uint32_t i; \ - \ - switch (df) { \ - case DF_BYTE: \ - for (i =3D 0; i < DF_ELEMENTS(DF_BYTE); i++) { \ - pwd->b[i] =3D msa_ ## func ## _df(df, pwd->b[i], pws->b[i], \ - pwt->b[i]); \ - } \ - break; \ - case DF_HALF: \ - for (i =3D 0; i < DF_ELEMENTS(DF_HALF); i++) { \ - pwd->h[i] =3D msa_ ## func ## _df(df, pwd->h[i], pws->h[i], \ - pwt->h[i]); \ - } \ - break; \ - case DF_WORD: \ - for (i =3D 0; i < DF_ELEMENTS(DF_WORD); i++) { \ - pwd->w[i] =3D msa_ ## func ## _df(df, pwd->w[i], pws->w[i], \ - pwt->w[i]); \ - } \ - break; \ - case DF_DOUBLE: \ - for (i =3D 0; i < DF_ELEMENTS(DF_DOUBLE); i++) { \ - pwd->d[i] =3D msa_ ## func ## _df(df, pwd->d[i], pws->d[i], \ - pwt->d[i]); \ - } \ - break; \ - default: \ - assert(0); \ - } \ +void helper_msa_ ## func ## _df(CPUMIPSState *env, uint32_t df, uint32_t w= d, \ + uint32_t ws, uint32_t wt) = \ +{ = \ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); = \ + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); = \ + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); = \ + = \ + switch (df) { = \ + case DF_BYTE: = \ + pwd->b[0] =3D msa_ ## func ## _df(df, pwd->b[0], pws->b[0], = \ + pwt->b[0]); = \ + pwd->b[1] =3D msa_ ## func ## _df(df, pwd->b[1], pws->b[1], = \ + pwt->b[1]); = \ + pwd->b[2] =3D msa_ ## func ## _df(df, pwd->b[2], pws->b[2], = \ + pwt->b[2]); = \ + pwd->b[3] =3D msa_ ## func ## _df(df, pwd->b[3], pws->b[3], = \ + pwt->b[3]); = \ + pwd->b[4] =3D msa_ ## func ## _df(df, pwd->b[4], pws->b[4], = \ + pwt->b[4]); = \ + pwd->b[5] =3D msa_ ## func ## _df(df, pwd->b[5], pws->b[5], = \ + pwt->b[5]); = \ + pwd->b[6] =3D msa_ ## func ## _df(df, pwd->b[6], pws->b[6], = \ + pwt->b[6]); = \ + pwd->b[7] =3D msa_ ## func ## _df(df, pwd->b[7], pws->b[7], = \ + pwt->b[7]); = \ + pwd->b[8] =3D msa_ ## func ## _df(df, pwd->b[8], pws->b[8], = \ + pwt->b[8]); = \ + pwd->b[9] =3D msa_ ## func ## _df(df, pwd->b[9], pws->b[9], = \ + pwt->b[9]); = \ + pwd->b[10] =3D msa_ ## func ## _df(df, pwd->b[10], pws->b[10], = \ + pwt->b[10]); = \ + pwd->b[11] =3D msa_ ## func ## _df(df, pwd->b[11], pws->b[11], = \ + pwt->b[11]); = \ + pwd->b[12] =3D msa_ ## func ## _df(df, pwd->b[12], pws->b[12], = \ + pwt->b[12]); = \ + pwd->b[13] =3D msa_ ## func ## _df(df, pwd->b[13], pws->b[13], = \ + pwt->b[13]); = \ + pwd->b[14] =3D msa_ ## func ## _df(df, pwd->b[14], pws->b[14], = \ + pwt->b[14]); = \ + pwd->b[15] =3D msa_ ## func ## _df(df, pwd->b[15], pws->b[15], = \ + pwt->b[15]); = \ + break; = \ + case DF_HALF: = \ + pwd->h[0] =3D msa_ ## func ## _df(df, pwd->h[0], pws->h[0], pwt->h= [0]); \ + pwd->h[1] =3D msa_ ## func ## _df(df, pwd->h[1], pws->h[1], pwt->h= [1]); \ + pwd->h[2] =3D msa_ ## func ## _df(df, pwd->h[2], pws->h[2], pwt->h= [2]); \ + pwd->h[3] =3D msa_ ## func ## _df(df, pwd->h[3], pws->h[3], pwt->h= [3]); \ + pwd->h[4] =3D msa_ ## func ## _df(df, pwd->h[4], pws->h[4], pwt->h= [4]); \ + pwd->h[5] =3D msa_ ## func ## _df(df, pwd->h[5], pws->h[5], pwt->h= [5]); \ + pwd->h[6] =3D msa_ ## func ## _df(df, pwd->h[6], pws->h[6], pwt->h= [6]); \ + pwd->h[7] =3D msa_ ## func ## _df(df, pwd->h[7], pws->h[7], pwt->h= [7]); \ + break; = \ + case DF_WORD: = \ + pwd->w[0] =3D msa_ ## func ## _df(df, pwd->w[0], pws->w[0], pwt->w= [0]); \ + pwd->w[1] =3D msa_ ## func ## _df(df, pwd->w[1], pws->w[1], pwt->w= [1]); \ + pwd->w[2] =3D msa_ ## func ## _df(df, pwd->w[2], pws->w[2], pwt->w= [2]); \ + pwd->w[3] =3D msa_ ## func ## _df(df, pwd->w[3], pws->w[3], pwt->w= [3]); \ + break; = \ + case DF_DOUBLE: = \ + pwd->d[0] =3D msa_ ## func ## _df(df, pwd->d[0], pws->d[0], pwt->d= [0]); \ + pwd->d[1] =3D msa_ ## func ## _df(df, pwd->d[1], pws->d[1], pwt->d= [1]); \ + break; = \ + default: = \ + assert(0); = \ + } = \ } =20 MSA_TEROP_DF(maddv) @@ -1167,53 +1213,6 @@ void helper_msa_##FUNC(CPUMIPSState *env, uint32_t d= f, uint32_t wd, \ #define Rd(pwr, i) (pwr->d[i]) #define Ld(pwr, i) (pwr->d[i + DF_ELEMENTS(DF_DOUBLE)/2]) =20 -#define MSA_DO(DF) \ - do { \ - R##DF(pwx, i) =3D pwt->DF[2*i]; \ - L##DF(pwx, i) =3D pws->DF[2*i]; \ - } while (0) -MSA_FN_DF(pckev_df) -#undef MSA_DO - -#define MSA_DO(DF) \ - do { \ - R##DF(pwx, i) =3D pwt->DF[2*i+1]; \ - L##DF(pwx, i) =3D pws->DF[2*i+1]; \ - } while (0) -MSA_FN_DF(pckod_df) -#undef MSA_DO - -#define MSA_DO(DF) \ - do { \ - pwx->DF[2*i] =3D L##DF(pwt, i); \ - pwx->DF[2*i+1] =3D L##DF(pws, i); \ - } while (0) -MSA_FN_DF(ilvl_df) -#undef MSA_DO - -#define MSA_DO(DF) \ - do { \ - pwx->DF[2*i] =3D R##DF(pwt, i); \ - pwx->DF[2*i+1] =3D R##DF(pws, i); \ - } while (0) -MSA_FN_DF(ilvr_df) -#undef MSA_DO - -#define MSA_DO(DF) \ - do { \ - pwx->DF[2*i] =3D pwt->DF[2*i]; \ - pwx->DF[2*i+1] =3D pws->DF[2*i]; \ - } while (0) -MSA_FN_DF(ilvev_df) -#undef MSA_DO - -#define MSA_DO(DF) \ - do { \ - pwx->DF[2*i] =3D pwt->DF[2*i+1]; \ - pwx->DF[2*i+1] =3D pws->DF[2*i+1]; \ - } while (0) -MSA_FN_DF(ilvod_df) -#undef MSA_DO #undef MSA_LOOP_COND =20 #define MSA_LOOP_COND(DF) \ @@ -1231,6 +1230,314 @@ MSA_FN_DF(vshf_df) #undef MSA_LOOP_COND #undef MSA_FN_DF =20 + +void helper_msa_ilvev_df(CPUMIPSState *env, uint32_t df, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); + + switch (df) { + case DF_BYTE: + pwd->b[15] =3D pws->b[14]; + pwd->b[14] =3D pwt->b[14]; + pwd->b[13] =3D pws->b[12]; + pwd->b[12] =3D pwt->b[12]; + pwd->b[11] =3D pws->b[10]; + pwd->b[10] =3D pwt->b[10]; + pwd->b[9] =3D pws->b[8]; + pwd->b[8] =3D pwt->b[8]; + pwd->b[7] =3D pws->b[6]; + pwd->b[6] =3D pwt->b[6]; + pwd->b[5] =3D pws->b[4]; + pwd->b[4] =3D pwt->b[4]; + pwd->b[3] =3D pws->b[2]; + pwd->b[2] =3D pwt->b[2]; + pwd->b[1] =3D pws->b[0]; + pwd->b[0] =3D pwt->b[0]; + break; + case DF_HALF: + pwd->h[7] =3D pws->h[6]; + pwd->h[6] =3D pwt->h[6]; + pwd->h[5] =3D pws->h[4]; + pwd->h[4] =3D pwt->h[4]; + pwd->h[3] =3D pws->h[2]; + pwd->h[2] =3D pwt->h[2]; + pwd->h[1] =3D pws->h[0]; + pwd->h[0] =3D pwt->h[0]; + break; + case DF_WORD: + pwd->w[3] =3D pws->w[2]; + pwd->w[2] =3D pwt->w[2]; + pwd->w[1] =3D pws->w[0]; + pwd->w[0] =3D pwt->w[0]; + break; + case DF_DOUBLE: + pwd->d[1] =3D pws->d[0]; + pwd->d[0] =3D pwt->d[0]; + break; + default: + assert(0); + } +} + +void helper_msa_ilvod_df(CPUMIPSState *env, uint32_t df, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); + + switch (df) { + case DF_BYTE: + pwd->b[0] =3D pwt->b[1]; + pwd->b[1] =3D pws->b[1]; + pwd->b[2] =3D pwt->b[3]; + pwd->b[3] =3D pws->b[3]; + pwd->b[4] =3D pwt->b[5]; + pwd->b[5] =3D pws->b[5]; + pwd->b[6] =3D pwt->b[7]; + pwd->b[7] =3D pws->b[7]; + pwd->b[8] =3D pwt->b[9]; + pwd->b[9] =3D pws->b[9]; + pwd->b[10] =3D pwt->b[11]; + pwd->b[11] =3D pws->b[11]; + pwd->b[12] =3D pwt->b[13]; + pwd->b[13] =3D pws->b[13]; + pwd->b[14] =3D pwt->b[15]; + pwd->b[15] =3D pws->b[15]; + break; + case DF_HALF: + pwd->h[0] =3D pwt->h[1]; + pwd->h[1] =3D pws->h[1]; + pwd->h[2] =3D pwt->h[3]; + pwd->h[3] =3D pws->h[3]; + pwd->h[4] =3D pwt->h[5]; + pwd->h[5] =3D pws->h[5]; + pwd->h[6] =3D pwt->h[7]; + pwd->h[7] =3D pws->h[7]; + break; + case DF_WORD: + pwd->w[0] =3D pwt->w[1]; + pwd->w[1] =3D pws->w[1]; + pwd->w[2] =3D pwt->w[3]; + pwd->w[3] =3D pws->w[3]; + break; + case DF_DOUBLE: + pwd->d[0] =3D pwt->d[1]; + pwd->d[1] =3D pws->d[1]; + break; + default: + assert(0); + } +} + +void helper_msa_ilvl_df(CPUMIPSState *env, uint32_t df, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); + + switch (df) { + case DF_BYTE: + pwd->b[0] =3D pwt->b[8]; + pwd->b[1] =3D pws->b[8]; + pwd->b[2] =3D pwt->b[9]; + pwd->b[3] =3D pws->b[9]; + pwd->b[4] =3D pwt->b[10]; + pwd->b[5] =3D pws->b[10]; + pwd->b[6] =3D pwt->b[11]; + pwd->b[7] =3D pws->b[11]; + pwd->b[8] =3D pwt->b[12]; + pwd->b[9] =3D pws->b[12]; + pwd->b[10] =3D pwt->b[13]; + pwd->b[11] =3D pws->b[13]; + pwd->b[12] =3D pwt->b[14]; + pwd->b[13] =3D pws->b[14]; + pwd->b[14] =3D pwt->b[15]; + pwd->b[15] =3D pws->b[15]; + break; + case DF_HALF: + pwd->h[0] =3D pwt->h[4]; + pwd->h[1] =3D pws->h[4]; + pwd->h[2] =3D pwt->h[5]; + pwd->h[3] =3D pws->h[5]; + pwd->h[4] =3D pwt->h[6]; + pwd->h[5] =3D pws->h[6]; + pwd->h[6] =3D pwt->h[7]; + pwd->h[7] =3D pws->h[7]; + break; + case DF_WORD: + pwd->w[0] =3D pwt->w[2]; + pwd->w[1] =3D pws->w[2]; + pwd->w[2] =3D pwt->w[3]; + pwd->w[3] =3D pws->w[3]; + break; + case DF_DOUBLE: + pwd->d[0] =3D pwt->d[1]; + pwd->d[1] =3D pws->d[1]; + break; + default: + assert(0); + } +} + +void helper_msa_ilvr_df(CPUMIPSState *env, uint32_t df, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); + + switch (df) { + case DF_BYTE: + pwd->b[15] =3D pws->b[7]; + pwd->b[14] =3D pwt->b[7]; + pwd->b[13] =3D pws->b[6]; + pwd->b[12] =3D pwt->b[6]; + pwd->b[11] =3D pws->b[5]; + pwd->b[10] =3D pwt->b[5]; + pwd->b[9] =3D pws->b[4]; + pwd->b[8] =3D pwt->b[4]; + pwd->b[7] =3D pws->b[3]; + pwd->b[6] =3D pwt->b[3]; + pwd->b[5] =3D pws->b[2]; + pwd->b[4] =3D pwt->b[2]; + pwd->b[3] =3D pws->b[1]; + pwd->b[2] =3D pwt->b[1]; + pwd->b[1] =3D pws->b[0]; + pwd->b[0] =3D pwt->b[0]; + break; + case DF_HALF: + pwd->h[7] =3D pws->h[3]; + pwd->h[6] =3D pwt->h[3]; + pwd->h[5] =3D pws->h[2]; + pwd->h[4] =3D pwt->h[2]; + pwd->h[3] =3D pws->h[1]; + pwd->h[2] =3D pwt->h[1]; + pwd->h[1] =3D pws->h[0]; + pwd->h[0] =3D pwt->h[0]; + break; + case DF_WORD: + pwd->w[3] =3D pws->w[1]; + pwd->w[2] =3D pwt->w[1]; + pwd->w[1] =3D pws->w[0]; + pwd->w[0] =3D pwt->w[0]; + break; + case DF_DOUBLE: + pwd->d[1] =3D pws->d[0]; + pwd->d[0] =3D pwt->d[0]; + break; + default: + assert(0); + } +} + +void helper_msa_pckev_df(CPUMIPSState *env, uint32_t df, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); + + switch (df) { + case DF_BYTE: + pwd->b[15] =3D pws->b[14]; + pwd->b[13] =3D pws->b[10]; + pwd->b[11] =3D pws->b[6]; + pwd->b[9] =3D pws->b[2]; + pwd->b[7] =3D pwt->b[14]; + pwd->b[5] =3D pwt->b[10]; + pwd->b[3] =3D pwt->b[6]; + pwd->b[1] =3D pwt->b[2]; + pwd->b[14] =3D pws->b[12]; + pwd->b[10] =3D pws->b[4]; + pwd->b[6] =3D pwt->b[12]; + pwd->b[2] =3D pwt->b[4]; + pwd->b[12] =3D pws->b[8]; + pwd->b[4] =3D pwt->b[8]; + pwd->b[8] =3D pws->b[0]; + pwd->b[0] =3D pwt->b[0]; + break; + case DF_HALF: + pwd->h[7] =3D pws->h[6]; + pwd->h[5] =3D pws->h[2]; + pwd->h[3] =3D pwt->h[6]; + pwd->h[1] =3D pwt->h[2]; + pwd->h[6] =3D pws->h[4]; + pwd->h[2] =3D pwt->h[4]; + pwd->h[4] =3D pws->h[0]; + pwd->h[0] =3D pwt->h[0]; + break; + case DF_WORD: + pwd->w[3] =3D pws->w[2]; + pwd->w[1] =3D pwt->w[2]; + pwd->w[2] =3D pws->w[0]; + pwd->w[0] =3D pwt->w[0]; + break; + case DF_DOUBLE: + pwd->d[1] =3D pws->d[0]; + pwd->d[0] =3D pwt->d[0]; + break; + default: + assert(0); + } +} + +void helper_msa_pckod_df(CPUMIPSState *env, uint32_t df, uint32_t wd, + uint32_t ws, uint32_t wt) +{ + wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); + wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); + wr_t *pwt =3D &(env->active_fpu.fpr[wt].wr); + + switch (df) { + case DF_BYTE: + pwd->b[0] =3D pwt->b[1]; + pwd->b[2] =3D pwt->b[5]; + pwd->b[4] =3D pwt->b[9]; + pwd->b[6] =3D pwt->b[13]; + pwd->b[8] =3D pws->b[1]; + pwd->b[10] =3D pws->b[5]; + pwd->b[12] =3D pws->b[9]; + pwd->b[14] =3D pws->b[13]; + pwd->b[1] =3D pwt->b[3]; + pwd->b[5] =3D pwt->b[11]; + pwd->b[9] =3D pws->b[3]; + pwd->b[13] =3D pws->b[11]; + pwd->b[3] =3D pwt->b[7]; + pwd->b[11] =3D pws->b[7]; + pwd->b[7] =3D pwt->b[15]; + pwd->b[15] =3D pws->b[15]; + break; + case DF_HALF: + pwd->h[0] =3D pwt->h[1]; + pwd->h[2] =3D pwt->h[5]; + pwd->h[4] =3D pws->h[1]; + pwd->h[6] =3D pws->h[5]; + pwd->h[1] =3D pwt->h[3]; + pwd->h[5] =3D pws->h[3]; + pwd->h[3] =3D pwt->h[7]; + pwd->h[7] =3D pws->h[7]; + break; + case DF_WORD: + pwd->w[0] =3D pwt->w[1]; + pwd->w[2] =3D pws->w[1]; + pwd->w[1] =3D pwt->w[3]; + pwd->w[3] =3D pws->w[3]; + break; + case DF_DOUBLE: + pwd->d[0] =3D pwt->d[1]; + pwd->d[1] =3D pws->d[1]; + break; + default: + assert(0); + } +} + + void helper_msa_sldi_df(CPUMIPSState *env, uint32_t df, uint32_t wd, uint32_t ws, uint32_t n) { @@ -1537,28 +1844,45 @@ void helper_msa_ ## func ## _df(CPUMIPSState *env, = uint32_t df, \ { \ wr_t *pwd =3D &(env->active_fpu.fpr[wd].wr); \ wr_t *pws =3D &(env->active_fpu.fpr[ws].wr); \ - uint32_t i; \ \ switch (df) { \ case DF_BYTE: \ - for (i =3D 0; i < DF_ELEMENTS(DF_BYTE); i++) { \ - pwd->b[i] =3D msa_ ## func ## _df(df, pws->b[i]); \ - } \ + pwd->b[0] =3D msa_ ## func ## _df(df, pws->b[0]); \ + pwd->b[1] =3D msa_ ## func ## _df(df, pws->b[1]); \ + pwd->b[2] =3D msa_ ## func ## _df(df, pws->b[2]); \ + pwd->b[3] =3D msa_ ## func ## _df(df, pws->b[3]); \ + pwd->b[4] =3D msa_ ## func ## _df(df, pws->b[4]); \ + pwd->b[5] =3D msa_ ## func ## _df(df, pws->b[5]); \ + pwd->b[6] =3D msa_ ## func ## _df(df, pws->b[6]); \ + pwd->b[7] =3D msa_ ## func ## _df(df, pws->b[7]); \ + pwd->b[8] =3D msa_ ## func ## _df(df, pws->b[8]); \ + pwd->b[9] =3D msa_ ## func ## _df(df, pws->b[9]); \ + pwd->b[10] =3D msa_ ## func ## _df(df, pws->b[10]); \ + pwd->b[11] =3D msa_ ## func ## _df(df, pws->b[11]); \ + pwd->b[12] =3D msa_ ## func ## _df(df, pws->b[12]); \ + pwd->b[13] =3D msa_ ## func ## _df(df, pws->b[13]); \ + pwd->b[14] =3D msa_ ## func ## _df(df, pws->b[14]); \ + pwd->b[15] =3D msa_ ## func ## _df(df, pws->b[15]); \ break; \ case DF_HALF: \ - for (i =3D 0; i < DF_ELEMENTS(DF_HALF); i++) { \ - pwd->h[i] =3D msa_ ## func ## _df(df, pws->h[i]); \ - } \ + pwd->h[0] =3D msa_ ## func ## _df(df, pws->h[0]); \ + pwd->h[1] =3D msa_ ## func ## _df(df, pws->h[1]); \ + pwd->h[2] =3D msa_ ## func ## _df(df, pws->h[2]); \ + pwd->h[3] =3D msa_ ## func ## _df(df, pws->h[3]); \ + pwd->h[4] =3D msa_ ## func ## _df(df, pws->h[4]); \ + pwd->h[5] =3D msa_ ## func ## _df(df, pws->h[5]); \ + pwd->h[6] =3D msa_ ## func ## _df(df, pws->h[6]); \ + pwd->h[7] =3D msa_ ## func ## _df(df, pws->h[7]); \ break; \ case DF_WORD: \ - for (i =3D 0; i < DF_ELEMENTS(DF_WORD); i++) { \ - pwd->w[i] =3D msa_ ## func ## _df(df, pws->w[i]); \ - } \ + pwd->w[0] =3D msa_ ## func ## _df(df, pws->w[0]); \ + pwd->w[1] =3D msa_ ## func ## _df(df, pws->w[1]); \ + pwd->w[2] =3D msa_ ## func ## _df(df, pws->w[2]); \ + pwd->w[3] =3D msa_ ## func ## _df(df, pws->w[3]); \ break; \ case DF_DOUBLE: \ - for (i =3D 0; i < DF_ELEMENTS(DF_DOUBLE); i++) { \ - pwd->d[i] =3D msa_ ## func ## _df(df, pws->d[i]); \ - } \ + pwd->d[0] =3D msa_ ## func ## _df(df, pws->d[0]); \ + pwd->d[1] =3D msa_ ## func ## _df(df, pws->d[1]); \ break; \ default: \ assert(0); \ --=20 2.7.4