From nobody Mon May 12 01:16:53 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1734253700; cv=none; d=zohomail.com; s=zohoarc; b=YcqYH/FQN03seFh9S8YXzdf5NVn5jDALliWyTEIF3R88wf3/FBYXKRA//zQSlRW04nwr7Idsq/OkofJNHGgLZvgWZlj1a4Zg8FXT4Nm7x8USIXFIJRMM8/P6zFS2273WjpVaa4K9HWtb/8Dg9ZjuXf06JJ4ERGb89HjWJKtWW6A= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1734253700; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Dafbs10xKmoTxtWUtvU8S+9K7XsF2B4JhpZ07ZD6dXY=; b=Ncmf/ItsIsgTaYWZ3TMPgg5JRZUSGgaFdzTDybKPQvcp3aK8bgMUngWtN4n6CmkxZPOYuCTJcyY/0qS/hLnTiD4SIGHcSXuz0Lu2TBWdzJqMy0UQq8+z61J4eu5wlt+3ckPQ+bpbBQecgRtTy4sq6EuOB7h96NjJrFnyayVK7gk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from=<pbonzini@redhat.com> (p=none dis=none) Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org> Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1734253700498682.3129555940345; Sun, 15 Dec 2024 01:08:20 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <qemu-devel-bounces@nongnu.org>) id 1tMkaH-0004X9-T2; Sun, 15 Dec 2024 04:06:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <pbonzini@redhat.com>) id 1tMkaD-0004WF-Hy for qemu-devel@nongnu.org; Sun, 15 Dec 2024 04:06:37 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <pbonzini@redhat.com>) id 1tMkaB-0006y3-Tz for qemu-devel@nongnu.org; Sun, 15 Dec 2024 04:06:37 -0500 Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-515-ckfCmBejMeqgN3rEE8NQSA-1; Sun, 15 Dec 2024 04:06:33 -0500 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3878ad4bf8cso1918126f8f.1 for <qemu-devel@nongnu.org>; Sun, 15 Dec 2024 01:06:33 -0800 (PST) Received: from [192.168.10.3] ([151.81.118.45]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-388c80602a1sm4695991f8f.97.2024.12.15.01.06.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 15 Dec 2024 01:06:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1734253594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dafbs10xKmoTxtWUtvU8S+9K7XsF2B4JhpZ07ZD6dXY=; b=AQOrrpV55OSwGrT2KjGzsjtvs6ZxV5MEC8rGr/dkRA/DjkgiV+Kdw1WAUhc43HBn+bLHxW sKFdszGLb851A3qcqXla8aDTrApmPkjUXNuGelxDdSYHD7sceKATKQhc8xd4FuUPQxxffy WpBOo9TvpYx23GtPF7IHJEtRBcAvnII= X-MC-Unique: ckfCmBejMeqgN3rEE8NQSA-1 X-Mimecast-MFC-AGG-ID: ckfCmBejMeqgN3rEE8NQSA X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734253591; x=1734858391; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dafbs10xKmoTxtWUtvU8S+9K7XsF2B4JhpZ07ZD6dXY=; b=rqwGJTuGeKi3g0NBf/Ja7W7HdjMhjIMEDPlcovcc1J3q+Pfm72MmnM94yl0JNn38Bg iT2BWWoZcX+TOWJLtZDBOFGfbgE9uvKC0WC/yVu9dpFE7tcYyIiGG7RYZhoc+vrRiPHn Xlz4pPqUayg47UX4zEgN48AcKtKq/CBi1Za59le7zFSUr+r6RfB+2Zurj3/sGmv+JXn9 RZZupop5so7hVKuFYvp2R/o976hb441RnKz96taT8OdyEMANlIigHucpdSXUI6m7W2h5 a6KBWxUu9AkEhAU4jG3pAeguX24cj+Na6f05P2aUB2hAwKOXEiMhVY+v07JPEhpqGGlS 8LPg== X-Gm-Message-State: AOJu0YylUUlCaF+h2bEJs4k2QF46EBD+JJdPGZ+tyymmf5pIu5ow29C1 x4k1M48Uz0tvF1JQV4aXmSCUPBId2MAIm5MDCgr3gcUH/Ryi73lDe5Rfi0VoWlTOnvykysicaUp nKk2Z4mvKmm2fJE4sG+wOVuEXAYq738svivM/jidPKDAKuSittXdq00Nkvc6GuYH0M1ukKZbpd4 /48oBl18uu0WtS+ejxCf4R+VCRKIl4na41RLyB X-Gm-Gg: ASbGncuq70vnOWZvhla01y2URM0nsnBiZBQI5anHw6GbKV3Miay+OoLL2Jnw/qXQVwq UDsObJD2U7Ot61PlD9oFcVPxAt2pZ9nSVxYlZmjoai3bebdu53g5UFgs4OaLVibtzZGOPZo/XVK mNi45Q9eJ4Ijgyi94y27KQcItAnV1MvQvZ3Yq+AfcvzRDRR7H5Z37hOVqAAXihbjMDMjmBjCcx+ 5evAZdfrMEpzmvQoncaveEDoGyxwguxy1NahlemplD/giajdopRS0R1LxA= X-Received: by 2002:a05:6000:4b1b:b0:385:dedb:a156 with SMTP id ffacd0b85a97d-38880af1254mr7564526f8f.6.1734253590910; Sun, 15 Dec 2024 01:06:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IGAAtuyechW2rSVsAkbBjZvvBUMuQKGDwGU5oAA6RJetN8vKCWSSb9todVJ0OSDiEce0V9oyw== X-Received: by 2002:a05:6000:4b1b:b0:385:dedb:a156 with SMTP id ffacd0b85a97d-38880af1254mr7564488f8f.6.1734253590433; Sun, 15 Dec 2024 01:06:30 -0800 (PST) From: Paolo Bonzini <pbonzini@redhat.com> To: qemu-devel@nongnu.org Cc: richard.henderson@linaro.org Subject: [PATCH 05/13] target/i386: reorganize ops emitted by do_gen_rep, drop repz_opt Date: Sun, 15 Dec 2024 10:06:04 +0100 Message-ID: <20241215090613.89588-6-pbonzini@redhat.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241215090613.89588-1-pbonzini@redhat.com> References: <20241215090613.89588-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.292, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: <qemu-devel.nongnu.org> List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>, <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe> List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel> List-Post: <mailto:qemu-devel@nongnu.org> List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help> List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>, <mailto:qemu-devel-request@nongnu.org?subject=subscribe> Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1734253701499116600 Content-Type: text/plain; charset="utf-8" The condition for optimizing repeat instruction is more or less the opposite of what you imagine: almost always the string instruction was _not_ optimized and optimizing the loop relied on goto_tb. This is obviously not great for performance, due to the cost of the exit-to-main-loop check, but also wrong. In fact, after expanding dc->jmp_opt and simplifying "!!x" to "x", the condition for looping used to be: ((cflags & CF_NO_GOTO_TB) || (flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MASK))) && !(cflags = & CF_USE_ICOUNT) In other words, setting aside RF (it requires special handling for REP instructions and it was completely missing), repeat instruction were being optimized if TF or inhibit IRQ flags were set. This is certainly wrong for TF, because string instructions trap after every execution, and probably for interrupt shadow too. Get rid of repz_opt completely. The next patches will reintroduce the optimization, applying it in the common case instead of the unlikely and wrong one. While at it, place the CX/ECX/RCX=3D0 case is at the end of the function, which saves a label and is clearer when reading the generated ops. For clarity, mark the cc_op explicitly as DYNAMIC even if at the end of the translation block; the cc_op can come from either the previous instruction or the string instruction, and currently we rely on a gen_update_cc_op() that is hidden in the bowels of gen_jcc() to spill cc_op and mark it clean. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> --- target/i386/tcg/translate.c | 60 ++++++++----------------------------- 1 file changed, 13 insertions(+), 47 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 63a39d9f15a..3732d05d5f5 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -112,7 +112,6 @@ typedef struct DisasContext { #endif bool vex_w; /* used by AVX even on 32-bit processors */ bool jmp_opt; /* use direct block chaining for direct jumps */ - bool repz_opt; /* optimize jumps within repz instructions */ bool cc_op_dirty; =20 CCOp cc_op; /* current CC operation */ @@ -1205,23 +1204,6 @@ static inline void gen_jcc(DisasContext *s, int b, T= CGLabel *l1) } } =20 -/* XXX: does not work with gdbstub "ice" single step - not a - serious problem. The caller can jump to the returned label - to stop the REP but, if the flags have changed, it has to call - gen_update_cc_op before doing so. */ -static TCGLabel *gen_jz_ecx_string(DisasContext *s) -{ - TCGLabel *l1 =3D gen_new_label(); - TCGLabel *l2 =3D gen_new_label(); - - gen_update_cc_op(s); - gen_op_jnz_ecx(s, l1); - gen_set_label(l2); - gen_jmp_rel_csize(s, 0, 1); - gen_set_label(l1); - return l2; -} - static void gen_stos(DisasContext *s, MemOp ot) { gen_string_movl_A0_EDI(s); @@ -1313,27 +1295,25 @@ static void do_gen_rep(DisasContext *s, MemOp ot, void (*fn)(DisasContext *s, MemOp ot), bool is_repz_nz) { - TCGLabel *l2; - l2 =3D gen_jz_ecx_string(s); + TCGLabel *done =3D gen_new_label(); + + gen_update_cc_op(s); + gen_op_jz_ecx(s, done); + fn(s, ot); gen_op_add_reg_im(s, s->aflag, R_ECX, -1); if (is_repz_nz) { int nz =3D (s->prefix & PREFIX_REPNZ) ? 1 : 0; - gen_jcc(s, (JCC_Z << 1) | (nz ^ 1), l2); + gen_jcc(s, (JCC_Z << 1) | (nz ^ 1), done); } - /* - * A loop would cause two single step exceptions if ECX =3D 1 - * before rep string_insn - */ - if (s->repz_opt) { - gen_op_jz_ecx(s, l2); - } - /* - * For CMPS/SCAS there is no need to set CC_OP_DYNAMIC: only one itera= tion - * is done at a time, so the translation block ends unconditionally af= ter - * this instruction and there is no control flow junction. - */ + + /* Go to the main loop but reenter the same instruction. */ gen_jmp_rel_csize(s, -cur_insn_len(s), 0); + + /* CX/ECX/RCX is zero, or REPZ/REPNZ broke the repetition. */ + gen_set_label(done); + set_cc_op(s, CC_OP_DYNAMIC); + gen_jmp_rel_csize(s, 0, 1); } =20 static void gen_repz(DisasContext *s, MemOp ot, @@ -3664,20 +3644,6 @@ static void i386_tr_init_disas_context(DisasContextB= ase *dcbase, CPUState *cpu) dc->cpuid_xsave_features =3D env->features[FEAT_XSAVE]; dc->jmp_opt =3D !((cflags & CF_NO_GOTO_TB) || (flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MAS= K))); - /* - * If jmp_opt, we want to handle each string instruction individually. - * For icount also disable repz optimization so that each iteration - * is accounted separately. - * - * FIXME: this is messy; it makes REP string instructions a lot less - * efficient than they should be and it gets in the way of correct - * handling of RF (interrupts or traps arriving after any iteration - * of a repeated string instruction but the last should set RF to 1). - * Perhaps it would be more efficient if REP string instructions were - * always at the beginning of the TB, or even their own TB? That - * would even allow accounting up to 64k iterations at once for icount. - */ - dc->repz_opt =3D !dc->jmp_opt && !(cflags & CF_USE_ICOUNT); =20 dc->T0 =3D tcg_temp_new(); dc->T1 =3D tcg_temp_new(); --=20 2.47.1