From nobody Tue Oct 28 02:06:20 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1522129140509895.016552013864; Mon, 26 Mar 2018 22:39:00 -0700 (PDT) Received: from localhost ([::1]:60500 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f0hJz-0002Vw-KM for importer@patchew.org; Tue, 27 Mar 2018 01:38:59 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35787) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f0hFI-0007P5-70 for qemu-devel@nongnu.org; Tue, 27 Mar 2018 01:34:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f0hFD-0005Lw-F5 for qemu-devel@nongnu.org; Tue, 27 Mar 2018 01:34:08 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:36039) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f0hFD-0005LK-AK for qemu-devel@nongnu.org; Tue, 27 Mar 2018 01:34:03 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 0705521653; Tue, 27 Mar 2018 01:34:03 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Tue, 27 Mar 2018 01:34:03 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 9CFA91025C; Tue, 27 Mar 2018 01:34:02 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=mesmtp; bh=KSsQUMmwlLE3UF tIKBIx3n1r2tzXDtMl1iHJBOIDCjQ=; b=cHfsSMn9CQETQ6zUYtAJTzl0dfNrnW +w5QEN4WjfTQ+TZMJzny69nm/mIFmjw6arIYa3Dp0V/fNVNqVYwRqZtfwgqFSZ9h ldgKx+OKjDgI5/8hNI2yVB5yM8HQonbSwx2vscTFIV4mwdIRthrsu0xz6Tj7h/JR Fp+6QF8xVrpcs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=KSsQUMmwlLE3UFtIKBIx3n1r2tzXDtMl1iHJBOIDCjQ=; b=mLZ0PbwZ CniIYkyGAk8hThfrRLuwioRNTInltwVVCOGgU1t+PhbMdqUPUeXQTsIhxoei/MUy VNETqxA9q4PprH7nnGog4sv1wb0g4Veau7WMXlLCgkbm6XDdxiW0joWGEKRY6p38 IndG60dRigLWM4dXZfYgv7mnac2TXFikijN8VvDbEhhoqT8gBVEQjsWK1L8MtSlG 6C1rE0cMGugJrvovqYcPVK9RfdWPtzaIaAD/4X+o21PGNyGTp3coooPD0i4Z1eQe LiKIiC7g3aSs8DuruGYmkcaONvc+i1rsWcMwn7xwBN5IIyoIqoZEKjJCcb7pm6xo FJGBEJTIE2GgTQ== X-ME-Sender: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Tue, 27 Mar 2018 01:33:53 -0400 Message-Id: <1522128840-498-8-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1522128840-498-1-git-send-email-cota@braap.org> References: <1522128840-498-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.29 Subject: [Qemu-devel] [PATCH v2 07/14] fpu: introduce hardfloat X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Mark Cave-Ayland , Richard Henderson , Laurent Vivier , Paolo Bonzini , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Aurelien Jarno Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The appended paves the way for leveraging the host FPU for a subset of guest FP operations. For most guest workloads (e.g. FP flags aren't ever cleared, inexact occurs often and rounding is set to the default [to nearest]) this will yield sizable performance speedups. The approach followed here avoids checking the FP exception flags register. See the added comment for details. This assumes that QEMU is running on an IEEE754-compliant FPU and that the rounding is set to the default (to nearest). The implementation-dependent specifics of the FPU should not matter; things like tininess detection and snan representation are still dealt with in soft-fp. However, this approach will break on most hosts if we compile QEMU with flags such as -ffast-math. We control the flags so this should be easy to enforce though. This patch just adds some boilerplate code; subsequent patches add operations, one per commit to ease bisection. Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 91 insertions(+) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 6803279..ffe16b2 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -82,6 +82,8 @@ this code that are retained. /* softfloat (and in particular the code in softfloat-specialize.h) is * target-dependent and needs the TARGET_* macros. */ +#include + #include "qemu/osdep.h" #include "qemu/bitops.h" #include "fpu/softfloat.h" @@ -105,6 +107,95 @@ this code that are retained. *-------------------------------------------------------------------------= ---*/ #include "softfloat-specialize.h" =20 +/* + * Hardfloat + * + * Fast emulation of guest FP instructions is challenging for two reasons. + * First, FP instruction semantics are similar but not identical, particul= arly + * when handling NaNs. Second, emulating at reasonable speed the guest FP + * exception flags is not trivial: reading the host's flags register with a + * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp= ], + * and trapping on every FP exception is not fast nor pleasant to work wit= h. + * + * We address these challenges by leverage the host FPU for a subset of the + * operations. To do this we follow the main idea presented in this paper: + * + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in= a + * binary translator." Software: Practice and Experience 46.12 (2016):1591= -1615. + * + * The idea is thus to leverage the host FPU to (1) compute FP operations + * and (2) identify whether FP exceptions occurred while avoiding + * expensive exception flag register accesses. + * + * An important optimization shown in the paper is that given that excepti= on + * flags are rarely cleared by the guest, we can avoid recomputing some fl= ags. + * This is particularly useful for the inexact flag, which is very frequen= tly + * raised in floating-point workloads. + * + * We optimize the code further by deferring to soft-fp whenever FP except= ion + * detection might get hairy. Two examples: (1) when at least one operand = is + * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 r= esult + * and the result is < the minimum normal. + */ +#define GEN_TYPE_CONV(name, to_t, from_t) \ + static inline to_t name(from_t a) \ + { \ + to_t r =3D *(to_t *)&a; \ + return r; \ + } + +GEN_TYPE_CONV(float32_to_float, float, float32) +GEN_TYPE_CONV(float64_to_double, double, float64) +GEN_TYPE_CONV(float_to_float32, float32, float) +GEN_TYPE_CONV(double_to_float64, float64, double) +#undef GEN_TYPE_CONV + +#define GEN_INPUT_FLUSH(soft_t) \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush__nocheck(soft_t *a, float_status *s) \ + { \ + if (unlikely(soft_t ## _is_denormal(*a))) { \ + *a =3D soft_t ## _set_sign(soft_t ## _zero, \ + soft_t ## _is_neg(*a)); \ + s->float_exception_flags |=3D float_flag_input_denormal; \ + } \ + } \ + \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush1(soft_t *a, float_status *s) \ + { \ + if (likely(!s->flush_inputs_to_zero)) { \ + return; \ + } \ + soft_t ## _input_flush__nocheck(a, s); \ + } \ + \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush2(soft_t *a, soft_t *b, float_status *s) \ + { \ + if (likely(!s->flush_inputs_to_zero)) { \ + return; \ + } \ + soft_t ## _input_flush__nocheck(a, s); \ + soft_t ## _input_flush__nocheck(b, s); \ + } \ + \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush3(soft_t *a, soft_t *b, soft_t *c, \ + float_status *s) \ + { \ + if (likely(!s->flush_inputs_to_zero)) { \ + return; \ + } \ + soft_t ## _input_flush__nocheck(a, s); \ + soft_t ## _input_flush__nocheck(b, s); \ + soft_t ## _input_flush__nocheck(c, s); \ + } + +GEN_INPUT_FLUSH(float32) +GEN_INPUT_FLUSH(float64) +#undef GEN_INPUT_FLUSH + /*------------------------------------------------------------------------= ---- | Returns the fraction bits of the half-precision floating-point value `a'. *-------------------------------------------------------------------------= ---*/ --=20 2.7.4