From nobody Sun Oct 26 00:03:09 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1521664114624705.4678697468104; Wed, 21 Mar 2018 13:28:34 -0700 (PDT) Received: from localhost ([::1]:57211 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eykLZ-0007g0-9D for importer@patchew.org; Wed, 21 Mar 2018 16:28:33 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42160) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eyk5V-00029u-TY for qemu-devel@nongnu.org; Wed, 21 Mar 2018 16:12:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eyk5R-0000zE-9B for qemu-devel@nongnu.org; Wed, 21 Mar 2018 16:11:57 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:60729) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eyk5R-0000yA-1v for qemu-devel@nongnu.org; Wed, 21 Mar 2018 16:11:53 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 1E42C2140B; Wed, 21 Mar 2018 16:11:52 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute4.internal (MEProxy); Wed, 21 Mar 2018 16:11:52 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id C53BA7E16D; Wed, 21 Mar 2018 16:11:51 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=mesmtp; bh=ftaBGbC7QTUEqs J/+mluIqhvF47nKZxEUKe71/TIfPE=; b=0j4jlSG1kx+/LMEOCdOj39+ixYNaw+ eio5/eayD3DaviLjW6MTSdSCLzApHcI4YRXkPo4qPa629i2PLQvqkcLBr7Xw7+4b j8n2AC3FIU9WFc5t8YaZ4EIbfIPC8hV6y7WMcjN+KRYH7lMUY2ZuDfWonAVljg/U oC8RAnDHnSWe8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=ftaBGbC7QTUEqsJ/+mluIqhvF47nKZxEUKe71/TIfPE=; b=moESHW4L 57u8iZt6LHmDCqlGUjJHRIOBZgJJXfrQpzNKIaMEeCc/sYPEiMahJ1kjQ4ZyUkxD 7VynJzAefA9FbovqiJWYVr5nwVwWtz91IPBO+AB2WNya8Fb4vSXRHLJg8uvjvV3/ jzkSMgvlQfUmDJ4V2YUt7/9UCTax6iSjspXy5PXuRArvPgDJ2hFx7Se9NhSaV2Bq JdRKSN9A95gMAHi1UUBGICGhN9y92FUMC5rGmIcObwjXz2zjUwHG1gNT85Y2Mqes wPw0KsORISBQrwuI8dcQIRXANW8RWMHCui/qNn730yDyYFI5ibxiI3Oar4gBrNwh Fa3q+V6NSMsR7w== X-ME-Sender: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Wed, 21 Mar 2018 16:11:42 -0400 Message-Id: <1521663109-32262-8-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1521663109-32262-1-git-send-email-cota@braap.org> References: <1521663109-32262-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.29 Subject: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Mark Cave-Ayland , Richard Henderson , Laurent Vivier , Paolo Bonzini , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Aurelien Jarno Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The appended paves the way for leveraging the host FPU for a subset of guest FP operations. For most guest workloads (e.g. FP flags aren't ever cleared, inexact occurs often and rounding is set to the default [to nearest]) this will yield sizable performance speedups. The approach followed here avoids checking the FP exception flags register. See the comment at the top of hostfloat.c for details. This assumes that QEMU is running on an IEEE754-compliant FPU and that the rounding is set to the default (to nearest). The implementation-dependent specifics of the FPU should not matter; things like tininess detection and snan representation are still dealt with in soft-fp. However, this approach will break on most hosts if we compile QEMU with flags such as -ffast-math. We control the flags so this should be easy to enforce though. The licensing in softfloat.h is complicated at best, so to keep things simple I'm adding this as a separate, GPL'ed file. This patch just adds some boilerplate code; subsequent patches add operations, one per commit to ease bisection. Signed-off-by: Emilio G. Cota --- Makefile.target | 2 +- include/fpu/hostfloat.h | 14 +++++++ include/fpu/softfloat.h | 1 + fpu/hostfloat.c | 96 +++++++++++++++++++++++++++++++++++++++++++= ++++ target/m68k/Makefile.objs | 2 +- tests/fp-test/Makefile | 2 +- 6 files changed, 114 insertions(+), 3 deletions(-) create mode 100644 include/fpu/hostfloat.h create mode 100644 fpu/hostfloat.c diff --git a/Makefile.target b/Makefile.target index 6549481..efcdfb9 100644 --- a/Makefile.target +++ b/Makefile.target @@ -97,7 +97,7 @@ obj-$(CONFIG_TCG) +=3D tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-= vec.o tcg/tcg-op-gvec.o obj-$(CONFIG_TCG) +=3D tcg/tcg-common.o tcg/optimize.o obj-$(CONFIG_TCG_INTERPRETER) +=3D tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) +=3D disas/tci.o -obj-y +=3D fpu/softfloat.o +obj-y +=3D fpu/softfloat.o fpu/hostfloat.o obj-y +=3D target/$(TARGET_BASE_ARCH)/ obj-y +=3D disas.o obj-$(call notempty,$(TARGET_XML_FILES)) +=3D gdbstub-xml.o diff --git a/include/fpu/hostfloat.h b/include/fpu/hostfloat.h new file mode 100644 index 0000000..b01291b --- /dev/null +++ b/include/fpu/hostfloat.h @@ -0,0 +1,14 @@ +/* + * Copyright (C) 2018, Emilio G. Cota + * + * License: GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#ifndef HOSTFLOAT_H +#define HOSTFLOAT_H + +#ifndef SOFTFLOAT_H +#error fpu/hostfloat.h must only be included from softfloat.h +#endif + +#endif /* HOSTFLOAT_H */ diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 8fb44a8..8963b68 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -95,6 +95,7 @@ enum { }; =20 #include "fpu/softfloat-types.h" +#include "fpu/hostfloat.h" =20 static inline void set_float_detect_tininess(int val, float_status *status) { diff --git a/fpu/hostfloat.c b/fpu/hostfloat.c new file mode 100644 index 0000000..cab0341 --- /dev/null +++ b/fpu/hostfloat.c @@ -0,0 +1,96 @@ +/* + * hostfloat.c - FP primitives that use the host's FPU whenever possible. + * + * Copyright (C) 2018, Emilio G. Cota + * + * License: GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + * Fast emulation of guest FP instructions is challenging for two reasons. + * First, FP instruction semantics are similar but not identical, particul= arly + * when handling NaNs. Second, emulating at reasonable speed the guest FP + * exception flags is not trivial: reading the host's flags register with a + * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp= ], + * and trapping on every FP exception is not fast nor pleasant to work wit= h. + * + * This module leverages the host FPU for a subset of the operations. To + * do this it follows the main idea presented in this paper: + * + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in= a + * binary translator." Software: Practice and Experience 46.12 (2016):1591= -1615. + * + * The idea is thus to leverage the host FPU to (1) compute FP operations + * and (2) identify whether FP exceptions occurred while avoiding + * expensive exception flag register accesses. + * + * An important optimization shown in the paper is that given that excepti= on + * flags are rarely cleared by the guest, we can avoid recomputing some fl= ags. + * This is particularly useful for the inexact flag, which is very frequen= tly + * raised in floating-point workloads. + * + * We optimize the code further by deferring to soft-fp whenever FP + * exception detection might get hairy. Fortunately this is not common. + */ +#include + +#include "qemu/osdep.h" +#include "fpu/softfloat.h" + +#define GEN_TYPE_CONV(name, to_t, from_t) \ + static inline to_t name(from_t a) \ + { \ + to_t r =3D *(to_t *)&a; \ + return r; \ + } + +GEN_TYPE_CONV(float32_to_float, float, float32) +GEN_TYPE_CONV(float64_to_double, double, float64) +GEN_TYPE_CONV(float_to_float32, float32, float) +GEN_TYPE_CONV(double_to_float64, float64, double) +#undef GEN_TYPE_CONV + +#define GEN_INPUT_FLUSH(soft_t) \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush__nocheck(soft_t *a, float_status *s) \ + { \ + if (unlikely(soft_t ## _is_denormal(*a))) { \ + *a =3D soft_t ## _set_sign(soft_t ## _zero, \ + soft_t ## _is_neg(*a)); \ + s->float_exception_flags |=3D float_flag_input_denormal; \ + } \ + } \ + \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush1(soft_t *a, float_status *s) \ + { \ + if (likely(!s->flush_inputs_to_zero)) { \ + return; \ + } \ + soft_t ## _input_flush__nocheck(a, s); \ + } \ + \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush2(soft_t *a, soft_t *b, float_status *s) \ + { \ + if (likely(!s->flush_inputs_to_zero)) { \ + return; \ + } \ + soft_t ## _input_flush__nocheck(a, s); \ + soft_t ## _input_flush__nocheck(b, s); \ + } \ + \ + static inline __attribute__((always_inline)) void \ + soft_t ## _input_flush3(soft_t *a, soft_t *b, soft_t *c, \ + float_status *s) \ + { \ + if (likely(!s->flush_inputs_to_zero)) { \ + return; \ + } \ + soft_t ## _input_flush__nocheck(a, s); \ + soft_t ## _input_flush__nocheck(b, s); \ + soft_t ## _input_flush__nocheck(c, s); \ + } + +GEN_INPUT_FLUSH(float32) +GEN_INPUT_FLUSH(float64) +#undef GEN_INPUT_FLUSH diff --git a/target/m68k/Makefile.objs b/target/m68k/Makefile.objs index ac61948..2868b11 100644 --- a/target/m68k/Makefile.objs +++ b/target/m68k/Makefile.objs @@ -1,5 +1,5 @@ obj-y +=3D m68k-semi.o obj-y +=3D translate.o op_helper.o helper.o cpu.o -obj-y +=3D fpu_helper.o softfloat.o +obj-y +=3D fpu_helper.o softfloat.o hostfloat.o obj-y +=3D gdbstub.o obj-$(CONFIG_SOFTMMU) +=3D monitor.o diff --git a/tests/fp-test/Makefile b/tests/fp-test/Makefile index 703434f..187cfcc 100644 --- a/tests/fp-test/Makefile +++ b/tests/fp-test/Makefile @@ -28,7 +28,7 @@ ibm: $(WHITELIST_FILES): wget -nv -O $@ http://www.cs.columbia.edu/~cota/qemu/fpbench-$@ =20 -fp-test$(EXESUF): fp-test.o softfloat.o +fp-test$(EXESUF): fp-test.o softfloat.o hostfloat.o =20 clean: rm -f *.o *.d $(OBJS) --=20 2.7.4