From nobody Fri Oct 24 09:58:55 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1519160613011402.9247764809669; Tue, 20 Feb 2018 13:03:33 -0800 (PST) Received: from localhost ([::1]:57752 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoF4V-0000lc-Q2 for importer@patchew.org; Tue, 20 Feb 2018 16:03:31 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37916) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoF32-00008d-E9 for qemu-devel@nongnu.org; Tue, 20 Feb 2018 16:02:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eoF2x-0001KB-18 for qemu-devel@nongnu.org; Tue, 20 Feb 2018 16:02:00 -0500 Received: from mail-wr0-x242.google.com ([2a00:1450:400c:c0c::242]:44010) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eoF2w-0001IP-NS for qemu-devel@nongnu.org; Tue, 20 Feb 2018 16:01:54 -0500 Received: by mail-wr0-x242.google.com with SMTP id u49so13614681wrc.10 for ; Tue, 20 Feb 2018 13:01:54 -0800 (PST) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id u198sm8293340wmu.44.2018.02.20.13.01.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Feb 2018 13:01:52 -0800 (PST) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id 8C79B3E0BDF; Tue, 20 Feb 2018 21:01:51 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=M3SWRJHuh9VY/5wSlvEPjK6bzrVEJSfYIwOz6oMLs1g=; b=TGFtbkJUaM3LDjezKcHmA3lMGQHXmvRr67viOBhI23Za662CYBSOiqEm7z9dRLx637 NBBya0piewwMjLpCr0hrS6okqStqfIo5HGqsshMRu5bAVYsC8NG3bTEP0lEqvvkROcKy LDuRO0myhTTttQNyzO3fCzDobY6N/gm2vZ13o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=M3SWRJHuh9VY/5wSlvEPjK6bzrVEJSfYIwOz6oMLs1g=; b=TrjOhyQzufcFabaz23uNPIOaPFCHjncllPnxs7AWcnBlrBz8SUUFnz8mXnSysE0F3C g4n0XIM36IcAGyI6l2VwQghDDu421owlxNztKdc0QQLW4z16lI32P1YrgPUmq+gJLY6Q vjvj5dOK7guJLywLBxiANWnr3WDsxlyEMgHhg2pINXeQVQr6UmG3j8mddBFFrLp7XV2X vacI2Fug1alOs8DtRl7ulPbtoC5IlIxD6jiSHI2jngZOVTqeg+SDAw3PbDqdDS1/ZmJk OgN2Dqg+pAwwH1PKTNWa8JfJaC8e/KIM84MS6eC8yo4+B/K6O+bLQCEIMcb3mPugI1Bf vcog== X-Gm-Message-State: APf1xPCCnRa/8tlZDjOUj8I+m3Lkgd/2Rv2YJsxJeUDay8XTuXSz+q4R FrUAZvupPoeR5e+v5VoEOicVrg== X-Google-Smtp-Source: AH8x227St+ueUA2kwM/C7ACCcEGR38gCTF63oFJhrzUMCdjv4o6Elon5IREFreXxBHBq2npttAAMIQ== X-Received: by 10.28.194.2 with SMTP id s2mr240288wmf.55.1519160512974; Tue, 20 Feb 2018 13:01:52 -0800 (PST) From: =?UTF-8?q?Alex=20Benn=C3=A9e?= To: peter.maydell@linaro.org, richard.henderson@linaro.org Date: Tue, 20 Feb 2018 21:01:37 +0000 Message-Id: <20180220210137.18018-1-alex.bennee@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c0c::242 Subject: [Qemu-devel] [PATCH] fpu/softfloat: use hardware sqrt if we can (EXPERIMENT!) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Crosthwaite , Riku Voipio , qemu-devel@nongnu.org, Laurent Vivier , "open list:ARM" , Paolo Bonzini , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Aurelien Jarno , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 This is an attempt to save some of the cost of sqrt by using the inbuilt support of the host hardware. The idea is assuming we start with a valid input we can use the hardware. If any tininess issues occur this will trip and FPU exception where: - we turn off cpu->use_host_fpu - mask the FPU exceptions - return to what we were doing Once we return we should pick up the fact that there was something weird about the operation and fall-back to the pure software implementation. You could imagine this being extended for code generation but instead of returning to the code we could exit and re-generate the TB but this time with pure software helpers rather than any support from the hardware. This is a sort of fix-it-up after the fact approach because reading the FP state is an expensive operation for everything so let's only worry about exceptions when they trip... Signed-off-by: Alex Benn=C3=A9e --- cpus.c | 28 ++++++++++++++++++++++++++++ fpu/softfloat.c | 40 +++++++++++++++++++++++++++++++++++----- include/fpu/softfloat-types.h | 2 ++ include/fpu/softfloat.h | 4 ++++ include/qom/cpu.h | 1 + linux-user/main.c | 8 ++++++++ linux-user/signal.c | 16 ++++++++++++++++ target/arm/cpu.c | 4 ++++ 8 files changed, 98 insertions(+), 5 deletions(-) diff --git a/cpus.c b/cpus.c index f298b659f4..e435f6737b 100644 --- a/cpus.c +++ b/cpus.c @@ -23,6 +23,7 @@ */ =20 #include "qemu/osdep.h" +#include #include "qemu/config-file.h" #include "cpu.h" #include "monitor/monitor.h" @@ -1078,10 +1079,36 @@ static void qemu_init_sigbus(void) =20 prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_EARLY, 0, 0); } + +static void sigfpu_handler(int n, siginfo_t *siginfo, void *ctx) +{ + fprintf(stderr, "%s: got %d, %p/%p\n", __func__, n, siginfo, ctx); + + /* Called asynchronously in VCPU thread. */ + g_assert(current_cpu); +} + +static void qemu_init_sigfpu(void) +{ + struct sigaction action; + + memset(&action, 0, sizeof(action)); + action.sa_flags =3D SA_SIGINFO; + action.sa_sigaction =3D sigfpu_handler; + sigaction(SIGBUS, &action, NULL); + + feenableexcept(FE_INVALID | + FE_OVERFLOW | + FE_UNDERFLOW | + FE_INEXACT); +} #else /* !CONFIG_LINUX */ static void qemu_init_sigbus(void) { } +static void qemu_init_sigfpu(void) +{ +} #endif /* !CONFIG_LINUX */ =20 static QemuMutex qemu_global_mutex; @@ -1827,6 +1854,7 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) if (!tcg_region_inited) { tcg_region_inited =3D 1; tcg_region_init(); + qemu_init_sigfpu(); } =20 if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { diff --git a/fpu/softfloat.c b/fpu/softfloat.c index e7fb0d357a..ec9355af7a 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -1905,10 +1905,12 @@ float64 float64_scalbn(float64 a, int n, float_stat= us *status) * bits to ensure we get a correctly rounded result. * * This does mean however the calculation is slower than before, - * especially for 64 bit floats. + * especially for 64 bit floats. However the caller can only do checks + * if they actually want to off-load to the library. */ =20 -static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt= *p) +static FloatParts sqrt_float(FloatParts a, float_status *s, + const FloatFmt *p, bool check_only) { uint64_t a_frac, r_frac, s_frac; int bit, last_bit; @@ -1928,6 +1930,10 @@ static FloatParts sqrt_float(FloatParts a, float_sta= tus *s, const FloatFmt *p) return a; /* sqrt(+inf) =3D +inf */ } =20 + if (check_only) { + return a; + } + assert(a.cls =3D=3D float_class_normal); =20 /* We need two overflow bits at the top. Adding room for that is a @@ -1973,21 +1979,45 @@ static FloatParts sqrt_float(FloatParts a, float_st= atus *s, const FloatFmt *p) float16 __attribute__((flatten)) float16_sqrt(float16 a, float_status *sta= tus) { FloatParts pa =3D float16_unpack_canonical(a, status); - FloatParts pr =3D sqrt_float(pa, status, &float16_params); + FloatParts pr =3D sqrt_float(pa, status, &float16_params, false); return float16_round_pack_canonical(pr, status); } =20 float32 __attribute__((flatten)) float32_sqrt(float32 a, float_status *sta= tus) { FloatParts pa =3D float32_unpack_canonical(a, status); - FloatParts pr =3D sqrt_float(pa, status, &float32_params); + FloatParts pr; + + if (status->use_host_fpu && *status->use_host_fpu) { + pr =3D sqrt_float(pa, status, &float32_params, true); + if (pr.cls =3D=3D float_class_normal) { + float32 r =3D __builtin_sqrt(a); + if (*status->use_host_fpu) { + return r; + } + } + } + + pr =3D sqrt_float(pa, status, &float32_params, false); return float32_round_pack_canonical(pr, status); } =20 float64 __attribute__((flatten)) float64_sqrt(float64 a, float_status *sta= tus) { FloatParts pa =3D float64_unpack_canonical(a, status); - FloatParts pr =3D sqrt_float(pa, status, &float64_params); + FloatParts pr =3D sqrt_float(pa, status, &float64_params, true); + + if (status->use_host_fpu && *status->use_host_fpu) { + pr =3D sqrt_float(pa, status, &float64_params, true); + if (pr.cls =3D=3D float_class_normal) { + float64 r =3D __builtin_sqrt(a); + if (*status->use_host_fpu) { + return r; + } + } + } + + pr =3D sqrt_float(pa, status, &float64_params, false); return float64_round_pack_canonical(pr, status); } =20 diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h index 4e378cb612..4c32e56cad 100644 --- a/include/fpu/softfloat-types.h +++ b/include/fpu/softfloat-types.h @@ -174,6 +174,8 @@ typedef struct float_status { flag flush_inputs_to_zero; flag default_nan_mode; flag snan_bit_is_one; + /* can we use the host_fpu for some things? */ + bool *use_host_fpu; } float_status; =20 #endif /* SOFTFLOAT_TYPES_H */ diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 9b7b5e34e2..f7ee0232a2 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -157,6 +157,10 @@ static inline flag get_default_nan_mode(float_status *= status) { return status->default_nan_mode; } +static inline void enable_host_fpu(bool *host_fpu_flag, float_status *stat= us) +{ + status->use_host_fpu =3D host_fpu_flag; +} =20 /*------------------------------------------------------------------------= ---- | Routine to raise any or all of the software IEC/IEEE floating-point diff --git a/include/qom/cpu.h b/include/qom/cpu.h index aff88fa16f..337ebef8b6 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -396,6 +396,7 @@ struct CPUState { uint32_t halted; uint32_t can_do_io; int32_t exception_index; + bool use_host_fpu; =20 /* shared by kvm, hax and hvf */ bool vcpu_dirty; diff --git a/linux-user/main.c b/linux-user/main.c index 7de0e02487..36b6be3b2b 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -20,6 +20,7 @@ #include "qemu-version.h" #include #include +#include =20 #include "qapi/error.h" #include "qemu.h" @@ -4927,6 +4928,13 @@ int main(int argc, char **argv, char **envp) } gdb_handlesig(cpu, 0); } + + feenableexcept(FE_INVALID | + FE_OVERFLOW | + FE_UNDERFLOW | + FE_INEXACT); + cpu->use_host_fpu =3D true; + cpu_loop(env); /* never exits */ return 0; diff --git a/linux-user/signal.c b/linux-user/signal.c index 9a380b9e31..0773d3ef18 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -20,6 +20,7 @@ #include "qemu/bitops.h" #include #include +#include =20 #include "qemu.h" #include "qemu-common.h" @@ -639,6 +640,21 @@ static void host_signal_handler(int host_signum, sigin= fo_t *info, ucontext_t *uc =3D puc; struct emulated_sigtable *k; =20 + /* Catch any FPU exceptions we might get from having tried to use + * the host FPU to speed up some calculations + */ + if (host_signum =3D=3D SIGFPE && cpu->use_host_fpu) { + cpu->use_host_fpu =3D false; + /* sadly this gets lost on the context switch when we return */ + fedisableexcept(FE_INVALID | + FE_OVERFLOW | + FE_UNDERFLOW | + FE_INEXACT); + /* sigaddset(&uc->uc_sigmask, SIGFPE); */ + uc->__fpregs_mem.mxcsr |=3D 0x1f80; + return; + } + /* the CPU emulator uses some host signals to detect exceptions, we forward to it some signals */ if ((host_signum =3D=3D SIGSEGV || host_signum =3D=3D SIGBUS) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 1b3ae62db6..67dce53a68 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -306,6 +306,10 @@ static void arm_cpu_reset(CPUState *s) &env->vfp.fp_status); set_float_detect_tininess(float_tininess_before_rounding, &env->vfp.standard_fp_status); + + enable_host_fpu(&s->use_host_fpu, &env->vfp.fp_status); + enable_host_fpu(&s->use_host_fpu, &env->vfp.standard_fp_status); + #ifndef CONFIG_USER_ONLY if (kvm_enabled()) { kvm_arm_reset_vcpu(cpu); --=20 2.15.1