From nobody Sat Nov 23 17:47:29 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1732153925; cv=none; d=zohomail.com; s=zohoarc; b=nSFP5eWIzLYjIWp2qPkTs2YOmlnBUgCxQ+yZuFzeywY65sPwi19EMYtfQ4PWN09Gyfb1lwVeyUzSbjspi03k+KVUmdUvJI5+Igh71Ns6fcydLHXdKAoRIZ/eRoZYv1eodvl4Yh1YekTDx5LhCv0F6QjN27QjDQ64p6QUKRvcRgs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1732153925; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=o0Qxb4icP5QHxebSgta6L/OLUkbwEJuI+/53RhwccoA=; b=JqOqeG3bI76gzEtnNJkbWapRCDGpCg0pakHmG49koM4o0QSctMZy4MQOif9dKL/WkP4CxjIEUchTX3bPPJ9joSsi3ilHBcdgUBkL1GtiU8yljP7cAAY1o36ZpI4Ltx1dM/NzoXpQI+R5M34V6hVFW7FYKIhHXjV+Jb3qS1iMgLc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1732153925356903.3968348221349; Wed, 20 Nov 2024 17:52:05 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tDwJn-0006UC-Ix; Wed, 20 Nov 2024 20:49:15 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tDwJd-0006EC-MG for qemu-devel@nongnu.org; Wed, 20 Nov 2024 20:49:05 -0500 Received: from rev.ng ([94.130.142.21]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tDwJY-0004pJ-NN for qemu-devel@nongnu.org; Wed, 20 Nov 2024 20:49:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rev.ng; s=dkim; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive:List-Unsubscribe:List-Unsubscribe-Post: List-Help; bh=o0Qxb4icP5QHxebSgta6L/OLUkbwEJuI+/53RhwccoA=; b=ulBmn1bzcH+0Ntg ZrTqWVHoPdEAAzzrT8Lb5SpW/rFfVf+ZAgBVj05Ex4s3X+4Ilb8BgVA5RRhHyKp3+bHEWhuRjy7vu 16mSOLDpllGQRt68NRMiEC/jHoUtV/cH8uz/yoSign7cPCmlDpexjkjxYhONQhOLNdwwXQ8nAf3oB Xc=; To: qemu-devel@nongnu.org Cc: ale@rev.ng, ltaylorsimpson@gmail.com, bcain@quicinc.com, richard.henderson@linaro.org, philmd@linaro.org, alex.bennee@linaro.org Subject: [RFC PATCH v1 32/43] helper-to-tcg: Add README Date: Thu, 21 Nov 2024 02:49:36 +0100 Message-ID: <20241121014947.18666-33-anjo@rev.ng> In-Reply-To: <20241121014947.18666-1-anjo@rev.ng> References: <20241121014947.18666-1-anjo@rev.ng> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=94.130.142.21; envelope-from=anjo@rev.ng; helo=rev.ng X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Anton Johansson From: Anton Johansson via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1732153927141116600 Content-Type: text/plain; charset="utf-8" Signed-off-by: Anton Johansson --- subprojects/helper-to-tcg/README.md | 265 ++++++++++++++++++++++++++++ 1 file changed, 265 insertions(+) create mode 100644 subprojects/helper-to-tcg/README.md diff --git a/subprojects/helper-to-tcg/README.md b/subprojects/helper-to-tc= g/README.md new file mode 100644 index 0000000000..8d1304ef4f --- /dev/null +++ b/subprojects/helper-to-tcg/README.md @@ -0,0 +1,265 @@ +# helper-to-tcg + +`helper-to-tcg` is a standalone LLVM IR to TCG translator, with the goal o= f simplifying the implementation of complicated instructions in TCG. Instru= ction semantics can be specified either directly in LLVM IR or any language= that can be compiled to it (C, C++, ...). However, the tool is tailored to= wards QEMU helper functions written in C. + +Internally, `helper-to-tcg` consists of a mix of custom and built-in trans= formation and analysis passes that are applied to the input LLVM IR sequent= ially. The pipeline of passes is laid out as follows +``` + +---------------+ +-----+ +---------------+ +---------= ---+ +LLVM IR -> | PrepareForOpt | -> | -Os | -> | PrepareForTcg | -> | TcgGenPa= ss | -> TCG + +---------------+ +-----+ +---------------+ +---------= ---+ +``` +where the custom passes performs: +* `PrepareForOpt` - Early culling of unneeded functions, mapping of functi= on annotations, removal of `noinline` added by `-O0` +* `PrepareForTcg` - Post-optimization pass that tries to get the IR as clo= se to Tinycode as possible, goal is to take complexity away from the backen= d; +* `TcgGenPass` - Backend pass that allocates TCG variables to LLVM values,= and emits final TCG C code. + +As for LLVM optimization, `-Os` strikes a good balance between unrolling a= nd vectorization, from testing. More aggressive optimization levels would o= ften unroll loops over compacting it with loop vectorization. + +## Project Structure + +* `get-llvm-ir.py` - Helper script to convert a QEMU .c file to LLVM IR by= getting compile flags from `compile_commands.json`. +* `pipeline` - Implementation of pipeline orchestrating LLVM passes and ha= ndling input. +* `passes` - Implementation of custom LLVM passes (`PrepareForOpt`,`Prepar= eForTcg`,`TcgGenPass`). +* `include` - Shared headers between `passes/pipeline`. +* `tests` - Simple end-to-end tests of C functions we expect to be able to= translate, tests fail if any function fails to translate, output is not ve= rified. + +## Example Translations + +`helper-to-tcg` is able to deal with a wide variety of helper functions, t= he following code snippet contains two examples from the Hexagon architectu= re implementing the semantics of a predicated and instruction (`A2_pandt`) = and a vectorized signed saturated 2-element scalar product (`V6_vdmpyhvsat`= ). + +```c +int32_t HELPER(A2_pandt)(CPUHexagonState *env, int32_t RdV, + int32_t PuV, int32_t RsV, int32_t RtV) +{ + if(fLSBOLD(PuV)) { + RdV=3DRsV&RtV; + } else { + CANCEL; + } + return RdV; +} + +void HELPER(V6_vdmpyhvsat)(CPUHexagonState *env, + void * restrict VdV_void, + void * restrict VuV_void, + void * restrict VvV_void) +{ + fVFOREACH(32, i) { + size8s_t accum =3D fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w= [i])); + accum +=3D fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i])); + VdV.w[i] =3D fVSATW(accum); + } +} +``` +For the above snippet, `helper-to-tcg` produces the following TCG +```c +void emit_A2_pandt(TCGv_i32 temp0, TCGv_env env, TCGv_i32 temp4, + TCGv_i32 temp8, TCGv_i32 temp7, TCGv_i32 temp6) { + TCGv_i32 temp2 =3D tcg_temp_new_i32(); + tcg_gen_andi_i32(temp2, temp8, 1); + TCGv_i32 temp5 =3D tcg_temp_new_i32(); + tcg_gen_and_i32(temp5, temp6, temp7); + tcg_gen_movcond_i32(TCG_COND_EQ, temp0, temp2, tcg_constant_i32(0), te= mp4, temp5); +} + +void emit_V6_vdmpyhvsat(TCGv_env env, intptr_t vec3, + intptr_t vec7, intptr_t vec6) { + VectorMem mem =3D {0}; + intptr_t vec0 =3D temp_new_gvec(&mem, 128); + tcg_gen_gvec_shli(MO_32, vec0, vec7, 16, 128, 128); + intptr_t vec5 =3D temp_new_gvec(&mem, 128); + tcg_gen_gvec_sari(MO_32, vec5, vec0, 16, 128, 128); + intptr_t vec1 =3D temp_new_gvec(&mem, 128); + tcg_gen_gvec_shli(MO_32, vec1, vec6, 16, 128, 128); + tcg_gen_gvec_sari(MO_32, vec1, vec1, 16, 128, 128); + tcg_gen_gvec_mul(MO_32, vec1, vec1, vec5, 128, 128); + intptr_t vec2 =3D temp_new_gvec(&mem, 128); + tcg_gen_gvec_sari(MO_32, vec2, vec7, 16, 128, 128); + tcg_gen_gvec_sari(MO_32, vec0, vec6, 16, 128, 128); + tcg_gen_gvec_mul(MO_32, vec2, vec0, vec2, 128, 128); + tcg_gen_gvec_ssadd(MO_32, vec3, vec1, vec2, 128, 128); +} +``` + +In the first case, the predicated and instruction was made branchless by u= sing a conditional move, and in the latter case the inner loop of the vecto= rized scalar product could be converted to a few vectorized shifts and mult= iplications, folllowed by a vectorized signed saturated addition. + +## Usage + +Building `helper-to-tcg` produces a binary implementing the pipeline outli= ned above, going from LLVM IR to TCG. + +### Specifying Functions to Translate + +Unless `--translate-all-helpers` is specified, the default behaviour of `h= elper-to-tcg` is to only translate functions annotated via a special `"help= er-to-tcg"` annotation. Functions called by annotated functions will also b= e translated, see the following example: + +```c +// Function will be translated, annotation provided +__attribute__((annotate ("helper-to-tcg"))) +int f(int a, int b) { + return 2 * g(a, b); +} + +// Function will be translated, called by annotated `f()` function +int g(int a, int b) { + ... +} + +// Function will not be translated +int h(int a, int b) { + ... +} +``` + +### Immediate and Vector Arguments + +Function annotations are in some cases used to provide extra information t= o `helper-to-tcg` not otherwise present in the IR. For example, whether an = integer argument should actually be treated as an immediate rather than a r= egister, or if a pointer argument should be treated as a `gvec` vector (off= set into `CPUArchState`). For instance: +```c +__attribute__((annotate ("helper-to-tcg"))) +__attribute__((annotate ("immediate: 1"))) +int f(int a, int i) { + ... +} + +__attribute__((annotate ("helper-to-tcg"))) +__attribute__((annotate ("ptr-to-offset: 0, 1"))) +void g(void * restrict a, void * restrict b) { + ... +} +``` +where `"immediate: 1"` tells `helper-to-tcg` that the argument with index = `1` should be treated as an immediate (multiple arguments are specified thr= ough a comma separated list). Similarly `"ptr-to-offset: 0, 1"` indicates t= hat arguments width index 0 and 1 should be treated as offsets from `CPUArc= hState` (given as `intptr_t`), rather than actual pointer arguments. For th= e above code, `helper-to-tcg` emits +```c +void emit_f(TCGv_i32 res, TCGv_i32 a, int i) { + ... +} + +void emit_g(intptr_t a, intptr_t b) { + ... +} +``` + +### Loads and Stores + +Translating loads and stores is slightly trickier, as some QEMU specific a= ssumptions are made. Loads and stores in the input are assumed to go throug= h the `cpu_[st|ld]*()` functions defined in `exec/cpu_ldst.h` that a helper= function would use.=20 + +If using standalone input functions (not QEMU helper functions), loads and= stores are still represented by `cpu_[st|ld]*()` which needs to be declare= d, consider: +```c +/* Opaque CPU state type, will be mapped to tcg_env */ +struct CPUArchState; +typedef struct CPUArchState CPUArchState; + +/* Prototype of QEMU helper guest load/store functions, see exec/cpu_ldst.= h */ +uint32_t cpu_ldub_data(CPUArchState *, uint32_t ptr); +void cpu_stb_data(CPUArchState *, uint32_t ptr, uint32_t data); + +uint32_t helper_ld8(CPUArchState *env, uint32_t addr) { + return cpu_ldub_data(env, addr); +} + +void helper_st8(CPUArchState *env, uint32_t addr, uint32_t data) { + return cpu_stb_data(env, addr, data); +} +``` +implementing an 8-bit load and store instruction, these will be translated= to the following TCG. +```c +void emit_ld8(TCGv_i32 temp0, TCGv_env env, TCGv_i32 temp1) { + tcg_gen_qemu_ld_i32(temp0, temp1, tb_mmu_index(tcg_ctx->gen_tb->flags)= , MO_UB); +} + +void emit_st8(TCGv_env env, TCGv_i32 temp0, TCGv_i32 temp1) { + tcg_gen_qemu_st_i32(temp1, temp0, tb_mmu_index(tcg_ctx->gen_tb->flags)= , MO_UB); +} +``` +Note, the emitted code assumes the definition of a `tb_mmu_index()` functi= on to retrieve the current CPU MMU index, the name of this function can be = configured via the `--mmu-index-function` flag. + +### Mapping CPU State + +In QEMU, commonly accessed fields in the `CPUArchState` are often mapped t= o global `TCGv*` variables representing that piece of CPU state in TCG. Whe= n translating helper functions (or other C functions), a method of specifyi= ng which fields in the CPU state should be mapped to which globals is neede= d. To this end, a declarative approach is taken, where mappings between CPU= state and globals can be consumed by both `helper-to-tcg` and runtime QEMU= for instantiating the `TCGv` globals themselves. + +Users must define this mapping via a global `cpu_tcg_mapping []` array, as= can be seen in the following example where `mapped_field` of `CPUArchState= ` is mapped to the global `tcg_field`. For more complicated examples see th= e tests in `tests/cpustate.c`. +```c +#include +#include "tcg/tcg-global-mappings.h" + +/* Define a CPU state with some different fields */ + +typedef struct CPUArchState { + uint32_t mapped_field; + uint32_t unmapped_field; +} CPUArchState; + +/* Dummy struct, in QEMU this would correspond to TCGv_i32 in tcg.h */ +typedef struct TCGv_i32 {} TCGv_i32; + +/* Global TCGv representing CPU state */ +TCGv_i32 tcg_field; + +/* + * Finally provide a mapping of CPUArchState to TCG globals we care about,= here + * we map mapped_field to tcg_field + */ +cpu_tcg_mapping mappings[] =3D { + CPU_TCG_MAP(CPUArchState, tcg_field, mapped_field, NULL), +}; + +uint32_t helper_mapped(CPUArchState *env) { + return env->mapped_field; +} + +uint32_t helper_unmapped(CPUArchState *env) { + return env->unmapped_field; +} +``` +Note, the name of the `cpu_tcg_mapping[]` is provided via the `--tcg-globa= l-mappings` flag. For the above example, `helper-to-tcg` emits +```c +extern TCGv_i32 tcg_field; + +void emit_mapped(TCGv_i32 temp0, TCGv_env env) { + tcg_gen_mov_i32(temp0, tcg_field); +} + +void emit_unmapped(TCGv_i32 temp0, TCGv_env env) { + TCGv_ptr ptr1 =3D tcg_temp_new_ptr(); + tcg_gen_addi_ptr(ptr1, env, 128ull); + tcg_gen_ld_i32(temp0, ptr1, 0); +} +``` +where accesses in the input C code are correctly mapped to the correspondi= ng TCG globals. The unmapped `CPUArchState` access turns into pointer math = and a load, whereas the mapped access turns into a `mov` from a global. + +### Automatic Calling of Generated Code + +Finally, calling the generated code is as simple as including the output o= f `helper-to-tcg` into the project and manually calling `emit_*(...)`. Howe= ver, when dealing with an existing frontend that has a lot of helper functi= ons already in use, we simplify this process somewhat for non-vector instru= ctions. `helper-to-tcg` can emit a dispatcher, which for the above CPU stat= e mapping example looks like +```c +int helper_to_tcg_dispatcher(void *func, TCGTemp *ret_temp, int nargs, TCG= Temp **args) { + if ((uintptr_t) func =3D=3D (uintptr_t) helper_mapped) { + TCGv_i32 temp0 =3D temp_tcgv_i32(ret_temp); + TCGv_env env =3D temp_tcgv_ptr(args[0]); + emit_mapped(temp0, env); + return 1; + } + if ((uintptr_t) func =3D=3D (uintptr_t) helper_unmapped) { + TCGv_i32 temp0 =3D temp_tcgv_i32(ret_temp); + TCGv_env env =3D temp_tcgv_ptr(args[0]); + emit_unmapped(temp0, env); + return 1; + } + return 0; +} +``` +Here `emit_mapped()` and `emit_unmapped()` are automatically called if the= current helper function call being translated `void *func` corresponds to = either of the input helper functions. If the fronend then defines +```c +#ifdef CONFIG_HELPER_TO_TCG +#define TARGET_HELPER_DISPATCHER helper_to_tcg_dispatcher +#endif +``` +in `cpu-param.h`, then calls to `gen_helper_mapped()` for instance, will e= nd up in `emit_mapped()` with no change to frontends. Additionally, dispatc= hing from helper calls allows for easy toggling of `helper-to-tcg`, which i= s increadibly useful for testing purposes. + +### Simple Command Usage + +Assume a `helpers.c` file with functions to translate, then to obtain LLVM= IR +```bash +$ clang helpers.c -O0 -Xclang -disable-O0-optnone -S -emit-llvm +``` +which produces `helpers.ll` to be fed into `helper-to-tcg` +```bash +$ ./helper-to-tcg helpers.ll --translate-all-helpers +``` +where `--translate-all-helpers` means "translate all functions starting wi= th helper_*". Finally, the above command produces `helper-to-tcg-emitted.[c= |h]` with emitted TCG code. --=20 2.45.2