From nobody Sun Feb  8 23:53:26 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	dkim=fail;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=pass(p=none dis=none)  header.from=nongnu.org
ARC-Seal: i=1; a=rsa-sha256; t=1732153925; cv=none;
	d=zohomail.com; s=zohoarc;
	b=nSFP5eWIzLYjIWp2qPkTs2YOmlnBUgCxQ+yZuFzeywY65sPwi19EMYtfQ4PWN09Gyfb1lwVeyUzSbjspi03k+KVUmdUvJI5+Igh71Ns6fcydLHXdKAoRIZ/eRoZYv1eodvl4Yh1YekTDx5LhCv0F6QjN27QjDQ64p6QUKRvcRgs=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com;
 s=zohoarc;
	t=1732153925;
 h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id;
	bh=o0Qxb4icP5QHxebSgta6L/OLUkbwEJuI+/53RhwccoA=;
	b=JqOqeG3bI76gzEtnNJkbWapRCDGpCg0pakHmG49koM4o0QSctMZy4MQOif9dKL/WkP4CxjIEUchTX3bPPJ9joSsi3ilHBcdgUBkL1GtiU8yljP7cAAY1o36ZpI4Ltx1dM/NzoXpQI+R5M34V6hVFW7FYKIhHXjV+Jb3qS1iMgLc=
ARC-Authentication-Results: i=1; mx.zohomail.com;
	dkim=fail;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=pass header.from=<qemu-devel@nongnu.org> (p=none dis=none)
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1732153925356903.3968348221349;
 Wed, 20 Nov 2024 17:52:05 -0800 (PST)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1tDwJn-0006UC-Ix; Wed, 20 Nov 2024 20:49:15 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <anjo@rev.ng>) id 1tDwJd-0006EC-MG
 for qemu-devel@nongnu.org; Wed, 20 Nov 2024 20:49:05 -0500
Received: from rev.ng ([94.130.142.21])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <anjo@rev.ng>) id 1tDwJY-0004pJ-NN
 for qemu-devel@nongnu.org; Wed, 20 Nov 2024 20:49:03 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rev.ng;
 s=dkim; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:
 Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive:List-Unsubscribe:List-Unsubscribe-Post:
 List-Help; bh=o0Qxb4icP5QHxebSgta6L/OLUkbwEJuI+/53RhwccoA=; b=ulBmn1bzcH+0Ntg
 ZrTqWVHoPdEAAzzrT8Lb5SpW/rFfVf+ZAgBVj05Ex4s3X+4Ilb8BgVA5RRhHyKp3+bHEWhuRjy7vu
 16mSOLDpllGQRt68NRMiEC/jHoUtV/cH8uz/yoSign7cPCmlDpexjkjxYhONQhOLNdwwXQ8nAf3oB
 Xc=;
To: qemu-devel@nongnu.org
Cc: ale@rev.ng, ltaylorsimpson@gmail.com, bcain@quicinc.com,
 richard.henderson@linaro.org, philmd@linaro.org, alex.bennee@linaro.org
Subject: [RFC PATCH v1 32/43] helper-to-tcg: Add README
Date: Thu, 21 Nov 2024 02:49:36 +0100
Message-ID: <20241121014947.18666-33-anjo@rev.ng>
In-Reply-To: <20241121014947.18666-1-anjo@rev.ng>
References: <20241121014947.18666-1-anjo@rev.ng>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Received-SPF: pass client-ip=94.130.142.21; envelope-from=anjo@rev.ng;
 helo=rev.ng
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Reply-to: Anton Johansson <anjo@rev.ng>
From: Anton Johansson via <qemu-devel@nongnu.org>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZohoMail-DKIM: fail (Header signature does not verify)
X-ZM-MESSAGEID: 1732153927141116600
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/README.md | 265 ++++++++++++++++++++++++++++
 1 file changed, 265 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/README.md

diff --git a/subprojects/helper-to-tcg/README.md b/subprojects/helper-to-tc=
g/README.md
new file mode 100644
index 0000000000..8d1304ef4f
--- /dev/null
+++ b/subprojects/helper-to-tcg/README.md
@@ -0,0 +1,265 @@
+# helper-to-tcg
+
+`helper-to-tcg` is a standalone LLVM IR to TCG translator, with the goal o=
f simplifying the implementation of complicated instructions in TCG. Instru=
ction semantics can be specified either directly in LLVM IR or any language=
 that can be compiled to it (C, C++, ...). However, the tool is tailored to=
wards QEMU helper functions written in C.
+
+Internally, `helper-to-tcg` consists of a mix of custom and built-in trans=
formation and analysis passes that are applied to the input LLVM IR sequent=
ially. The pipeline of passes is laid out as follows
+```
+           +---------------+    +-----+    +---------------+    +---------=
---+
+LLVM IR -> | PrepareForOpt | -> | -Os | -> | PrepareForTcg | -> | TcgGenPa=
ss | -> TCG
+           +---------------+    +-----+    +---------------+    +---------=
---+
+```
+where the custom passes performs:
+* `PrepareForOpt` - Early culling of unneeded functions, mapping of functi=
on annotations, removal of `noinline` added by `-O0`
+* `PrepareForTcg` - Post-optimization pass that tries to get the IR as clo=
se to Tinycode as possible, goal is to take complexity away from the backen=
d;
+* `TcgGenPass` - Backend pass that allocates TCG variables to LLVM values,=
 and emits final TCG C code.
+
+As for LLVM optimization, `-Os` strikes a good balance between unrolling a=
nd vectorization, from testing. More aggressive optimization levels would o=
ften unroll loops over compacting it with loop vectorization.
+
+## Project Structure
+
+* `get-llvm-ir.py` - Helper script to convert a QEMU .c file to LLVM IR by=
 getting compile flags from `compile_commands.json`.
+* `pipeline` - Implementation of pipeline orchestrating LLVM passes and ha=
ndling input.
+* `passes` - Implementation of custom LLVM passes (`PrepareForOpt`,`Prepar=
eForTcg`,`TcgGenPass`).
+* `include` - Shared headers between `passes/pipeline`.
+* `tests` - Simple end-to-end tests of C functions we expect to be able to=
 translate, tests fail if any function fails to translate, output is not ve=
rified.
+
+## Example Translations
+
+`helper-to-tcg` is able to deal with a wide variety of helper functions, t=
he following code snippet contains two examples from the Hexagon architectu=
re implementing the semantics of a predicated and instruction (`A2_pandt`) =
and a vectorized signed saturated 2-element scalar product (`V6_vdmpyhvsat`=
).
+
+```c
+int32_t HELPER(A2_pandt)(CPUHexagonState *env, int32_t RdV,
+                         int32_t PuV, int32_t RsV, int32_t RtV)
+{
+    if(fLSBOLD(PuV)) {
+        RdV=3DRsV&RtV;
+    } else {
+        CANCEL;
+    }
+    return RdV;
+}
+
+void HELPER(V6_vdmpyhvsat)(CPUHexagonState *env,
+                           void * restrict VdV_void,
+                           void * restrict VuV_void,
+                           void * restrict VvV_void)
+{
+    fVFOREACH(32, i) {
+        size8s_t accum =3D fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w=
[i]));
+        accum +=3D fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+        VdV.w[i] =3D fVSATW(accum);
+    }
+}
+```
+For the above snippet, `helper-to-tcg` produces the following TCG
+```c
+void emit_A2_pandt(TCGv_i32 temp0, TCGv_env env, TCGv_i32 temp4,
+                   TCGv_i32 temp8, TCGv_i32 temp7, TCGv_i32 temp6) {
+    TCGv_i32 temp2 =3D tcg_temp_new_i32();
+    tcg_gen_andi_i32(temp2, temp8, 1);
+    TCGv_i32 temp5 =3D tcg_temp_new_i32();
+    tcg_gen_and_i32(temp5, temp6, temp7);
+    tcg_gen_movcond_i32(TCG_COND_EQ, temp0, temp2, tcg_constant_i32(0), te=
mp4, temp5);
+}
+
+void emit_V6_vdmpyhvsat(TCGv_env env, intptr_t vec3,
+                        intptr_t vec7, intptr_t vec6) {
+     VectorMem mem =3D {0};
+     intptr_t vec0 =3D temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_shli(MO_32, vec0, vec7, 16, 128, 128);
+     intptr_t vec5 =3D temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_sari(MO_32, vec5, vec0, 16, 128, 128);
+     intptr_t vec1 =3D temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_shli(MO_32, vec1, vec6, 16, 128, 128);
+     tcg_gen_gvec_sari(MO_32, vec1, vec1, 16, 128, 128);
+     tcg_gen_gvec_mul(MO_32, vec1, vec1, vec5, 128, 128);
+     intptr_t vec2 =3D temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_sari(MO_32, vec2, vec7, 16, 128, 128);
+     tcg_gen_gvec_sari(MO_32, vec0, vec6, 16, 128, 128);
+     tcg_gen_gvec_mul(MO_32, vec2, vec0, vec2, 128, 128);
+     tcg_gen_gvec_ssadd(MO_32, vec3, vec1, vec2, 128, 128);
+}
+```
+
+In the first case, the predicated and instruction was made branchless by u=
sing a conditional move, and in the latter case the inner loop of the vecto=
rized scalar product could be converted to a few vectorized shifts and mult=
iplications, folllowed by a vectorized signed saturated addition.
+
+## Usage
+
+Building `helper-to-tcg` produces a binary implementing the pipeline outli=
ned above, going from LLVM IR to TCG.
+
+### Specifying Functions to Translate
+
+Unless `--translate-all-helpers` is specified, the default behaviour of `h=
elper-to-tcg` is to only translate functions annotated via a special `"help=
er-to-tcg"` annotation. Functions called by annotated functions will also b=
e translated, see the following example:
+
+```c
+// Function will be translated, annotation provided
+__attribute__((annotate ("helper-to-tcg")))
+int f(int a, int b) {
+    return 2 * g(a, b);
+}
+
+// Function will be translated, called by annotated `f()` function
+int g(int a, int b) {
+    ...
+}
+
+// Function will not be translated
+int h(int a, int b) {
+    ...
+}
+```
+
+### Immediate and Vector Arguments
+
+Function annotations are in some cases used to provide extra information t=
o `helper-to-tcg` not otherwise present in the IR. For example, whether an =
integer argument should actually be treated as an immediate rather than a r=
egister, or if a pointer argument should be treated as a `gvec` vector (off=
set into `CPUArchState`). For instance:
+```c
+__attribute__((annotate ("helper-to-tcg")))
+__attribute__((annotate ("immediate: 1")))
+int f(int a, int i) {
+    ...
+}
+
+__attribute__((annotate ("helper-to-tcg")))
+__attribute__((annotate ("ptr-to-offset: 0, 1")))
+void g(void * restrict a, void * restrict b) {
+    ...
+}
+```
+where `"immediate: 1"` tells `helper-to-tcg` that the argument with index =
`1` should be treated as an immediate (multiple arguments are specified thr=
ough a comma separated list). Similarly `"ptr-to-offset: 0, 1"` indicates t=
hat arguments width index 0 and 1 should be treated as offsets from `CPUArc=
hState` (given as `intptr_t`), rather than actual pointer arguments. For th=
e above code, `helper-to-tcg` emits
+```c
+void emit_f(TCGv_i32 res, TCGv_i32 a, int i) {
+    ...
+}
+
+void emit_g(intptr_t a, intptr_t b) {
+    ...
+}
+```
+
+### Loads and Stores
+
+Translating loads and stores is slightly trickier, as some QEMU specific a=
ssumptions are made. Loads and stores in the input are assumed to go throug=
h the `cpu_[st|ld]*()` functions defined in `exec/cpu_ldst.h` that a helper=
 function would use.=20
+
+If using standalone input functions (not QEMU helper functions), loads and=
 stores are still represented by `cpu_[st|ld]*()` which needs to be declare=
d, consider:
+```c
+/* Opaque CPU state type, will be mapped to tcg_env */
+struct CPUArchState;
+typedef struct CPUArchState CPUArchState;
+
+/* Prototype of QEMU helper guest load/store functions, see exec/cpu_ldst.=
h */
+uint32_t cpu_ldub_data(CPUArchState *, uint32_t ptr);
+void cpu_stb_data(CPUArchState *, uint32_t ptr, uint32_t data);
+
+uint32_t helper_ld8(CPUArchState *env, uint32_t addr) {
+    return cpu_ldub_data(env, addr);
+}
+
+void helper_st8(CPUArchState *env, uint32_t addr, uint32_t data) {
+    return cpu_stb_data(env, addr, data);
+}
+```
+implementing an 8-bit load and store instruction, these will be translated=
 to the following TCG.
+```c
+void emit_ld8(TCGv_i32 temp0, TCGv_env env, TCGv_i32 temp1) {
+    tcg_gen_qemu_ld_i32(temp0, temp1, tb_mmu_index(tcg_ctx->gen_tb->flags)=
, MO_UB);
+}
+
+void emit_st8(TCGv_env env, TCGv_i32 temp0, TCGv_i32 temp1) {
+    tcg_gen_qemu_st_i32(temp1, temp0, tb_mmu_index(tcg_ctx->gen_tb->flags)=
, MO_UB);
+}
+```
+Note, the emitted code assumes the definition of a `tb_mmu_index()` functi=
on to retrieve the current CPU MMU index, the name of this function can be =
configured via the `--mmu-index-function` flag.
+
+### Mapping CPU State
+
+In QEMU, commonly accessed fields in the `CPUArchState` are often mapped t=
o global `TCGv*` variables representing that piece of CPU state in TCG. Whe=
n translating helper functions (or other C functions), a method of specifyi=
ng which fields in the CPU state should be mapped to which globals is neede=
d. To this end, a declarative approach is taken, where mappings between CPU=
 state and globals can be consumed by both `helper-to-tcg` and runtime QEMU=
 for instantiating the `TCGv` globals themselves.
+
+Users must define this mapping via a global `cpu_tcg_mapping []` array, as=
 can be seen in the following example where `mapped_field` of `CPUArchState=
` is mapped to the global `tcg_field`. For more complicated examples see th=
e tests in `tests/cpustate.c`.
+```c
+#include <stdint.h>
+#include "tcg/tcg-global-mappings.h"
+
+/* Define a CPU state with some different fields */
+
+typedef struct CPUArchState {
+    uint32_t mapped_field;
+    uint32_t unmapped_field;
+} CPUArchState;
+
+/* Dummy struct, in QEMU this would correspond to TCGv_i32 in tcg.h */
+typedef struct TCGv_i32 {} TCGv_i32;
+
+/* Global TCGv representing CPU state */
+TCGv_i32 tcg_field;
+
+/*
+ * Finally provide a mapping of CPUArchState to TCG globals we care about,=
 here
+ * we map mapped_field to tcg_field
+ */
+cpu_tcg_mapping mappings[] =3D {
+    CPU_TCG_MAP(CPUArchState, tcg_field, mapped_field, NULL),
+};
+
+uint32_t helper_mapped(CPUArchState *env) {
+    return env->mapped_field;
+}
+
+uint32_t helper_unmapped(CPUArchState *env) {
+    return env->unmapped_field;
+}
+```
+Note, the name of the `cpu_tcg_mapping[]` is provided via the `--tcg-globa=
l-mappings` flag. For the above example, `helper-to-tcg` emits
+```c
+extern TCGv_i32 tcg_field;
+
+void emit_mapped(TCGv_i32 temp0, TCGv_env env) {
+    tcg_gen_mov_i32(temp0, tcg_field);
+}
+
+void emit_unmapped(TCGv_i32 temp0, TCGv_env env) {
+    TCGv_ptr ptr1 =3D tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ptr1, env, 128ull);
+    tcg_gen_ld_i32(temp0, ptr1, 0);
+}
+```
+where accesses in the input C code are correctly mapped to the correspondi=
ng TCG globals. The unmapped `CPUArchState` access turns into pointer math =
and a load, whereas the mapped access turns into a `mov` from a global.
+
+### Automatic Calling of Generated Code
+
+Finally, calling the generated code is as simple as including the output o=
f `helper-to-tcg` into the project and manually calling `emit_*(...)`. Howe=
ver, when dealing with an existing frontend that has a lot of helper functi=
ons already in use, we simplify this process somewhat for non-vector instru=
ctions. `helper-to-tcg` can emit a dispatcher, which for the above CPU stat=
e mapping example looks like
+```c
+int helper_to_tcg_dispatcher(void *func, TCGTemp *ret_temp, int nargs, TCG=
Temp **args) {
+    if ((uintptr_t) func =3D=3D (uintptr_t) helper_mapped) {
+        TCGv_i32 temp0 =3D temp_tcgv_i32(ret_temp);
+        TCGv_env env =3D temp_tcgv_ptr(args[0]);
+        emit_mapped(temp0, env);
+        return 1;
+    }
+    if ((uintptr_t) func =3D=3D (uintptr_t) helper_unmapped) {
+        TCGv_i32 temp0 =3D temp_tcgv_i32(ret_temp);
+        TCGv_env env =3D temp_tcgv_ptr(args[0]);
+        emit_unmapped(temp0, env);
+        return 1;
+    }
+    return 0;
+}
+```
+Here `emit_mapped()` and `emit_unmapped()` are automatically called if the=
 current helper function call being translated `void *func` corresponds to =
either of the input helper functions. If the fronend then defines
+```c
+#ifdef CONFIG_HELPER_TO_TCG
+#define TARGET_HELPER_DISPATCHER helper_to_tcg_dispatcher
+#endif
+```
+in `cpu-param.h`, then calls to `gen_helper_mapped()` for instance, will e=
nd up in `emit_mapped()` with no change to frontends. Additionally, dispatc=
hing from helper calls allows for easy toggling of `helper-to-tcg`, which i=
s increadibly useful for testing purposes.
+
+### Simple Command Usage
+
+Assume a `helpers.c` file with functions to translate, then to obtain LLVM=
 IR
+```bash
+$ clang helpers.c -O0 -Xclang -disable-O0-optnone -S -emit-llvm
+```
+which produces `helpers.ll` to be fed into `helper-to-tcg`
+```bash
+$ ./helper-to-tcg helpers.ll --translate-all-helpers
+```
+where `--translate-all-helpers` means "translate all functions starting wi=
th helper_*". Finally, the above command produces `helper-to-tcg-emitted.[c=
|h]` with emitted TCG code.
--=20
2.45.2