From nobody Sat Apr 5 01:25:18 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=anarch128.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1739122149781338.55981672106634; Sun, 9 Feb 2025 09:29:09 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1thB6E-0004W7-Ip; Sun, 09 Feb 2025 12:28:06 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1thB6D-0004VW-Lc for qemu-devel@nongnu.org; Sun, 09 Feb 2025 12:28:05 -0500 Received: from anarch128.org ([2001:4801:7825:104:be76:4eff:fe10:52ae]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1thB6A-0006Iq-Ao for qemu-devel@nongnu.org; Sun, 09 Feb 2025 12:28:04 -0500 Received: from localhost.localdomain (default-rdns.vocus.co.nz [202.150.110.104] (may be forged)) (authenticated bits=0) by anarch128.org (8.15.2/8.15.2/Debian-22+deb11u3) with ESMTPSA id 519HRdEp3380282 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sun, 9 Feb 2025 17:27:49 GMT Authentication-Results: anarch128.org; auth=pass; dkim=pass (2048-bit rsa key sha256) header.d=anarch128.org header.i=@anarch128.org header.b=e3HcQepb header.a=rsa-sha256 header.s=100003; x-return-mx=pass header.domain=anarch128.org policy.is_org=yes (MX Records found: mail.anarch128.org); x-return-mx=pass smtp.domain=anarch128.org policy.is_org=yes (MX Records found: mail.anarch128.org) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=anarch128.org; s=100003; t=1739122071; bh=v2r5+aLWIDg3nxqMxTLpFoUFEZ4FxZUh0mEzUQmI0hw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e3HcQepbtWyhFJ8jlFIzbiQZV0SQzKfWyrswls7GUSyWScBjkXQ25JgdjFZrAZMd9 dAv9lTpa+adVCD7MKRyd3BxL+6Bdtt7WRevTyKFFKi1ycvcqpkHKlI9uNCraz+8UOm E/WaCExAlIcSHf89jTIdmpVrEego0/Kf9N5X+d74dbFkPgjqUgCIlV0Ou5qgGxyDhJ 6mr5OL9pdEKaWy+3nRyEf3Bg8Ynq0Gxq1xLHIhyuNMR+KmuvJcPXOZYWz9bDCQgTYJ IcfENMIyuT1qG7z32fJIZTCijAyswhRQHk1CP88gg/DolKdSEf427juAvQe+9SIg9h /sMnnPKIxjMBA== From: Michael Clark To: qemu-devel@nongnu.org, Richard Henderson , Eduardo Habkost , Paolo Bonzini , Zhao Liu Cc: Michael Clark Subject: [PATCH v2 2/4] x86-disas: add x86-mini metadata documentation Date: Mon, 10 Feb 2025 06:26:54 +1300 Message-ID: <20250209172656.1466556-3-michael@anarch128.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250209172656.1466556-1-michael@anarch128.org> References: <20250209172656.1466556-1-michael@anarch128.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:4801:7825:104:be76:4eff:fe10:52ae; envelope-from=michael@anarch128.org; helo=anarch128.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1739122151341019000 Content-Type: text/plain; charset="utf-8" add detailed information on the instruction opcode encoding format for LEX/VEX/EVEX prefix, map and opcode encoding, the operand encoding format, the field order encoding format and notes on instruction synthesis for parameterized opcodes. Signed-off-by: Michael Clark --- docs/x86-metadata.txt | 303 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 303 insertions(+) create mode 100644 docs/x86-metadata.txt diff --git a/docs/x86-metadata.txt b/docs/x86-metadata.txt new file mode 100644 index 000000000000..ce96cedfe100 --- /dev/null +++ b/docs/x86-metadata.txt @@ -0,0 +1,303 @@ +x86 Instruction Set Metadata +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +Legacy x86 instructions have been parameterized in the instruction set +metadata using a new LEX prefix for instruction encoding with abstract wid= th +suffix codes that synthesize multiple instruction widths using combinations +of operand size prefixes and `REX.W` bits. This new LEX format makes legacy +instruction encodings consistent with VEX and EVEX encodings as well as +eliminating some redundancy in the metadata. + +There are a small number of special cases for legacy instructions which ne= ed +mode-dependent overrides for cases such as, a different opcode is used for +different modes, or the instruction has a quirk where the operand size does +not follow the default rules for instruction word and address sizes: + +- `.wx` is used to specify 64-bit instructions that default to 32-bit + operands in 64-bit mode. +- `.ww` is used to specify 64-bit instructions that default to 64-bit + operands in 64-bit mode. +- `.o16` is used to specify an instruction override specific to 16-bit mod= e. +- `.o32` is used to specify an instruction override specific to 32-bit mod= e. +- `.o64` is used to specify an instruction override specific to 64-bit mod= e. + +CSV File Format +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The instruction set metadata in the `disas/x86-data` directory has the +following fields which map to instruction encoding tables in the Intel +Architecture Software Developer's Manual: + +- _Instruction_: opcode and operands from Opcode/Instruction column. +- _Opcode_: instruction encoding from Opcode/Instruction column. +- _Valid 64-bit_: 64-bit valid field from 64/32 bit Mode Support column. +- _Valid 32-bit_: 32-bit valid field from 64/32 bit Mode Support column. +- _Valid 16-bit_: 16-bit valid field from Compat/Legacy Mode column. +- _Feature Flags_: extension name from CPUID Feature Flag column. +- _Operand 1_: Operand 1 column from Instruction Operand Encoding table. +- _Operand 2_: Operand 2 column from Instruction Operand Encoding table. +- _Operand 3_: Operand 3 column from Instruction Operand Encoding table. +- _Operand 4_: Operand 4 column from Instruction Operand Encoding table. +- _Tuple Type_: Tuple Type column from Instruction Operand Encoding table. + +The instruction set metadata in the `data` directory is derived from +[x86-csv](https://github.com/GregoryComer/x86-csv), although it has had +extensive modifications to fix transcription errors, to revise legacy +instruction encodings to conform to the new LEX format, as well as add +missing details such as missing operands or recently added AVX-512 +instruction encodings and various other instruction set extensions. + +Table Generation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The appendices outline the printable form of the mnemonics used in the +generated tables to describe operands, instruction encodings and field ord= er. +The mnemonics are referenced in the instruction set metadata files which a= re +translated to enums and arrays by `scripts/x86-tablegen.py` which then map= to +the enum type and set definitions in `disas/x86.h`: + +- _enum x86_opr_ - operand encoding enum type and set attributes. +- _enum x86_enc_ - instruction encoding enum type and set attributes. +- _enum x86_ord_ - operand to instruction encoding field map set attribute= s. + +The enum values are combined together with _logical or_ combinations to +form the primary metadata tables used by the encoder and decoder library: + +- _struct x86_opc_data_ - table type for instruction opcode encodings. +- _struct x86_opr_data_ - table type for unique sets of instruction operan= ds. +- _struct x86_ord_data_ - table type for unique sets of instruction field = orders. + +***Note***: There are some differences between the mnemonics used in the +CSV metadata and the C enums. Exceptions are described in `operand_map` and +`opcode_map` within `scripts/x86_tablegen.py`. The primary differences are +in the names used in the operand columns to indicate operand field order, +otherwise a type prefix is added, dots and brackets are omitted, and forwa= rd +slashes are translated to underscores. + +Appendices +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This section describes the mnemonics used in the primary data structures: + +- _Appendix A - Operand Encoding_ - describes instruction operands. +- _Appendix B - Operand Order_ - describes instruction field ordering. +- _Appendix C - Instruction Encoding Prefixes_ - describes encoding prefix= es. +- _Appendix D - Instruction Encoding Suffixes_ - describes encoding suffix= es. +- _Appendix E - Instruction Synthesis Notes_ - notes on prefix synthesis. + +Appendix A - Operand Encoding +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D + +This table outlines the operand mnemonics used in instruction operands +_(enum x86_opr)_. + +| operand | description = | +|:-------------------|:---------------------------------------------------= ---| +| `r` | integer register = | +| `v` | vector register = | +| `k` | mask register = | +| `seg` | segment register = | +| `creg` | control register = | +| `dreg` | debug register = | +| `bnd` | bound register = | +| `mem` | memory reference = | +| `rw` | integer register word-sized (16/32/64 bit) = | +| `ra` | integer register addr-sized (16/32/64 bit) = | +| `mw` | memory reference word-sized (16/32/64 bit) = | +| `mm` | vector register 64-bit = | +| `xmm` | vector register 128-bit = | +| `ymm` | vector register 256-bit = | +| `zmm` | vector register 512-bit = | +| `r8` | register 8-bit = | +| `r16` | register 16-bit = | +| `r32` | register 32-bit = | +| `r64` | register 64-bit = | +| `m8` | memory reference 8-bit byte = | +| `m16` | memory reference 16-bit word = | +| `m32` | memory reference 32-bit dword = | +| `m64` | memory reference 64-bit qword = | +| `m128` | memory reference 128-bit oword/xmmword = | +| `m256` | memory reference 256-bit ymmword = | +| `m512` | memory reference 512-bit zmmword = | +| `m80` | memory reference 80-bit tword/tbyte = | +| `m384` | memory reference 384-bit key locker handle = | +| `mib` | memory reference bound = | +| `m16bcst` | memory reference 16-bit word broadcast = | +| `m32bcst` | memory reference 32-bit word broadcast = | +| `m64bcst` | memory reference 64-bit word broadcast = | +| `vm32` | vector memory 32-bit = | +| `vm64` | vector memory 64-bit = | +| `{er}` | operand suffix - embedded rounding control = | +| `{k}` | operand suffix - apply mask register = | +| `{sae}` | operand suffix - suppress all execptions = | +| `{z}` | operand suffix - zero instead of merge = | +| `{rs2}` | operand suffix - register stride 2 = | +| `{rs4}` | operand suffix - register stride 4 = | +| `r/m8` | register unsized memory 8-bit = | +| `r/m16` | register unsized memory 16-bit = | +| `r/m32` | register unsized memory 32-bit = | +| `r/m64` | register unsized memory 64-bit = | +| `k/m8` | mask register memory 8-bit = | +| `k/m16` | mask register memory 16-bit = | +| `k/m32` | mask register memory 32-bit = | +| `k/m64` | mask register memory 64-bit = | +| `bnd/m64` | bound register memory 64-bit = | +| `bnd/m128` | bound register memory 128-bit = | +| `rw/mw` | register or memory 16/32/64-bit (word size) = | +| `r8/m8` | 8-bit register 8-bit memory = | +| `r?/m?` | N-bit register N-bit memory = | +| `mm/m?` | 64-bit vector N-bit memory = | +| `xmm/m?` | 128-bit vector N-bit memory = | +| `ymm/m?` | 256-bit vector N-bit memory = | +| `zmm/m?` | 512-bit vector N-bit memory = | +| `xmm/m?/m?bcst` | 128-bit vector N-bit memory N-bit broadcast = | +| `ymm/m?/m?bcst` | 256-bit vector N-bit memory N-bit broadcast = | +| `zmm/m?/m?bcst` | 512-bit vector N-bit memory N-bit broadcast = | +| `vm32x` | 32-bit vector memory in xmm = | +| `vm32y` | 32-bit vector memory in ymm = | +| `vm32z` | 32-bit vector memory in zmm = | +| `vm64x` | 64-bit vector memory in xmm = | +| `vm64y` | 64-bit vector memory in ymm = | +| `vm64z` | 64-bit vector memory in zmm = | +| `st0` | implicit register st0 = | +| `st1` | implicit register st1 = | +| `es` | implicit segment es = | +| `cs` | implicit segment cs = | +| `ss` | implicit segment ss = | +| `ds` | implicit segment ds = | +| `fs` | implicit segment fs = | +| `gs` | implicit segment gs = | +| `aw` | implicit register (ax/eax/rax) = | +| `cw` | implicit register (cx/ecx/rcx) = | +| `dw` | implicit register (dx/edx/rdx) = | +| `bw` | implicit register (bx/ebx/rbx) = | +| `pa` | implicit indirect register (ax/eax/rax) = | +| `pc` | implicit indirect register (cx/ecx/rcx) = | +| `pd` | implicit indirect register (dx/edx/rdx) = | +| `pb` | implicit indirect register (bx/ebx/rbx) = | +| `psi` | implicit indirect register (si/esi/rsi) = | +| `pdi` | implicit indirect register (di/edi/rdi) = | +| `xmm0` | implicit register xmm0 = | +| `xmm0_7` | implicit registers xmm0-xmm7 = | +| `1` | constant 1 = | +| `ib` | 8-bit immediate = | +| `iw` | 16-bit or 32-bit immediate (mode + operand size) = | +| `iwd` | 16-bit or 32-bit immediate (mode) = | +| `id` | 32-bit immediate = | +| `iq` | 64-bit immediate = | +| `rel8` | 8-bit displacement = | +| `relw` | 6-bit or 32-bit displacement (mode + operand size) = | +| `moffs` | indirect memory offset = | +| `far16/16` | 16-bit seg 16-bit far displacement = | +| `far16/32` | 16-bit seg 32-bit far displacement = | +| `memfar16/16` | indirect 16-bit seg 16-bit far displacement = | +| `memfar16/32` | indirect 16-bit seg 32-bit far displacement = | +| `memfar16/64` | indirect 16-bit seg 64-bit far displacement = | + +Appendix B - Operand Order +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +This table outlines the mnemonics used to map operand field order +_(enum x86_ord)_. + +| mnemonic | description = | +|:---------|:-------------------------------------------------------------= ---| +| `imm` | ib, iw, i16, i32, i64 = | +| `reg` | modrm.reg = | +| `mrm` | modrm.r/m = | +| `sib` | modrm.r/m sib = | +| `is4` | register from ib = | +| `ime` | i8, i16 (special case for CALLF/JMPF/ENTER) = | +| `vec` | VEX.vvvv = | +| `opr` | opcode +r = | +| `one` | constant 1 = | +| `rax` | constant al/ax/eax/rax = | +| `rcx` | constant cl/cx/ecx/rcx = | +| `rdx` | constant dl/dx/edx/rdx = | +| `rbx` | constant bl/bx/ebx/rbx = | +| `rsp` | constant sp/esp/rsp = | +| `rbp` | constant bp/ebp/rbp = | +| `rsi` | constant si/esi/rsi = | +| `rdi` | constant di/edi/rdi = | +| `st0` | constant st(0) = | +| `stx` | constant st(i) = | +| `seg` | constant segment = | +| `xmm0` | constant xmm0 = | +| `xmm0_7` | constant xmm0-xmm7 = | +| `mxcsr` | constant mxcsr = | +| `rflags` | constant rflags = | + +Appendix C - Instruction Encoding Prefixes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This table outlines the mnemonic prefixes used in instruction encodings +_(enum x86_enc)_. + +| mnemonic | description = | +|:---------|:-------------------------------------------------------------= ---| +| `lex` | legacy instruction = | +| `vex` | VEX encoded instruction = | +| `evex` | EVEX encoded instruction = | +| `.lz` | VEX encoding L=3D0 and L=3D1 is unassigned = | +| `.l0` | VEX encoding L=3D0 = | +| `.l1` | VEX encoding L=3D1 = | +| `.lig` | VEX/EVEX encoding ignores length L=3Dany = | +| `.128` | VEX/EVEX encoding uses 128-bit vector L=3D0 = | +| `.256` | VEX/EVEX encoding uses 256-bit vector L=3D1 = | +| `.512` | EVEX encoding uses 512-bit vector L=3D2 = | +| `.66` | prefix byte 66 is used for opcode mapping = | +| `.f2` | prefix byte f2 is used for opcode mapping = | +| `.f3` | prefix byte f3 is used for opcode mapping = | +| `.9b` | prefix byte 9b is used for opcode mapping (x87 only) = | +| `.0f` | map 0f is used in opcode = | +| `.0f38` | map 0f38 is used in opcode = | +| `.0f3a` | map 0f3a is used in opcode = | +| `.wn` | no register extension, fixed operand size = | +| `.wb` | register extension, fixed operand size = | +| `.wx` | REX and/or operand size extension, optional 66 or REX.W0/W1 = | +| `.ww` | REX and/or operand size extension, optional 66 and REX.WIG = | +| `.w0` | LEX/VEX/EVEX optional REX W0 with operand size used in opcode= | +| `.w1` | LEX/VEX/EVEX mandatory REX W1 with operand size used in opcod= e | +| `.wig` | VEX/EVEX encoding width ignored = | + +Appendix D - Instruction Encoding Suffixes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This table outlines the mnemonic suffixes used in instruction encodings +_(enum x86_enc)_. + +| mnemonic | description = | +|:---------|:-------------------------------------------------------------= ---| +| `/r` | ModRM byte = | +| `/0../7` | ModRM byte with 'r' field used for functions 0 to 7 = | +| `XX+r` | opcode byte with 3-bit register added to the opcode = | +| `XX` | opcode byte = | +| `ib` | 8-bit immediate = | +| `iw` | 16-bit or 32-bit immediate (real mode XOR operand size) = | +| `i16` | 16-bit immediate = | +| `i32` | 32-bit immediate = | +| `i64` | 64-bit immediate = | +| `o16` | encoding uses prefix 66 in 32-bit and 64-bit modes = | +| `o32` | encoding uses prefix 66 in 16-bit mode = | +| `o64` | encoding is used exclusively in 64-bit mode with REX.W=3D1 = | +| `a16` | encoding uses prefix 67 in 32-bit and 64-bit modes = | +| `a32` | encoding uses prefix 67 in 16-bit mode = | +| `a64` | encoding is used exclusively in 64-bit mode = | +| `lock` | memory operand encodings can be used with the LOCK prefix = | +| `rep` | string instructions that can be used with the REP prefix = | + +Appendix E - Instruction Synthesis Notes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The `.wx` and `.ww` mnemonics are used to synthesize prefix combinations: + +- `.wx` labels opcodes with _default 32-bit operand size in 64-bit mode_ + to synthesize 16/32/64-bit versions using REX and operand size prefix, + or in 16/32-bit modes synthesizes 16/32-bit versions using only the + operand size prefix. REX is used for register extension on opcodes + with `rw` or `rw/mw` operands or fixed register operands like `aw`. +- `.ww` labels opcodes with _default 64-bit operand size in 64-bit mode_ + to synthesize 16/64-bit versions using only the operand size prefix, + or in 16/32-bit modes. synthesizes 16/32-bit versions using only the + operand size prefix. REX is used for register extension on opcodes + with `rw` or `rw/mw` operands or fixed register operands like `aw`. --=20 2.43.0