From nobody Sat Apr 5 01:25:19 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=anarch128.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1739122131067305.6129282857663; Sun, 9 Feb 2025 09:28:51 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1thB6D-0004Vo-OI; Sun, 09 Feb 2025 12:28:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1thB6C-0004VM-1D for qemu-devel@nongnu.org; Sun, 09 Feb 2025 12:28:04 -0500 Received: from anarch128.org ([2001:4801:7825:104:be76:4eff:fe10:52ae]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1thB67-0006K0-HP for qemu-devel@nongnu.org; Sun, 09 Feb 2025 12:28:03 -0500 Received: from localhost.localdomain (default-rdns.vocus.co.nz [202.150.110.104] (may be forged)) (authenticated bits=0) by anarch128.org (8.15.2/8.15.2/Debian-22+deb11u3) with ESMTPSA id 519HRdEq3380282 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sun, 9 Feb 2025 17:27:51 GMT Authentication-Results: anarch128.org; auth=pass; dkim=pass (2048-bit rsa key sha256) header.d=anarch128.org header.i=@anarch128.org header.b=Il23nKOa header.a=rsa-sha256 header.s=100003; x-return-mx=pass header.domain=anarch128.org policy.is_org=yes (MX Records found: mail.anarch128.org); x-return-mx=pass smtp.domain=anarch128.org policy.is_org=yes (MX Records found: mail.anarch128.org) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=anarch128.org; s=100003; t=1739122074; bh=PEtAsmbyszmqdfXjdSl1GSic99FuRpUMJux4kROZ2A4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Il23nKOafKtAAd0aK2yjOFfV8nd43K/gX/IYqweUj4g6VDgeTOURI4+yKqDNkzgS7 Wapa6dJz6fnuSy/y16/hPK++j5C9H9ozjS/E0Nv1DwMKOt4hTOKgMuuy7rCqRbwD8y Wzk/HaLf2cJ0tMxQ38uPivhwqOe3jE4r6I2Pd3wLWd66mUiDURG8FEqdz56htYifkh 5ejkT9Fg/07kFXnvvA0AX1ZJhSJh/KSZtgpNdKUB3hg0/huelr1RFfoF+u4ZD+nZLE LBE/X4tSMzIWhYSx088JMh5y5bUg8KghtlSPQPRJBz1bulTYtuQCalIHkv6mbMO0uS AKYZdHVU9H7nA== From: Michael Clark To: qemu-devel@nongnu.org, Richard Henderson , Eduardo Habkost , Paolo Bonzini , Zhao Liu Cc: Michael Clark Subject: [PATCH v2 3/4] x86-disas: add x86-mini metadata tablegen script Date: Mon, 10 Feb 2025 06:26:55 +1300 Message-ID: <20250209172656.1466556-4-michael@anarch128.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250209172656.1466556-1-michael@anarch128.org> References: <20250209172656.1466556-1-michael@anarch128.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:4801:7825:104:be76:4eff:fe10:52ae; envelope-from=michael@anarch128.org; helo=anarch128.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1739122135687019000 Content-Type: text/plain; charset="utf-8" the x86-mini metadata tablegen python script reads instruction set metadata CSV files and translates them into tables used by the disassembler. it generates the following tables: - x86_opc_table that encodes prefix, map, and opcode - x86_opr_table that encodes instruction operands - x86_ord_table that encodes operand field order - x86 register enum and string table - x86 opcode enum and string table Signed-off-by: Michael Clark --- scripts/x86-tablegen.py | 693 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 693 insertions(+) create mode 100755 scripts/x86-tablegen.py diff --git a/scripts/x86-tablegen.py b/scripts/x86-tablegen.py new file mode 100755 index 000000000000..6d6a0916fb36 --- /dev/null +++ b/scripts/x86-tablegen.py @@ -0,0 +1,693 @@ +#!/usr/bin/env python3 +# +# Copyright (c) 2024-2025 Michael Clark +# +# Permission is hereby granted, free of charge, to any person obtaining a +# copy of this software and associated documentation files (the "Software"= ), +# to deal in the Software without restriction, including without limitation +# the rights to use, copy, modify, merge, publish, distribute, sublicense, +# and/or sell copies of the Software, and to permit persons to whom the +# Software is furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS +# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILI= TY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR +# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +# OTHER DEALINGS IN THE SOFTWARE. + +import re +import sys +import csv +import glob +import string +import argparse + +gpr_bh =3D ["ah", "ch", "dh", "bh"] +gpr_b =3D ["al", "cl", "dl", "bl", "spl", "bpl", "sil", "dil"] +gpr_w =3D ["ax", "cx", "dx", "bx", "sp", "bp", "si", "di"] +gpr_d =3D ["eax", "ecx", "edx", "ebx", "esp", "ebp", "esi", "edi"] +gpr_q =3D ["rax", "rcx", "rdx", "rbx", "rsp", "rbp", "rsi", "rdi"] +seg_r =3D ["es", "cs", "ss", "ds", "fs", "gs", "seg6", "seg7"] +sys_r =3D ["rip", "rflags","fpcsr", "mxcsr"] +sys_n =3D ["none"] + +cc_all =3D [ 'EQ', 'NEQ', 'GT', 'NLE', 'GE', 'NLT', 'LT', 'NGE', 'LE', 'NG= T', + 'A', 'NBE', 'AE', 'NB', 'B', 'NAE', 'BE', 'NA' ] +cc_signed =3D [ 'EQ', 'GE', 'GT', 'LE', 'LT', 'NEQ', 'NGT', 'NLE', 'NLT' ] +cc_unsigned =3D [ 'EQ', 'AE', 'A', 'BE', 'B', 'NEQ', 'NA', 'NBE', 'NB' ] + +def gen_range(fmt,f,s,e): + t =3D [] + for i in range(s,e): + t +=3D [[i, fmt % i, f]] + return t + +def gen_list(l,f,start): + t =3D [] + for i, s in enumerate(l): + t +=3D [[i + start, s, f]] + return t + +def gen_sep(): + return [[0, "", ""]] + +def reg_table(): + t =3D [] + t +=3D gen_list(gpr_bh, "reg_bl", 4) + t +=3D gen_sep() + t +=3D gen_list(gpr_b, "reg_b", 0) + t +=3D gen_range("r%db", "reg_b", 8, 32) + t +=3D gen_sep() + t +=3D gen_list(gpr_w, "reg_w", 0) + t +=3D gen_range("r%dw", "reg_w", 8, 32) + t +=3D gen_sep() + t +=3D gen_list(gpr_d, "reg_d", 0) + t +=3D gen_range("r%dd", "reg_d", 8, 32) + t +=3D gen_sep() + t +=3D gen_list(gpr_q, "reg_q", 0) + t +=3D gen_range("r%d", "reg_q", 8, 32) + t +=3D gen_sep() + t +=3D gen_range("mm%d", "reg_mmx", 0, 8) + t +=3D gen_sep() + t +=3D gen_range("xmm%d", "reg_xmm", 0, 32) + t +=3D gen_sep() + t +=3D gen_range("ymm%d", "reg_ymm", 0, 32) + t +=3D gen_sep() + t +=3D gen_range("zmm%d", "reg_zmm", 0, 32) + t +=3D gen_sep() + t +=3D gen_range("k%d", "reg_kmask", 0, 8) + t +=3D gen_sep() + t +=3D gen_range("st(%d)", "reg_fpu", 0, 8) + t +=3D gen_sep() + t +=3D gen_range("bnd%d", "reg_bnd", 0, 8) + t +=3D gen_sep() + t +=3D gen_range("dr%d", "reg_dreg", 0, 16) + t +=3D gen_sep() + t +=3D gen_range("cr%d", "reg_creg", 0, 16) + t +=3D gen_sep() + t +=3D gen_list(seg_r, "reg_sreg", 0) + t +=3D gen_sep() + t +=3D gen_list(sys_r, "reg_sys", 0) + t +=3D gen_sep() + t +=3D gen_list(sys_n, "reg_sys", 31) + return t + +operand_map =3D { + '1' : 'one/r', + 'RAX (r)' : 'rax/r', + 'RAX (r, w)' : 'rax/rw', + 'RAX (w)' : 'rax/w', + 'RCX (r)' : 'rcx/r', + 'RCX (r, w)' : 'rcx/rw', + 'RCX (w)' : 'rcx/w', + 'RDX (r)' : 'rdx/r', + 'RDX (r, w)' : 'rdx/rw', + 'RDX (w)' : 'rdx/w', + 'RBX (r)' : 'rbx/r', + 'RBX (r, w)' : 'rbx/rw', + 'RBX (w)' : 'rbx/w', + 'RSI (r)' : 'rsi/r', + 'RSI (r, w)' : 'rsi/rw', + 'RSI (w)' : 'rsi/w', + 'RDI (r)' : 'rdi/r', + 'RDI (r, w)' : 'rdi/rw', + 'RDI (w)' : 'rdi/w', + 'ST0 (r)' : 'st0/r', + 'ST0 (r, w)' : 'st0/rw', + 'ST0 (w)' : 'st0/w', + 'STX (r)' : 'stx/r', + 'STX (r, w)' : 'stx/rw', + 'STX (w)' : 'stx/w', + 'SEG (r)' : 'seg/r', + 'SEG (r, w)' : 'seg/rw', + 'SEG (w)' : 'seg/w', + 'RSP (r, w, i)' : 'rsp/rwi', + 'RBP (r, w, i)' : 'rbp/rwi', + 'MXCSR (r, i)' : 'mxcsr/ri', + 'MXCSR (w, i)' : 'mxcsr/wi', + 'RFLAGS (r, i)' : 'rflags/ri', + 'RFLAGS (w, i)' : 'rflags/wi', + 'ModRM:reg (r)' : 'reg/r', + 'ModRM:reg (r, w)' : 'reg/rw', + 'ModRM:reg (w)' : 'reg/w', + 'ModRM:r/m (r)' : 'mrm/r', + 'ModRM:r/m (r, w)' : 'mrm/rw', + 'ModRM:r/m (w)' : 'mrm/w', + 'ModRM:r/m (r, ModRM:[7:6] must be 11b)' : 'mrm/r', + 'ModRM:r/m (r, ModRM:[7:6] must not be 11b)' : 'mrm/r', + 'ModRM:r/m (w, ModRM:[7:6] must not be 11b)' : 'mrm/w', + 'ModRM:r/m (r, w, ModRM:[7:6] must not be 11b)' : 'mrm/rw', + 'BaseReg (r): VSIB:base, VectorReg (r): VSIB:index' : 'sib/r', + 'SIB.base (r): Address of pointer SIB.index (r)' : 'sib/r', + 'EVEX.vvvv (r)' : 'vec/r', + 'EVEX.vvvv (w)' : 'vec/w', + 'VEX.vvvv (r)' : 'vec/r', + 'VEX.vvvv (r, w)' : 'vec/rw', + 'VEX.vvvv (w)' : 'vec/w', + 'ib' : 'imm', + 'iw' : 'imm', + 'iwd' : 'imm', + 'i16' : 'imm', + 'i32' : 'imm', + 'i64' : 'imm', + 'imm' : 'imm', + 'ime' : 'ime', + 'ib[3:0]' : 'imm', + 'ib[7:4]' : 'is4/r', + 'Implicit XMM0 (r)' : 'xmm0/r', + 'Implicit XMM0-7 (r, w)' : 'xmm0_7/rw', + 'opcode +r (r)' : 'opr/r', + 'opcode +r (r, w)' : 'opr/rw', + 'opcode +r (w)' : 'opr/w', + 'NA' : None, + '' : None +} + +opcode_map =3D { + '' : 'reg_xmm0', + '': 'reg_xmm0_7', + '1' : '1', + 'm' : 'mem', + 'al' : 'reg_al', + 'cl' : 'reg_cl', + 'ah' : 'reg_ah', + 'aw' : 'reg_aw', + 'cw' : 'reg_cw', + 'dw' : 'reg_dw', + 'bw' : 'reg_bw', + 'ax' : 'reg_ax', + 'cx' : 'reg_cx', + 'dx' : 'reg_dx', + 'bx' : 'reg_bx', + 'eax' : 'reg_eax', + 'ecx' : 'reg_ecx', + 'edx' : 'reg_edx', + 'ebx' : 'reg_ebx', + 'rax' : 'reg_rax', + 'rcx' : 'reg_rcx', + 'rdx' : 'reg_rdx', + 'rbx' : 'reg_rbx', + 'si' : 'reg_si', + 'di' : 'reg_di', + 'pa' : 'reg_pa', + 'pc' : 'reg_pc', + 'pd' : 'reg_pd', + 'pb' : 'reg_pb', + 'psi' : 'reg_psi', + 'pdi' : 'reg_pdi', + 'cs' : 'seg_cs', + 'ds' : 'seg_ds', + 'ss' : 'seg_ss', + 'es' : 'seg_es', + 'fs' : 'seg_fs', + 'gs' : 'seg_gs', + 'sreg' : 'seg', + 'dr0-dr7' : 'dreg', + 'cr0-cr15': 'creg', + 'cr8' : 'creg8', + 'st(0)' : 'reg_st0', + 'st(1)' : 'reg_st1', + 'st(i)' : 'st' +} + +def x86_mode(row): + l =3D list() + if row['Valid 64-bit'] =3D=3D 'Valid': + l.append('64') + if row['Valid 32-bit'] =3D=3D 'Valid': + l.append('32') + if row['Valid 16-bit'] =3D=3D 'Valid': + l.append('16') + return "/".join(l) + +def x86_operand(opcode,row): + l =3D list() + opcode =3D opcode.split(' ')[0] + operand1 =3D operand_map[row['Operand 1']] + operand2 =3D operand_map[row['Operand 2']] + operand3 =3D operand_map[row['Operand 3']] + operand4 =3D operand_map[row['Operand 4']] + if operand1: + l +=3D [operand1] + if operand2: + l +=3D [operand2] + if operand3: + l +=3D [operand3] + if operand4: + l +=3D [operand4] + return ",".join(l) + +def cleanup_oprs(args): + args =3D list(map(lambda x : x.lstrip().rstrip(), args.split(","))) + args =3D list(map(lambda x : x.replace('&',':'), args)) + args =3D list(map(lambda x : x.replace('{k1}','{k}'), args)) + args =3D list(map(lambda x : x.lower(), args)) + args =3D list(map(lambda x : opcode_map[x] if x in opcode_map else x, = args)) + for reg in ('r32', 'r64'): + for suffix in ('a', 'b'): + args =3D list(map(lambda x : x.replace(reg + suffix, reg), arg= s)) + args =3D list(map(lambda x : 'rw' if x =3D=3D 'r' else x, args)) + args =3D list(map(lambda x : 'rw/mw' if x =3D=3D 'r/m' else x, args)) + for reg in ('k', 'bnd', 'mm', 'xmm', 'ymm', 'zmm'): + for i in range(0,5): + args =3D list(map(lambda x : x.replace(reg + str(i), reg) \ + if x.find(reg) =3D=3D 0 else x, args)) + args =3D list(map(lambda x : x.replace(' ', ''), args)) + return args + +def split_opcode(opcode): + space_idx =3D opcode.find(' ') + if space_idx =3D=3D -1: + return (opcode,list()) + else: + return (opcode[:space_idx], cleanup_oprs(opcode[space_idx:])) + +def cleanup_opcode(opcode): + op, args =3D split_opcode(opcode) + if len(args) > 0: + return "%s %s " % (op, ",".join(args)) + else: + return op + +def cleanup_encoding(enc): + enc =3D enc.lower() + enc =3D enc.replace('/is4', 'ib') + enc =3D enc.replace('0f 38', '0f38') + enc =3D enc.replace('0f 3a', '0f3a') + enc =3D enc.replace(' ', ' ') + return enc + +def translate_modes(modes): + modelist =3D [] + if modes =3D=3D '': + return '0' + for m in modes.split('/'): + modelist +=3D ['x86_modes_%s' % m] + return "|".join(modelist) + +# add 9b, del rex rex.w +def translate_encoding(enc): + prefixes =3D [ 'hex', 'lex', 'vex', 'evex' ] + r_suffixes =3D [ 'rep', 'lock', 'norexb' ] + s_suffixes =3D [ 'o16', 'o32', 'o64', 'a16', 'a32', 'a64' ] + pbytes =3D [ '66', '9b', 'f2', 'f3' ] + maps =3D { '0f', '0f38', '0f3a', 'map4', 'map5', 'map6' } + widths =3D { 'w0', 'w1', 'wig', 'wb', 'wn', 'ws', 'wx', 'ww' } + lengths =3D { 'lig', 'lz', 'l0', 'l1', '128', '256', '512' } + flags =3D { 'nds', 'ndd', 'dds' } + imm =3D { 'ib', 'iw', 'iwd', 'i16', 'i32', 'i64' } + mods =3D { '/r', '/0', '/1', '/2', '/3', '/4', '/5', '/6', '/7' } + pl =3D [] + opc =3D ['0x00','0x00'] + opm =3D ['0x00','0x00'] + oplen =3D 0 + has_imm, has_pfx, has_pbyte, has_map =3D False, False, False, False + comps =3D enc.split(" ") + for el in comps: + is_hex =3D all(c in string.hexdigits for c in el[0:2]) + p =3D None + for sel in prefixes: + if el.find(sel) =3D=3D 0 and ( p =3D=3D None or len(sel) > len= (p) ): + p =3D sel + if p: + pl +=3D ['x86_enc_t_%s' % p.replace('.', '_')] + el =3D el[len(p):] + vp, vm, vw, vl, vf =3D None, None, None, None, None + for sel in el.split('.'): + if sel =3D=3D '': + pass + elif sel in pbytes: + vp =3D 'x86_enc_p_%s' % sel + elif sel in maps: + vm =3D 'x86_enc_m_%s' % sel + elif sel in widths: + vw =3D 'x86_enc_w_%s' % sel + elif sel in lengths: + vl =3D 'x86_enc_l_%s' % sel + elif sel in flags: + vf =3D 'x86_enc_f_%s' % sel + else: + raise Exception("unknown element '%s' for encoding" + " '%s" % (sel, enc)) + if vp: + pl +=3D [vp] + if vm: + pl +=3D [vm] + if vw: + pl +=3D [vw] + if vl: + pl +=3D [vl] + if vf: + pl +=3D [vf] + if p =3D=3D 'vex' or p =3D=3D 'evex' or p =3D=3D 'lex': + has_pfx =3D True + elif el in maps and len(comps) > 1 and not (has_map or has_pfx): + pl +=3D ['x86_enc_m_%s' % el] + has_map =3D True + elif el in r_suffixes: + pl +=3D ['x86_enc_r_%s' % el] + elif el in s_suffixes: + pl +=3D ['x86_enc_s_%s' % el] + elif el in imm: + if has_imm: + # additional immediate used by CALLF/JMPF/ENTER + if el in { 'ib', 'i16' }: + pl +=3D ['x86_enc_j_%s' % el] + else: + raise Exception("illegal immediate '%s' for encoding" + " '%s" % (el, enc)) + else: + pl +=3D ['x86_enc_i_%s' % el] + has_imm =3D True + elif el in mods: + if oplen =3D=3D 2: + raise Exception("opcode '%s' limit exceeded for encoding" + " '%s" % (el, enc)) + pl +=3D ['x86_enc_f_modrm_r' if el =3D=3D '/r' else 'x86_enc_f= _modrm_n'] + if el !=3D '/r': + opc[oplen] =3D '0x{:02x}'.format(int(el[1]) << 3) + opm[oplen] =3D '0x38' + oplen +=3D 1 + elif len(el) =3D=3D 2 and is_hex: + if oplen =3D=3D 2: + raise Exception("opcode '%s' limit exceeded for encoding" + " '%s" % (el, enc)) + if oplen =3D=3D 1: + pl +=3D ['x86_enc_f_opcode'] + opc[oplen] =3D '0x%s' % el[0:2] + opm[oplen] =3D '0xff' + oplen +=3D 1 + elif len(el) =3D=3D 4 and is_hex and el[2:4] =3D=3D '+r': + if oplen =3D=3D 2: + raise Exception("opcode '%s' limit exceeded for encoding " + "'%s" % (el, enc)) + pl +=3D ['x86_enc_%s_opcode_r' % ('o' if oplen =3D=3D 0 else '= f')] + opc[oplen] =3D '0x%s' % el[0:2] + opm[oplen] =3D '0xf8' + oplen +=3D 1 + else: + raise Exception("unknown element '%s' for encoding " + "'%s" % (el, enc)) + return "|".join(pl), opc, opm + +def translate_operands(operands): + oprlist =3D [] + typpat =3D re.compile('([if])(\\d+)x(\\d+)') + for i,arg0 in enumerate(operands): + flags =3D arg0.split('{') + argcomps =3D [] + for j,arg1 in enumerate(flags): + cp =3D arg1.find('}') + if cp =3D=3D -1: + arg1 =3D arg1.replace(':','_') + argp =3D [] + for arg2 in arg1.split('/'): + m =3D typpat.match(arg2) + if m: + argcomps +=3D ['x86_opr_' + arg2] + else: + argp +=3D [arg2] + argcomps +=3D ['x86_opr_' + '_'.join(argp)] + else: + arg1 =3D arg1.replace('}','') + argcomps +=3D ['x86_opr_flag_' + arg1] + oprlist +=3D ["|".join(argcomps)] + return oprlist + +def translate_order(order): + ol =3D [] + if order: + for o in order.split(','): + o =3D o.replace(':','_') + ol.append("|".join(map(lambda x: 'x86_ord_' + x, o.split('/'))= )) + return ol + +def print_insn(x86_insn): + for row in x86_insn: + opcode, enc, modes, ext, order, tt, desc =3D row + opcode =3D opcode.replace('reg_','') + print("| %-53s | %-31s | %-23s | %-8s |" % \ + (opcode, enc, order, modes)) + +def opcode_list(x86_insn): + ops =3D set() + for row in x86_insn: + opcode, enc, modes, ext, order, tt, desc =3D row + op, opr =3D split_opcode(opcode) + ops.add(op) + return ['NIL'] + sorted(ops) + +def operand_list(x86_insn): + oprset =3D set() + for idx, row in enumerate(x86_insn): + opcode, enc, modes, ext, order, tt, desc =3D row + op, opr =3D split_opcode(opcode) + oprset.add(tuple(translate_operands(opr))) + return sorted(oprset) + +def order_list(x86_insn): + ordset =3D set() + for idx, row in enumerate(x86_insn): + opcode, enc, modes, ext, order, tt, desc =3D row + ordset.add(tuple(translate_order(order))) + return sorted(ordset) + +opcode_enums_template =3D """/* generated source */ +enum x86_reg\n{%s}; +enum x86_op\n{%s};""" + +opcode_table_template =3D """/* generated source */ +const size_t x86_opc_table_size =3D %d; +const size_t x86_opr_table_size =3D %d; +const size_t x86_ord_table_size =3D %d; +const size_t x86_op_names_size =3D %d; +const x86_opc_data x86_opc_table[] =3D\n{ + { x86_op_NIL, 0, 0, 0, 0, { { 0, 0 } }, { { 0, 0 } } },%s}; +const x86_opr_data x86_opr_table[] =3D\n{%s}; +const x86_ord_data x86_ord_table[] =3D\n{%s}; +const char* x86_op_names[] =3D\n{%s}; +const char* x86_reg_names[512] =3D\n{%s};""" + +def print_opcode_enums(x86_reg, x86_insn): + regstr, opstr =3D '\n', '\n' + for i,s,f in x86_reg: + n =3D s.replace('(','').replace(')','') + regstr +=3D '\n' if len(s) =3D=3D 0 else \ + ' %-10s =3D %s,\n' % ('x86_%s' % n, 'x86_%s | %d' % (f, i)) + for op in opcode_list(x86_insn): + opstr +=3D ' x86_op_%s,\n' % op + print(opcode_enums_template % (regstr, opstr)) + +def print_opcode_tables(x86_reg, x86_insn): + oplist =3D opcode_list(x86_insn) + oprlist =3D operand_list(x86_insn) + ordlist =3D order_list(x86_insn) + oprmap =3D {v: i for i, v in enumerate(oprlist)} + ordmap =3D {v: i for i, v in enumerate(ordlist)} + opcstr, oprstr, ordstr, opsstr, regstr =3D '\n', '\n', '\n', '\n', '\n' + for idx, row in enumerate(x86_insn): + opcode, enc, modes, ext, order, tt, desc =3D row + op, opr =3D split_opcode(opcode) + oprl =3D translate_operands(opr) + ordl =3D translate_order(order) + oprc =3D oprmap[tuple(oprl)] + ordc =3D ordmap[tuple(ordl)] + modes =3D translate_modes(modes) + enc, opc, opm =3D translate_encoding(enc) + opcstr +=3D ' { %s, %s, %d, %d, %s, { %s }, { %s } },\n' % \ + ('x86_op_%s' % op, modes, oprc, ordc, enc, + '{ %s, %s }' % (opc[0], opc[1]), + '{ %s, %s }' % (opm[0], opm[1])) + for x in oprlist: + oprstr +=3D ' { { %s } },\n' % (", ".join(['0'] if not x else x)) + for x in ordlist: + ordstr +=3D ' { { %s } },\n' % (", ".join(['0'] if not x else x)) + for op in oplist: + opsstr +=3D ' "' + op.lower() + '",\n' + for i,s,f in x86_reg: + n =3D s.replace('(','').replace(')','') + regstr +=3D '\n' if len(s) =3D=3D 0 else \ + ' %-12s =3D \"%s\",\n' % ('[x86_%s]' % n, s) + print(opcode_table_template % ( + len(x86_insn) + 1, len(oprlist), len(ordlist), len(oplist), + opcstr, oprstr, ordstr, opsstr, regstr) + ) + +def read_data(files): + data =3D [] + if not isinstance(files, list): + files =3D glob.glob(files) + for csvpath in files: + file =3D open(csvpath, encoding=3D'utf-8-sig', newline=3D'') + reader =3D csv.DictReader(file, delimiter=3D',', quotechar=3D'"') + for row in reader: + data +=3D [row] + data.sort(key=3Dlambda x: ( + x['Instruction'].split(' ')[0], + x['Opcode'].split(' ')[1], + x['Instruction'])) + insn =3D [] + for row in data: + opcode =3D cleanup_opcode(row['Instruction']) + enc =3D cleanup_encoding(row['Opcode']) + modes =3D x86_mode(row) + ext =3D row['Feature Flags'] + order =3D x86_operand(opcode,row) + tt =3D row['Tuple Type'] + desc =3D row['Description'] + insn +=3D [[opcode, enc, modes, ext, order, tt, desc]] + return insn + +def parse_table(rows): + rows =3D [row.strip() for row in rows] + data =3D [] + obj =3D [] + h1 =3D None + begun =3D False + for row in rows: + if row.startswith("#"): + space =3D row.index(' ') + hashes =3D row[:space] + heading =3D row[space+1:] + depth =3D hashes.count('#') + if not h1: + h1 =3D heading + obj.append(TableSection(heading, depth)) + elif row.startswith("|"): + cells =3D row.split('|') + cells =3D [cell.strip() for cell in cells if cell.strip()] + if cells[0].startswith("-") or cells[0].startswith(":"): + begun =3D True + obj.append(TableHeader()) + elif begun: + data.append(cells) + obj.append(TableData(cells)) + else: + begun =3D False + obj.append(TableText(row)) + return { 'title': h1, 'data': data, 'obj': obj } + +def read_file(file_path): + try: + with open(file_path, 'r') as file: + lines =3D file.readlines() + lines =3D [line.strip() for line in lines] + return lines + except FileNotFoundError: + print(f"File not found: {file_path}") + return [] + +def make_map(x86_insn): + insn_map =3D dict() + for row in x86_insn: + opcode, enc, modes, ext, order, tt, desc =3D row + op, opr =3D split_opcode(opcode) + if op not in insn_map: + insn_map[op] =3D list() + insn_map[op].append(row) + return insn_map + +# +# description table types +# +class TableText(): + def __init__(self,text): + self.text =3D text +class TableSection(): + def __init__(self,heading,depth): + self.heading =3D heading + self.depth =3D depth +class TableHeader(): + def __init__(self): + return +class TableData(): + def __init__(self,cells): + self.cells =3D cells + +# +# print descriptions with instructions +# +def table_text_insn(self,x86_desc): + return "" +def table_section_insn(self,x86_desc): + return "%s %s\n\n" % ("#" * self.depth, self.heading) +def table_header_insn(self,x86_desc): + return "" +def table_data_insn(self,x86_desc): + insn_map =3D x86_desc['insn'] + insn, desc =3D self.cells + text =3D "" + insn_list =3D [] + o =3D insn.find("cc") + if insn.startswith("v"): + insn_list.append(insn[1:]) + insn_list.append(insn.upper()) + elif o >=3D 0: + for cc in cc_all: + new_insn =3D '%s%s%s' % (insn[0:o], cc, insn[o+2:]) + insn_list.append(new_insn) + else: + insn_list.append(insn) + if len(insn_list) > 0: + text +=3D "\n" + text +=3D "| %-51s | %-29s | %-23s | %-8s |\n" % \ + ("opcode", "encoding", "order", "modes") + text +=3D "|:%-51s-|:%-29s-|:%-23s-|:%-8s-|\n" % \ + ("-"*51, "-"*29, "-"*23, "-"*8) + for insn_name in insn_list: + if insn_name in insn_map: + for row in insn_map[insn_name]: + opcode, enc, modes, ext, order, tt, desc =3D row + opcode =3D opcode.replace('reg_','') + text +=3D "| %-51s | %-29s | %-23s | %-8s |\n" % \ + (opcode, enc, order, modes) + text +=3D "|:%-51s-|:%-29s-|:%-23s-|:%-8s-|\n" % \ + ("-"*51, "-"*29, "-"*23, "-"*8) + text +=3D "\n\n" + return "%s %s\n%s" % ("[%s]" % insn, "# %s" % desc, text) + +table_insn =3D { + TableText: table_text_insn, TableSection: table_section_insn, + TableHeader: table_header_insn, TableData: table_data_insn, +} + +def print_fancy_insn(x86_desc): + for obj in x86_desc['tab']['obj']: + print(table_insn[type(obj)](obj, x86_desc), end=3D"") + +parser =3D argparse.ArgumentParser(description=3D'x86 table generator') +parser.add_argument('files', + default=3D'data/*.csv', nargs=3D'*', + help=3D'x86 csv metadata') +parser.add_argument('--print-insn', + default=3DFalse, action=3D'store_true', + help=3D'print instructions') +parser.add_argument('--print-fancy-insn', + default=3DFalse, action=3D'store_true', + help=3D'print fancy instructions') +parser.add_argument('--print-opcode-enums', + default=3DFalse, action=3D'store_true', + help=3D'print register enum') +parser.add_argument('--print-opcode-tables', + default=3DFalse, action=3D'store_true', + help=3D'print register strings') +parser.add_argument('--output-file', type=3Dargparse.FileType('w'), + help=3D"filename to write output to") +args =3D parser.parse_args() + +x86_reg =3D reg_table() +x86_insn =3D read_data(args.files) + +if args.output_file: + sys.stdout =3D args.output_file +if args.print_insn: + print_insn(x86_insn) +if args.print_opcode_enums: + print_opcode_enums(x86_reg, x86_insn) +if args.print_opcode_tables: + print_opcode_tables(x86_reg, x86_insn) --=20 2.43.0