From nobody Fri Apr 3 22:35:00 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=linux.microsoft.com ARC-Seal: i=1; a=rsa-sha256; t=1774274786; cv=none; d=zohomail.com; s=zohoarc; b=Xh6P69WbUEoGZDyA7nJz2Uw6RV5W9aIe7ae8hwz6Vr2qhzaJxltfyLJ9E1ix5bqqOeo4v7ZWDZe79Gd8vUkGIja7aBBf8+y3dqBHPXJeTTX1K42U76MrDlXZcbVweE2TJ5DYNZ9eWYdE7rULbvEEbrmc6PWhZnCDGSUXgWTzTwY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1774274786; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=+vncmzL8fth993nkvLF4vJxsSvuXK42ECZb3oeQw0p4=; b=Lsp8rDs2vwSQupFt00jY+/X1hAA9InbklKf1hj5Sud3x3QXyEDKr41FF8r+XrGAdqsG7tHgY9mmvjTgN0CFDyQcDuao+f4gZikC02yM/vxrJ83x8uJGjwfrl3bG0uQ32bPLxpDJl3HHVzu/DHhXhdYsPdXW7Et8MzcVj7vFoUeA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1774274786201898.1127390955759; Mon, 23 Mar 2026 07:06:26 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1w4fsE-0003jy-1A; Mon, 23 Mar 2026 10:03:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1w4fpK-0000tP-HG for qemu-devel@nongnu.org; Mon, 23 Mar 2026 10:00:23 -0400 Received: from linux.microsoft.com ([13.77.154.182]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1w4fpG-0007At-Ai for qemu-devel@nongnu.org; Mon, 23 Mar 2026 10:00:17 -0400 Received: from DESKTOP-TUU1E5L.localdomain (unknown [167.220.208.76]) by linux.microsoft.com (Postfix) with ESMTPSA id 803C220B6F1F; Mon, 23 Mar 2026 06:59:59 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 803C220B6F1F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1774274402; bh=+vncmzL8fth993nkvLF4vJxsSvuXK42ECZb3oeQw0p4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VLfFYUZ0pSnR7atr85LKQAMWgQ/3Yv2GouJJR/SuwF7e5n7PfNhzXbwYQ+QyECsG5 jZoELpMOAtRKUKU8kyafANhOblvAn9TVJcMas7HivK+Gh5dkKI2Wd4AXgbrl0x8kKy XEDCvido95ixkYvl0PPIFYB2FUU2AgW/2er1CbAQ= From: Magnus Kulke To: qemu-devel@nongnu.org Cc: kvm@vger.kernel.org, Wei Liu , Richard Henderson , Marcelo Tosatti , Marcel Apfelbaum , Wei Liu , Alex Williamson , Paolo Bonzini , Zhao Liu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Magnus Kulke , Magnus Kulke , "Michael S. Tsirkin" Subject: [RFC 28/32] target/i386: add de/compaction to xsave_helper Date: Mon, 23 Mar 2026 14:58:08 +0100 Message-Id: <20260323135812.383509-29-magnuskulke@linux.microsoft.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260323135812.383509-1-magnuskulke@linux.microsoft.com> References: <20260323135812.383509-1-magnuskulke@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=13.77.154.182; envelope-from=magnuskulke@linux.microsoft.com; helo=linux.microsoft.com X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @linux.microsoft.com) X-ZM-MESSAGEID: 1774274787428154100 Content-Type: text/plain; charset="utf-8" HyperV use XSAVES which stores extended state in compacted format in which components are packed contiguously, while QEMU's internal XSAVE representation use the standard format in which each component is places at a fixed offset. Hence for this purpose we add two conversion fn's to the xsave helper to roundtrip XSAVE state in a migration. - decompact_xsave_area(): converts compacted format to standard. XSTATE_BV is masked to host XCR0 since IA32_XSS is managed by the hypervisor. - compact_xsave_area(): converts standard format back to compacted format. XCOMP_BV is set from the host's CPUID 0xD.0 rather than the guest's XCR0, as this is what the hypervisor expects. Both functions use the host's CPUID leaf 0xD subleaves to determine compone= nt sizes, offsets, and alignment requirements. There are situations when the host advertises features that we want to disable for the guest, e.g. AMX TILE. In this case we cannot rely on the host's xcr0, but instead we use the feature mask that has been generated in as part of the CPU realization process (x86_cpu_expand_features). Signed-off-by: Magnus Kulke --- target/i386/cpu.h | 2 + target/i386/xsave_helper.c | 255 +++++++++++++++++++++++++++++++++++++ 2 files changed, 257 insertions(+) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 4ad4a35ce9..cd5d5a5369 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -3033,6 +3033,8 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void= *buf, uint32_t buflen); void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen); uint32_t xsave_area_size(uint64_t mask, bool compacted); void x86_update_hflags(CPUX86State* env); +int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env); +int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen); =20 static inline bool hyperv_feat_enabled(X86CPU *cpu, int feat) { diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c index bab2258732..2272b83f5f 100644 --- a/target/i386/xsave_helper.c +++ b/target/i386/xsave_helper.c @@ -3,6 +3,7 @@ * See the COPYING file in the top-level directory. */ #include "qemu/osdep.h" +#include "qemu/error-report.h" =20 #include "cpu.h" =20 @@ -293,3 +294,257 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void= *buf, uint32_t buflen) } #endif } + +#define XSTATE_BV_IN_HDR offsetof(X86XSaveHeader, xstate_bv) +#define XCOMP_BV_IN_HDR offsetof(X86XSaveHeader, xcomp_bvo) + +typedef struct X86XSaveAreaView { + /* 512 bytes */ + X86LegacyXSaveArea legacy; + /* 64 bytes */ + X86XSaveHeader header; + /* ...followed by individual xsave areas */ +} X86XSaveAreaView; + +#define XSAVE_XSTATE_BV_OFFSET offsetof(X86XSaveAreaView, header.xstate_b= v) +#define XSAVE_XCOMP_BV_OFFSET offsetof(X86XSaveAreaView, header.xcomp_bv) +#define XSAVE_EXT_OFFSET (sizeof(X86LegacyXSaveArea) + \ + sizeof(X86XSaveHeader)) + +/** + * decompact_xsave_area - Convert compacted XSAVE format to standard format + * @buf: Source buffer containing compacted XSAVE data + * @buflen: Size of source buffer + * @env: CPU state where the standard format buffer will be written to + * + * Accelerator backends like MSHV might return XSAVE state in compacted fo= rmat + * (XSAVEC). The state components have to be packed contiguously without g= aps. + * The XSAVE qemu buffers are in standard format where each component has a + * fixed offset. + * + * Returns: 0 on success, negative errno on failure + */ +int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env) +{ + uint64_t compacted_xstate_bv, compacted_xcomp_bv, compacted_layout_bv; + uint64_t xsave_offset, *xcomp_bv; + size_t i; + uint32_t eax, ebx, ecx, edx; + uint32_t size, dst_off; + bool align64; + uint64_t guest_xcr0, *xstate_bv; + + compacted_xstate_bv =3D *(uint64_t *)(buf + XSAVE_XSTATE_BV_OFFSET); + compacted_xcomp_bv =3D *(uint64_t *)(buf + XSAVE_XCOMP_BV_OFFSET); + + /* This function only handles compacted format (bit 63 set) */ + assert((compacted_xcomp_bv >> 63) & 1); + + /* Low bits of XCOMP_BV describe which components are in the layout */ + compacted_layout_bv =3D compacted_xcomp_bv & ~(1ULL << 63); + + /* Zero out buffer, then copy legacy region (FP + SSE) and header as-i= s */ + memset(env->xsave_buf, 0, env->xsave_buf_len); + memcpy(env->xsave_buf, buf, XSAVE_EXT_OFFSET); + + /* + * We mask XSTATE_BV with the guest's supported XCR0 because: + * 1. Supervisor state (IA32_XSS) is hypervisor-managed, we don't use + * this state for migration. + * 2. Features disabled at partition creation (e.g. AMX) must be exclu= ded + */ + guest_xcr0 =3D ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) | + env->features[FEAT_XSAVE_XCR0_LO]; + xstate_bv =3D (uint64_t *)(env->xsave_buf + XSAVE_XSTATE_BV_OFFSET); + *xstate_bv &=3D guest_xcr0; + + /* Clear bit 63 - output is standard format, not compacted */ + xcomp_bv =3D (uint64_t *)(env->xsave_buf + XSAVE_XCOMP_BV_OFFSET); + *xcomp_bv =3D *xcomp_bv & ~(1ULL << 63); + + /* + * Process each extended state component in the compacted layout. + * Components 0 and 1 (FP and SSE) are in the legacy region, so we + * start at component 2. For each component: + * - Calculate its offset in the compacted source (contiguous layout) + * - Get its fixed offset in the standard destination from CPUID + * - Copy if the component has non-init state (bit set in XSTATE_BV) + */ + xsave_offset =3D XSAVE_EXT_OFFSET; + for (i =3D 2; i < 63; i++) { + if (((compacted_layout_bv >> i) & 1) =3D=3D 0) { + continue; + } + + /* Query guest CPUID for this component's size and standard offset= */ + cpu_x86_cpuid(env, 0xD, i, &eax, &ebx, &ecx, &edx); + + size =3D eax; + dst_off =3D ebx; + align64 =3D (ecx & (1u << 1)) !=3D 0; + + /* Component is in the layout but unknown to the guest CPUID model= */ + if (size =3D=3D 0) { + /* + * The hypervisor might expose a component that has no + * representation in the guest CPUID model. We query the host = to + * retrieve the size of the component, so we can skip over it. + */ + host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx); + size =3D eax; + align64 =3D (ecx & (1u << 1)) !=3D 0; + if (size =3D=3D 0) { + error_report("xsave component %zu: size unknown to both " + "guest and host CPUID", i); + return -EINVAL; + } + + if (align64) { + xsave_offset =3D QEMU_ALIGN_UP(xsave_offset, 64); + } + + if (xsave_offset + size > buflen) { + error_report("xsave component %zu overruns source buffer: " + "offset=3D%zu size=3D%u buflen=3D%zu", + i, xsave_offset, size, buflen); + return -E2BIG; + } + + xsave_offset +=3D size; + continue; + } + + if (align64) { + xsave_offset =3D QEMU_ALIGN_UP(xsave_offset, 64); + } + + if ((xsave_offset + size) > buflen) { + error_report("xsave component %zu overruns source buffer: " + "offset=3D%zu size=3D%u buflen=3D%zu", + i, xsave_offset, size, buflen); + return -E2BIG; + } + + if ((dst_off + size) > env->xsave_buf_len) { + error_report("xsave component %zu overruns destination buffer:= " + "offset=3D%u size=3D%u buflen=3D%zu", + i, dst_off, size, (size_t)env->xsave_buf_len); + return -E2BIG; + } + + /* Copy components marked present in XSTATE_BV to guest model */ + if (((compacted_xstate_bv >> i) & 1) !=3D 0) { + memcpy(env->xsave_buf + dst_off, buf + xsave_offset, size); + } + + xsave_offset +=3D size; + } + + return 0; +} + +/** + * compact_xsave_area - Convert standard XSAVE format to compacted format + * @env: CPU state containing the standard format XSAVE buffer + * @buf: Destination buffer for compacted XSAVE data (to send to hyperviso= r) + * @buflen: Size of destination buffer + * + * Accelerator backends like MSHV might expect XSAVE state in compacted fo= rmat + * (XSAVEC). The state components are packed contiguously without gaps. + * The XSAVE qemu buffers are in standard format where each component has a + * fixed offset. + * + * This function converts from standard to compacted format, it accepts a + * pre-allocated destination buffer of sufficient size, it is the + * responsibility of the caller to ensure the buffer is big enough. + * + * Returns: total size of compacted XSAVE data written to @buf + */ +int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen) +{ + uint64_t *xcomp_bv; + size_t i; + uint32_t eax, ebx, ecx, edx; + uint32_t size, src_off; + bool align64; + size_t compact_offset; + uint64_t host_xcr0_mask, guest_xcr0; + + /* Zero out buffer, then copy legacy region (FP + SSE) and header as-i= s */ + memset(buf, 0, buflen); + memcpy(buf, env->xsave_buf, XSAVE_EXT_OFFSET); + + /* + * Set XCOMP_BV to indicate compacted format (bit 63) and which + * components are in the layout. + * + * We must explicitly set XCOMP_BV because x86_cpu_xsave_all_areas() + * produces standard format with XCOMP_BV=3D0 (buffer is zeroed and on= ly + * XSTATE_BV is set in the header). + * + * XCOMP_BV must reflect the partition's XSAVE capability, not the + * guest's current XCR0 (env->xcr0). These differ b/c: + * - A guest's XCR0 is what the guest OS has enabled via XSETBV + * - The partition's XCR0 mask is the hypervisor's save/restore capabi= lity + * + * The hypervisor uses XSAVES which saves based on its capability, so = the + * XCOMP_BV value in the buffer we send back must match that capabilit= y. + * + * We intersect the host XCR0 with the guest's supported XCR0 features + * (FEAT_XSAVE_XCR0_*) so that features disabled at partition creation + * (e.g. AMX) are excluded from the compacted layout. + */ + host_cpuid(0xD, 0, &eax, &ebx, &ecx, &edx); + host_xcr0_mask =3D ((uint64_t)edx << 32) | eax; + guest_xcr0 =3D ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) | + env->features[FEAT_XSAVE_XCR0_LO]; + host_xcr0_mask &=3D guest_xcr0; + xcomp_bv =3D buf + XSAVE_XCOMP_BV_OFFSET; + *xcomp_bv =3D host_xcr0_mask | (1ULL << 63); + + /* + * Process each extended state component in the host's XCR0. + * The compacted layout must match XCOMP_BV (host capability). + * + * For each component: + * - Get its size and standard offset from host CPUID + * - Apply 64-byte alignment if required + * - Copy data only if guest has this component (bit set in env->xcr0) + * - Always advance offset to maintain correct layout + */ + compact_offset =3D XSAVE_EXT_OFFSET; + for (i =3D 2; i < 63; i++) { + if (!((host_xcr0_mask >> i) & 1)) { + continue; + } + + /* Query host CPUID for this component's size and standard offset = */ + host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx); + size =3D eax; + src_off =3D ebx; + align64 =3D (ecx >> 1) & 1; + + if (size =3D=3D 0) { + /* Component in host xcr0 but unknown - shouldn't happen */ + continue; + } + + /* Apply 64-byte alignment if required by this component */ + if (align64) { + compact_offset =3D QEMU_ALIGN_UP(compact_offset, 64); + } + + /* + * Only copy data if guest has this component enabled in XCR0. + * Otherwise the component remains zeroed (init state), but we + * still advance the offset to maintain the correct layout. + */ + if ((env->xcr0 >> i) & 1) { + memcpy(buf + compact_offset, env->xsave_buf + src_off, size); + } + + compact_offset +=3D size; + } + + return compact_offset; +} --=20 2.34.1