:p
atchew
Login
v4 -> v5: Largely unchanged and resent for review after the 4.20 dev cycle started. * Addressed Andrew's nits in v4/patch1 * Addressed Jan's concern with MTRR overrides in v4/patch6 by keeping the same MTRR data in the vCPU contexts for HVM domain creations. v4: https://lore.kernel.org/xen-devel/cover.1719416329.git.alejandro.vallejo@cloud.com/ v3 -> v4: * Fixed cpuid() bug in hvmloader, causing UB in v3 * Fixed a bogus assert in hvmloader, also causing a crash in v3 * Used HVM contexts rather than sync'd algos between Xen and toolstack in order to initialise the per-vCPU LAPIC state. * Formatting asjustments. v3: https://lore.kernel.org/xen-devel/cover.1716976271.git.alejandro.vallejo@cloud.com/ v2 -> v3: (v2/patch2 and v2/patch4 are already committed) * Moved the vlapic check hook addition to v3/patch1 * And created a check hook for the architectural state too for consistency. * Fixed migrations from Xen <= 4.13 by reconstructing the previous topology. * Correctly set the APIC ID after a policy change when vlapic is already in x2APIC mode. * Removed bogus assumption introduced in v1 and v2 on hvmloader about which 8bit APIC IDs represent ids > 254. (it's "id % 0xff", not "min(id, 0xff)". * Used an x2apic flag check instead. * Various formatting adjustments. v2: https://lore.kernel.org/xen-devel/cover.1715102098.git.alejandro.vallejo@cloud.com/ v1 -> v2: * v1/patch 4 replaced by a different strategy (See patches 4 and 5 in v2): * Have hvmloader populate MADT with the real APIC IDs as read by the APs themselves rather than giving it knowledge on how to derive them. * Removed patches 2 and 3 in v1, as no longer relevant. * Split v1/patch6 in two parts ((a) creating the generator and (b) plugging it in) and use the generator in the unit tests of the vcpuid->apicid mapping function. Becomes patches 6 and 8 in v2. Patch 1: Same as v1/patch1. Patch 2: Header dependency cleanup in preparation for patch3. Patch 3: Adds vlapic_hidden check for the newly introduced reserved area. Patch 4: [hvmloader] Replaces INIT+SIPI+SIPI sequences with hypercalls. Patch 5: [hvmloader] Retrieve the per-CPU APIC IDs from the APs themselves. Patch 6: Split from v1/patch6. Patch 7: Logically matching v1/patch5, but using v2/patch6 for testing. Patch 8: Split from v1/patch6. v1: https://lore.kernel.org/xen-devel/20240109153834.4192-1-alejandro.vallejo@cloud.com/ === Original cover letter === Current topology handling is close to non-existent. As things stand, APIC IDs are allocated through the apic_id=vcpu_id*2 relation without giving any hints to the OS on how to parse the x2APIC ID of a given CPU and assuming the guest will assume 2 threads per core. This series involves bringing x2APIC IDs into the migration stream, so older guests keep operating as they used to and enhancing Xen+toolstack so new guests get topology information consistent with their x2APIC IDs. As a side effect of this, x2APIC IDs are now packed and don't have (unless under a pathological case) gaps. Further work ought to allow combining this topology configurations with gang-scheduling of guest hyperthreads into affine physical hyperthreads. For the time being it purposefully keeps the configuration of "1 socket" + "1 thread per core" + "1 core per vCPU". Patch 1: Includes x2APIC IDs in the migration stream. This allows Xen to reconstruct the right x2APIC IDs on migrated-in guests, and future-proofs itself in the face of x2APIC ID derivation changes. Patch 2: Minor refactor to expose xc_cpu_policy in libxl Patch 3: Refactors xen/lib/x86 to work on non-Xen freestanding environments (e.g: hvmloader) Patch 4: Remove old assumptions about vcpu_id<->apic_id relationship in hvmloader Patch 5: Add logic to derive x2APIC IDs given a CPU policy and vCPU IDs Patch 6: Includes a simple topology generator for toolstack so new guests have topologically consistent information in CPUID Alejandro Vallejo (10): tools/hvmloader: Fix non-deterministic cpuid() x86/vlapic: Move lapic migration checks to the check hooks xen/x86: Add initial x2APIC ID to the per-vLAPIC save area xen/x86: Add supporting code for uploading LAPIC contexts during domain create tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves tools/libguest: Always set vCPU context in vcpu_hvm() xen/lib: Add topology generator for x86 xen/x86: Derive topologically correct x2APIC IDs from the policy tools/libguest: Set distinct x2APIC IDs for each vCPU xen/x86: Synthesise domain topologies tools/firmware/hvmloader/config.h | 6 +- tools/firmware/hvmloader/hvmloader.c | 4 +- tools/firmware/hvmloader/smp.c | 54 ++++-- tools/firmware/hvmloader/util.c | 9 - tools/firmware/hvmloader/util.h | 27 ++- tools/include/xen-tools/common-macros.h | 5 + tools/libs/guest/xg_cpuid_x86.c | 24 ++- tools/libs/guest/xg_dom_x86.c | 114 ++++++++----- tools/tests/cpu-policy/test-cpu-policy.c | 201 +++++++++++++++++++++++ xen/arch/x86/cpu-policy.c | 9 +- xen/arch/x86/cpuid.c | 14 +- xen/arch/x86/hvm/vlapic.c | 123 ++++++++++---- xen/arch/x86/include/asm/hvm/vlapic.h | 1 + xen/include/public/arch-x86/hvm/save.h | 2 + xen/include/xen/lib/x86/cpu-policy.h | 27 +++ xen/lib/x86/policy.c | 164 ++++++++++++++++++ xen/lib/x86/private.h | 4 + 17 files changed, 683 insertions(+), 105 deletions(-) -- 2.45.2
hvmloader's cpuid() implementation deviates from Xen's in that the value passed on ecx is unspecified. This means that when used on leaves that implement subleaves it's unspecified which one you get; though it's more than likely an invalid one. Import Xen's implementation so there are no surprises. Fixes: 318ac791f9f9 ("Add utilities needed for SMBIOS generation to hvmloader") Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * Added Fixes tag * Cosmetic changes to static inline, as proposed by Andrew --- tools/firmware/hvmloader/util.c | 9 --------- tools/firmware/hvmloader/util.h | 27 ++++++++++++++++++++++++--- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -XXX,XX +XXX,XX @@ memcmp(const void *s1, const void *s2, unsigned n) return 0; } -void -cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx) -{ - asm volatile ( - "cpuid" - : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx) - : "0" (idx) ); -} - static const char hex_digits[] = "0123456789abcdef"; /* Write a two-character hex representation of 'byte' to digits[]. diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/util.h +++ b/tools/firmware/hvmloader/util.h @@ -XXX,XX +XXX,XX @@ int uart_exists(uint16_t uart_base); int lpt_exists(uint16_t lpt_base); int hpet_exists(unsigned long hpet_base); -/* Do cpuid instruction, with operation 'idx' */ -void cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx, - uint32_t *ecx, uint32_t *edx); +/* Some CPUID calls want 'count' to be placed in ecx */ +static inline void cpuid_count( + uint32_t leaf, + uint32_t subleaf, + uint32_t *eax, + uint32_t *ebx, + uint32_t *ecx, + uint32_t *edx) +{ + asm volatile ( "cpuid" + : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx) + : "a" (leaf), "c" (subleaf) ); +} + +/* Generic CPUID function (subleaf 0) */ +static inline void cpuid( + uint32_t leaf, + uint32_t *eax, + uint32_t *ebx, + uint32_t *ecx, + uint32_t *edx) +{ + cpuid_count(leaf, 0, eax, ebx, ecx, edx); +} /* Read the TSC register. */ static inline uint64_t rdtsc(void) -- 2.45.2
While doing this, factor out checks common to architectural and hidden state. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> --- v5: * No change --- xen/arch/x86/hvm/vlapic.c | 81 +++++++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 28 deletions(-) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -XXX,XX +XXX,XX @@ static void lapic_load_fixup(struct vlapic *vlapic) v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr); } -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) -{ - unsigned int vcpuid = hvm_load_instance(h); - struct vcpu *v; - struct vlapic *s; +static int lapic_check_common(const struct domain *d, unsigned int vcpuid) +{ if ( !has_vlapic(d) ) return -ENODEV; /* Which vlapic to load? */ - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + if ( !domain_vcpu(d, vcpuid) ) { dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n", d->domain_id, vcpuid); return -EINVAL; } - s = vcpu_vlapic(v); - if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 ) + return 0; +} + +static int cf_check lapic_check_hidden(const struct domain *d, + hvm_domain_context_t *h) +{ + unsigned int vcpuid = hvm_load_instance(h); + struct hvm_hw_lapic s; + int rc; + + if ( (rc = lapic_check_common(d, vcpuid)) ) + return rc; + + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) != 0 ) + return -ENODATA; + + /* EN=0 with EXTD=1 is illegal */ + if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) == + APIC_BASE_EXTD ) return -EINVAL; + return 0; +} + +static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) +{ + unsigned int vcpuid = hvm_load_instance(h); + struct vcpu *v = d->vcpu[vcpuid]; + struct vlapic *s = vcpu_vlapic(v); + + if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 ) + ASSERT_UNREACHABLE(); + s->loaded.hw = 1; if ( s->loaded.regs ) lapic_load_fixup(s); - if ( !(s->hw.apic_base_msr & APIC_BASE_ENABLE) && - unlikely(vlapic_x2apic_mode(s)) ) - return -EINVAL; - hvm_update_vlapic_mode(v); return 0; } -static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h) +static int cf_check lapic_check_regs(const struct domain *d, + hvm_domain_context_t *h) { unsigned int vcpuid = hvm_load_instance(h); - struct vcpu *v; - struct vlapic *s; + int rc; - if ( !has_vlapic(d) ) - return -ENODEV; + if ( (rc = lapic_check_common(d, vcpuid)) ) + return rc; - /* Which vlapic to load? */ - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) - { - dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n", - d->domain_id, vcpuid); - return -EINVAL; - } - s = vcpu_vlapic(v); + if ( !hvm_get_entry(LAPIC_REGS, h) ) + return -ENODATA; + + return 0; +} + +static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h) +{ + unsigned int vcpuid = hvm_load_instance(h); + struct vcpu *v = d->vcpu[vcpuid]; + struct vlapic *s = vcpu_vlapic(v); if ( hvm_load_entry(LAPIC_REGS, h, s->regs) != 0 ) - return -EINVAL; + ASSERT_UNREACHABLE(); s->loaded.id = vlapic_get_reg(s, APIC_ID); s->loaded.ldr = vlapic_get_reg(s, APIC_LDR); @@ -XXX,XX +XXX,XX @@ static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h) return 0; } -HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, NULL, +HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, lapic_check_hidden, lapic_load_hidden, 1, HVMSR_PER_VCPU); -HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, NULL, +HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, lapic_check_regs, lapic_load_regs, 1, HVMSR_PER_VCPU); int vlapic_init(struct vcpu *v) -- 2.45.2
This allows the initial x2APIC ID to be sent on the migration stream. This allows further changes to topology and APIC ID assignment without breaking existing hosts. Given the vlapic data is zero-extended on restore, fix up migrations from hosts without the field by setting it to the old convention if zero. The hardcoded mapping x2apic_id=2*vcpu_id is kept for the time being, but it's meant to be overriden by toolstack on a later patch with appropriate values. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * No change --- xen/arch/x86/cpuid.c | 14 +++++--------- xen/arch/x86/hvm/vlapic.c | 22 ++++++++++++++++++++-- xen/arch/x86/include/asm/hvm/vlapic.h | 1 + xen/include/public/arch-x86/hvm/save.h | 2 ++ 4 files changed, 28 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -XXX,XX +XXX,XX @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, const struct cpu_user_regs *regs; case 0x1: - /* TODO: Rework topology logic. */ res->b &= 0x00ffffffu; if ( is_hvm_domain(d) ) - res->b |= (v->vcpu_id * 2) << 24; + res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24; /* TODO: Rework vPMU control in terms of toolstack choices. */ if ( vpmu_available(v) && @@ -XXX,XX +XXX,XX @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, case 0xb: /* - * In principle, this leaf is Intel-only. In practice, it is tightly - * coupled with x2apic, and we offer an x2apic-capable APIC emulation - * to guests on AMD hardware as well. - * - * TODO: Rework topology logic. + * Don't expose topology information to PV guests. Exposed on HVM + * along with x2APIC because they are tightly coupled. */ - if ( p->basic.x2apic ) + if ( is_hvm_domain(d) && p->basic.x2apic ) { *(uint8_t *)&res->c = subleaf; /* Fix the x2APIC identifier. */ - res->d = v->vcpu_id * 2; + res->d = vlapic_x2apic_id(vcpu_vlapic(v)); } break; diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -XXX,XX +XXX,XX @@ static uint32_t x2apic_ldr_from_id(uint32_t id) static void set_x2apic_id(struct vlapic *vlapic) { const struct vcpu *v = vlapic_vcpu(vlapic); - uint32_t apic_id = v->vcpu_id * 2; + uint32_t apic_id = vlapic->hw.x2apic_id; uint32_t apic_ldr = x2apic_ldr_from_id(apic_id); /* @@ -XXX,XX +XXX,XX @@ void vlapic_reset(struct vlapic *vlapic) if ( v->vcpu_id == 0 ) vlapic->hw.apic_base_msr |= APIC_BASE_BSP; - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24); + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id)); vlapic_do_init(vlapic); } @@ -XXX,XX +XXX,XX @@ static void lapic_load_fixup(struct vlapic *vlapic) const struct vcpu *v = vlapic_vcpu(vlapic); uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id); + /* + * Loading record without hw.x2apic_id in the save stream, calculate using + * the traditional "vcpu_id * 2" relation. There's an implicit assumption + * that vCPU0 always has x2APIC0, which is true for the old relation, and + * still holds under the new x2APIC generation algorithm. While that case + * goes through the conditional it's benign because it still maps to zero. + */ + if ( !vlapic->hw.x2apic_id ) + vlapic->hw.x2apic_id = v->vcpu_id * 2; + /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */ if ( !vlapic_x2apic_mode(vlapic) || (vlapic->loaded.ldr == good_ldr) ) @@ -XXX,XX +XXX,XX @@ static int cf_check lapic_check_hidden(const struct domain *d, APIC_BASE_EXTD ) return -EINVAL; + /* + * Fail migrations from newer versions of Xen where + * rsvd_zero is interpreted as something else. + */ + if ( s.rsvd_zero ) + return -EINVAL; + return 0; } @@ -XXX,XX +XXX,XX @@ int vlapic_init(struct vcpu *v) } vlapic->pt.source = PTSRC_lapic; + vlapic->hw.x2apic_id = 2 * v->vcpu_id; vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner); if ( !vlapic->regs_page ) diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/include/asm/hvm/vlapic.h +++ b/xen/arch/x86/include/asm/hvm/vlapic.h @@ -XXX,XX +XXX,XX @@ #define vlapic_xapic_mode(vlapic) \ (!vlapic_hw_disabled(vlapic) && \ !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD)) +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id) /* * Generic APIC bitmap vector update & search routines. diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -XXX,XX +XXX,XX @@ struct hvm_hw_lapic { uint32_t disabled; /* VLAPIC_xx_DISABLED */ uint32_t timer_divisor; uint64_t tdt_msr; + uint32_t x2apic_id; + uint32_t rsvd_zero; }; DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic); -- 2.45.2
If toolstack were to upload LAPIC contexts as part of domain creation it would encounter a problem were the architectural state does not reflect the APIC ID in the hidden state. This patch ensures updates to the hidden state trigger an update in the architectural registers so the APIC ID in both is consistent. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- We could also let toolstack synthesise architectural registers, but that would be adding logic on how architectural state operates to software that really shouldn't care. I could be persuaded to do it the other way, but I think it's going to be messier. v5: * No change --- xen/arch/x86/hvm/vlapic.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -XXX,XX +XXX,XX @@ static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) s->loaded.hw = 1; if ( s->loaded.regs ) + { + /* + * We already processed architectural regs in lapic_load_regs(), so + * this must be a migration. Fix up inconsistencies from any older Xen. + */ lapic_load_fixup(s); + } + else + { + /* + * We haven't seen architectural regs so this could be a migration or a + * plain domain create. In the domain create case it's fine to modify + * the architectural state to align it to the APIC ID that was just + * uploaded and in the migrate case it doesn't matter because the + * architectural state will be replaced by the LAPIC_REGS ctx later on. + */ + if ( vlapic_x2apic_mode(s) ) + set_x2apic_id(s); + else + vlapic_set_reg(s, APIC_ID, SET_xAPIC_ID(s->hw.x2apic_id)); + } hvm_update_vlapic_mode(v); -- 2.45.2
Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs from hvmloader. While at this also remove ap_callin, as writing the APIC ID may serve the same purpose. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * No change --- tools/firmware/hvmloader/config.h | 6 ++- tools/firmware/hvmloader/hvmloader.c | 4 +- tools/firmware/hvmloader/smp.c | 54 ++++++++++++++++++++----- tools/include/xen-tools/common-macros.h | 5 +++ 4 files changed, 56 insertions(+), 13 deletions(-) diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/config.h +++ b/tools/firmware/hvmloader/config.h @@ -XXX,XX +XXX,XX @@ #include <stdint.h> #include <stdbool.h> +#include <xen/hvm/hvm_info_table.h> + enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt }; extern enum virtual_vga virtual_vga; @@ -XXX,XX +XXX,XX @@ extern uint8_t ioapic_version; #define IOAPIC_ID 0x01 +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS]; + #define LAPIC_BASE_ADDRESS 0xfee00000 -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2) +#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)]) #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */ #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */ diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/hvmloader.c +++ b/tools/firmware/hvmloader/hvmloader.c @@ -XXX,XX +XXX,XX @@ int main(void) printf("CPU speed is %u MHz\n", get_cpu_mhz()); + smp_initialise(); + apic_setup(); pci_setup(); - smp_initialise(); - perform_tests(); if ( bios->bios_info_setup ) diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/smp.c +++ b/tools/firmware/hvmloader/smp.c @@ -XXX,XX +XXX,XX @@ #include <xen/vcpu.h> -static int ap_callin; +/** + * Lookup table of x2APIC IDs. + * + * Each entry is populated its respective CPU as they come online. This is required + * for generating the MADT with minimal assumptions about ID relationships. + */ +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS]; + +/** Tristate about x2apic being supported. -1=unknown */ +static int has_x2apic = -1; + +static uint32_t read_apic_id(void) +{ + uint32_t apic_id; + + if ( has_x2apic ) + cpuid(0xb, NULL, NULL, NULL, &apic_id); + else + { + cpuid(1, NULL, &apic_id, NULL, NULL); + apic_id >>= 24; + } + + /* Never called by cpu0, so should never return 0 */ + ASSERT(apic_id); + + return apic_id; +} static void cpu_setup(unsigned int cpu) { @@ -XXX,XX +XXX,XX @@ static void cpu_setup(unsigned int cpu) cacheattr_init(); printf("done.\n"); - if ( !cpu ) /* Used on the BSP too */ + /* The BSP exits early because its APIC ID is known to be zero */ + if ( !cpu ) return; wmb(); - ap_callin = 1; + ACCESS_ONCE(CPU_TO_X2APICID[cpu]) = read_apic_id(); - /* After this point, the BSP will shut us down. */ + /* + * After this point the BSP will shut us down. A write to + * CPU_TO_X2APICID[cpu] signals the BSP to bring down `cpu`. + */ for ( ;; ) asm volatile ( "hlt" ); @@ -XXX,XX +XXX,XX @@ static void boot_cpu(unsigned int cpu) static uint8_t ap_stack[PAGE_SIZE] __attribute__ ((aligned (16))); static struct vcpu_hvm_context ap; - /* Initialise shared variables. */ - ap_callin = 0; - wmb(); - /* Wake up the secondary processor */ ap = (struct vcpu_hvm_context) { .mode = VCPU_HVM_MODE_32B, @@ -XXX,XX +XXX,XX @@ static void boot_cpu(unsigned int cpu) BUG(); /* - * Wait for the secondary processor to complete initialisation. + * Wait for the secondary processor to complete initialisation, + * which is signaled by its x2APIC ID being written to the LUT. * Do not touch shared resources meanwhile. */ - while ( !ap_callin ) + while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) ) cpu_relax(); /* Take the secondary processor offline. */ @@ -XXX,XX +XXX,XX @@ static void boot_cpu(unsigned int cpu) void smp_initialise(void) { unsigned int i, nr_cpus = hvm_info->nr_vcpus; + uint32_t ecx; + + cpuid(1, NULL, NULL, &ecx, NULL); + has_x2apic = (ecx >> 21) & 1; + if ( has_x2apic ) + printf("x2APIC supported\n"); printf("Multiprocessor initialisation:\n"); cpu_setup(0); diff --git a/tools/include/xen-tools/common-macros.h b/tools/include/xen-tools/common-macros.h index XXXXXXX..XXXXXXX 100644 --- a/tools/include/xen-tools/common-macros.h +++ b/tools/include/xen-tools/common-macros.h @@ -XXX,XX +XXX,XX @@ #define get_unaligned(ptr) get_unaligned_t(typeof(*(ptr)), ptr) #define put_unaligned(val, ptr) put_unaligned_t(typeof(*(ptr)), val, ptr) +#define __ACCESS_ONCE(x) ({ \ + (void)(typeof(x))0; /* Scalar typecheck. */ \ + (volatile typeof(x) *)&(x); }) +#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) + #endif /* __XEN_TOOLS_COMMON_MACROS__ */ -- 2.45.2
Currently used by PVH to set MTRR, will be used by a later patch to set APIC state. Unconditionally send the hypercall, and gate overriding the MTRR so it remains functionally equivalent. While at it, add a missing "goto out" to what was the error condition in the loop. In principle this patch shouldn't affect functionality. An extra record (the MTRR) is sent to the hypervisor per vCPU on HVM, but these records are identical to those retrieved in the first place so there's no expected functional change. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * Ensure MTRRs are only overriden in PVH, and left as-is on HVM --- tools/libs/guest/xg_dom_x86.c | 84 ++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 40 deletions(-) diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/xg_dom_x86.c +++ b/tools/libs/guest/xg_dom_x86.c @@ -XXX,XX +XXX,XX @@ const static void *hvm_get_save_record(const void *ctx, unsigned int type, static int vcpu_hvm(struct xc_dom_image *dom) { + /* Initialises the BSP */ struct { struct hvm_save_descriptor header_d; HVM_SAVE_TYPE(HEADER) header; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) struct hvm_save_descriptor end_d; HVM_SAVE_TYPE(END) end; } bsp_ctx; + /* Initialises APICs and MTRRs of every vCPU */ + struct { + struct hvm_save_descriptor header_d; + HVM_SAVE_TYPE(HEADER) header; + struct hvm_save_descriptor mtrr_d; + HVM_SAVE_TYPE(MTRR) mtrr; + struct hvm_save_descriptor end_d; + HVM_SAVE_TYPE(END) end; + } vcpu_ctx; + /* Context from full_ctx */ + const HVM_SAVE_TYPE(MTRR) *mtrr_record; + /* Raw context as taken from Xen */ uint8_t *full_ctx = NULL; int rc; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) bsp_ctx.end_d.instance = 0; bsp_ctx.end_d.length = HVM_SAVE_LENGTH(END); - /* TODO: maybe this should be a firmware option instead? */ - if ( !dom->device_model ) + /* TODO: maybe setting MTRRs should be a firmware option instead? */ + mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0); + + if ( !mtrr_record) { - struct { - struct hvm_save_descriptor header_d; - HVM_SAVE_TYPE(HEADER) header; - struct hvm_save_descriptor mtrr_d; - HVM_SAVE_TYPE(MTRR) mtrr; - struct hvm_save_descriptor end_d; - HVM_SAVE_TYPE(END) end; - } mtrr = { - .header_d = bsp_ctx.header_d, - .header = bsp_ctx.header, - .mtrr_d.typecode = HVM_SAVE_CODE(MTRR), - .mtrr_d.length = HVM_SAVE_LENGTH(MTRR), - .end_d = bsp_ctx.end_d, - .end = bsp_ctx.end, - }; - const HVM_SAVE_TYPE(MTRR) *mtrr_record = - hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0); - unsigned int i; - - if ( !mtrr_record ) - { - xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, - "%s: unable to get MTRR save record", __func__); - goto out; - } + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: unable to get MTRR save record", __func__); + goto out; + } - memcpy(&mtrr.mtrr, mtrr_record, sizeof(mtrr.mtrr)); + vcpu_ctx.header_d = bsp_ctx.header_d; + vcpu_ctx.header = bsp_ctx.header; + vcpu_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR); + vcpu_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR); + vcpu_ctx.mtrr = *mtrr_record; + vcpu_ctx.end_d = bsp_ctx.end_d; + vcpu_ctx.end = bsp_ctx.end; - /* - * Enable MTRR, set default type to WB. - * TODO: add MMIO areas as UC when passthrough is supported. - */ - mtrr.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK | MTRR_DEF_TYPE_ENABLE; + /* + * Enable MTRR, set default type to WB. + * TODO: add MMIO areas as UC when passthrough is supported in PVH + */ + if ( !dom->device_model) + vcpu_ctx.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK | MTRR_DEF_TYPE_ENABLE; - for ( i = 0; i < dom->max_vcpus; i++ ) + for ( unsigned int i = 0; i < dom->max_vcpus; i++ ) + { + vcpu_ctx.mtrr_d.instance = i; + + rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid, + (uint8_t *)&vcpu_ctx, sizeof(vcpu_ctx)); + if ( rc != 0 ) { - mtrr.mtrr_d.instance = i; - rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid, - (uint8_t *)&mtrr, sizeof(mtrr)); - if ( rc != 0 ) - xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, - "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc); + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc); + goto out; } } -- 2.45.2
Add a helper to populate topology leaves in the cpu policy from threads/core and cores/package counts. It's unit-tested in test-cpu-policy.c, but it's not connected to the rest of the code yet. Adds the ASSERT() macro to xen/lib/x86/private.h, as it was missing. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * No change --- tools/tests/cpu-policy/test-cpu-policy.c | 133 +++++++++++++++++++++++ xen/include/xen/lib/x86/cpu-policy.h | 16 +++ xen/lib/x86/policy.c | 88 +++++++++++++++ xen/lib/x86/private.h | 4 + 4 files changed, 241 insertions(+) diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/tools/tests/cpu-policy/test-cpu-policy.c +++ b/tools/tests/cpu-policy/test-cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void test_is_compatible_failure(void) } } +static void test_topo_from_parts(void) +{ + static const struct test { + unsigned int threads_per_core; + unsigned int cores_per_pkg; + struct cpu_policy policy; + } tests[] = { + { + .threads_per_core = 3, .cores_per_pkg = 1, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, }, + { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 1, .cores_per_pkg = 3, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, }, + { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 7, .cores_per_pkg = 5, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, }, + { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, }, + }, + }, + }, + { + .threads_per_core = 2, .cores_per_pkg = 128, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, }, + { .nr_logical = 128, .level = 1, .type = 2, + .id_shift = 8, }, + }, + }, + }, + { + .threads_per_core = 3, .cores_per_pkg = 1, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, }, + { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 1, .cores_per_pkg = 3, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, }, + { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 7, .cores_per_pkg = 5, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, }, + { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, }, + }, + }, + }, + { + .threads_per_core = 2, .cores_per_pkg = 128, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, }, + { .nr_logical = 256, .level = 1, .type = 2, + .id_shift = 8, }, + }, + }, + }, + }; + + printf("Testing topology synthesis from parts:\n"); + + for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i ) + { + const struct test *t = &tests[i]; + struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor }; + int rc = x86_topo_from_parts(&actual, t->threads_per_core, + t->cores_per_pkg); + + if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) ) + { +#define TOPO(n, f) t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f + fail("FAIL[%d] - '%s %u t/c, %u c/p'\n", + rc, + x86_cpuid_vendor_to_str(t->policy.x86_vendor), + t->threads_per_core, t->cores_per_pkg); + printf(" subleaf=%u expected_n=%u actual_n=%u\n" + " expected_lvl=%u actual_lvl=%u\n" + " expected_type=%u actual_type=%u\n" + " expected_shift=%u actual_shift=%u\n", + 0, + TOPO(0, nr_logical), + TOPO(0, level), + TOPO(0, type), + TOPO(0, id_shift)); + + printf(" subleaf=%u expected_n=%u actual_n=%u\n" + " expected_lvl=%u actual_lvl=%u\n" + " expected_type=%u actual_type=%u\n" + " expected_shift=%u actual_shift=%u\n", + 1, + TOPO(1, nr_logical), + TOPO(1, level), + TOPO(1, type), + TOPO(1, id_shift)); +#undef TOPO + } + } +} + int main(int argc, char **argv) { printf("CPU Policy unit tests\n"); @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) test_is_compatible_success(); test_is_compatible_failure(); + test_topo_from_parts(); + if ( nr_failures ) printf("Done: %u failures\n", nr_failures); else diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/lib/x86/cpu-policy.h +++ b/xen/include/xen/lib/x86/cpu-policy.h @@ -XXX,XX +XXX,XX @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err); +/** + * Synthesise topology information in `p` given high-level constraints + * + * Topology is given in various fields accross several leaves, some of + * which are vendor-specific. This function uses the policy itself to + * derive such leaves from threads/core and cores/package. + * + * @param p CPU policy of the domain. + * @param threads_per_core threads/core. Doesn't need to be a power of 2. + * @param cores_per_package cores/package. Doesn't need to be a power of 2. + * @return 0 on success; -errno on failure + */ +int x86_topo_from_parts(struct cpu_policy *p, + unsigned int threads_per_core, + unsigned int cores_per_pkg); + #endif /* !XEN_LIB_X86_POLICIES_H */ /* diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/policy.c +++ b/xen/lib/x86/policy.c @@ -XXX,XX +XXX,XX @@ #include <xen/lib/x86/cpu-policy.h> +static unsigned int order(unsigned int n) +{ + ASSERT(n); /* clz(0) is UB */ + + return 8 * sizeof(n) - __builtin_clz(n); +} + +int x86_topo_from_parts(struct cpu_policy *p, + unsigned int threads_per_core, + unsigned int cores_per_pkg) +{ + unsigned int threads_per_pkg = threads_per_core * cores_per_pkg; + unsigned int apic_id_size; + + if ( !p || !threads_per_core || !cores_per_pkg ) + return -EINVAL; + + p->basic.max_leaf = MAX(0xb, p->basic.max_leaf); + + memset(p->topo.raw, 0, sizeof(p->topo.raw)); + + /* thread level */ + p->topo.subleaf[0].nr_logical = threads_per_core; + p->topo.subleaf[0].id_shift = 0; + p->topo.subleaf[0].level = 0; + p->topo.subleaf[0].type = 1; + if ( threads_per_core > 1 ) + p->topo.subleaf[0].id_shift = order(threads_per_core - 1); + + /* core level */ + p->topo.subleaf[1].nr_logical = cores_per_pkg; + if ( p->x86_vendor == X86_VENDOR_INTEL ) + p->topo.subleaf[1].nr_logical = threads_per_pkg; + p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift; + p->topo.subleaf[1].level = 1; + p->topo.subleaf[1].type = 2; + if ( cores_per_pkg > 1 ) + p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1); + + apic_id_size = p->topo.subleaf[1].id_shift; + + /* + * Contrary to what the name might seem to imply. HTT is an enabler for + * SMP and there's no harm in setting it even with a single vCPU. + */ + p->basic.htt = true; + p->basic.lppp = MIN(0xff, threads_per_pkg); + + switch ( p->x86_vendor ) + { + case X86_VENDOR_INTEL: { + struct cpuid_cache_leaf *sl = p->cache.subleaf; + + for ( size_t i = 0; sl->type && + i < ARRAY_SIZE(p->cache.raw); i++, sl++ ) + { + sl->cores_per_package = cores_per_pkg - 1; + sl->threads_per_cache = threads_per_core - 1; + if ( sl->type == 3 /* unified cache */ ) + sl->threads_per_cache = threads_per_pkg - 1; + } + break; + } + + case X86_VENDOR_AMD: + case X86_VENDOR_HYGON: + /* Expose p->basic.lppp */ + p->extd.cmp_legacy = true; + + /* Clip NC to the maximum value it can hold */ + p->extd.nc = MIN(0xff, threads_per_pkg - 1); + + /* TODO: Expose leaf e1E */ + p->extd.topoext = false; + + /* + * Clip APIC ID to 8 bits, as that's what high core-count machines do. + * + * That's what AMD EPYC 9654 does with >256 CPUs. + */ + p->extd.apic_id_size = MIN(8, apic_id_size); + + break; + } + + return 0; +} + int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err) diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/private.h +++ b/xen/lib/x86/private.h @@ -XXX,XX +XXX,XX @@ #ifdef __XEN__ #include <xen/bitops.h> +#include <xen/bug.h> #include <xen/guest_access.h> #include <xen/kernel.h> #include <xen/lib.h> @@ -XXX,XX +XXX,XX @@ #else +#include <assert.h> #include <errno.h> #include <inttypes.h> #include <stdbool.h> @@ -XXX,XX +XXX,XX @@ #include <xen-tools/common-macros.h> +#define ASSERT(x) assert(x) + static inline bool test_bit(unsigned int bit, const void *vaddr) { const char *addr = vaddr; -- 2.45.2
Implements the helper for mapping vcpu_id to x2apic_id given a valid topology in a policy. The algo is written with the intention of extending it to leaves 0x1f and extended 0x26 in the future. Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared, so the leaf is not implemented. In that case, the new helper just returns the legacy mapping. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * No change --- tools/tests/cpu-policy/test-cpu-policy.c | 68 +++++++++++++++++++++ xen/include/xen/lib/x86/cpu-policy.h | 11 ++++ xen/lib/x86/policy.c | 76 ++++++++++++++++++++++++ 3 files changed, 155 insertions(+) diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/tools/tests/cpu-policy/test-cpu-policy.c +++ b/tools/tests/cpu-policy/test-cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void test_topo_from_parts(void) } } +static void test_x2apic_id_from_vcpu_id_success(void) +{ + static const struct test { + unsigned int vcpu_id; + unsigned int threads_per_core; + unsigned int cores_per_pkg; + uint32_t x2apic_id; + uint8_t x86_vendor; + } tests[] = { + { + .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = 1 << 2, + }, + { + .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = 2 << 2, + }, + { + .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = 1 << 5, + }, + { + .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5), + }, + { + .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3, + .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5), + }, + }; + + const uint8_t vendors[] = { + X86_VENDOR_INTEL, + X86_VENDOR_AMD, + X86_VENDOR_CENTAUR, + X86_VENDOR_SHANGHAI, + X86_VENDOR_HYGON, + }; + + printf("Testing x2apic id from vcpu id success:\n"); + + /* Perform the test run on every vendor we know about */ + for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i ) + { + for ( size_t j = 0; j < ARRAY_SIZE(tests); ++j ) + { + struct cpu_policy policy = { .x86_vendor = vendors[i] }; + const struct test *t = &tests[j]; + uint32_t x2apic_id; + int rc = x86_topo_from_parts(&policy, t->threads_per_core, + t->cores_per_pkg); + + if ( rc ) { + fail("FAIL[%d] - 'x86_topo_from_parts() failed", rc); + continue; + } + + x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id); + if ( x2apic_id != t->x2apic_id ) + fail("FAIL - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n", + x86_cpuid_vendor_to_str(policy.x86_vendor), + t->vcpu_id, t->threads_per_core, t->cores_per_pkg, + t->x2apic_id, x2apic_id); + } + } +} + int main(int argc, char **argv) { printf("CPU Policy unit tests\n"); @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) test_is_compatible_failure(); test_topo_from_parts(); + test_x2apic_id_from_vcpu_id_success(); if ( nr_failures ) printf("Done: %u failures\n", nr_failures); diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/lib/x86/cpu-policy.h +++ b/xen/include/xen/lib/x86/cpu-policy.h @@ -XXX,XX +XXX,XX @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err); +/** + * Calculates the x2APIC ID of a vCPU given a CPU policy + * + * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2 + * + * @param p CPU policy of the domain. + * @param id vCPU ID of the vCPU. + * @returns x2APIC ID of the vCPU. + */ +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id); + /** * Synthesise topology information in `p` given high-level constraints * diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/policy.c +++ b/xen/lib/x86/policy.c @@ -XXX,XX +XXX,XX @@ #include <xen/lib/x86/cpu-policy.h> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, + size_t lvl) +{ + /* + * `nr_logical` reported by Intel is the number of THREADS contained in + * the next topological scope. For example, assuming a system with 2 + * threads/core and 3 cores/module in a fully symmetric topology, + * `nr_logical` at the core level will report 6. Because it's reporting + * the number of threads in a module. + * + * On AMD/Hygon, nr_logical is already normalized by the higher scoped + * level (cores/complex, etc) so we can return it as-is. + */ + if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl ) + return p->topo.subleaf[lvl].nr_logical; + + return p->topo.subleaf[lvl].nr_logical / + p->topo.subleaf[lvl - 1].nr_logical; +} + +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id) +{ + uint32_t shift = 0, x2apic_id = 0; + + /* In the absence of topology leaves, fallback to traditional mapping */ + if ( !p->topo.subleaf[0].type ) + return id * 2; + + /* + * `id` means different things at different points of the algo + * + * At lvl=0: global thread_id (same as vcpu_id) + * At lvl=1: global core_id + * At lvl=2: global socket_id (actually complex_id in AMD, module_id + * in Intel, but the name is inconsequential) + * + * +--+ + * ____ |#0| ______ <= 1 socket + * / +--+ \+--+ + * __#0__ __|#1|__ <= 2 cores/socket + * / | \ +--+/ +-|+ \ + * #0 #1 #2 |#3| #4 #5 <= 3 threads/core + * +--+ + * + * ... and so on. Global in this context means that it's a unique + * identifier for the whole topology, and not relative to the level + * it's in. For example, in the diagram shown above, we're looking at + * thread #3 in the global sense, though it's #0 within its core. + * + * Note that dividing a global thread_id by the number of threads per + * core returns the global core id that contains it. e.g: 0, 1 or 2 + * divided by 3 returns core_id=0. 3, 4 or 5 divided by 3 returns core + * 1, and so on. An analogous argument holds for higher levels. This is + * the property we exploit to derive x2apic_id from vcpu_id. + * + * NOTE: `topo` is currently derived from leaf 0xb, which is bound to two + * levels, but once we track leaves 0x1f (or extended 0x26) there will be a + * few more. The algorithm is written to cope with that case. + */ + for ( uint32_t i = 0; i < ARRAY_SIZE(p->topo.raw); i++ ) + { + uint32_t nr_parts; + + if ( !p->topo.subleaf[i].type ) + /* sentinel subleaf */ + break; + + nr_parts = parts_per_higher_scoped_level(p, i); + x2apic_id |= (id % nr_parts) << shift; + id /= nr_parts; + shift = p->topo.subleaf[i].id_shift; + } + + return (id << shift) | x2apic_id; +} + static unsigned int order(unsigned int n) { ASSERT(n); /* clz(0) is UB */ -- 2.45.2
Have toolstack populate the new x2APIC ID in the LAPIC save record with the proper IDs intended for each vCPU. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * No change --- tools/libs/guest/xg_dom_x86.c | 38 ++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/xg_dom_x86.c +++ b/tools/libs/guest/xg_dom_x86.c @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) HVM_SAVE_TYPE(HEADER) header; struct hvm_save_descriptor mtrr_d; HVM_SAVE_TYPE(MTRR) mtrr; + struct hvm_save_descriptor lapic_d; + HVM_SAVE_TYPE(LAPIC) lapic; struct hvm_save_descriptor end_d; HVM_SAVE_TYPE(END) end; } vcpu_ctx; - /* Context from full_ctx */ + /* Contexts from full_ctx */ const HVM_SAVE_TYPE(MTRR) *mtrr_record; + const HVM_SAVE_TYPE(LAPIC) *lapic_record; /* Raw context as taken from Xen */ uint8_t *full_ctx = NULL; + xc_cpu_policy_t *policy = xc_cpu_policy_init(); int rc; DOMPRINTF_CALLED(dom->xch); assert(dom->max_vcpus); + /* + * Fetch the CPU policy of this domain. We need it to determine the APIC IDs + * each of vCPU in a manner consistent with the exported topology. + * + * TODO: It's silly to query a policy we have ourselves created. It should + * instead be part of xc_dom_image + */ + + rc = xc_cpu_policy_get_domain(dom->xch, dom->guest_domid, policy); + if ( rc != 0 ) + { + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: unable to fetch cpu policy for dom%u (rc=%d)", + __func__, dom->guest_domid, rc); + goto out; + } + /* * Get the full HVM context in order to have the header, it is not * possible to get the header with getcontext_partial, and crafting one @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) vcpu_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR); vcpu_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR); vcpu_ctx.mtrr = *mtrr_record; + vcpu_ctx.lapic_d.typecode = HVM_SAVE_CODE(LAPIC); + vcpu_ctx.lapic_d.length = HVM_SAVE_LENGTH(LAPIC); vcpu_ctx.end_d = bsp_ctx.end_d; vcpu_ctx.end = bsp_ctx.end; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) { vcpu_ctx.mtrr_d.instance = i; + lapic_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(LAPIC), i); + if ( !lapic_record ) + { + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: unable to get LAPIC[%d] save record", __func__, i); + goto out; + } + + vcpu_ctx.lapic = *lapic_record; + vcpu_ctx.lapic.x2apic_id = x86_x2apic_id_from_vcpu_id(&policy->policy, i); + vcpu_ctx.lapic_d.instance = i; + rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid, (uint8_t *)&vcpu_ctx, sizeof(vcpu_ctx)); if ( rc != 0 ) @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) out: free(full_ctx); + xc_cpu_policy_destroy(policy); return rc; } -- 2.45.2
Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT systems, in line with the previous code intent. Leaf 0xb in the host policy is no longer zapped and the guest {max,def} policies have their topology leaves zapped instead. The intent is for toolstack to populate them. There's no current use for the topology information in the host policy, but it makes no harm. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v5: * No change --- tools/libs/guest/xg_cpuid_x86.c | 24 +++++++++++++++++++++++- xen/arch/x86/cpu-policy.c | 9 ++++++--- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/xg_cpuid_x86.c +++ b/tools/libs/guest/xg_cpuid_x86.c @@ -XXX,XX +XXX,XX @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, p->policy.basic.htt = test_bit(X86_FEATURE_HTT, host_featureset); p->policy.extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset); } - else + else if ( restore ) { + /* + * Reconstruct the topology exposed on Xen <= 4.13. It makes very little + * sense, but it's what those guests saw so it's set in stone now. + * + * Guests from Xen 4.14 onwards carry their own CPUID leaves in the + * migration stream so they don't need special treatment. + */ + /* * Topology for HVM guests is entirely controlled by Xen. For now, we * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. @@ -XXX,XX +XXX,XX @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, break; } } + else + { + /* TODO: Expose the ability to choose a custom topology for HVM/PVH */ + unsigned int threads_per_core = 1; + unsigned int cores_per_pkg = di.max_vcpu_id + 1; + + rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg); + if ( rc ) + { + ERROR("Failed to generate topology: rc=%d t/c=%u c/p=%u", + rc, threads_per_core, cores_per_pkg); + goto out; + } + } nr_leaves = ARRAY_SIZE(p->leaves); rc = x86_cpuid_copy_to_buffer(&p->policy, p->leaves, &nr_leaves); diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void recalculate_misc(struct cpu_policy *p) p->basic.raw[0x8] = EMPTY_LEAF; - /* TODO: Rework topology logic. */ - memset(p->topo.raw, 0, sizeof(p->topo.raw)); - p->basic.raw[0xc] = EMPTY_LEAF; p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES; @@ -XXX,XX +XXX,XX @@ static void __init calculate_pv_max_policy(void) recalculate_xstate(p); p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */ + + /* Wipe host topology. Populated by toolstack */ + memset(p->topo.raw, 0, sizeof(p->topo.raw)); } static void __init calculate_pv_def_policy(void) @@ -XXX,XX +XXX,XX @@ static void __init calculate_hvm_max_policy(void) /* It's always possible to emulate CPUID faulting for HVM guests */ p->platform_info.cpuid_faulting = true; + + /* Wipe host topology. Populated by toolstack */ + memset(p->topo.raw, 0, sizeof(p->topo.raw)); } static void __init calculate_hvm_def_policy(void) -- 2.45.2
(original cover letter at the bottom) The series wasn't taking proper precautions to not break PVH. In particular, because new APIC IDs break with the old convention, PVH is not something that can be left for later as it otherwise suffers a mismatch between APIC IDs in the vLAPICs and the MADT table. This version is a rebased v5 with the additional fixes of: 1. PVH should now work, because libacpi was refactored to stop taking a function pointer and start taking a LUT for the cpu->apic_id mapping. 2. Expose leaf 0xb to guests even on hosts that don't themselves do so. (e.g: AMD Lisbon). Otherwise all such hosts are unable to create guests with this patch series on, and there's no good reason not to do so. Hypervisor prerequisites: patch 1: lib/x86: Relax checks about policy compatibility * new patch to properly operate (after this series) on older AMD hardware. patch 2: x86/vlapic: Move lapic migration checks to the check hooks * Same as in v5 patch 3: xen/x86: Add initial x2APIC ID to the per-vLAPIC save area * Same as in v5. patch 4: xen/x86: Add supporting code for uploading LAPIC contexts during domain create * Same as in v5. hvmloader prerequisites patch 5: tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves * Same as in v5. Toolstack prerequisites: patch 6: tools/libacpi: Use LUT of APIC IDs rather than function pointer * new patch to enable proper topology reporting to PVH guests. patch 7: tools/libguest: Always set vCPU context in vcpu_hvm() * Same as in v5. No functional changes: patch 8: xen/lib: Add topology generator for x86 * Same as in v5. patch 9: xen/x86: Derive topologically correct x2APIC IDs from the policy * Same as in v5. Final toolstack/xen stitching: patch 10: tools/libguest: Set distinct x2APIC IDs for each vCPU * Unlikely in v5, this patch takes the APIC IDs from a LUT stored in the xc_dom_image structure. patch 11: xen/x86: Synthesise domain topologies * Same as v5. ==================================================================== v5: https://lore.kernel.org/xen-devel/20240808134251.29995-1-alejandro.vallejo@cloud.com/ v4 -> v5: Largely unchanged and resent for review after the 4.20 dev cycle started. * Addressed Andrew's nits in v4/patch1 * Addressed Jan's concern with MTRR overrides in v4/patch6 by keeping the same MTRR data in the vCPU contexts for HVM domain creations. v4: https://lore.kernel.org/xen-devel/cover.1719416329.git.alejandro.vallejo@cloud.com/ v3 -> v4: * Fixed cpuid() bug in hvmloader, causing UB in v3 * Fixed a bogus assert in hvmloader, also causing a crash in v3 * Used HVM contexts rather than sync'd algos between Xen and toolstack in order to initialise the per-vCPU LAPIC state. * Formatting asjustments. v3: https://lore.kernel.org/xen-devel/cover.1716976271.git.alejandro.vallejo@cloud.com/ v2 -> v3: (v2/patch2 and v2/patch4 are already committed) * Moved the vlapic check hook addition to v3/patch1 * And created a check hook for the architectural state too for consistency. * Fixed migrations from Xen <= 4.13 by reconstructing the previous topology. * Correctly set the APIC ID after a policy change when vlapic is already in x2APIC mode. * Removed bogus assumption introduced in v1 and v2 on hvmloader about which 8bit APIC IDs represent ids > 254. (it's "id % 0xff", not "min(id, 0xff)". * Used an x2apic flag check instead. * Various formatting adjustments. v2: https://lore.kernel.org/xen-devel/cover.1715102098.git.alejandro.vallejo@cloud.com/ v1 -> v2: * v1/patch 4 replaced by a different strategy (See patches 4 and 5 in v2): * Have hvmloader populate MADT with the real APIC IDs as read by the APs themselves rather than giving it knowledge on how to derive them. * Removed patches 2 and 3 in v1, as no longer relevant. * Split v1/patch6 in two parts ((a) creating the generator and (b) plugging it in) and use the generator in the unit tests of the vcpuid->apicid mapping function. Becomes patches 6 and 8 in v2. Patch 1: Same as v1/patch1. Patch 2: Header dependency cleanup in preparation for patch3. Patch 3: Adds vlapic_hidden check for the newly introduced reserved area. Patch 4: [hvmloader] Replaces INIT+SIPI+SIPI sequences with hypercalls. Patch 5: [hvmloader] Retrieve the per-CPU APIC IDs from the APs themselves. Patch 6: Split from v1/patch6. Patch 7: Logically matching v1/patch5, but using v2/patch6 for testing. Patch 8: Split from v1/patch6. v1: https://lore.kernel.org/xen-devel/20240109153834.4192-1-alejandro.vallejo@cloud.com/ === Original cover letter === Current topology handling is close to non-existent. As things stand, APIC IDs are allocated through the apic_id=vcpu_id*2 relation without giving any hints to the OS on how to parse the x2APIC ID of a given CPU and assuming the guest will assume 2 threads per core. This series involves bringing x2APIC IDs into the migration stream, so older guests keep operating as they used to and enhancing Xen+toolstack so new guests get topology information consistent with their x2APIC IDs. As a side effect of this, x2APIC IDs are now packed and don't have (unless under a pathological case) gaps. Further work ought to allow combining this topology configurations with gang-scheduling of guest hyperthreads into affine physical hyperthreads. For the time being it purposefully keeps the configuration of "1 socket" + "1 thread per core" + "1 core per vCPU". Patch 1: Includes x2APIC IDs in the migration stream. This allows Xen to reconstruct the right x2APIC IDs on migrated-in guests, and future-proofs itself in the face of x2APIC ID derivation changes. Patch 2: Minor refactor to expose xc_cpu_policy in libxl Patch 3: Refactors xen/lib/x86 to work on non-Xen freestanding environments (e.g: hvmloader) Patch 4: Remove old assumptions about vcpu_id<->apic_id relationship in hvmloader Patch 5: Add logic to derive x2APIC IDs given a CPU policy and vCPU IDs Patch 6: Includes a simple topology generator for toolstack so new guests have topologically consistent information in CPUID =================================================================== v6: Alejandro Vallejo (11): lib/x86: Relax checks about policy compatibility x86/vlapic: Move lapic migration checks to the check hooks xen/x86: Add initial x2APIC ID to the per-vLAPIC save area xen/x86: Add supporting code for uploading LAPIC contexts during domain create tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves tools/libacpi: Use LUT of APIC IDs rather than function pointer tools/libguest: Always set vCPU context in vcpu_hvm() xen/lib: Add topology generator for x86 xen/x86: Derive topologically correct x2APIC IDs from the policy tools/libguest: Set distinct x2APIC IDs for each vCPU xen/x86: Synthesise domain topologies tools/firmware/hvmloader/config.h | 5 +- tools/firmware/hvmloader/hvmloader.c | 6 +- tools/firmware/hvmloader/mp_tables.c | 4 +- tools/firmware/hvmloader/smp.c | 54 ++++-- tools/firmware/hvmloader/util.c | 7 +- tools/include/xen-tools/common-macros.h | 5 + tools/include/xenguest.h | 8 + tools/libacpi/build.c | 6 +- tools/libacpi/libacpi.h | 2 +- tools/libs/guest/xg_cpuid_x86.c | 29 +++- tools/libs/guest/xg_dom_x86.c | 93 ++++++---- tools/libs/light/libxl_dom.c | 25 +++ tools/libs/light/libxl_x86_acpi.c | 7 +- tools/tests/cpu-policy/test-cpu-policy.c | 207 ++++++++++++++++++++++- xen/arch/x86/cpu-policy.c | 9 +- xen/arch/x86/cpuid.c | 14 +- xen/arch/x86/hvm/vlapic.c | 126 ++++++++++---- xen/arch/x86/include/asm/hvm/vlapic.h | 1 + xen/include/public/arch-x86/hvm/save.h | 2 + xen/include/xen/lib/x86/cpu-policy.h | 27 +++ xen/lib/x86/policy.c | 175 ++++++++++++++++++- xen/lib/x86/private.h | 4 + 22 files changed, 704 insertions(+), 112 deletions(-) -- 2.46.0
Allow a guest policy have up to leaf 0xb even if the host doesn't. Otherwise it's not possible to show leaf 0xb to guests we're emulating an x2APIC for on old AMD machines. No externally visible changes though because toolstack doesn't yet populate that leaf. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- tools/tests/cpu-policy/test-cpu-policy.c | 6 +++++- xen/lib/x86/policy.c | 11 ++++++++++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/tools/tests/cpu-policy/test-cpu-policy.c +++ b/tools/tests/cpu-policy/test-cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void test_is_compatible_success(void) .platform_info.cpuid_faulting = true, }, }, + { + .name = "Host missing leaf 0xb, Guest wanted", + .guest.basic.max_leaf = 0xb, + }, }; struct cpu_policy_errors no_errors = INIT_CPU_POLICY_ERRORS; @@ -XXX,XX +XXX,XX @@ static void test_is_compatible_failure(void) } tests[] = { { .name = "Host basic.max_leaf out of range", - .guest.basic.max_leaf = 1, + .guest.basic.max_leaf = 0xc, .e = { 0, -1, -1 }, }, { diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/policy.c +++ b/xen/lib/x86/policy.c @@ -XXX,XX +XXX,XX @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host, #define FAIL_MSR(m) \ do { e.msr = (m); goto out; } while ( 0 ) - if ( guest->basic.max_leaf > host->basic.max_leaf ) + /* + * Old AMD hardware doesn't expose topology information in leaf 0xb. We + * want to emulate that leaf with credible information because it must be + * present on systems in which we emulate the x2APIC. + * + * For that reason, allow the max basic guest leaf to be larger than the + * hosts' up until 0xb. + */ + if ( guest->basic.max_leaf > 0xb && + guest->basic.max_leaf > host->basic.max_leaf ) FAIL_CPUID(0, NA); if ( guest->feat.max_subleaf > host->feat.max_subleaf ) -- 2.46.0
While doing this, factor out checks common to architectural and hidden state. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> -- Last reviewed in the topology series v3. Fell under the cracks. https://lore.kernel.org/xen-devel/ZlhP11Vvk6c1Ix36@macbook/ --- xen/arch/x86/hvm/vlapic.c | 84 ++++++++++++++++++++++++++------------- 1 file changed, 56 insertions(+), 28 deletions(-) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -XXX,XX +XXX,XX @@ static void lapic_load_fixup(struct vlapic *vlapic) v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr); } -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) -{ - unsigned int vcpuid = hvm_load_instance(h); - struct vcpu *v; - struct vlapic *s; +static int lapic_check_common(const struct domain *d, unsigned int vcpuid) +{ if ( !has_vlapic(d) ) return -ENODEV; /* Which vlapic to load? */ - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + if ( !domain_vcpu(d, vcpuid) ) { - dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n", + dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no vCPU %u\n", d->domain_id, vcpuid); return -EINVAL; } - s = vcpu_vlapic(v); + + return 0; +} + +static int cf_check lapic_check_hidden(const struct domain *d, + hvm_domain_context_t *h) +{ + unsigned int vcpuid = hvm_load_instance(h); + struct hvm_hw_lapic s; + int rc = lapic_check_common(d, vcpuid); + + if ( rc ) + return rc; + + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) != 0 ) + return -ENODATA; + + /* EN=0 with EXTD=1 is illegal */ + if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) == + APIC_BASE_EXTD ) + return -EINVAL; + + return 0; +} + +static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) +{ + unsigned int vcpuid = hvm_load_instance(h); + struct vcpu *v = d->vcpu[vcpuid]; + struct vlapic *s = vcpu_vlapic(v); if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 ) + { + ASSERT_UNREACHABLE(); return -EINVAL; + } s->loaded.hw = 1; if ( s->loaded.regs ) lapic_load_fixup(s); - if ( !(s->hw.apic_base_msr & APIC_BASE_ENABLE) && - unlikely(vlapic_x2apic_mode(s)) ) - return -EINVAL; - hvm_update_vlapic_mode(v); return 0; } -static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h) +static int cf_check lapic_check_regs(const struct domain *d, + hvm_domain_context_t *h) { unsigned int vcpuid = hvm_load_instance(h); - struct vcpu *v; - struct vlapic *s; + int rc; - if ( !has_vlapic(d) ) - return -ENODEV; + if ( (rc = lapic_check_common(d, vcpuid)) ) + return rc; - /* Which vlapic to load? */ - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) - { - dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n", - d->domain_id, vcpuid); - return -EINVAL; - } - s = vcpu_vlapic(v); + if ( !hvm_get_entry(LAPIC_REGS, h) ) + return -ENODATA; + + return 0; +} + +static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h) +{ + unsigned int vcpuid = hvm_load_instance(h); + struct vcpu *v = d->vcpu[vcpuid]; + struct vlapic *s = vcpu_vlapic(v); if ( hvm_load_entry(LAPIC_REGS, h, s->regs) != 0 ) - return -EINVAL; + ASSERT_UNREACHABLE(); s->loaded.id = vlapic_get_reg(s, APIC_ID); s->loaded.ldr = vlapic_get_reg(s, APIC_LDR); @@ -XXX,XX +XXX,XX @@ static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h) return 0; } -HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, NULL, +HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, lapic_check_hidden, lapic_load_hidden, 1, HVMSR_PER_VCPU); -HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, NULL, +HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, lapic_check_regs, lapic_load_regs, 1, HVMSR_PER_VCPU); int vlapic_init(struct vcpu *v) -- 2.46.0
This allows the initial x2APIC ID to be sent on the migration stream. This allows further changes to topology and APIC ID assignment without breaking existing hosts. Given the vlapic data is zero-extended on restore, fix up migrations from hosts without the field by setting it to the old convention if zero. The hardcoded mapping x2apic_id=2*vcpu_id is kept for the time being, but it's meant to be overriden by toolstack on a later patch with appropriate values. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- xen/arch/x86/cpuid.c | 14 +++++--------- xen/arch/x86/hvm/vlapic.c | 22 ++++++++++++++++++++-- xen/arch/x86/include/asm/hvm/vlapic.h | 1 + xen/include/public/arch-x86/hvm/save.h | 2 ++ 4 files changed, 28 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -XXX,XX +XXX,XX @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, const struct cpu_user_regs *regs; case 0x1: - /* TODO: Rework topology logic. */ res->b &= 0x00ffffffu; if ( is_hvm_domain(d) ) - res->b |= (v->vcpu_id * 2) << 24; + res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24; /* TODO: Rework vPMU control in terms of toolstack choices. */ if ( vpmu_available(v) && @@ -XXX,XX +XXX,XX @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, case 0xb: /* - * In principle, this leaf is Intel-only. In practice, it is tightly - * coupled with x2apic, and we offer an x2apic-capable APIC emulation - * to guests on AMD hardware as well. - * - * TODO: Rework topology logic. + * Don't expose topology information to PV guests. Exposed on HVM + * along with x2APIC because they are tightly coupled. */ - if ( p->basic.x2apic ) + if ( is_hvm_domain(d) && p->basic.x2apic ) { *(uint8_t *)&res->c = subleaf; /* Fix the x2APIC identifier. */ - res->d = v->vcpu_id * 2; + res->d = vlapic_x2apic_id(vcpu_vlapic(v)); } break; diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -XXX,XX +XXX,XX @@ static uint32_t x2apic_ldr_from_id(uint32_t id) static void set_x2apic_id(struct vlapic *vlapic) { const struct vcpu *v = vlapic_vcpu(vlapic); - uint32_t apic_id = v->vcpu_id * 2; + uint32_t apic_id = vlapic->hw.x2apic_id; uint32_t apic_ldr = x2apic_ldr_from_id(apic_id); /* @@ -XXX,XX +XXX,XX @@ void vlapic_reset(struct vlapic *vlapic) if ( v->vcpu_id == 0 ) vlapic->hw.apic_base_msr |= APIC_BASE_BSP; - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24); + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id)); vlapic_do_init(vlapic); } @@ -XXX,XX +XXX,XX @@ static void lapic_load_fixup(struct vlapic *vlapic) const struct vcpu *v = vlapic_vcpu(vlapic); uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id); + /* + * Loading record without hw.x2apic_id in the save stream, calculate using + * the traditional "vcpu_id * 2" relation. There's an implicit assumption + * that vCPU0 always has x2APIC0, which is true for the old relation, and + * still holds under the new x2APIC generation algorithm. While that case + * goes through the conditional it's benign because it still maps to zero. + */ + if ( !vlapic->hw.x2apic_id ) + vlapic->hw.x2apic_id = v->vcpu_id * 2; + /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */ if ( !vlapic_x2apic_mode(vlapic) || (vlapic->loaded.ldr == good_ldr) ) @@ -XXX,XX +XXX,XX @@ static int cf_check lapic_check_hidden(const struct domain *d, APIC_BASE_EXTD ) return -EINVAL; + /* + * Fail migrations from newer versions of Xen where + * rsvd_zero is interpreted as something else. + */ + if ( s.rsvd_zero ) + return -EINVAL; + return 0; } @@ -XXX,XX +XXX,XX @@ int vlapic_init(struct vcpu *v) } vlapic->pt.source = PTSRC_lapic; + vlapic->hw.x2apic_id = 2 * v->vcpu_id; vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner); if ( !vlapic->regs_page ) diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/include/asm/hvm/vlapic.h +++ b/xen/arch/x86/include/asm/hvm/vlapic.h @@ -XXX,XX +XXX,XX @@ #define vlapic_xapic_mode(vlapic) \ (!vlapic_hw_disabled(vlapic) && \ !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD)) +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id) /* * Generic APIC bitmap vector update & search routines. diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -XXX,XX +XXX,XX @@ struct hvm_hw_lapic { uint32_t disabled; /* VLAPIC_xx_DISABLED */ uint32_t timer_divisor; uint64_t tdt_msr; + uint32_t x2apic_id; + uint32_t rsvd_zero; }; DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic); -- 2.46.0
If toolstack were to upload LAPIC contexts as part of domain creation it would encounter a problem were the architectural state does not reflect the APIC ID in the hidden state. This patch ensures updates to the hidden state trigger an update in the architectural registers so the APIC ID in both is consistent. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- xen/arch/x86/hvm/vlapic.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -XXX,XX +XXX,XX @@ static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h) s->loaded.hw = 1; if ( s->loaded.regs ) + { + /* + * We already processed architectural regs in lapic_load_regs(), so + * this must be a migration. Fix up inconsistencies from any older Xen. + */ lapic_load_fixup(s); + } + else + { + /* + * We haven't seen architectural regs so this could be a migration or a + * plain domain create. In the domain create case it's fine to modify + * the architectural state to align it to the APIC ID that was just + * uploaded and in the migrate case it doesn't matter because the + * architectural state will be replaced by the LAPIC_REGS ctx later on. + */ + if ( vlapic_x2apic_mode(s) ) + set_x2apic_id(s); + else + vlapic_set_reg(s, APIC_ID, SET_xAPIC_ID(s->hw.x2apic_id)); + } hvm_update_vlapic_mode(v); -- 2.46.0
Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs from hvmloader. While at this also remove ap_callin, as writing the APIC ID may serve the same purpose. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- tools/firmware/hvmloader/config.h | 5 ++- tools/firmware/hvmloader/hvmloader.c | 6 +-- tools/firmware/hvmloader/mp_tables.c | 4 +- tools/firmware/hvmloader/smp.c | 54 ++++++++++++++++++++----- tools/firmware/hvmloader/util.c | 2 +- tools/include/xen-tools/common-macros.h | 5 +++ 6 files changed, 60 insertions(+), 16 deletions(-) diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/config.h +++ b/tools/firmware/hvmloader/config.h @@ -XXX,XX +XXX,XX @@ #include <stdint.h> #include <stdbool.h> +#include <xen/hvm/hvm_info_table.h> + enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt }; extern enum virtual_vga virtual_vga; @@ -XXX,XX +XXX,XX @@ extern uint8_t ioapic_version; #define IOAPIC_ID 0x01 +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS]; + #define LAPIC_BASE_ADDRESS 0xfee00000 -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2) #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */ #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */ diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/hvmloader.c +++ b/tools/firmware/hvmloader/hvmloader.c @@ -XXX,XX +XXX,XX @@ static void apic_setup(void) /* 8259A ExtInts are delivered through IOAPIC pin 0 (Virtual Wire Mode). */ ioapic_write(0x10, APIC_DM_EXTINT); - ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0))); + ioapic_write(0x11, SET_APIC_ID(CPU_TO_X2APICID[0])); } struct bios_info { @@ -XXX,XX +XXX,XX @@ int main(void) printf("CPU speed is %u MHz\n", get_cpu_mhz()); + smp_initialise(); + apic_setup(); pci_setup(); - smp_initialise(); - perform_tests(); if ( bios->bios_info_setup ) diff --git a/tools/firmware/hvmloader/mp_tables.c b/tools/firmware/hvmloader/mp_tables.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/mp_tables.c +++ b/tools/firmware/hvmloader/mp_tables.c @@ -XXX,XX +XXX,XX @@ static void fill_mp_config_table(struct mp_config_table *mpct, int length) /* fills in an MP processor entry for VCPU 'vcpu_id' */ static void fill_mp_proc_entry(struct mp_proc_entry *mppe, int vcpu_id) { + ASSERT(CPU_TO_X2APICID[vcpu_id] < 0xFF ); + mppe->type = ENTRY_TYPE_PROCESSOR; - mppe->lapic_id = LAPIC_ID(vcpu_id); + mppe->lapic_id = CPU_TO_X2APICID[vcpu_id]; mppe->lapic_version = 0x11; mppe->cpu_flags = CPU_FLAG_ENABLED; if ( vcpu_id == 0 ) diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/smp.c +++ b/tools/firmware/hvmloader/smp.c @@ -XXX,XX +XXX,XX @@ #include <xen/vcpu.h> -static int ap_callin; +/** + * Lookup table of x2APIC IDs. + * + * Each entry is populated its respective CPU as they come online. This is required + * for generating the MADT with minimal assumptions about ID relationships. + */ +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS]; + +/** Tristate about x2apic being supported. -1=unknown */ +static int has_x2apic = -1; + +static uint32_t read_apic_id(void) +{ + uint32_t apic_id; + + if ( has_x2apic ) + cpuid(0xb, NULL, NULL, NULL, &apic_id); + else + { + cpuid(1, NULL, &apic_id, NULL, NULL); + apic_id >>= 24; + } + + /* Never called by cpu0, so should never return 0 */ + ASSERT(apic_id); + + return apic_id; +} static void cpu_setup(unsigned int cpu) { @@ -XXX,XX +XXX,XX @@ static void cpu_setup(unsigned int cpu) cacheattr_init(); printf("done.\n"); - if ( !cpu ) /* Used on the BSP too */ + /* The BSP exits early because its APIC ID is known to be zero */ + if ( !cpu ) return; wmb(); - ap_callin = 1; + ACCESS_ONCE(CPU_TO_X2APICID[cpu]) = read_apic_id(); - /* After this point, the BSP will shut us down. */ + /* + * After this point the BSP will shut us down. A write to + * CPU_TO_X2APICID[cpu] signals the BSP to bring down `cpu`. + */ for ( ;; ) asm volatile ( "hlt" ); @@ -XXX,XX +XXX,XX @@ static void boot_cpu(unsigned int cpu) static uint8_t ap_stack[PAGE_SIZE] __attribute__ ((aligned (16))); static struct vcpu_hvm_context ap; - /* Initialise shared variables. */ - ap_callin = 0; - wmb(); - /* Wake up the secondary processor */ ap = (struct vcpu_hvm_context) { .mode = VCPU_HVM_MODE_32B, @@ -XXX,XX +XXX,XX @@ static void boot_cpu(unsigned int cpu) BUG(); /* - * Wait for the secondary processor to complete initialisation. + * Wait for the secondary processor to complete initialisation, + * which is signaled by its x2APIC ID being written to the LUT. * Do not touch shared resources meanwhile. */ - while ( !ap_callin ) + while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) ) cpu_relax(); /* Take the secondary processor offline. */ @@ -XXX,XX +XXX,XX @@ static void boot_cpu(unsigned int cpu) void smp_initialise(void) { unsigned int i, nr_cpus = hvm_info->nr_vcpus; + uint32_t ecx; + + cpuid(1, NULL, NULL, &ecx, NULL); + has_x2apic = (ecx >> 21) & 1; + if ( has_x2apic ) + printf("x2APIC supported\n"); printf("Multiprocessor initialisation:\n"); cpu_setup(0); diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -XXX,XX +XXX,XX @@ static void acpi_mem_free(struct acpi_ctxt *ctxt, static uint32_t acpi_lapic_id(unsigned cpu) { - return LAPIC_ID(cpu); + return CPU_TO_X2APIC_ID[cpu]; } void hvmloader_acpi_build_tables(struct acpi_config *config, diff --git a/tools/include/xen-tools/common-macros.h b/tools/include/xen-tools/common-macros.h index XXXXXXX..XXXXXXX 100644 --- a/tools/include/xen-tools/common-macros.h +++ b/tools/include/xen-tools/common-macros.h @@ -XXX,XX +XXX,XX @@ #define get_unaligned(ptr) get_unaligned_t(typeof(*(ptr)), ptr) #define put_unaligned(val, ptr) put_unaligned_t(typeof(*(ptr)), val, ptr) +#define __ACCESS_ONCE(x) ({ \ + (void)(typeof(x))0; /* Scalar typecheck. */ \ + (volatile typeof(x) *)&(x); }) +#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) + #endif /* __XEN_TOOLS_COMMON_MACROS__ */ -- 2.46.0
Refactors libacpi so that a single LUT is the authoritative source of truth for the CPU to APIC ID mappings. This has a know-on effect in reducing complexity on future patches, as the same LUT can be used for configuring the APICs and configuring the ACPI tables for PVH. Not functional change intended, because the same mappings are preserved. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- tools/firmware/hvmloader/util.c | 7 +------ tools/include/xenguest.h | 5 +++++ tools/libacpi/build.c | 6 +++--- tools/libacpi/libacpi.h | 2 +- tools/libs/light/libxl_dom.c | 5 +++++ tools/libs/light/libxl_x86_acpi.c | 7 +------ 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index XXXXXXX..XXXXXXX 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -XXX,XX +XXX,XX @@ static void acpi_mem_free(struct acpi_ctxt *ctxt, /* ACPI builder currently doesn't free memory so this is just a stub */ } -static uint32_t acpi_lapic_id(unsigned cpu) -{ - return CPU_TO_X2APIC_ID[cpu]; -} - void hvmloader_acpi_build_tables(struct acpi_config *config, unsigned int physical) { @@ -XXX,XX +XXX,XX @@ void hvmloader_acpi_build_tables(struct acpi_config *config, } config->lapic_base_address = LAPIC_BASE_ADDRESS; - config->lapic_id = acpi_lapic_id; + config->cpu_to_apicid = CPU_TO_X2APICID; config->ioapic_base_address = IOAPIC_BASE_ADDRESS; config->ioapic_id = IOAPIC_ID; config->pci_isa_irq_mask = PCI_ISA_IRQ_MASK; diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h index XXXXXXX..XXXXXXX 100644 --- a/tools/include/xenguest.h +++ b/tools/include/xenguest.h @@ -XXX,XX +XXX,XX @@ #ifndef XENGUEST_H #define XENGUEST_H +#include "xen/hvm/hvm_info_table.h" + #define XC_NUMA_NO_NODE (~0U) #define XCFLAGS_LIVE (1 << 0) @@ -XXX,XX +XXX,XX @@ struct xc_dom_image { #if defined(__i386__) || defined(__x86_64__) struct e820entry *e820; unsigned int e820_entries; + + /* LUT mapping cpu id to (x2)APIC ID */ + uint32_t cpu_to_apicid[HVM_MAX_VCPUS]; #endif xen_pfn_t vuart_gfn; diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libacpi/build.c +++ b/tools/libacpi/build.c @@ -XXX,XX +XXX,XX @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt, const struct hvm_info_table *hvminfo = config->hvminfo; int i, sz; - if ( config->lapic_id == NULL ) + if ( config->cpu_to_apicid == NULL ) return NULL; sz = sizeof(struct acpi_20_madt); @@ -XXX,XX +XXX,XX @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt, lapic->length = sizeof(*lapic); /* Processor ID must match processor-object IDs in the DSDT. */ lapic->acpi_processor_id = i; - lapic->apic_id = config->lapic_id(i); + lapic->apic_id = config->cpu_to_apicid[i]; lapic->flags = (test_bit(i, hvminfo->vcpu_online) ? ACPI_LOCAL_APIC_ENABLED : 0); lapic++; @@ -XXX,XX +XXX,XX @@ static struct acpi_20_srat *construct_srat(struct acpi_ctxt *ctxt, processor->type = ACPI_PROCESSOR_AFFINITY; processor->length = sizeof(*processor); processor->domain = config->numa.vcpu_to_vnode[i]; - processor->apic_id = config->lapic_id(i); + processor->apic_id = config->cpu_to_apicid[i]; processor->flags = ACPI_LOCAL_APIC_AFFIN_ENABLED; processor++; } diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h index XXXXXXX..XXXXXXX 100644 --- a/tools/libacpi/libacpi.h +++ b/tools/libacpi/libacpi.h @@ -XXX,XX +XXX,XX @@ struct acpi_config { unsigned long rsdp; /* x86-specific parameters */ - uint32_t (*lapic_id)(unsigned cpu); + uint32_t *cpu_to_apicid; /* LUT mapping cpu id to (x2)APIC ID */ uint32_t lapic_base_address; uint32_t ioapic_base_address; uint16_t pci_isa_irq_mask; diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_dom.c +++ b/tools/libs/light/libxl_dom.c @@ -XXX,XX +XXX,XX @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, dom->container_type = XC_DOM_HVM_CONTAINER; +#if defined(__i386__) || defined(__x86_64__) + for ( uint32_t i = 0; i < info->max_vcpus; i++ ) + dom->cpu_to_apicid[i] = 2 * i; /* TODO: Replace by topo calculation */ +#endif + /* The params from the configuration file are in Mb, which are then * multiplied by 1 Kb. This was then divided off when calling * the old xc_hvm_build_target_mem() which then turned them to bytes. diff --git a/tools/libs/light/libxl_x86_acpi.c b/tools/libs/light/libxl_x86_acpi.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_x86_acpi.c +++ b/tools/libs/light/libxl_x86_acpi.c @@ -XXX,XX +XXX,XX @@ static void acpi_mem_free(struct acpi_ctxt *ctxt, { } -static uint32_t acpi_lapic_id(unsigned cpu) -{ - return cpu * 2; -} - static int init_acpi_config(libxl__gc *gc, struct xc_dom_image *dom, const libxl_domain_build_info *b_info, @@ -XXX,XX +XXX,XX @@ static int init_acpi_config(libxl__gc *gc, config->hvminfo = hvminfo; config->lapic_base_address = LAPIC_BASE_ADDRESS; - config->lapic_id = acpi_lapic_id; + config->cpu_to_apicid = dom->cpu_to_apicid; config->acpi_revision = 5; rc = 0; -- 2.46.0
Currently used by PVH to set MTRR, will be used by a later patch to set APIC state. Unconditionally send the hypercall, and gate overriding the MTRR so it remains functionally equivalent. While at it, add a missing "goto out" to what was the error condition in the loop. In principle this patch shouldn't affect functionality. An extra record (the MTRR) is sent to the hypervisor per vCPU on HVM, but these records are identical to those retrieved in the first place so there's no expected functional change. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- tools/libs/guest/xg_dom_x86.c | 84 ++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 40 deletions(-) diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/xg_dom_x86.c +++ b/tools/libs/guest/xg_dom_x86.c @@ -XXX,XX +XXX,XX @@ const static void *hvm_get_save_record(const void *ctx, unsigned int type, static int vcpu_hvm(struct xc_dom_image *dom) { + /* Initialises the BSP */ struct { struct hvm_save_descriptor header_d; HVM_SAVE_TYPE(HEADER) header; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) struct hvm_save_descriptor end_d; HVM_SAVE_TYPE(END) end; } bsp_ctx; + /* Initialises APICs and MTRRs of every vCPU */ + struct { + struct hvm_save_descriptor header_d; + HVM_SAVE_TYPE(HEADER) header; + struct hvm_save_descriptor mtrr_d; + HVM_SAVE_TYPE(MTRR) mtrr; + struct hvm_save_descriptor end_d; + HVM_SAVE_TYPE(END) end; + } vcpu_ctx; + /* Context from full_ctx */ + const HVM_SAVE_TYPE(MTRR) *mtrr_record; + /* Raw context as taken from Xen */ uint8_t *full_ctx = NULL; int rc; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) bsp_ctx.end_d.instance = 0; bsp_ctx.end_d.length = HVM_SAVE_LENGTH(END); - /* TODO: maybe this should be a firmware option instead? */ - if ( !dom->device_model ) + /* TODO: maybe setting MTRRs should be a firmware option instead? */ + mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0); + + if ( !mtrr_record) { - struct { - struct hvm_save_descriptor header_d; - HVM_SAVE_TYPE(HEADER) header; - struct hvm_save_descriptor mtrr_d; - HVM_SAVE_TYPE(MTRR) mtrr; - struct hvm_save_descriptor end_d; - HVM_SAVE_TYPE(END) end; - } mtrr = { - .header_d = bsp_ctx.header_d, - .header = bsp_ctx.header, - .mtrr_d.typecode = HVM_SAVE_CODE(MTRR), - .mtrr_d.length = HVM_SAVE_LENGTH(MTRR), - .end_d = bsp_ctx.end_d, - .end = bsp_ctx.end, - }; - const HVM_SAVE_TYPE(MTRR) *mtrr_record = - hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0); - unsigned int i; - - if ( !mtrr_record ) - { - xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, - "%s: unable to get MTRR save record", __func__); - goto out; - } + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: unable to get MTRR save record", __func__); + goto out; + } - memcpy(&mtrr.mtrr, mtrr_record, sizeof(mtrr.mtrr)); + vcpu_ctx.header_d = bsp_ctx.header_d; + vcpu_ctx.header = bsp_ctx.header; + vcpu_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR); + vcpu_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR); + vcpu_ctx.mtrr = *mtrr_record; + vcpu_ctx.end_d = bsp_ctx.end_d; + vcpu_ctx.end = bsp_ctx.end; - /* - * Enable MTRR, set default type to WB. - * TODO: add MMIO areas as UC when passthrough is supported. - */ - mtrr.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK | MTRR_DEF_TYPE_ENABLE; + /* + * Enable MTRR, set default type to WB. + * TODO: add MMIO areas as UC when passthrough is supported in PVH + */ + if ( !dom->device_model ) + vcpu_ctx.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK | MTRR_DEF_TYPE_ENABLE; + + for ( unsigned int i = 0; i < dom->max_vcpus; i++ ) + { + vcpu_ctx.mtrr_d.instance = i; - for ( i = 0; i < dom->max_vcpus; i++ ) + rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid, + (uint8_t *)&vcpu_ctx, sizeof(vcpu_ctx)); + if ( rc != 0 ) { - mtrr.mtrr_d.instance = i; - rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid, - (uint8_t *)&mtrr, sizeof(mtrr)); - if ( rc != 0 ) - xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, - "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc); + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc); + goto out; } } -- 2.46.0
Add a helper to populate topology leaves in the cpu policy from threads/core and cores/package counts. It's unit-tested in test-cpu-policy.c, but it's not connected to the rest of the code yet. Adds the ASSERT() macro to xen/lib/x86/private.h, as it was missing. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- tools/tests/cpu-policy/test-cpu-policy.c | 133 +++++++++++++++++++++++ xen/include/xen/lib/x86/cpu-policy.h | 16 +++ xen/lib/x86/policy.c | 88 +++++++++++++++ xen/lib/x86/private.h | 4 + 4 files changed, 241 insertions(+) diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/tools/tests/cpu-policy/test-cpu-policy.c +++ b/tools/tests/cpu-policy/test-cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void test_is_compatible_failure(void) } } +static void test_topo_from_parts(void) +{ + static const struct test { + unsigned int threads_per_core; + unsigned int cores_per_pkg; + struct cpu_policy policy; + } tests[] = { + { + .threads_per_core = 3, .cores_per_pkg = 1, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, }, + { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 1, .cores_per_pkg = 3, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, }, + { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 7, .cores_per_pkg = 5, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, }, + { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, }, + }, + }, + }, + { + .threads_per_core = 2, .cores_per_pkg = 128, + .policy = { + .x86_vendor = X86_VENDOR_AMD, + .topo.subleaf = { + { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, }, + { .nr_logical = 128, .level = 1, .type = 2, + .id_shift = 8, }, + }, + }, + }, + { + .threads_per_core = 3, .cores_per_pkg = 1, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, }, + { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 1, .cores_per_pkg = 3, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, }, + { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, }, + }, + }, + }, + { + .threads_per_core = 7, .cores_per_pkg = 5, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, }, + { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, }, + }, + }, + }, + { + .threads_per_core = 2, .cores_per_pkg = 128, + .policy = { + .x86_vendor = X86_VENDOR_INTEL, + .topo.subleaf = { + { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, }, + { .nr_logical = 256, .level = 1, .type = 2, + .id_shift = 8, }, + }, + }, + }, + }; + + printf("Testing topology synthesis from parts:\n"); + + for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i ) + { + const struct test *t = &tests[i]; + struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor }; + int rc = x86_topo_from_parts(&actual, t->threads_per_core, + t->cores_per_pkg); + + if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) ) + { +#define TOPO(n, f) t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f + fail("FAIL[%d] - '%s %u t/c, %u c/p'\n", + rc, + x86_cpuid_vendor_to_str(t->policy.x86_vendor), + t->threads_per_core, t->cores_per_pkg); + printf(" subleaf=%u expected_n=%u actual_n=%u\n" + " expected_lvl=%u actual_lvl=%u\n" + " expected_type=%u actual_type=%u\n" + " expected_shift=%u actual_shift=%u\n", + 0, + TOPO(0, nr_logical), + TOPO(0, level), + TOPO(0, type), + TOPO(0, id_shift)); + + printf(" subleaf=%u expected_n=%u actual_n=%u\n" + " expected_lvl=%u actual_lvl=%u\n" + " expected_type=%u actual_type=%u\n" + " expected_shift=%u actual_shift=%u\n", + 1, + TOPO(1, nr_logical), + TOPO(1, level), + TOPO(1, type), + TOPO(1, id_shift)); +#undef TOPO + } + } +} + int main(int argc, char **argv) { printf("CPU Policy unit tests\n"); @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) test_is_compatible_success(); test_is_compatible_failure(); + test_topo_from_parts(); + if ( nr_failures ) printf("Done: %u failures\n", nr_failures); else diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/lib/x86/cpu-policy.h +++ b/xen/include/xen/lib/x86/cpu-policy.h @@ -XXX,XX +XXX,XX @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err); +/** + * Synthesise topology information in `p` given high-level constraints + * + * Topology is given in various fields accross several leaves, some of + * which are vendor-specific. This function uses the policy itself to + * derive such leaves from threads/core and cores/package. + * + * @param p CPU policy of the domain. + * @param threads_per_core threads/core. Doesn't need to be a power of 2. + * @param cores_per_package cores/package. Doesn't need to be a power of 2. + * @return 0 on success; -errno on failure + */ +int x86_topo_from_parts(struct cpu_policy *p, + unsigned int threads_per_core, + unsigned int cores_per_pkg); + #endif /* !XEN_LIB_X86_POLICIES_H */ /* diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/policy.c +++ b/xen/lib/x86/policy.c @@ -XXX,XX +XXX,XX @@ #include <xen/lib/x86/cpu-policy.h> +static unsigned int order(unsigned int n) +{ + ASSERT(n); /* clz(0) is UB */ + + return 8 * sizeof(n) - __builtin_clz(n); +} + +int x86_topo_from_parts(struct cpu_policy *p, + unsigned int threads_per_core, + unsigned int cores_per_pkg) +{ + unsigned int threads_per_pkg = threads_per_core * cores_per_pkg; + unsigned int apic_id_size; + + if ( !p || !threads_per_core || !cores_per_pkg ) + return -EINVAL; + + p->basic.max_leaf = MAX(0xb, p->basic.max_leaf); + + memset(p->topo.raw, 0, sizeof(p->topo.raw)); + + /* thread level */ + p->topo.subleaf[0].nr_logical = threads_per_core; + p->topo.subleaf[0].id_shift = 0; + p->topo.subleaf[0].level = 0; + p->topo.subleaf[0].type = 1; + if ( threads_per_core > 1 ) + p->topo.subleaf[0].id_shift = order(threads_per_core - 1); + + /* core level */ + p->topo.subleaf[1].nr_logical = cores_per_pkg; + if ( p->x86_vendor == X86_VENDOR_INTEL ) + p->topo.subleaf[1].nr_logical = threads_per_pkg; + p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift; + p->topo.subleaf[1].level = 1; + p->topo.subleaf[1].type = 2; + if ( cores_per_pkg > 1 ) + p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1); + + apic_id_size = p->topo.subleaf[1].id_shift; + + /* + * Contrary to what the name might seem to imply. HTT is an enabler for + * SMP and there's no harm in setting it even with a single vCPU. + */ + p->basic.htt = true; + p->basic.lppp = MIN(0xff, threads_per_pkg); + + switch ( p->x86_vendor ) + { + case X86_VENDOR_INTEL: { + struct cpuid_cache_leaf *sl = p->cache.subleaf; + + for ( size_t i = 0; sl->type && + i < ARRAY_SIZE(p->cache.raw); i++, sl++ ) + { + sl->cores_per_package = cores_per_pkg - 1; + sl->threads_per_cache = threads_per_core - 1; + if ( sl->type == 3 /* unified cache */ ) + sl->threads_per_cache = threads_per_pkg - 1; + } + break; + } + + case X86_VENDOR_AMD: + case X86_VENDOR_HYGON: + /* Expose p->basic.lppp */ + p->extd.cmp_legacy = true; + + /* Clip NC to the maximum value it can hold */ + p->extd.nc = MIN(0xff, threads_per_pkg - 1); + + /* TODO: Expose leaf e1E */ + p->extd.topoext = false; + + /* + * Clip APIC ID to 8 bits, as that's what high core-count machines do. + * + * That's what AMD EPYC 9654 does with >256 CPUs. + */ + p->extd.apic_id_size = MIN(8, apic_id_size); + + break; + } + + return 0; +} + int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err) diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/private.h +++ b/xen/lib/x86/private.h @@ -XXX,XX +XXX,XX @@ #ifdef __XEN__ #include <xen/bitops.h> +#include <xen/bug.h> #include <xen/guest_access.h> #include <xen/kernel.h> #include <xen/lib.h> @@ -XXX,XX +XXX,XX @@ #else +#include <assert.h> #include <errno.h> #include <inttypes.h> #include <stdbool.h> @@ -XXX,XX +XXX,XX @@ #include <xen-tools/common-macros.h> +#define ASSERT(x) assert(x) + static inline bool test_bit(unsigned int bit, const void *vaddr) { const char *addr = vaddr; -- 2.46.0
Implements the helper for mapping vcpu_id to x2apic_id given a valid topology in a policy. The algo is written with the intention of extending it to leaves 0x1f and extended 0x26 in the future. Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared, so the leaf is not implemented. In that case, the new helper just returns the legacy mapping. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- tools/tests/cpu-policy/test-cpu-policy.c | 68 +++++++++++++++++++++ xen/include/xen/lib/x86/cpu-policy.h | 11 ++++ xen/lib/x86/policy.c | 76 ++++++++++++++++++++++++ 3 files changed, 155 insertions(+) diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/tools/tests/cpu-policy/test-cpu-policy.c +++ b/tools/tests/cpu-policy/test-cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void test_topo_from_parts(void) } } +static void test_x2apic_id_from_vcpu_id_success(void) +{ + static const struct test { + unsigned int vcpu_id; + unsigned int threads_per_core; + unsigned int cores_per_pkg; + uint32_t x2apic_id; + uint8_t x86_vendor; + } tests[] = { + { + .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = 1 << 2, + }, + { + .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = 2 << 2, + }, + { + .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = 1 << 5, + }, + { + .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8, + .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5), + }, + { + .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3, + .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5), + }, + }; + + const uint8_t vendors[] = { + X86_VENDOR_INTEL, + X86_VENDOR_AMD, + X86_VENDOR_CENTAUR, + X86_VENDOR_SHANGHAI, + X86_VENDOR_HYGON, + }; + + printf("Testing x2apic id from vcpu id success:\n"); + + /* Perform the test run on every vendor we know about */ + for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i ) + { + for ( size_t j = 0; j < ARRAY_SIZE(tests); ++j ) + { + struct cpu_policy policy = { .x86_vendor = vendors[i] }; + const struct test *t = &tests[j]; + uint32_t x2apic_id; + int rc = x86_topo_from_parts(&policy, t->threads_per_core, + t->cores_per_pkg); + + if ( rc ) { + fail("FAIL[%d] - 'x86_topo_from_parts() failed", rc); + continue; + } + + x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id); + if ( x2apic_id != t->x2apic_id ) + fail("FAIL - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n", + x86_cpuid_vendor_to_str(policy.x86_vendor), + t->vcpu_id, t->threads_per_core, t->cores_per_pkg, + t->x2apic_id, x2apic_id); + } + } +} + int main(int argc, char **argv) { printf("CPU Policy unit tests\n"); @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) test_is_compatible_failure(); test_topo_from_parts(); + test_x2apic_id_from_vcpu_id_success(); if ( nr_failures ) printf("Done: %u failures\n", nr_failures); diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/lib/x86/cpu-policy.h +++ b/xen/include/xen/lib/x86/cpu-policy.h @@ -XXX,XX +XXX,XX @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host, const struct cpu_policy *guest, struct cpu_policy_errors *err); +/** + * Calculates the x2APIC ID of a vCPU given a CPU policy + * + * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2 + * + * @param p CPU policy of the domain. + * @param id vCPU ID of the vCPU. + * @returns x2APIC ID of the vCPU. + */ +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id); + /** * Synthesise topology information in `p` given high-level constraints * diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/lib/x86/policy.c +++ b/xen/lib/x86/policy.c @@ -XXX,XX +XXX,XX @@ #include <xen/lib/x86/cpu-policy.h> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, + size_t lvl) +{ + /* + * `nr_logical` reported by Intel is the number of THREADS contained in + * the next topological scope. For example, assuming a system with 2 + * threads/core and 3 cores/module in a fully symmetric topology, + * `nr_logical` at the core level will report 6. Because it's reporting + * the number of threads in a module. + * + * On AMD/Hygon, nr_logical is already normalized by the higher scoped + * level (cores/complex, etc) so we can return it as-is. + */ + if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl ) + return p->topo.subleaf[lvl].nr_logical; + + return p->topo.subleaf[lvl].nr_logical / + p->topo.subleaf[lvl - 1].nr_logical; +} + +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id) +{ + uint32_t shift = 0, x2apic_id = 0; + + /* In the absence of topology leaves, fallback to traditional mapping */ + if ( !p->topo.subleaf[0].type ) + return id * 2; + + /* + * `id` means different things at different points of the algo + * + * At lvl=0: global thread_id (same as vcpu_id) + * At lvl=1: global core_id + * At lvl=2: global socket_id (actually complex_id in AMD, module_id + * in Intel, but the name is inconsequential) + * + * +--+ + * ____ |#0| ______ <= 1 socket + * / +--+ \+--+ + * __#0__ __|#1|__ <= 2 cores/socket + * / | \ +--+/ +-|+ \ + * #0 #1 #2 |#3| #4 #5 <= 3 threads/core + * +--+ + * + * ... and so on. Global in this context means that it's a unique + * identifier for the whole topology, and not relative to the level + * it's in. For example, in the diagram shown above, we're looking at + * thread #3 in the global sense, though it's #0 within its core. + * + * Note that dividing a global thread_id by the number of threads per + * core returns the global core id that contains it. e.g: 0, 1 or 2 + * divided by 3 returns core_id=0. 3, 4 or 5 divided by 3 returns core + * 1, and so on. An analogous argument holds for higher levels. This is + * the property we exploit to derive x2apic_id from vcpu_id. + * + * NOTE: `topo` is currently derived from leaf 0xb, which is bound to two + * levels, but once we track leaves 0x1f (or extended 0x26) there will be a + * few more. The algorithm is written to cope with that case. + */ + for ( uint32_t i = 0; i < ARRAY_SIZE(p->topo.raw); i++ ) + { + uint32_t nr_parts; + + if ( !p->topo.subleaf[i].type ) + /* sentinel subleaf */ + break; + + nr_parts = parts_per_higher_scoped_level(p, i); + x2apic_id |= (id % nr_parts) << shift; + id /= nr_parts; + shift = p->topo.subleaf[i].id_shift; + } + + return (id << shift) | x2apic_id; +} + static unsigned int order(unsigned int n) { ASSERT(n); /* clz(0) is UB */ -- 2.46.0
Have toolstack populate the new x2APIC ID in the LAPIC save record with the proper IDs intended for each vCPU. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v6: * Rely on the new LUT in xc_dom_image rather than recalculating APIC IDs. --- tools/libs/guest/xg_dom_x86.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/xg_dom_x86.c +++ b/tools/libs/guest/xg_dom_x86.c @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) HVM_SAVE_TYPE(HEADER) header; struct hvm_save_descriptor mtrr_d; HVM_SAVE_TYPE(MTRR) mtrr; + struct hvm_save_descriptor lapic_d; + HVM_SAVE_TYPE(LAPIC) lapic; struct hvm_save_descriptor end_d; HVM_SAVE_TYPE(END) end; } vcpu_ctx; - /* Context from full_ctx */ + /* Contexts from full_ctx */ const HVM_SAVE_TYPE(MTRR) *mtrr_record; + const HVM_SAVE_TYPE(LAPIC) *lapic_record; /* Raw context as taken from Xen */ uint8_t *full_ctx = NULL; int rc; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) vcpu_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR); vcpu_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR); vcpu_ctx.mtrr = *mtrr_record; + vcpu_ctx.lapic_d.typecode = HVM_SAVE_CODE(LAPIC); + vcpu_ctx.lapic_d.length = HVM_SAVE_LENGTH(LAPIC); vcpu_ctx.end_d = bsp_ctx.end_d; vcpu_ctx.end = bsp_ctx.end; @@ -XXX,XX +XXX,XX @@ static int vcpu_hvm(struct xc_dom_image *dom) { vcpu_ctx.mtrr_d.instance = i; + lapic_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(LAPIC), i); + if ( !lapic_record ) + { + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: unable to get LAPIC[%d] save record", __func__, i); + goto out; + } + + vcpu_ctx.lapic = *lapic_record; + vcpu_ctx.lapic.x2apic_id = dom->cpu_to_apicid[i]; + vcpu_ctx.lapic_d.instance = i; + rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid, (uint8_t *)&vcpu_ctx, sizeof(vcpu_ctx)); if ( rc != 0 ) -- 2.46.0
Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT systems, in line with the previous code intent. Leaf 0xb in the host policy is no longer zapped and the guest {max,def} policies have their topology leaves zapped instead. The intent is for toolstack to populate them. There's no current use for the topology information in the host policy, but it makes no harm. Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> --- v6: * Assign accurate APIC IDs to xc_dom_img->cpu_to_apicid * New field in v6. Allows ACPI generation to be correct on PVH too. --- tools/include/xenguest.h | 3 +++ tools/libs/guest/xg_cpuid_x86.c | 29 ++++++++++++++++++++++++++++- tools/libs/light/libxl_dom.c | 22 +++++++++++++++++++++- xen/arch/x86/cpu-policy.c | 9 ++++++--- 4 files changed, 58 insertions(+), 5 deletions(-) diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h index XXXXXXX..XXXXXXX 100644 --- a/tools/include/xenguest.h +++ b/tools/include/xenguest.h @@ -XXX,XX +XXX,XX @@ int xc_set_domain_cpu_policy(xc_interface *xch, uint32_t domid, uint32_t xc_get_cpu_featureset_size(void); +/* Returns the APIC ID of the `cpu`-th CPU according to `policy` */ +uint32_t xc_cpu_to_apicid(const xc_cpu_policy_t *policy, unsigned int cpu); + enum xc_static_cpu_featuremask { XC_FEATUREMASK_KNOWN, XC_FEATUREMASK_SPECIAL, diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/xg_cpuid_x86.c +++ b/tools/libs/guest/xg_cpuid_x86.c @@ -XXX,XX +XXX,XX @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, p->policy.basic.htt = test_bit(X86_FEATURE_HTT, host_featureset); p->policy.extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset); } - else + else if ( restore ) { + /* + * Reconstruct the topology exposed on Xen <= 4.13. It makes very little + * sense, but it's what those guests saw so it's set in stone now. + * + * Guests from Xen 4.14 onwards carry their own CPUID leaves in the + * migration stream so they don't need special treatment. + */ + /* * Topology for HVM guests is entirely controlled by Xen. For now, we * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. @@ -XXX,XX +XXX,XX @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, break; } } + else + { + /* TODO: Expose the ability to choose a custom topology for HVM/PVH */ + unsigned int threads_per_core = 1; + unsigned int cores_per_pkg = di.max_vcpu_id + 1; + + rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg); + if ( rc ) + { + ERROR("Failed to generate topology: rc=%d t/c=%u c/p=%u", + rc, threads_per_core, cores_per_pkg); + goto out; + } + } nr_leaves = ARRAY_SIZE(p->leaves); rc = x86_cpuid_copy_to_buffer(&p->policy, p->leaves, &nr_leaves); @@ -XXX,XX +XXX,XX @@ bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host, return false; } + +uint32_t xc_cpu_to_apicid(const xc_cpu_policy_t *policy, unsigned int cpu) +{ + return x86_x2apic_id_from_vcpu_id(&policy->policy, cpu); +} diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_dom.c +++ b/tools/libs/light/libxl_dom.c @@ -XXX,XX +XXX,XX @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *const info = &d_config->b_info; struct xc_dom_image *dom = NULL; bool device_model = info->type == LIBXL_DOMAIN_TYPE_HVM ? true : false; +#if defined(__i386__) || defined(__x86_64__) + struct xc_cpu_policy *policy = NULL; +#endif xc_dom_loginit(ctx->xch); @@ -XXX,XX +XXX,XX @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, dom->container_type = XC_DOM_HVM_CONTAINER; #if defined(__i386__) || defined(__x86_64__) + policy = xc_cpu_policy_init(); + if (!policy) { + LOGE(ERROR, "xc_cpu_policy_get_domain failed d%u", domid); + rc = ERROR_NOMEM; + goto out; + } + + rc = xc_cpu_policy_get_domain(ctx->xch, domid, policy); + if (rc != 0) { + LOGE(ERROR, "xc_cpu_policy_get_domain failed d%u", domid); + rc = ERROR_FAIL; + goto out; + } + for ( uint32_t i = 0; i < info->max_vcpus; i++ ) - dom->cpu_to_apicid[i] = 2 * i; /* TODO: Replace by topo calculation */ + dom->cpu_to_apicid[i] = xc_cpu_to_apicid(policy, i); #endif /* The params from the configuration file are in Mb, which are then @@ -XXX,XX +XXX,XX @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, out: assert(rc != 0); if (dom != NULL) xc_dom_release(dom); +#if defined(__i386__) || defined(__x86_64__) + xc_cpu_policy_destroy(policy); +#endif return rc; } diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -XXX,XX +XXX,XX @@ static void recalculate_misc(struct cpu_policy *p) p->basic.raw[0x8] = EMPTY_LEAF; - /* TODO: Rework topology logic. */ - memset(p->topo.raw, 0, sizeof(p->topo.raw)); - p->basic.raw[0xc] = EMPTY_LEAF; p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES; @@ -XXX,XX +XXX,XX @@ static void __init calculate_pv_max_policy(void) recalculate_xstate(p); p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */ + + /* Wipe host topology. Populated by toolstack */ + memset(p->topo.raw, 0, sizeof(p->topo.raw)); } static void __init calculate_pv_def_policy(void) @@ -XXX,XX +XXX,XX @@ static void __init calculate_hvm_max_policy(void) /* It's always possible to emulate CPUID faulting for HVM guests */ p->platform_info.cpuid_faulting = true; + + /* Wipe host topology. Populated by toolstack */ + memset(p->topo.raw, 0, sizeof(p->topo.raw)); } static void __init calculate_hvm_def_policy(void) -- 2.46.0