xen/arch/x86/hvm/vlapic.c | 1 + 1 file changed, 1 insertion(+)
From: Grygorii Strashko <grygorii_strashko@epam.com>
The LAPIC LVTx registers have two RO bits:
- all: Delivery Status (DS) bit 12
- LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
(MBZ access type).
and the current vLAPIC implementations allows guest to write to these RO bits.
The Delivery Status (DS) is not emulated by Xen - there is no IRQ msg bus, and
the IRQ is:
- or accepted at destination and appears as pending
(vLAPIC Interrupt Request Register (IRR))
- or get rejected immediately.
The Remote IRR Flag (RIR) behavior emulation is not implemented for LINT0/LINT1
in Xen for now.
Hence it is definitely wrong to allow guest to write to LVTx regs RO bits,
fix it by unconditionally cleaning up those bits in vlapic_reg_write().
Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com>
---
xen/arch/x86/hvm/vlapic.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 79697487ba90..78162afe7711 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
if ( vlapic_sw_disabled(vlapic) )
val |= APIC_LVT_MASKED;
val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4);
+ val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING);
vlapic_set_reg(vlapic, reg, val);
if ( reg == APIC_LVT0 )
{
--
2.34.1
On 25.09.2025 21:55, Grygorii Strashko wrote: > From: Grygorii Strashko <grygorii_strashko@epam.com> > > The LAPIC LVTx registers have two RO bits: > - all: Delivery Status (DS) bit 12 > - LINT0/LINT1: Remote IRR Flag (RIR) bit 14. > This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while > WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits > (MBZ access type). Question is what the behavior is for writing the r/o (but not reserved) bits. I wasn't able to find any statement in the SDM. > and the current vLAPIC implementations allows guest to write to these RO bits. > > The Delivery Status (DS) is not emulated by Xen - there is no IRQ msg bus, and > the IRQ is: > - or accepted at destination and appears as pending > (vLAPIC Interrupt Request Register (IRR)) > - or get rejected immediately. > > The Remote IRR Flag (RIR) behavior emulation is not implemented for LINT0/LINT1 > in Xen for now. > > Hence it is definitely wrong to allow guest to write to LVTx regs RO bits, > fix it by unconditionally cleaning up those bits in vlapic_reg_write(). > > Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com> Please also add a Fixes: tag when correcting code. > --- a/xen/arch/x86/hvm/vlapic.c > +++ b/xen/arch/x86/hvm/vlapic.c > @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val) > if ( vlapic_sw_disabled(vlapic) ) > val |= APIC_LVT_MASKED; > val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4); > + val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING); There shouldn't be a 2nd &= here; what needs adding should imo be added to (really: removed from) vlapic_lvt_mask[]. (Orthogonal to this I wonder whether guest_wrmsr_x2apic() wouldn't better use that array, too.) While looking at this, don't we have an issue with CMCI as well? guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on AMD we would fail to deliver #GP when the guest accesses it, while on Intel we would lose the value written. And we also don't set its mask bit in vlapic_do_init(). I guess I need to make a patch ... Jan
On 26.09.25 11:17, Jan Beulich wrote: > On 25.09.2025 21:55, Grygorii Strashko wrote: >> From: Grygorii Strashko <grygorii_strashko@epam.com> >> >> The LAPIC LVTx registers have two RO bits: >> - all: Delivery Status (DS) bit 12 >> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14. >> This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while >> WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits >> (MBZ access type). > > Question is what the behavior is for writing the r/o (but not reserved) bits. > I wasn't able to find any statement in the SDM. Me too. Usually RO/WI on most HW. For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception). > >> and the current vLAPIC implementations allows guest to write to these RO bits. >> >> The Delivery Status (DS) is not emulated by Xen - there is no IRQ msg bus, and >> the IRQ is: >> - or accepted at destination and appears as pending >> (vLAPIC Interrupt Request Register (IRR)) >> - or get rejected immediately. >> >> The Remote IRR Flag (RIR) behavior emulation is not implemented for LINT0/LINT1 >> in Xen for now. >> >> Hence it is definitely wrong to allow guest to write to LVTx regs RO bits, >> fix it by unconditionally cleaning up those bits in vlapic_reg_write(). >> >> Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com> > > Please also add a Fixes: tag when correcting code. This is veeery old code. Probably f7c8af3a6476e ("[XEN] HVM: Clean up and simplify vlapic device-model code") or 50b3cef2eecb0 ("[HVM] Place all APIC registers into one page in native format.") > >> --- a/xen/arch/x86/hvm/vlapic.c >> +++ b/xen/arch/x86/hvm/vlapic.c >> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val) >> if ( vlapic_sw_disabled(vlapic) ) >> val |= APIC_LVT_MASKED; >> val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4); >> + val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING); > > There shouldn't be a 2nd &= here; what needs adding should imo be added to > (really: removed from) vlapic_lvt_mask[]. I'll try it. (Orthogonal to this I wonder whether > guest_wrmsr_x2apic() wouldn't better use that array, too.) WRMSR checks for MBZ. RO bits are not MBZ, so masks are different. > > While looking at this, don't we have an issue with CMCI as well? I see no APIC_CMCI write emulation. only read. > guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on > AMD we would fail to deliver #GP when the guest accesses it, while on Intel > we would lose the value written. And we also don't set its mask bit in > vlapic_do_init(). I guess I need to make a patch ... Is'n it depends on CMCI capability exposing to guest? (have no idea what's CMCI :) -- Best regards, -grygorii
On 26.09.2025 12:38, Grygorii Strashko wrote: > On 26.09.25 11:17, Jan Beulich wrote: >> On 25.09.2025 21:55, Grygorii Strashko wrote: >>> From: Grygorii Strashko <grygorii_strashko@epam.com> >>> >>> The LAPIC LVTx registers have two RO bits: >>> - all: Delivery Status (DS) bit 12 >>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14. >>> This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while >>> WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits >>> (MBZ access type). >> >> Question is what the behavior is for writing the r/o (but not reserved) bits. >> I wasn't able to find any statement in the SDM. > > Me too. Usually RO/WI on most HW. > For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception). My remark was specifically about WRMSR, and what you say here contradicts ... >>> --- a/xen/arch/x86/hvm/vlapic.c >>> +++ b/xen/arch/x86/hvm/vlapic.c >>> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val) >>> if ( vlapic_sw_disabled(vlapic) ) >>> val |= APIC_LVT_MASKED; >>> val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4); >>> + val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING); >> >> There shouldn't be a 2nd &= here; what needs adding should imo be added to >> (really: removed from) vlapic_lvt_mask[]. > > I'll try it. > > (Orthogonal to this I wonder whether >> guest_wrmsr_x2apic() wouldn't better use that array, too.) > > WRMSR checks for MBZ. RO bits are not MBZ, so masks are different. ... what you say here. >> While looking at this, don't we have an issue with CMCI as well? > > I see no APIC_CMCI write emulation. only read. guest_wrmsr_x2apic() has case APIC_CMCI: >> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on >> AMD we would fail to deliver #GP when the guest accesses it, while on Intel >> we would lose the value written. And we also don't set its mask bit in >> vlapic_do_init(). I guess I need to make a patch ... > > Is'n it depends on CMCI capability exposing to guest? Yes, that's part of what I was (effectively) saying. > (have no idea what's CMCI :) Corrected Machine Check Interrupt. Jan
On 26.09.25 13:52, Jan Beulich wrote: > On 26.09.2025 12:38, Grygorii Strashko wrote: >> On 26.09.25 11:17, Jan Beulich wrote: >>> On 25.09.2025 21:55, Grygorii Strashko wrote: >>>> From: Grygorii Strashko <grygorii_strashko@epam.com> >>>> >>>> The LAPIC LVTx registers have two RO bits: >>>> - all: Delivery Status (DS) bit 12 >>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14. >>>> This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while >>>> WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits >>>> (MBZ access type). >>> >>> Question is what the behavior is for writing the r/o (but not reserved) bits. >>> I wasn't able to find any statement in the SDM. >> >> Me too. Usually RO/WI on most HW. >> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception). Sorry, ignore ^ sentence. > > My remark was specifically about WRMSR, and what you say here contradicts ... > >>>> --- a/xen/arch/x86/hvm/vlapic.c >>>> +++ b/xen/arch/x86/hvm/vlapic.c >>>> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val) >>>> if ( vlapic_sw_disabled(vlapic) ) >>>> val |= APIC_LVT_MASKED; >>>> val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4); >>>> + val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING); >>> >>> There shouldn't be a 2nd &= here; what needs adding should imo be added to >>> (really: removed from) vlapic_lvt_mask[]. >> >> I'll try it. >> >> (Orthogonal to this I wonder whether >>> guest_wrmsr_x2apic() wouldn't better use that array, too.) >> >> WRMSR checks for MBZ. RO bits are not MBZ, so masks are different. > > ... what you say here. Above reflects current code - guest_wrmsr_x2apic() allows valid RO bits, then vlapic_reg_write() ignores (W/I) them. That's why masks are different between guest_wrmsr_x2apic and vlapic_reg_write. > >>> While looking at this, don't we have an issue with CMCI as well? >> >> I see no APIC_CMCI write emulation. only read. > > guest_wrmsr_x2apic() has > > case APIC_CMCI: it will end up calling vlapic_reg_write() which doesn't have case statement for APIC_CMCI - write ignored. > >>> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on >>> AMD we would fail to deliver #GP when the guest accesses it, while on Intel >>> we would lose the value written. And we also don't set its mask bit in >>> vlapic_do_init(). I guess I need to make a patch ... >> >> Is'n it depends on CMCI capability exposing to guest? > > Yes, that's part of what I was (effectively) saying. > >> (have no idea what's CMCI :) > > Corrected Machine Check Interrupt. Looking at: #define VLAPIC_VERSION 0x00050014 which means "Max LVT Entries" = 6 (5+1) Looking at linux kernel apic.c: #ifdef CONFIG_X86_MCE_INTEL if (maxlvt >= 6) { v = apic_read(APIC_LVTCMCI); if (!(v & APIC_LVT_MASKED)) apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); } #endif Which means Xen never really emulated APIC_CMCI, so wouldn't it be correct to just drop APIC_CMCI? Also, taking into account that it's Intel specific. -- Best regards, -grygorii
On 26.09.2025 13:34, Grygorii Strashko wrote: > On 26.09.25 13:52, Jan Beulich wrote: >> On 26.09.2025 12:38, Grygorii Strashko wrote: >>> On 26.09.25 11:17, Jan Beulich wrote: >>>> On 25.09.2025 21:55, Grygorii Strashko wrote: >>>> While looking at this, don't we have an issue with CMCI as well? >>> >>> I see no APIC_CMCI write emulation. only read. >> >> guest_wrmsr_x2apic() has >> >> case APIC_CMCI: > > it will end up calling vlapic_reg_write() which doesn't have case statement for > APIC_CMCI - write ignored. Which again is what I had described. >>>> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on >>>> AMD we would fail to deliver #GP when the guest accesses it, while on Intel >>>> we would lose the value written. And we also don't set its mask bit in >>>> vlapic_do_init(). I guess I need to make a patch ... >>> >>> Is'n it depends on CMCI capability exposing to guest? >> >> Yes, that's part of what I was (effectively) saying. >> >>> (have no idea what's CMCI :) >> >> Corrected Machine Check Interrupt. > > Looking at: > > #define VLAPIC_VERSION 0x00050014 > > which means "Max LVT Entries" = 6 (5+1) > > Looking at linux kernel apic.c: > #ifdef CONFIG_X86_MCE_INTEL > if (maxlvt >= 6) { > v = apic_read(APIC_LVTCMCI); > if (!(v & APIC_LVT_MASKED)) > apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); > } > #endif > > Which means Xen never really emulated APIC_CMCI, so wouldn't it be correct to just drop APIC_CMCI? From the SDM it's not quite clear to me whether using the LVT count is the correct / only way to determine whether there's a CMCI LVT entry. To me it reads more like the MCG_CAP bit is what is the basis. Perhaps based on that we should conditionally set the LVT count to 6 (AMD) or 7 (Intel). > Also, taking into account that it's Intel specific. Another part of what I said I think needs correcting. Jan
On Fri Sep 26, 2025 at 12:52 PM CEST, Jan Beulich wrote: > On 26.09.2025 12:38, Grygorii Strashko wrote: >> On 26.09.25 11:17, Jan Beulich wrote: >>> On 25.09.2025 21:55, Grygorii Strashko wrote: >>>> From: Grygorii Strashko <grygorii_strashko@epam.com> >>>> >>>> The LAPIC LVTx registers have two RO bits: >>>> - all: Delivery Status (DS) bit 12 >>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14. >>>> This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while >>>> WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits >>>> (MBZ access type). >>> >>> Question is what the behavior is for writing the r/o (but not reserved) bits. >>> I wasn't able to find any statement in the SDM. >> >> Me too. Usually RO/WI on most HW. >> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception). > > My remark was specifically about WRMSR, and what you say here contradicts ... Not quite what you're asking, but writing to the X2APIC_ID register does trigger #GP(0), so one would hope writing to RO bits triggers an exception too rather than being WI when mixed with RW bits in a register. Now again, it might not in order to avoid #GP(0) on a race. Definitely worth running a silly test with wrmsr_safe() to make sure. I could see real hardware going either way. Cheers, Alejandro
On 26.09.25 14:12, Alejandro Vallejo wrote: > On Fri Sep 26, 2025 at 12:52 PM CEST, Jan Beulich wrote: >> On 26.09.2025 12:38, Grygorii Strashko wrote: >>> On 26.09.25 11:17, Jan Beulich wrote: >>>> On 25.09.2025 21:55, Grygorii Strashko wrote: >>>>> From: Grygorii Strashko <grygorii_strashko@epam.com> >>>>> >>>>> The LAPIC LVTx registers have two RO bits: >>>>> - all: Delivery Status (DS) bit 12 >>>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14. >>>>> This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while >>>>> WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits >>>>> (MBZ access type). >>>> >>>> Question is what the behavior is for writing the r/o (but not reserved) bits. >>>> I wasn't able to find any statement in the SDM. >>> >>> Me too. Usually RO/WI on most HW. >>> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception). >> >> My remark was specifically about WRMSR, and what you say here contradicts ... > > Not quite what you're asking, but writing to the X2APIC_ID register does trigger > #GP(0), so one would hope writing to RO bits triggers an exception too rather > than being WI when mixed with RW bits in a register. > > Now again, it might not in order to avoid #GP(0) on a race. > > Definitely worth running a silly test with wrmsr_safe() to make sure. I could > see real hardware going either way. I see following code in Linux apic.c value = apic_read(APIC_LVT0); value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING | APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED); value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING; value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_EXTINT); apic_write(APIC_LVT0, value); where RO #define APIC_LVT_REMOTE_IRR (1 << 14) #define APIC_SEND_PENDING (1 << 12) This means writing to RO bits (at least LVT) doesn't expect to trigger exception and changing that in guest_wrmsr_x2apic() will break existing guests. Xen has the similar code [2]. [1] https://github.com/torvalds/linux/blob/4ff71af020ae59ae2d83b174646fc2ad9fcd4dc4/arch/x86/kernel/apic/apic.c#L2251 [2] https://github.com/xen-project/xen/blob/382dd0d166cb85139d86ff26fd96af102ae4fef3/xen/arch/x86/apic.c#L244 -- Best regards, -grygorii
© 2016 - 2025 Red Hat, Inc.