[XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs

Grygorii Strashko posted 1 patch 1 week, 2 days ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20250925195558.519568-1-grygorii._5Fstrashko@epam.com
There is a newer version of this series
xen/arch/x86/hvm/vlapic.c | 1 +
1 file changed, 1 insertion(+)
[XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Grygorii Strashko 1 week, 2 days ago
From: Grygorii Strashko <grygorii_strashko@epam.com>

The LAPIC LVTx registers have two RO bits:
- all: Delivery Status (DS) bit 12
- LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
  This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
  WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
  (MBZ access type).
and the current vLAPIC implementations allows guest to write to these RO bits.

The Delivery Status (DS) is not emulated by Xen - there is no IRQ msg bus, and
the IRQ is:
- or accepted at destination and appears as pending
  (vLAPIC Interrupt Request Register (IRR))
- or get rejected immediately.

The Remote IRR Flag (RIR) behavior emulation is not implemented for LINT0/LINT1
in Xen for now.

Hence it is definitely wrong to allow guest to write to LVTx regs RO bits,
fix it by unconditionally cleaning up those bits in vlapic_reg_write().

Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com>
---
 xen/arch/x86/hvm/vlapic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 79697487ba90..78162afe7711 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
         if ( vlapic_sw_disabled(vlapic) )
             val |= APIC_LVT_MASKED;
         val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4);
+        val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING);
         vlapic_set_reg(vlapic, reg, val);
         if ( reg == APIC_LVT0 )
         {
-- 
2.34.1
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Jan Beulich 1 week, 1 day ago
On 25.09.2025 21:55, Grygorii Strashko wrote:
> From: Grygorii Strashko <grygorii_strashko@epam.com>
> 
> The LAPIC LVTx registers have two RO bits:
> - all: Delivery Status (DS) bit 12
> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
>   This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
>   WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
>   (MBZ access type).

Question is what the behavior is for writing the r/o (but not reserved) bits.
I wasn't able to find any statement in the SDM.

> and the current vLAPIC implementations allows guest to write to these RO bits.
> 
> The Delivery Status (DS) is not emulated by Xen - there is no IRQ msg bus, and
> the IRQ is:
> - or accepted at destination and appears as pending
>   (vLAPIC Interrupt Request Register (IRR))
> - or get rejected immediately.
> 
> The Remote IRR Flag (RIR) behavior emulation is not implemented for LINT0/LINT1
> in Xen for now.
> 
> Hence it is definitely wrong to allow guest to write to LVTx regs RO bits,
> fix it by unconditionally cleaning up those bits in vlapic_reg_write().
> 
> Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com>

Please also add a Fixes: tag when correcting code.

> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
>          if ( vlapic_sw_disabled(vlapic) )
>              val |= APIC_LVT_MASKED;
>          val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4);
> +        val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING);

There shouldn't be a 2nd &= here; what needs adding should imo be added to
(really: removed from) vlapic_lvt_mask[]. (Orthogonal to this I wonder whether
guest_wrmsr_x2apic() wouldn't better use that array, too.)

While looking at this, don't we have an issue with CMCI as well?
guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on
AMD we would fail to deliver #GP when the guest accesses it, while on Intel
we would lose the value written. And we also don't set its mask bit in
vlapic_do_init(). I guess I need to make a patch ...

Jan
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Grygorii Strashko 1 week, 1 day ago

On 26.09.25 11:17, Jan Beulich wrote:
> On 25.09.2025 21:55, Grygorii Strashko wrote:
>> From: Grygorii Strashko <grygorii_strashko@epam.com>
>>
>> The LAPIC LVTx registers have two RO bits:
>> - all: Delivery Status (DS) bit 12
>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
>>    This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
>>    WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
>>    (MBZ access type).
> 
> Question is what the behavior is for writing the r/o (but not reserved) bits.
> I wasn't able to find any statement in the SDM.

Me too. Usually RO/WI on most HW.
For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception).

> 
>> and the current vLAPIC implementations allows guest to write to these RO bits.
>>
>> The Delivery Status (DS) is not emulated by Xen - there is no IRQ msg bus, and
>> the IRQ is:
>> - or accepted at destination and appears as pending
>>    (vLAPIC Interrupt Request Register (IRR))
>> - or get rejected immediately.
>>
>> The Remote IRR Flag (RIR) behavior emulation is not implemented for LINT0/LINT1
>> in Xen for now.
>>
>> Hence it is definitely wrong to allow guest to write to LVTx regs RO bits,
>> fix it by unconditionally cleaning up those bits in vlapic_reg_write().
>>
>> Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com>
> 
> Please also add a Fixes: tag when correcting code.

This is veeery old code. Probably
f7c8af3a6476e ("[XEN] HVM: Clean up and simplify vlapic device-model code")
or
50b3cef2eecb0 ("[HVM] Place all APIC registers into one page in native format.")

> 
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
>>           if ( vlapic_sw_disabled(vlapic) )
>>               val |= APIC_LVT_MASKED;
>>           val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4);
>> +        val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING);
> 
> There shouldn't be a 2nd &= here; what needs adding should imo be added to
> (really: removed from) vlapic_lvt_mask[].

I'll try it.

  (Orthogonal to this I wonder whether
> guest_wrmsr_x2apic() wouldn't better use that array, too.)

WRMSR checks for MBZ. RO bits are not MBZ, so masks are different.

> 
> While looking at this, don't we have an issue with CMCI as well?

I see no APIC_CMCI write emulation. only read.

> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on
> AMD we would fail to deliver #GP when the guest accesses it, while on Intel
> we would lose the value written. And we also don't set its mask bit in
> vlapic_do_init(). I guess I need to make a patch ...

Is'n it depends on CMCI capability exposing to guest? (have no idea what's CMCI :)

-- 
Best regards,
-grygorii
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Jan Beulich 1 week, 1 day ago
On 26.09.2025 12:38, Grygorii Strashko wrote:
> On 26.09.25 11:17, Jan Beulich wrote:
>> On 25.09.2025 21:55, Grygorii Strashko wrote:
>>> From: Grygorii Strashko <grygorii_strashko@epam.com>
>>>
>>> The LAPIC LVTx registers have two RO bits:
>>> - all: Delivery Status (DS) bit 12
>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
>>>    This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
>>>    WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
>>>    (MBZ access type).
>>
>> Question is what the behavior is for writing the r/o (but not reserved) bits.
>> I wasn't able to find any statement in the SDM.
> 
> Me too. Usually RO/WI on most HW.
> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception).

My remark was specifically about WRMSR, and what you say here contradicts ...

>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
>>>           if ( vlapic_sw_disabled(vlapic) )
>>>               val |= APIC_LVT_MASKED;
>>>           val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4);
>>> +        val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING);
>>
>> There shouldn't be a 2nd &= here; what needs adding should imo be added to
>> (really: removed from) vlapic_lvt_mask[].
> 
> I'll try it.
> 
>   (Orthogonal to this I wonder whether
>> guest_wrmsr_x2apic() wouldn't better use that array, too.)
> 
> WRMSR checks for MBZ. RO bits are not MBZ, so masks are different.

... what you say here.

>> While looking at this, don't we have an issue with CMCI as well?
> 
> I see no APIC_CMCI write emulation. only read.

guest_wrmsr_x2apic() has

    case APIC_CMCI:

>> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on
>> AMD we would fail to deliver #GP when the guest accesses it, while on Intel
>> we would lose the value written. And we also don't set its mask bit in
>> vlapic_do_init(). I guess I need to make a patch ...
> 
> Is'n it depends on CMCI capability exposing to guest?

Yes, that's part of what I was (effectively) saying.

> (have no idea what's CMCI :)

Corrected Machine Check Interrupt.

Jan
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Grygorii Strashko 1 week, 1 day ago

On 26.09.25 13:52, Jan Beulich wrote:
> On 26.09.2025 12:38, Grygorii Strashko wrote:
>> On 26.09.25 11:17, Jan Beulich wrote:
>>> On 25.09.2025 21:55, Grygorii Strashko wrote:
>>>> From: Grygorii Strashko <grygorii_strashko@epam.com>
>>>>
>>>> The LAPIC LVTx registers have two RO bits:
>>>> - all: Delivery Status (DS) bit 12
>>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
>>>>     This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
>>>>     WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
>>>>     (MBZ access type).
>>>
>>> Question is what the behavior is for writing the r/o (but not reserved) bits.
>>> I wasn't able to find any statement in the SDM.
>>
>> Me too. Usually RO/WI on most HW.
>> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception).

Sorry, ignore ^ sentence.

> 
> My remark was specifically about WRMSR, and what you say here contradicts ...

> 
>>>> --- a/xen/arch/x86/hvm/vlapic.c
>>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>>> @@ -880,6 +880,7 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
>>>>            if ( vlapic_sw_disabled(vlapic) )
>>>>                val |= APIC_LVT_MASKED;
>>>>            val &= array_access_nospec(vlapic_lvt_mask, (reg - APIC_LVTT) >> 4);
>>>> +        val &= ~(APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING);
>>>
>>> There shouldn't be a 2nd &= here; what needs adding should imo be added to
>>> (really: removed from) vlapic_lvt_mask[].
>>
>> I'll try it.
>>
>>    (Orthogonal to this I wonder whether
>>> guest_wrmsr_x2apic() wouldn't better use that array, too.)
>>
>> WRMSR checks for MBZ. RO bits are not MBZ, so masks are different.
> 
> ... what you say here.

Above reflects current code - guest_wrmsr_x2apic() allows valid RO bits,
then vlapic_reg_write() ignores (W/I) them.
That's why masks are different between guest_wrmsr_x2apic and vlapic_reg_write.

> 
>>> While looking at this, don't we have an issue with CMCI as well?
>>
>> I see no APIC_CMCI write emulation. only read.
> 
> guest_wrmsr_x2apic() has
> 
>      case APIC_CMCI:

it will end up calling vlapic_reg_write() which doesn't have case statement for
APIC_CMCI - write ignored.

> 
>>> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on
>>> AMD we would fail to deliver #GP when the guest accesses it, while on Intel
>>> we would lose the value written. And we also don't set its mask bit in
>>> vlapic_do_init(). I guess I need to make a patch ...
>>
>> Is'n it depends on CMCI capability exposing to guest?
> 
> Yes, that's part of what I was (effectively) saying.
> 
>> (have no idea what's CMCI :)
> 
> Corrected Machine Check Interrupt.

Looking at:

  #define VLAPIC_VERSION                  0x00050014

which means "Max LVT Entries" = 6 (5+1)

Looking at linux kernel apic.c:
#ifdef CONFIG_X86_MCE_INTEL
	if (maxlvt >= 6) {
		v = apic_read(APIC_LVTCMCI);
		if (!(v & APIC_LVT_MASKED))
			apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED);
	}
#endif

Which means Xen never really emulated APIC_CMCI, so wouldn't it be correct to just drop APIC_CMCI?
Also, taking into account that it's Intel specific.

-- 
Best regards,
-grygorii
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Jan Beulich 1 week, 1 day ago
On 26.09.2025 13:34, Grygorii Strashko wrote:
> On 26.09.25 13:52, Jan Beulich wrote:
>> On 26.09.2025 12:38, Grygorii Strashko wrote:
>>> On 26.09.25 11:17, Jan Beulich wrote:
>>>> On 25.09.2025 21:55, Grygorii Strashko wrote:
>>>> While looking at this, don't we have an issue with CMCI as well?
>>>
>>> I see no APIC_CMCI write emulation. only read.
>>
>> guest_wrmsr_x2apic() has
>>
>>      case APIC_CMCI:
> 
> it will end up calling vlapic_reg_write() which doesn't have case statement for
> APIC_CMCI - write ignored.

Which again is what I had described.

>>>> guest_{rd,wr}msr_x2apic() handle it, but vlapic_reg_write() doesn't. I.e. on
>>>> AMD we would fail to deliver #GP when the guest accesses it, while on Intel
>>>> we would lose the value written. And we also don't set its mask bit in
>>>> vlapic_do_init(). I guess I need to make a patch ...
>>>
>>> Is'n it depends on CMCI capability exposing to guest?
>>
>> Yes, that's part of what I was (effectively) saying.
>>
>>> (have no idea what's CMCI :)
>>
>> Corrected Machine Check Interrupt.
> 
> Looking at:
> 
>   #define VLAPIC_VERSION                  0x00050014
> 
> which means "Max LVT Entries" = 6 (5+1)
> 
> Looking at linux kernel apic.c:
> #ifdef CONFIG_X86_MCE_INTEL
> 	if (maxlvt >= 6) {
> 		v = apic_read(APIC_LVTCMCI);
> 		if (!(v & APIC_LVT_MASKED))
> 			apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED);
> 	}
> #endif
> 
> Which means Xen never really emulated APIC_CMCI, so wouldn't it be correct to just drop APIC_CMCI?

From the SDM it's not quite clear to me whether using the LVT count is the
correct / only way to determine whether there's a CMCI LVT entry. To me
it reads more like the MCG_CAP bit is what is the basis. Perhaps based on
that we should conditionally set the LVT count to 6 (AMD) or 7 (Intel).

> Also, taking into account that it's Intel specific.

Another part of what I said I think needs correcting.

Jan
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Alejandro Vallejo 1 week, 1 day ago
On Fri Sep 26, 2025 at 12:52 PM CEST, Jan Beulich wrote:
> On 26.09.2025 12:38, Grygorii Strashko wrote:
>> On 26.09.25 11:17, Jan Beulich wrote:
>>> On 25.09.2025 21:55, Grygorii Strashko wrote:
>>>> From: Grygorii Strashko <grygorii_strashko@epam.com>
>>>>
>>>> The LAPIC LVTx registers have two RO bits:
>>>> - all: Delivery Status (DS) bit 12
>>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
>>>>    This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
>>>>    WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
>>>>    (MBZ access type).
>>>
>>> Question is what the behavior is for writing the r/o (but not reserved) bits.
>>> I wasn't able to find any statement in the SDM.
>> 
>> Me too. Usually RO/WI on most HW.
>> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception).
>
> My remark was specifically about WRMSR, and what you say here contradicts ...

Not quite what you're asking, but writing to the X2APIC_ID register does trigger
#GP(0), so one would hope writing to RO bits triggers an exception too rather
than being WI when mixed with RW bits in a register.

Now again, it might not in order to avoid #GP(0) on a race.

Definitely worth running a silly test with wrmsr_safe() to make sure. I could
see real hardware going either way.

Cheers,
Alejandro
Re: [XEN][PATCH] x86/hvm: vlapic: fix RO bits emulation in LVTx regs
Posted by Grygorii Strashko 1 week, 1 day ago

On 26.09.25 14:12, Alejandro Vallejo wrote:
> On Fri Sep 26, 2025 at 12:52 PM CEST, Jan Beulich wrote:
>> On 26.09.2025 12:38, Grygorii Strashko wrote:
>>> On 26.09.25 11:17, Jan Beulich wrote:
>>>> On 25.09.2025 21:55, Grygorii Strashko wrote:
>>>>> From: Grygorii Strashko <grygorii_strashko@epam.com>
>>>>>
>>>>> The LAPIC LVTx registers have two RO bits:
>>>>> - all: Delivery Status (DS) bit 12
>>>>> - LINT0/LINT1: Remote IRR Flag (RIR) bit 14.
>>>>>     This bit is reserved for other LVTx regs with RAZ/WI access type (MMIO), while
>>>>>     WRMSR (guest_wrmsr_x2apic()) has appropiate checks for reserved bits
>>>>>     (MBZ access type).
>>>>
>>>> Question is what the behavior is for writing the r/o (but not reserved) bits.
>>>> I wasn't able to find any statement in the SDM.
>>>
>>> Me too. Usually RO/WI on most HW.
>>> For example, LAPIC MMIO "Write" will be ignored (WRMSR will trigger exception).
>>
>> My remark was specifically about WRMSR, and what you say here contradicts ...
> 
> Not quite what you're asking, but writing to the X2APIC_ID register does trigger
> #GP(0), so one would hope writing to RO bits triggers an exception too rather
> than being WI when mixed with RW bits in a register.
> 
> Now again, it might not in order to avoid #GP(0) on a race.
> 
> Definitely worth running a silly test with wrmsr_safe() to make sure. I could
> see real hardware going either way.

I see following code in Linux apic.c

	value = apic_read(APIC_LVT0);
	value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING |
		APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR |
		APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED);
	value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING;
	value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_EXTINT);
	apic_write(APIC_LVT0, value);

where RO
#define		APIC_LVT_REMOTE_IRR		(1 << 14)
#define		APIC_SEND_PENDING		(1 << 12)

This means writing to RO bits (at least LVT) doesn't expect to trigger exception and
changing that in guest_wrmsr_x2apic() will break existing guests.

Xen has the similar code [2].

[1] https://github.com/torvalds/linux/blob/4ff71af020ae59ae2d83b174646fc2ad9fcd4dc4/arch/x86/kernel/apic/apic.c#L2251
[2] https://github.com/xen-project/xen/blob/382dd0d166cb85139d86ff26fd96af102ae4fef3/xen/arch/x86/apic.c#L244

-- 
Best regards,
-grygorii