ioreq: Assert with out of bounds vCPU ID

[PATCH] ioreq: Assert with out of bounds vCPU ID

Posted by Teddy Astie 2 months, 3 weeks ago

A 4K page appears to be able to hold 128 ioreq entries, which luckly
matches the current vCPU limit. However, if we decide to increase the
domain vCPU limit, that doesn't hold anymore and this function would now
silently create a out of bounds pointer leading to confusing problems.

All architectures have no more than 128 as vCPU limit on HVM guests,
and have pages that are at most 4 KB, so this case doesn't occurs in
with the current limits.

Assert if the vCPU ID will lead to a out of bounds pointer.

No functional change.

Reported-by: Julian Vetter <julian.vetter@vates.tech>
Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
---
Not sure if this is the best approach, perhaps preventing compilation if the
vCPU limit is higher than what the ioreq page can hold is preferable ?

 xen/common/ioreq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index f5fd30ce12..b2ef46ed7b 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -99,6 +99,7 @@ static ioreq_t *get_ioreq(struct ioreq_server *s, struct vcpu *v)
 
     ASSERT((v == current) || !vcpu_runnable(v));
     ASSERT(p != NULL);
+    ASSERT(v->vcpu_id < (PAGE_SIZE / sizeof(struct ioreq)));
 
     return &p->vcpu_ioreq[v->vcpu_id];
 }
-- 
2.51.2



--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

Re: [PATCH] ioreq: Assert with out of bounds vCPU ID

Posted by Andrew Cooper 2 months, 3 weeks ago

On 14/11/2025 1:05 pm, Teddy Astie wrote:
> A 4K page appears to be able to hold 128 ioreq entries, which luckly
> matches the current vCPU limit. However, if we decide to increase the
> domain vCPU limit, that doesn't hold anymore and this function would now
> silently create a out of bounds pointer leading to confusing problems.
>
> All architectures have no more than 128 as vCPU limit on HVM guests,
> and have pages that are at most 4 KB, so this case doesn't occurs in
> with the current limits.
>
> Assert if the vCPU ID will lead to a out of bounds pointer.
>
> No functional change.
>
> Reported-by: Julian Vetter <julian.vetter@vates.tech>
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
> Not sure if this is the best approach, perhaps preventing compilation if the
> vCPU limit is higher than what the ioreq page can hold is preferable ?
>
>  xen/common/ioreq.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
> index f5fd30ce12..b2ef46ed7b 100644
> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -99,6 +99,7 @@ static ioreq_t *get_ioreq(struct ioreq_server *s, struct vcpu *v)
>  
>      ASSERT((v == current) || !vcpu_runnable(v));
>      ASSERT(p != NULL);
> +    ASSERT(v->vcpu_id < (PAGE_SIZE / sizeof(struct ioreq)));
>  
>      return &p->vcpu_ioreq[v->vcpu_id];
>  }

The 128 vCPU problem was known and IOERQ servers intentionally had
multi-page capabilities from the outset to address this problem.

See the calculation in ioreq_server_max_frames().

It hasn't been exercised in anger because there's still the APIC ID
limit at 128 still, but IOREQ servers are expected to work correctly
above this limit.  That said, this function is clearly buggy and in need
of fixing.

To the assert specifically, that's really not appropriate.  If it were
to ever fire, we'd have an XSA.  Logic if this nature either needs to be
fail safe (if returning NULL is ok), or be a BUG() so it doesn't become
unsafe in release builds.

~Andrew

Re: [PATCH] ioreq: Assert with out of bounds vCPU ID

Posted by Teddy Astie 2 months, 3 weeks ago

Le 14/11/2025 à 14:24, Andrew Cooper a écrit :
> On 14/11/2025 1:05 pm, Teddy Astie wrote:
>> A 4K page appears to be able to hold 128 ioreq entries, which luckly
>> matches the current vCPU limit. However, if we decide to increase the
>> domain vCPU limit, that doesn't hold anymore and this function would now
>> silently create a out of bounds pointer leading to confusing problems.
>>
>> All architectures have no more than 128 as vCPU limit on HVM guests,
>> and have pages that are at most 4 KB, so this case doesn't occurs in
>> with the current limits.
>>
>> Assert if the vCPU ID will lead to a out of bounds pointer.
>>
>> No functional change.
>>
>> Reported-by: Julian Vetter <julian.vetter@vates.tech>
>> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
>> ---
>> Not sure if this is the best approach, perhaps preventing compilation if the
>> vCPU limit is higher than what the ioreq page can hold is preferable ?
>>
>>   xen/common/ioreq.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
>> index f5fd30ce12..b2ef46ed7b 100644
>> --- a/xen/common/ioreq.c
>> +++ b/xen/common/ioreq.c
>> @@ -99,6 +99,7 @@ static ioreq_t *get_ioreq(struct ioreq_server *s, struct vcpu *v)
>>   
>>       ASSERT((v == current) || !vcpu_runnable(v));
>>       ASSERT(p != NULL);
>> +    ASSERT(v->vcpu_id < (PAGE_SIZE / sizeof(struct ioreq)));
>>   
>>       return &p->vcpu_ioreq[v->vcpu_id];
>>   }
> 
> The 128 vCPU problem was known and IOERQ servers intentionally had
> multi-page capabilities from the outset to address this problem.
> 
> See the calculation in ioreq_server_max_frames().
> 
> It hasn't been exercised in anger because there's still the APIC ID
> limit at 128 still, but IOREQ servers are expected to work correctly
> above this limit.  


Yes, although only the first ioreq page exists and can be used (even 
though ioreq_server_max_frames may report more). It looks like the logic 
is incomplete there.


> That said, this function is clearly buggy and in need
> of fixing.
> 
> To the assert specifically, that's really not appropriate.  If it were
> to ever fire, we'd have an XSA.  Logic if this nature either needs to be
> fail safe (if returning NULL is ok), or be a BUG() so it doesn't become
> unsafe in release builds.
> 

It looks like it is possible to somewhat gracefully handle this, by e.g 
crashing the offending domain instead of stepping in a assert. I'm 
trying this for v2.

> ~Andrew



--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech