net/xdp/xsk_queue.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
option is also set, then the value of the expression
'desc->len + pool->tx_metadata_len' can overflow and validation
of the incorrect descriptor will be successfully passed.
This can lead to a subsequent chain of arithmetic overflows
in the xsk_build_skb() function and incorrect sk_buff allocation.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
---
net/xdp/xsk_queue.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index f16f390370dc..b206a8839b39 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -144,7 +144,7 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
u64 addr = desc->addr - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
+ u64 len = (u64)desc->len + pool->tx_metadata_len;
u64 offset = addr & (pool->chunk_size - 1);
if (!desc->len)
@@ -165,7 +165,7 @@ static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
+ u64 len = (u64)desc->len + pool->tx_metadata_len;
if (!desc->len)
return false;
--
2.39.5
From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Date: Mon, 6 Oct 2025 08:53:17 +0000
> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
In theory. Never in practice.
> option is also set, then the value of the expression
> 'desc->len + pool->tx_metadata_len' can overflow and validation
> of the incorrect descriptor will be successfully passed.
> This can lead to a subsequent chain of arithmetic overflows
> in the xsk_build_skb() function and incorrect sk_buff allocation.
>
> Found by InfoTeCS on behalf of Linux Verification Center
> (linuxtesting.org) with SVACE.
I think the general rule for sending fixes is that a fix must fix a real
bug which can be reproduced in real life scenarios.
Static Analysis Tools have no idea that nobody sends 4 Gb sized network
packets.
>
> Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
> ---
> net/xdp/xsk_queue.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Thanks,
Olek
On 10/6/25 18:19, Alexander Lobakin wrote:
> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
> Date: Mon, 6 Oct 2025 08:53:17 +0000
>
>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
>
> In theory. Never in practice.
>
Hi Alexander,
Thank you for the review.
It seems to me that this problem should be considered not from the point of view of practical use,
but from the point of view of security. An attacker can set any length of the packet in the descriptor
from the user space and descriptor validation will pass.
>> option is also set, then the value of the expression
>> 'desc->len + pool->tx_metadata_len' can overflow and validation
>> of the incorrect descriptor will be successfully passed.
>> This can lead to a subsequent chain of arithmetic overflows
>> in the xsk_build_skb() function and incorrect sk_buff allocation.
>>
>> Found by InfoTeCS on behalf of Linux Verification Center
>> (linuxtesting.org) with SVACE.
>
> I think the general rule for sending fixes is that a fix must fix a real
> bug which can be reproduced in real life scenarios.
I agree with that, so I make a test program (PoC). Something like that:
struct xdp_umem_reg umem_reg;
umem_reg.addr = (__u64)(void *)umem;
...
umem_reg.chunk_size = 4096;
umem_reg.tx_metadata_len = 16;
umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
...
xsk_ring_prod__reserve(tq, batch_size, &idx);
for (i = 0; i < nr_packets; ++i) {
struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
tx_desc->addr = packets[i].addr;
tx_desc->addr += umem->tx_metadata_len;
tx_desc->options = XDP_TX_METADATA;
tx_desc->len = UINT32_MAX;
}
xsk_ring_prod__submit(tq, nr_packets);
...
sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
Since the check of an invalid descriptor has passed, kernel try to allocate
a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
and this is where the next overflow occurs.
skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
Next the skb_store_bits() tries to copy -1 bytes, but fails.
__xsk_generic_xmit
xsk_build_skb
len = desc->len; // from descriptor
sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
sock_alloc_send_pskb
alloc_skb_with_frags
skb_put(skb, len) // len casts to int
skb_store_bits(skb, 0, buffer, len)
> Static Analysis Tools have no idea that nobody sends 4 Gb sized network
> packets.
>
That's right. Static analyzer is only a tool, but in this case, the overflow
highlighted by the static analyzer can be used for malicious purposes.
>>
>> Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
>> ---
>> net/xdp/xsk_queue.h | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
> Thanks,
> Olek
Thanks,
Ilia
From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Date: Tue, 7 Oct 2025 11:19:19 +0000
> On 10/6/25 18:19, Alexander Lobakin wrote:
>> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
>> Date: Mon, 6 Oct 2025 08:53:17 +0000
>>
>>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
>>
>> In theory. Never in practice.
>>
>
> Hi Alexander,
> Thank you for the review.
>
> It seems to me that this problem should be considered not from the point of view of practical use,
> but from the point of view of security. An attacker can set any length of the packet in the descriptor
> from the user space and descriptor validation will pass.
>
>
>>> option is also set, then the value of the expression
>>> 'desc->len + pool->tx_metadata_len' can overflow and validation
>>> of the incorrect descriptor will be successfully passed.
>>> This can lead to a subsequent chain of arithmetic overflows
>>> in the xsk_build_skb() function and incorrect sk_buff allocation.
>>>
>>> Found by InfoTeCS on behalf of Linux Verification Center
>>> (linuxtesting.org) with SVACE.
>>
>> I think the general rule for sending fixes is that a fix must fix a real
>> bug which can be reproduced in real life scenarios.
>
> I agree with that, so I make a test program (PoC). Something like that:
>
> struct xdp_umem_reg umem_reg;
> umem_reg.addr = (__u64)(void *)umem;
> ...
> umem_reg.chunk_size = 4096;
> umem_reg.tx_metadata_len = 16;
> umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
> setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
> ...
>
> xsk_ring_prod__reserve(tq, batch_size, &idx);
>
> for (i = 0; i < nr_packets; ++i) {
> struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
> tx_desc->addr = packets[i].addr;
> tx_desc->addr += umem->tx_metadata_len;
> tx_desc->options = XDP_TX_METADATA;
> tx_desc->len = UINT32_MAX;
> }
>
> xsk_ring_prod__submit(tq, nr_packets);
> ...
> sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
>
> Since the check of an invalid descriptor has passed, kernel try to allocate
> a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
> and this is where the next overflow occurs.
> skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
> Next the skb_store_bits() tries to copy -1 bytes, but fails.
>
> __xsk_generic_xmit
> xsk_build_skb
> len = desc->len; // from descriptor
> sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
> sock_alloc_send_pskb
> alloc_skb_with_frags
> skb_put(skb, len) // len casts to int
> skb_store_bits(skb, 0, buffer, len)
Oh, so you actually have a repro for this. This is good. I suggest you
resubmitting the patch and include this repro in the commit message, so
that it will be clear that it's actually possible to trigger the problem
in the kernel using a malicious/broken userspace application.
(also pls remove those double `@@` from the subject next time)
I'd also like to hear from Maciej and/or others what they think about
this problem (that the userspace can set packet len to U32_MAX). Should
we just go with this proposed u64 propagation or maybe we need to limit
the maximum length which could be sent from the userspace?
In any case, you raised a good topic.
>
>> Static Analysis Tools have no idea that nobody sends 4 Gb sized network
>> packets.
>>
>
> That's right. Static analyzer is only a tool, but in this case, the overflow
> highlighted by the static analyzer can be used for malicious purposes.
+1
Also I really do hope Infotecs stayed independent from the govs and
doesn't take part in any dual-purpose/gov-related projects.
Thanks,
Olek
On Tue, 7 Oct 2025 at 14:11, Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
> Date: Tue, 7 Oct 2025 11:19:19 +0000
>
> > On 10/6/25 18:19, Alexander Lobakin wrote:
> >> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
> >> Date: Mon, 6 Oct 2025 08:53:17 +0000
> >>
> >>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
> >>
> >> In theory. Never in practice.
> >>
> >
> > Hi Alexander,
> > Thank you for the review.
> >
> > It seems to me that this problem should be considered not from the point of view of practical use,
> > but from the point of view of security. An attacker can set any length of the packet in the descriptor
> > from the user space and descriptor validation will pass.
> >
> >
> >>> option is also set, then the value of the expression
> >>> 'desc->len + pool->tx_metadata_len' can overflow and validation
> >>> of the incorrect descriptor will be successfully passed.
> >>> This can lead to a subsequent chain of arithmetic overflows
> >>> in the xsk_build_skb() function and incorrect sk_buff allocation.
> >>>
> >>> Found by InfoTeCS on behalf of Linux Verification Center
> >>> (linuxtesting.org) with SVACE.
> >>
> >> I think the general rule for sending fixes is that a fix must fix a real
> >> bug which can be reproduced in real life scenarios.
> >
> > I agree with that, so I make a test program (PoC). Something like that:
> >
> > struct xdp_umem_reg umem_reg;
> > umem_reg.addr = (__u64)(void *)umem;
> > ...
> > umem_reg.chunk_size = 4096;
> > umem_reg.tx_metadata_len = 16;
> > umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
> > setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
> > ...
> >
> > xsk_ring_prod__reserve(tq, batch_size, &idx);
> >
> > for (i = 0; i < nr_packets; ++i) {
> > struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
> > tx_desc->addr = packets[i].addr;
> > tx_desc->addr += umem->tx_metadata_len;
> > tx_desc->options = XDP_TX_METADATA;
> > tx_desc->len = UINT32_MAX;
> > }
> >
> > xsk_ring_prod__submit(tq, nr_packets);
> > ...
> > sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
> >
> > Since the check of an invalid descriptor has passed, kernel try to allocate
> > a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
> > and this is where the next overflow occurs.
> > skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
> > Next the skb_store_bits() tries to copy -1 bytes, but fails.
> >
> > __xsk_generic_xmit
> > xsk_build_skb
> > len = desc->len; // from descriptor
> > sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
> > sock_alloc_send_pskb
> > alloc_skb_with_frags
> > skb_put(skb, len) // len casts to int
> > skb_store_bits(skb, 0, buffer, len)
>
> Oh, so you actually have a repro for this. This is good. I suggest you
> resubmitting the patch and include this repro in the commit message, so
> that it will be clear that it's actually possible to trigger the problem
> in the kernel using a malicious/broken userspace application.
>
> (also pls remove those double `@@` from the subject next time)
>
> I'd also like to hear from Maciej and/or others what they think about
> this problem (that the userspace can set packet len to U32_MAX). Should
> we just go with this proposed u64 propagation or maybe we need to limit
> the maximum length which could be sent from the userspace?
I prefer that we do not set a limit on it and go with the proposed
solution since I do not know what a future proof size limit would be.
Somebody could come up with a new virtual device that can send really
large packets, who knows.
> In any case, you raised a good topic.
>
> >
> >> Static Analysis Tools have no idea that nobody sends 4 Gb sized network
> >> packets.
> >>
> >
> > That's right. Static analyzer is only a tool, but in this case, the overflow
> > highlighted by the static analyzer can be used for malicious purposes.
>
> +1
>
> Also I really do hope Infotecs stayed independent from the govs and
> doesn't take part in any dual-purpose/gov-related projects.
>
> Thanks,
> Olek
>
On 10/7/25 15:44, Magnus Karlsson wrote:
> On Tue, 7 Oct 2025 at 14:11, Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
>> Date: Tue, 7 Oct 2025 11:19:19 +0000
>>
>>> On 10/6/25 18:19, Alexander Lobakin wrote:
>>>> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
>>>> Date: Mon, 6 Oct 2025 08:53:17 +0000
>>>>
>>>>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
>>>>
>>>> In theory. Never in practice.
>>>>
>>>
>>> Hi Alexander,
>>> Thank you for the review.
>>>
>>> It seems to me that this problem should be considered not from the point of view of practical use,
>>> but from the point of view of security. An attacker can set any length of the packet in the descriptor
>>> from the user space and descriptor validation will pass.
>>>
>>>
>>>>> option is also set, then the value of the expression
>>>>> 'desc->len + pool->tx_metadata_len' can overflow and validation
>>>>> of the incorrect descriptor will be successfully passed.
>>>>> This can lead to a subsequent chain of arithmetic overflows
>>>>> in the xsk_build_skb() function and incorrect sk_buff allocation.
>>>>>
>>>>> Found by InfoTeCS on behalf of Linux Verification Center
>>>>> (linuxtesting.org) with SVACE.
>>>>
>>>> I think the general rule for sending fixes is that a fix must fix a real
>>>> bug which can be reproduced in real life scenarios.
>>>
>>> I agree with that, so I make a test program (PoC). Something like that:
>>>
>>> struct xdp_umem_reg umem_reg;
>>> umem_reg.addr = (__u64)(void *)umem;
>>> ...
>>> umem_reg.chunk_size = 4096;
>>> umem_reg.tx_metadata_len = 16;
>>> umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
>>> setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
>>> ...
>>>
>>> xsk_ring_prod__reserve(tq, batch_size, &idx);
>>>
>>> for (i = 0; i < nr_packets; ++i) {
>>> struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
>>> tx_desc->addr = packets[i].addr;
>>> tx_desc->addr += umem->tx_metadata_len;
>>> tx_desc->options = XDP_TX_METADATA;
>>> tx_desc->len = UINT32_MAX;
>>> }
>>>
>>> xsk_ring_prod__submit(tq, nr_packets);
>>> ...
>>> sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
>>>
>>> Since the check of an invalid descriptor has passed, kernel try to allocate
>>> a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
>>> and this is where the next overflow occurs.
>>> skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
>>> Next the skb_store_bits() tries to copy -1 bytes, but fails.
>>>
>>> __xsk_generic_xmit
>>> xsk_build_skb
>>> len = desc->len; // from descriptor
>>> sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
>>> sock_alloc_send_pskb
>>> alloc_skb_with_frags
>>> skb_put(skb, len) // len casts to int
>>> skb_store_bits(skb, 0, buffer, len)
>>
>> Oh, so you actually have a repro for this. This is good. I suggest you
>> resubmitting the patch and include this repro in the commit message, so
>> that it will be clear that it's actually possible to trigger the problem
>> in the kernel using a malicious/broken userspace application.
>>
I'll add the repro from this e-mail in the next patch version,
the full source is too long.
>> (also pls remove those double `@@` from the subject next time)
>>
ok
>> I'd also like to hear from Maciej and/or others what they think about
>> this problem (that the userspace can set packet len to U32_MAX). Should
>> we just go with this proposed u64 propagation or maybe we need to limit
>> the maximum length which could be sent from the userspace?
>
> I prefer that we do not set a limit on it and go with the proposed
> solution since I do not know what a future proof size limit would be.
> Somebody could come up with a new virtual device that can send really
> large packets, who knows.
>
The limit is already checked in the xp_aligned_validate_desc() function:
static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
...
if (offset + len > pool->chunk_size)
return false;
...
}
and pool->chunk_size can't be more than PAGE_SIZE:
static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
{
u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
...
if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
/* Strictly speaking we could support this, if:
* - huge pages, or*
* - using an IOMMU, or
* - making sure the memory area is consecutive
* but for now, we simply say "computer says no".
*/
return -EINVAL;
}
...
}
The problem is exactly the overflow.
>> In any case, you raised a good topic.
>>
>>>
>>>> Static Analysis Tools have no idea that nobody sends 4 Gb sized network
>>>> packets.
>>>>
>>>
>>> That's right. Static analyzer is only a tool, but in this case, the overflow
>>> highlighted by the static analyzer can be used for malicious purposes.
>>
>> +1
>>
>> Also I really do hope Infotecs stayed independent from the govs and
>> doesn't take part in any dual-purpose/gov-related projects.
>>
>> Thanks,
>> Olek
>>
On Tue, 7 Oct 2025 at 15:34, Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> wrote:
>
> On 10/7/25 15:44, Magnus Karlsson wrote:
> > On Tue, 7 Oct 2025 at 14:11, Alexander Lobakin
> > <aleksander.lobakin@intel.com> wrote:
> >>
> >> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
> >> Date: Tue, 7 Oct 2025 11:19:19 +0000
> >>
> >>> On 10/6/25 18:19, Alexander Lobakin wrote:
> >>>> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
> >>>> Date: Mon, 6 Oct 2025 08:53:17 +0000
> >>>>
> >>>>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
> >>>>
> >>>> In theory. Never in practice.
> >>>>
> >>>
> >>> Hi Alexander,
> >>> Thank you for the review.
> >>>
> >>> It seems to me that this problem should be considered not from the point of view of practical use,
> >>> but from the point of view of security. An attacker can set any length of the packet in the descriptor
> >>> from the user space and descriptor validation will pass.
> >>>
> >>>
> >>>>> option is also set, then the value of the expression
> >>>>> 'desc->len + pool->tx_metadata_len' can overflow and validation
> >>>>> of the incorrect descriptor will be successfully passed.
> >>>>> This can lead to a subsequent chain of arithmetic overflows
> >>>>> in the xsk_build_skb() function and incorrect sk_buff allocation.
> >>>>>
> >>>>> Found by InfoTeCS on behalf of Linux Verification Center
> >>>>> (linuxtesting.org) with SVACE.
> >>>>
> >>>> I think the general rule for sending fixes is that a fix must fix a real
> >>>> bug which can be reproduced in real life scenarios.
> >>>
> >>> I agree with that, so I make a test program (PoC). Something like that:
> >>>
> >>> struct xdp_umem_reg umem_reg;
> >>> umem_reg.addr = (__u64)(void *)umem;
> >>> ...
> >>> umem_reg.chunk_size = 4096;
> >>> umem_reg.tx_metadata_len = 16;
> >>> umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
> >>> setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
> >>> ...
> >>>
> >>> xsk_ring_prod__reserve(tq, batch_size, &idx);
> >>>
> >>> for (i = 0; i < nr_packets; ++i) {
> >>> struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
> >>> tx_desc->addr = packets[i].addr;
> >>> tx_desc->addr += umem->tx_metadata_len;
> >>> tx_desc->options = XDP_TX_METADATA;
> >>> tx_desc->len = UINT32_MAX;
> >>> }
> >>>
> >>> xsk_ring_prod__submit(tq, nr_packets);
> >>> ...
> >>> sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
> >>>
> >>> Since the check of an invalid descriptor has passed, kernel try to allocate
> >>> a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
> >>> and this is where the next overflow occurs.
> >>> skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
> >>> Next the skb_store_bits() tries to copy -1 bytes, but fails.
> >>>
> >>> __xsk_generic_xmit
> >>> xsk_build_skb
> >>> len = desc->len; // from descriptor
> >>> sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
> >>> sock_alloc_send_pskb
> >>> alloc_skb_with_frags
> >>> skb_put(skb, len) // len casts to int
> >>> skb_store_bits(skb, 0, buffer, len)
> >>
> >> Oh, so you actually have a repro for this. This is good. I suggest you
> >> resubmitting the patch and include this repro in the commit message, so
> >> that it will be clear that it's actually possible to trigger the problem
> >> in the kernel using a malicious/broken userspace application.
> >>
>
> I'll add the repro from this e-mail in the next patch version,
> the full source is too long.
>
> >> (also pls remove those double `@@` from the subject next time)
> >>
>
> ok
>
> >> I'd also like to hear from Maciej and/or others what they think about
> >> this problem (that the userspace can set packet len to U32_MAX). Should
> >> we just go with this proposed u64 propagation or maybe we need to limit
> >> the maximum length which could be sent from the userspace?
> >
> > I prefer that we do not set a limit on it and go with the proposed
> > solution since I do not know what a future proof size limit would be.
> > Somebody could come up with a new virtual device that can send really
> > large packets, who knows.
> >
>
> The limit is already checked in the xp_aligned_validate_desc() function:
What I meant was, let us not introduce a new limit. I like your
proposed solution.
> static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> {
>
> ...
> if (offset + len > pool->chunk_size)
> return false;
> ...
> }
>
> and pool->chunk_size can't be more than PAGE_SIZE:
>
> static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
> {
> u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
> ...
>
> if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
> /* Strictly speaking we could support this, if:
> * - huge pages, or*
> * - using an IOMMU, or
> * - making sure the memory area is consecutive
> * but for now, we simply say "computer says no".
> */
>
> return -EINVAL;
> }
> ...
> }
>
> The problem is exactly the overflow.
>
> >> In any case, you raised a good topic.
> >>
> >>>
> >>>> Static Analysis Tools have no idea that nobody sends 4 Gb sized network
> >>>> packets.
> >>>>
> >>>
> >>> That's right. Static analyzer is only a tool, but in this case, the overflow
> >>> highlighted by the static analyzer can be used for malicious purposes.
> >>
> >> +1
> >>
> >> Also I really do hope Infotecs stayed independent from the govs and
> >> doesn't take part in any dual-purpose/gov-related projects.
> >>
> >> Thanks,
> >> Olek
> >>
>
From: Magnus Karlsson <magnus.karlsson@gmail.com>
Date: Tue, 7 Oct 2025 15:52:52 +0200
> On Tue, 7 Oct 2025 at 15:34, Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> wrote:
>>
>> On 10/7/25 15:44, Magnus Karlsson wrote:
>>> On Tue, 7 Oct 2025 at 14:11, Alexander Lobakin
>>> <aleksander.lobakin@intel.com> wrote:
>>>>
>>>> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
>>>> Date: Tue, 7 Oct 2025 11:19:19 +0000
>>>>
>>>>> On 10/6/25 18:19, Alexander Lobakin wrote:
>>>>>> From: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
>>>>>> Date: Mon, 6 Oct 2025 08:53:17 +0000
>>>>>>
>>>>>>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
>>>>>>
>>>>>> In theory. Never in practice.
>>>>>>
>>>>>
>>>>> Hi Alexander,
>>>>> Thank you for the review.
>>>>>
>>>>> It seems to me that this problem should be considered not from the point of view of practical use,
>>>>> but from the point of view of security. An attacker can set any length of the packet in the descriptor
>>>>> from the user space and descriptor validation will pass.
>>>>>
>>>>>
>>>>>>> option is also set, then the value of the expression
>>>>>>> 'desc->len + pool->tx_metadata_len' can overflow and validation
>>>>>>> of the incorrect descriptor will be successfully passed.
>>>>>>> This can lead to a subsequent chain of arithmetic overflows
>>>>>>> in the xsk_build_skb() function and incorrect sk_buff allocation.
>>>>>>>
>>>>>>> Found by InfoTeCS on behalf of Linux Verification Center
>>>>>>> (linuxtesting.org) with SVACE.
>>>>>>
>>>>>> I think the general rule for sending fixes is that a fix must fix a real
>>>>>> bug which can be reproduced in real life scenarios.
>>>>>
>>>>> I agree with that, so I make a test program (PoC). Something like that:
>>>>>
>>>>> struct xdp_umem_reg umem_reg;
>>>>> umem_reg.addr = (__u64)(void *)umem;
>>>>> ...
>>>>> umem_reg.chunk_size = 4096;
>>>>> umem_reg.tx_metadata_len = 16;
>>>>> umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
>>>>> setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
>>>>> ...
>>>>>
>>>>> xsk_ring_prod__reserve(tq, batch_size, &idx);
>>>>>
>>>>> for (i = 0; i < nr_packets; ++i) {
>>>>> struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
>>>>> tx_desc->addr = packets[i].addr;
>>>>> tx_desc->addr += umem->tx_metadata_len;
>>>>> tx_desc->options = XDP_TX_METADATA;
>>>>> tx_desc->len = UINT32_MAX;
>>>>> }
>>>>>
>>>>> xsk_ring_prod__submit(tq, nr_packets);
>>>>> ...
>>>>> sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
>>>>>
>>>>> Since the check of an invalid descriptor has passed, kernel try to allocate
>>>>> a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
>>>>> and this is where the next overflow occurs.
>>>>> skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
>>>>> Next the skb_store_bits() tries to copy -1 bytes, but fails.
>>>>>
>>>>> __xsk_generic_xmit
>>>>> xsk_build_skb
>>>>> len = desc->len; // from descriptor
>>>>> sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
>>>>> sock_alloc_send_pskb
>>>>> alloc_skb_with_frags
>>>>> skb_put(skb, len) // len casts to int
>>>>> skb_store_bits(skb, 0, buffer, len)
>>>>
>>>> Oh, so you actually have a repro for this. This is good. I suggest you
>>>> resubmitting the patch and include this repro in the commit message, so
>>>> that it will be clear that it's actually possible to trigger the problem
>>>> in the kernel using a malicious/broken userspace application.
>>>>
>>
>> I'll add the repro from this e-mail in the next patch version,
>> the full source is too long.
>>
>>>> (also pls remove those double `@@` from the subject next time)
>>>>
>>
>> ok
>>
>>>> I'd also like to hear from Maciej and/or others what they think about
>>>> this problem (that the userspace can set packet len to U32_MAX). Should
>>>> we just go with this proposed u64 propagation or maybe we need to limit
>>>> the maximum length which could be sent from the userspace?
>>>
>>> I prefer that we do not set a limit on it and go with the proposed
>>> solution since I do not know what a future proof size limit would be.
>>> Somebody could come up with a new virtual device that can send really
>>> large packets, who knows.
>>>
>>
>> The limit is already checked in the xp_aligned_validate_desc() function:
>
> What I meant was, let us not introduce a new limit. I like your
> proposed solution.
[to Ilia]
The netdev and XSk maintainers prefer to fix this themselves after a
quick discussion.
Let us take over this topic. Thanks for finding this.
Olek
© 2016 - 2026 Red Hat, Inc.