[v1] net: atlantic: fix fragment overflow handling in RX path

[PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by jiefeng.z.zhang@gmail.com 2 months, 3 weeks ago

From: Jiefeng Zhang <jiefeng.z.zhang@gmail.com>

The atlantic driver can receive packets with more than MAX_SKB_FRAGS (17)
fragments when handling large multi-descriptor packets. This causes an
out-of-bounds write in skb_add_rx_frag_netmem() leading to kernel panic.

The issue occurs because the driver doesn't check the total number of
fragments before calling skb_add_rx_frag(). When a packet requires more
than MAX_SKB_FRAGS fragments, the fragment index exceeds the array bounds.

Add a check in __aq_ring_rx_clean() to ensure the total number of fragments
(including the initial header fragment and subsequent descriptor fragments)
does not exceed MAX_SKB_FRAGS. If it does, drop the packet gracefully
and increment the error counter.

Signed-off-by: Jiefeng Zhang <jiefeng.z.zhang@gmail.com>
---
 .../net/ethernet/aquantia/atlantic/aq_ring.c  | 26 ++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index f21de0c21e52..51e0c6cc71d7 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -538,6 +538,7 @@ static int __aq_ring_rx_clean(struct aq_ring_s *self, struct napi_struct *napi,
 		bool is_ptp_ring = aq_ptp_ring(self->aq_nic, self);
 		struct aq_ring_buff_s *buff_ = NULL;
 		struct sk_buff *skb = NULL;
+		unsigned int frag_cnt = 0U;
 		unsigned int next_ = 0U;
 		unsigned int i = 0U;
 		u16 hdr_len;
@@ -546,7 +547,6 @@ static int __aq_ring_rx_clean(struct aq_ring_s *self, struct napi_struct *napi,
 			continue;
 
 		if (!buff->is_eop) {
-			unsigned int frag_cnt = 0U;
 			buff_ = buff;
 			do {
 				bool is_rsc_completed = true;
@@ -628,6 +628,30 @@ static int __aq_ring_rx_clean(struct aq_ring_s *self, struct napi_struct *napi,
 						  aq_buf_vaddr(&buff->rxdata),
 						  AQ_CFG_RX_HDR_SIZE);
 
+		/* Check if total fragments exceed MAX_SKB_FRAGS limit.
+		 * The total fragment count consists of:
+		 * - One fragment from the first buffer if (buff->len > hdr_len)
+		 * - frag_cnt fragments from subsequent descriptors
+		 * If the total exceeds MAX_SKB_FRAGS (17), we must drop the
+		 * packet to prevent an out-of-bounds write in skb_add_rx_frag().
+		 */
+		if (unlikely(((buff->len - hdr_len) > 0 ? 1 : 0) + frag_cnt > MAX_SKB_FRAGS)) {
+			/* Drop packet: fragment count exceeds kernel limit */
+			if (!buff->is_eop) {
+				buff_ = buff;
+				do {
+					next_ = buff_->next;
+					buff_ = &self->buff_ring[next_];
+					buff_->is_cleaned = 1;
+				} while (!buff_->is_eop);
+			}
+			u64_stats_update_begin(&self->stats.rx.syncp);
+			++self->stats.rx.errors;
+			u64_stats_update_end(&self->stats.rx.syncp);
+			dev_kfree_skb_any(skb);
+			continue;
+		}
+
 		memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),
 		       ALIGN(hdr_len, sizeof(long)));
 
-- 
2.39.5

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jakub Kicinski 2 months, 3 weeks ago

On Tue, 18 Nov 2025 15:04:02 +0800 jiefeng.z.zhang@gmail.com wrote:
> From: Jiefeng Zhang <jiefeng.z.zhang@gmail.com>
> 
> The atlantic driver can receive packets with more than MAX_SKB_FRAGS (17)
> fragments when handling large multi-descriptor packets. This causes an
> out-of-bounds write in skb_add_rx_frag_netmem() leading to kernel panic.
> 
> The issue occurs because the driver doesn't check the total number of
> fragments before calling skb_add_rx_frag(). When a packet requires more
> than MAX_SKB_FRAGS fragments, the fragment index exceeds the array bounds.
> 
> Add a check in __aq_ring_rx_clean() to ensure the total number of fragments
> (including the initial header fragment and subsequent descriptor fragments)
> does not exceed MAX_SKB_FRAGS. If it does, drop the packet gracefully
> and increment the error counter.

First off, some basic Linux mailing list savoir vivre:
 - please don't top post
 - please don't resubmit your code within 24h of previous posting
 - please wait for a discussion to close before you send another version

Quoting your response:

https://lore.kernel.org/all/CADEc0q6iLdpwYsyGAwH4qzST8G7asjdqgR6+ymXMy1k0wRwhNQ@mail.gmail.com/

> I have used git send-email to send my code.
> 
> As for the patch --The aquantia/atlantic driver supports a maximum of
> AQ_CFG_SKB_FRAGS_MAX (32U) fragments, while the kernel limits the
> maximum number of fragments to MAX_SKB_FRAGS (17).

Frag count limits in drivers are usually for Tx not Rx.
Again, why do you think this driver can generate more frags than 17?
-- 
pw-bot: cr

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jiefeng 2 months, 2 weeks ago

On Wed, Nov 19, 2025 at 4:24 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Nov 2025 15:04:02 +0800 jiefeng.z.zhang@gmail.com wrote:
> > From: Jiefeng Zhang <jiefeng.z.zhang@gmail.com>
> >
> > The atlantic driver can receive packets with more than MAX_SKB_FRAGS (17)
> > fragments when handling large multi-descriptor packets. This causes an
> > out-of-bounds write in skb_add_rx_frag_netmem() leading to kernel panic.
> >
> > The issue occurs because the driver doesn't check the total number of
> > fragments before calling skb_add_rx_frag(). When a packet requires more
> > than MAX_SKB_FRAGS fragments, the fragment index exceeds the array bounds.
> >
> > Add a check in __aq_ring_rx_clean() to ensure the total number of fragments
> > (including the initial header fragment and subsequent descriptor fragments)
> > does not exceed MAX_SKB_FRAGS. If it does, drop the packet gracefully
> > and increment the error counter.
>
> First off, some basic Linux mailing list savoir vivre:
>  - please don't top post
>  - please don't resubmit your code within 24h of previous posting
>  - please wait for a discussion to close before you send another version
>
> Quoting your response:
>
> https://lore.kernel.org/all/CADEc0q6iLdpwYsyGAwH4qzST8G7asjdqgR6+ymXMy1k0wRwhNQ@mail.gmail.com/
>
> > I have used git send-email to send my code.
> >
> > As for the patch --The aquantia/atlantic driver supports a maximum of
> > AQ_CFG_SKB_FRAGS_MAX (32U) fragments, while the kernel limits the
> > maximum number of fragments to MAX_SKB_FRAGS (17).
>
> Frag count limits in drivers are usually for Tx not Rx.
> Again, why do you think this driver can generate more frags than 17?
> --
> pw-bot: cr

Thank you for the feedback. You're right that fragment limits are
usually for TX, but this is a special case in the RX path.

Also, I apologize for resubmitting the patch within 24 hours. The
initial submission had formatting issues, so I resubmitted it using
git send-email to ensure proper formatting. I understand the mailing
list etiquette and will wait for discussion before sending another
version in the future.

The atlantic hardware supports multi-descriptor packet reception
(RSC). When a large packet arrives, the hardware splits it across
multiple descriptors, where each descriptor can hold up to 2048 bytes
(AQ_CFG_RX_FRAME_MAX).

There is a logic bug in the code. The code already counts fragments in
the multi-descriptor loop
(frag_cnt at aq_ring.c:559), but the check at aq_ring.c:568 only considers
frag_cnt, not the additional fragment from the first buffer
(if buff->len > hdr_len). The actual fragment addition happens later
(aq_ring.c:634-667):
- One fragment from the first buffer (if hdr_len < buff->len)
- Plus frag_cnt fragments from subsequent descriptors

This can exceed MAX_SKB_FRAGS (17) in edge cases:
- If frag_cnt = 17 (the check passes)
- And the first buffer has a fragment (buff->len > hdr_len)
- Then total = 1 + 17 = 18 > MAX_SKB_FRAGS

While the hardware MTU limit is 16334 bytes (B0/ATL2), which should
only need ~8 fragments, there are edge cases like LRO aggregation
or hardware anomalies that could produce more descriptors.

The panic occurred because skb_add_rx_frag() was called with an index
>= MAX_SKB_FRAGS, causing an out-of-bounds write. The fix ensures
we check the total fragment count (first buffer fragment + frag_cnt)
before calling skb_add_rx_frag().

And I have encountered this crash in production with an
Aquantia(AQtion AQC113) 10G NIC[Antigua 10G]:

```
<4>[175432.612171] RIP: 0010:skb_add_rx_frag_netmem+0x29/0xd0
<4>[175432.612193] Code: 90 f3 0f 1e fa 0f 1f 44 00 00 48 89 f8 41 89
ca 48 89 d7 48 63 ce 8b 90 c0 00 00 00 48 c1 e1 04 48 01 ca 48 03 90
c8 00 00 00 <48> 89 7a 30 44 89 52 3c 44 89 42 38 40 f6 c7 01 75 74 48
89 fa 83
<4>[175432.612212] RSP: 0018:ffffa9bec02a8d50 EFLAGS: 00010287
<4>[175432.612223] RAX: ffff925b22e80a00 RBX: ffff925ad38d2700 RCX:
fffffffe0a0c8000
<4>[175432.612234] RDX: ffff9258ea95bac0 RSI: ffff925ae0a0c800 RDI:
0000000000037a40
<4>[175432.612244] RBP: 0000000000000024 R08: 0000000000000000 R09:
0000000000000021
<4>[175432.612254] R10: 0000000000000848 R11: 0000000000000000 R12:
ffffa9bec02a8e24
<4>[175432.612264] R13: ffff925ad8615570 R14: 0000000000000000 R15:
ffff925b22e80a00
<4>[175432.612274] FS: 0000000000000000(0000)
GS:ffff925e47880000(0000) knlGS:0000000000000000
<4>[175432.612287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[175432.612296] CR2: ffff9258ea95baf0 CR3: 0000000166022004 CR4:
0000000000f72ef0
<4>[175432.612307] PKRU: 55555554
<4>[175432.612314] Call Trace:
<4>[175432.612323] <IRQ>
<4>[175432.612334] aq_ring_rx_clean+0x175/0xe60 [atlantic]
<4>[175432.612398] ? aq_ring_rx_clean+0x14d/0xe60 [atlantic]
<4>[175432.612455] ? aq_ring_tx_clean+0xdf/0x190 [atlantic]
<4>[175432.612508] ? kmem_cache_free+0x348/0x450
<4>[175432.612525] ? aq_vec_poll+0x81/0x1d0 [atlantic]
<4>[175432.612575] ? __napi_poll+0x28/0x1c0
<4>[175432.612587] ? net_rx_action+0x337/0x420
```

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jakub Kicinski 2 months, 2 weeks ago

On Wed, 19 Nov 2025 16:38:13 +0800 Jiefeng wrote:
> And I have encountered this crash in production with an
> Aquantia(AQtion AQC113) 10G NIC[Antigua 10G]:

Ah you're actually seeing a crash! Thanks a lot for the additional info,
I thought this is something you found with static code analysis!
Please include the stack trace and more info in the commit message,
makes it easier for others encountering the crash to compare.
(Drop the timestamps from the crash lines, tho, it's not important)

> The atlantic hardware supports multi-descriptor packet reception
> (RSC). When a large packet arrives, the hardware splits it across
> multiple descriptors, where each descriptor can hold up to 2048 bytes
> (AQ_CFG_RX_FRAME_MAX).
> 
> There is a logic bug in the code. The code already counts fragments in
> the multi-descriptor loop
> (frag_cnt at aq_ring.c:559), but the check at aq_ring.c:568 only considers
> frag_cnt, not the additional fragment from the first buffer
> (if buff->len > hdr_len). The actual fragment addition happens later
> (aq_ring.c:634-667):
> - One fragment from the first buffer (if hdr_len < buff->len)
> - Plus frag_cnt fragments from subsequent descriptors
> 
> This can exceed MAX_SKB_FRAGS (17) in edge cases:
> - If frag_cnt = 17 (the check passes)
> - And the first buffer has a fragment (buff->len > hdr_len)
> - Then total = 1 + 17 = 18 > MAX_SKB_FRAGS

Got it, would it make more sense to fix the existing check?
(assume there will be an extra frag if buff->len > AQ_CFG_RX_HDR_SIZE)

Or fix adding the zeroth frag? (if frag_cnt == max do not extract the
zeroth frag). Extracting the zeroth frag is just to make SW GRO/skb
freeing slightly faster, it's not necessary for correctness.

> While the hardware MTU limit is 16334 bytes (B0/ATL2), which should
> only need ~8 fragments, there are edge cases like LRO aggregation
> or hardware anomalies that could produce more descriptors.
> 
> The panic occurred because skb_add_rx_frag() was called with an index
> >= MAX_SKB_FRAGS, causing an out-of-bounds write. The fix ensures  
> we check the total fragment count (first buffer fragment + frag_cnt)
> before calling skb_add_rx_frag().

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jiefeng 2 months, 2 weeks ago

On Thu, Nov 20, 2025 at 11:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 19 Nov 2025 16:38:13 +0800 Jiefeng wrote:
> > And I have encountered this crash in production with an
> > Aquantia(AQtion AQC113) 10G NIC[Antigua 10G]:
>
> Ah you're actually seeing a crash! Thanks a lot for the additional info,
> I thought this is something you found with static code analysis!
> Please include the stack trace and more info in the commit message,
> makes it easier for others encountering the crash to compare.
> (Drop the timestamps from the crash lines, tho, it's not important)
>

Thank you for the feedback! I've updated the patch to v2 based on your
suggestion to skip extracting the zeroth fragment when frag_cnt ==
MAX_SKB_FRAGS.
This approach is simpler and aligns with your comment that extracting the
zeroth fragment is just a performance optimization, not necessary for
correctness.

I've also included the stack trace from production (without timestamps) in
the commit message:

The fix adds a check to skip extracting the zeroth fragment when
frag_cnt == MAX_SKB_FRAGS, preventing the fragment overflow.

Please review the v2 patch.

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jiefeng 2 months, 2 weeks ago

On Thu, Nov 20, 2025 at 9:06 PM Jiefeng <jiefeng.z.zhang@gmail.com> wrote:
>
> On Thu, Nov 20, 2025 at 11:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 19 Nov 2025 16:38:13 +0800 Jiefeng wrote:
> > > And I have encountered this crash in production with an
> > > Aquantia(AQtion AQC113) 10G NIC[Antigua 10G]:
> >
> > Ah you're actually seeing a crash! Thanks a lot for the additional info,
> > I thought this is something you found with static code analysis!
> > Please include the stack trace and more info in the commit message,
> > makes it easier for others encountering the crash to compare.
> > (Drop the timestamps from the crash lines, tho, it's not important)
> >
>
> Thank you for the feedback! I've updated the patch to v2 based on your
> suggestion to skip extracting the zeroth fragment when frag_cnt ==
> MAX_SKB_FRAGS.
> This approach is simpler and aligns with your comment that extracting the
> zeroth fragment is just a performance optimization, not necessary for
> correctness.
>
> I've also included the stack trace from production (without timestamps) in
> the commit message:
>
> The fix adds a check to skip extracting the zeroth fragment when
> frag_cnt == MAX_SKB_FRAGS, preventing the fragment overflow.
>
> Please review the v2 patch.

Hi, I've reconsidered the two approaches and I
think fixing the existing check (assuming there will be an extra frag if
buff->len > AQ_CFG_RX_HDR_SIZE) makes more sense. This approach:

1. Prevents the overflow earlier in the code path
2. Ensures data completeness (all fragments are accounted for)
3. Avoids potential data loss from skipping the zeroth fragment

If you agree, I'll submit a v3 patch based on this approach. The fix
will modify the existing check to include the potential zeroth
fragment in the fragment count calculation.

Please let me know if this approach is acceptable.

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jakub Kicinski 2 months, 2 weeks ago

On Sat, 22 Nov 2025 09:36:29 +0800 Jiefeng wrote:
> > Thank you for the feedback! I've updated the patch to v2 based on your
> > suggestion to skip extracting the zeroth fragment when frag_cnt ==
> > MAX_SKB_FRAGS.
> > This approach is simpler and aligns with your comment that extracting the
> > zeroth fragment is just a performance optimization, not necessary for
> > correctness.
> >
> > I've also included the stack trace from production (without timestamps) in
> > the commit message:
> >
> > The fix adds a check to skip extracting the zeroth fragment when
> > frag_cnt == MAX_SKB_FRAGS, preventing the fragment overflow.
> >
> > Please review the v2 patch.  
> 
> Hi, I've reconsidered the two approaches and I
> think fixing the existing check (assuming there will be an extra frag if
> buff->len > AQ_CFG_RX_HDR_SIZE) makes more sense. This approach:
> 
> 1. Prevents the overflow earlier in the code path
> 2. Ensures data completeness (all fragments are accounted for)
> 3. Avoids potential data loss from skipping the zeroth fragment
> 
> If you agree, I'll submit a v3 patch based on this approach. The fix
> will modify the existing check to include the potential zeroth
> fragment in the fragment count calculation.
> 
> Please let me know if this approach is acceptable.

Right, v2 is not correct. You'd need to calculate hdr_len earlier,
already taking into account whether there is space for the zeroth
frag. And if not - you can just allocate napi_alloc_skb() with enough
space, and copy the full buf. This would avoid the data loss.

Re: [PATCH net] net: atlantic: fix fragment overflow handling in RX path

Posted by Jiefeng 2 months, 2 weeks ago

On Sat, Nov 22, 2025 at 10:01 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 22 Nov 2025 09:36:29 +0800 Jiefeng wrote:
> > > Thank you for the feedback! I've updated the patch to v2 based on your
> > > suggestion to skip extracting the zeroth fragment when frag_cnt ==
> > > MAX_SKB_FRAGS.
> > > This approach is simpler and aligns with your comment that extracting the
> > > zeroth fragment is just a performance optimization, not necessary for
> > > correctness.
> > >
> > > I've also included the stack trace from production (without timestamps) in
> > > the commit message:
> > >
> > > The fix adds a check to skip extracting the zeroth fragment when
> > > frag_cnt == MAX_SKB_FRAGS, preventing the fragment overflow.
> > >
> > > Please review the v2 patch.
> >
> > Hi, I've reconsidered the two approaches and I
> > think fixing the existing check (assuming there will be an extra frag if
> > buff->len > AQ_CFG_RX_HDR_SIZE) makes more sense. This approach:
> >
> > 1. Prevents the overflow earlier in the code path
> > 2. Ensures data completeness (all fragments are accounted for)
> > 3. Avoids potential data loss from skipping the zeroth fragment
> >
> > If you agree, I'll submit a v3 patch based on this approach. The fix
> > will modify the existing check to include the potential zeroth
> > fragment in the fragment count calculation.
> >
> > Please let me know if this approach is acceptable.
>
> Right, v2 is not correct. You'd need to calculate hdr_len earlier,
> already taking into account whether there is space for the zeroth
> frag. And if not - you can just allocate napi_alloc_skb() with enough
> space, and copy the full buf. This would avoid the data loss.

Thank you for your feedback! Based on your first suggestion to "fix the
existing check (assume there will be an extra frag if buff->len >
AQ_CFG_RX_HDR_SIZE)", I've implemented the changes and submitted the v3
version of the patch.

The v3 patch is ready for review. Please let me know if you have any
suggestions for improvement.

Thank you!