The CPU receives frames from the MAC through conventional DMA: the CPU
allocates buffers for the MAC, then the MAC fills them and returns
ownership to the CPU. For each hardware RX queue, the CPU and MAC
coordinate through a shared ring array of DMA descriptors: one
descriptor per DMA buffer. Each descriptor includes the buffer's
physical address and a status flag ("OWN") indicating which side owns
the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
the flag and the MAC is only allowed to clear it, and both must move
through the ring in sequence: thus the ring is used for both
"submissions" and "completions."
In the stmmac driver, stmmac_rx() bookmarks its position in the ring
with the `cur_rx` index. The main receive loop in that function checks
for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
network stack (NULLing the pointer), and increments `cur_rx` modulo the
ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
position with `dirty_rx`, allocates fresh buffers and rearms the
descriptors (setting OWN=1). If it fails any allocation, it simply stops
early (leaving OWN=0) and will retry where it left off when next called.
This means descriptors have a three-stage lifecycle (terms my own):
- `empty` (OWN=1, buffer valid)
- `full` (OWN=0, buffer valid and populated)
- `dirty` (OWN=0, buffer NULL)
But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
the past (see 'Fixes:'), there was a bug where the loop could cycle
`cur_rx` all the way back to the first descriptor it dirtied, resulting
in a NULL dereference when mistaken for `full`. The aforementioned
commit resolved that *specific* failure by capping the loop's iteration
limit at `dma_rx_size - 1`, but this is only a partial fix: if the
previous stmmac_rx_refill() didn't complete, then there are leftover
`dirty` descriptors that the loop might encounter without needing to
cycle fully around. The current code therefore panics (see 'Closes:')
when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
catch up to `dirty_rx`.
Fix this by further tightening the clamp from `dma_rx_size - 1` to
`dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty
entries and limiting the loop so that `cur_rx` cannot catch back up to
`dirty_rx`. This carries no risk of arithmetic underflow: since the
maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`,
the worst the clamp can do is prevent the loop from running at all.
Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010
Cc: stable@vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 6827c99bde8c..f98b070073c0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5609,7 +5609,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
dma_dir = page_pool_get_dma_dir(rx_q->page_pool);
bufsz = DIV_ROUND_UP(priv->dma_conf.dma_buf_sz, PAGE_SIZE) * PAGE_SIZE;
- limit = min(priv->dma_conf.dma_rx_size - 1, (unsigned int)limit);
+ limit = min(priv->dma_conf.dma_rx_size - stmmac_rx_dirty(priv, queue) - 1,
+ (unsigned int)limit);
if (netif_msg_rx_status(priv)) {
void *rx_head;
--
2.52.0
On Thu, Mar 19, 2026 at 11:40:30AM -0700, Sam Edwards wrote:
> The CPU receives frames from the MAC through conventional DMA: the CPU
> allocates buffers for the MAC, then the MAC fills them and returns
> ownership to the CPU. For each hardware RX queue, the CPU and MAC
> coordinate through a shared ring array of DMA descriptors: one
> descriptor per DMA buffer. Each descriptor includes the buffer's
> physical address and a status flag ("OWN") indicating which side owns
> the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
> the flag and the MAC is only allowed to clear it, and both must move
> through the ring in sequence: thus the ring is used for both
> "submissions" and "completions."
>
> In the stmmac driver, stmmac_rx() bookmarks its position in the ring
> with the `cur_rx` index. The main receive loop in that function checks
> for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
> network stack (NULLing the pointer), and increments `cur_rx` modulo the
> ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
> position with `dirty_rx`, allocates fresh buffers and rearms the
> descriptors (setting OWN=1). If it fails any allocation, it simply stops
> early (leaving OWN=0) and will retry where it left off when next called.
>
> This means descriptors have a three-stage lifecycle (terms my own):
> - `empty` (OWN=1, buffer valid)
> - `full` (OWN=0, buffer valid and populated)
> - `dirty` (OWN=0, buffer NULL)
>
> But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
> the past (see 'Fixes:'), there was a bug where the loop could cycle
> `cur_rx` all the way back to the first descriptor it dirtied, resulting
> in a NULL dereference when mistaken for `full`. The aforementioned
> commit resolved that *specific* failure by capping the loop's iteration
> limit at `dma_rx_size - 1`, but this is only a partial fix: if the
> previous stmmac_rx_refill() didn't complete, then there are leftover
> `dirty` descriptors that the loop might encounter without needing to
> cycle fully around. The current code therefore panics (see 'Closes:')
> when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
> catch up to `dirty_rx`.
>
> Fix this by further tightening the clamp from `dma_rx_size - 1` to
> `dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty
> entries and limiting the loop so that `cur_rx` cannot catch back up to
> `dirty_rx`. This carries no risk of arithmetic underflow: since the
> maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`,
> the worst the clamp can do is prevent the loop from running at all.
Isn't the limit equivalent to:
space = CIRC_SPACE(rx_q->cur_rx, rx_q->dirty_rx,
priv->dma_conf.dma_rx_size);
limit = min_t(unsigned int, space, limit);
(Think of the "full" and "empty" cases as "space" which can be
consumed to provide entries for the refiller to action.)
I think the same applies for patch 2 - when the above returns zero
it means we have no entries in the ring that aren't due for refill.
Have you checked whether stmmac_rx_zc() also buggy in this respect?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Mar 19, 2026 at 2:59 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: Hi Russell, > Isn't the limit equivalent to: > > space = CIRC_SPACE(rx_q->cur_rx, rx_q->dirty_rx, > priv->dma_conf.dma_rx_size); > limit = min_t(unsigned int, space, limit); > I had to think about the edge cases for a few minutes -- yes, you're right. However... > (Think of the "full" and "empty" cases as "space" which can be > consumed to provide entries for the refiller to action.) ...this seems like a double negative: We're looking at the ring from the refiller's perspective (i.e. considering "dirty" the meaningful state), and *then* looking at the "unused space" (i.e. "not dirty"). If I used it, I'd at least add a comment explaining that CIRC_SPACE is, counterintuitively, not measuring "space." After the bugs are fixed, I'll follow up with a net-next patch [1] that removes the `min` altogether and makes the loop check for 'dirty' directly, so my focus here is "minimal, obviously correct" and not necessarily "tidiest." Would it be acceptable if I simplify stmmac_rx_dirty() to use CIRC_CNT() then, instead of using CIRC_SPACE() now? > I think the same applies for patch 2 - when the above returns zero > it means we have no entries in the ring that aren't due for refill. I'm not completely satisfied with the `stmmac_rx_dirty() == dma_rx_size - 1` expression either. One might argue we should keep trying until dirty==0, because tolerating *any* dirtiness risks not having enough buffers for the next traffic burst, causing avoidable rx drops. But that's a discussion for patch 2. :) > Have you checked whether stmmac_rx_zc() also buggy in this respect? I skimmed it. I saw that it checks the return status of stmmac_rx_refill_zc() and on failure, returns the NAPI budget to keep polling -- which told me that the ZC author(s) at least considered these problems. Looking at it more thoroughly now, there are a few code smells around the unconditional `dirty = 0;`, and `dirty` being used as the "budget" for stmmac_rx_refill_zc() (why have a budget at all then?) so it looks like something's there, but it's a separate bug if so. Best, Sam [1] https://lore.kernel.org/netdev/20260316021009.262358-4-CFSworks@gmail.com/
© 2016 - 2026 Red Hat, Inc.