[PATCH] wifi: ath12k: fix ring-buffer corruption

Johan Hovold posted 1 patch 9 months ago
drivers/net/wireless/ath/ath12k/ce.c  | 11 +++++------
drivers/net/wireless/ath/ath12k/hal.c |  4 ++--
2 files changed, 7 insertions(+), 8 deletions(-)
[PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Johan Hovold 9 months ago
Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
breaks and the log fills up with errors like:

    ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
    ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484

which based on a quick look at the ath11k driver seemed to indicate some
kind of ring-buffer corruption.

Miaoqing Pan tracked it down to the host seeing the updated destination
ring head pointer before the updated descriptor, and the error handling
for that in turn leaves the ring buffer in an inconsistent state.

While this has not yet been observed with ath12k, the ring-buffer
implementation is very similar to the ath11k one and it suffers from the
same bugs.

Add the missing memory barrier to make sure that the descriptor is read
after the head pointer to address the root cause of the corruption while
fixing up the error handling in case there are ever any (ordering) bugs
on the device side.

Note that the READ_ONCE() are only needed to avoid compiler mischief in
case the ring-buffer helpers are ever inlined.

Tested-on: WCN7850 hw2.0 WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3

Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Cc: stable@vger.kernel.org	# 6.3
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218623
Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com
Cc: Miaoqing Pan <quic_miaoqing@quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
 drivers/net/wireless/ath/ath12k/ce.c  | 11 +++++------
 drivers/net/wireless/ath/ath12k/hal.c |  4 ++--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/ath12k/ce.c b/drivers/net/wireless/ath/ath12k/ce.c
index be0d669d31fc..740586fe49d1 100644
--- a/drivers/net/wireless/ath/ath12k/ce.c
+++ b/drivers/net/wireless/ath/ath12k/ce.c
@@ -343,11 +343,10 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe,
 		goto err;
 	}
 
+	/* Make sure descriptor is read after the head pointer. */
+	dma_rmb();
+
 	*nbytes = ath12k_hal_ce_dst_status_get_length(desc);
-	if (*nbytes == 0) {
-		ret = -EIO;
-		goto err;
-	}
 
 	*skb = pipe->dest_ring->skb[sw_index];
 	pipe->dest_ring->skb[sw_index] = NULL;
@@ -380,8 +379,8 @@ static void ath12k_ce_recv_process_cb(struct ath12k_ce_pipe *pipe)
 		dma_unmap_single(ab->dev, ATH12K_SKB_RXCB(skb)->paddr,
 				 max_nbytes, DMA_FROM_DEVICE);
 
-		if (unlikely(max_nbytes < nbytes)) {
-			ath12k_warn(ab, "rxed more than expected (nbytes %d, max %d)",
+		if (unlikely(max_nbytes < nbytes || nbytes == 0)) {
+			ath12k_warn(ab, "unexpected rx length (nbytes %d, max %d)",
 				    nbytes, max_nbytes);
 			dev_kfree_skb_any(skb);
 			continue;
diff --git a/drivers/net/wireless/ath/ath12k/hal.c b/drivers/net/wireless/ath/ath12k/hal.c
index cd59ff8e6c7b..91d5126ca149 100644
--- a/drivers/net/wireless/ath/ath12k/hal.c
+++ b/drivers/net/wireless/ath/ath12k/hal.c
@@ -1962,7 +1962,7 @@ u32 ath12k_hal_ce_dst_status_get_length(struct hal_ce_srng_dst_status_desc *desc
 {
 	u32 len;
 
-	len = le32_get_bits(desc->flags, HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
+	len = le32_get_bits(READ_ONCE(desc->flags), HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
 	desc->flags &= ~cpu_to_le32(HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
 
 	return len;
@@ -2132,7 +2132,7 @@ void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng)
 		srng->u.src_ring.cached_tp =
 			*(volatile u32 *)srng->u.src_ring.tp_addr;
 	else
-		srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
+		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
 }
 
 /* Update cached ring head/tail pointers to HW. ath12k_hal_srng_access_begin()
-- 
2.48.1
Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Remi Pommarel 6 months, 4 weeks ago
Hi Johan,

On Fri, Mar 21, 2025 at 10:52:19AM +0100, Johan Hovold wrote:
> Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
> breaks and the log fills up with errors like:
> 
>     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
>     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> 
> which based on a quick look at the ath11k driver seemed to indicate some
> kind of ring-buffer corruption.
> 
> Miaoqing Pan tracked it down to the host seeing the updated destination
> ring head pointer before the updated descriptor, and the error handling
> for that in turn leaves the ring buffer in an inconsistent state.
> 
> While this has not yet been observed with ath12k, the ring-buffer
> implementation is very similar to the ath11k one and it suffers from the
> same bugs.

Thanks for the fix. We have actually seen reports that could be related
to this issue with ath12k. I know that this series has already been
applied yet I do have a couple of question on how you fixed that if you
don't mind. That would be much appreciated and would help me understand
if mentionned reports are actually linked to this.

> 
> Add the missing memory barrier to make sure that the descriptor is read
> after the head pointer to address the root cause of the corruption while
> fixing up the error handling in case there are ever any (ordering) bugs
> on the device side.

Just as a personal note, driver doing that kind of ring buffer
communication seems to generally use MMIO to store the ring indices,
readl() providing sufficient synchronization mechanism to avoid that
kind of issue.

> 
> Note that the READ_ONCE() are only needed to avoid compiler mischief in
> case the ring-buffer helpers are ever inlined.
> 
> Tested-on: WCN7850 hw2.0 WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
> 
> Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
> Cc: stable@vger.kernel.org	# 6.3
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218623
> Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com
> Cc: Miaoqing Pan <quic_miaoqing@quicinc.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/net/wireless/ath/ath12k/ce.c  | 11 +++++------
>  drivers/net/wireless/ath/ath12k/hal.c |  4 ++--
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath12k/ce.c b/drivers/net/wireless/ath/ath12k/ce.c
> index be0d669d31fc..740586fe49d1 100644
> --- a/drivers/net/wireless/ath/ath12k/ce.c
> +++ b/drivers/net/wireless/ath/ath12k/ce.c
> @@ -343,11 +343,10 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe,
>  		goto err;
>  	}
>  
> +	/* Make sure descriptor is read after the head pointer. */
> +	dma_rmb();
> +

That does not seem to be the only place descriptor is read just after
the head pointer, ath12k_dp_rx_process{,err,reo_status,wbm_err} seem to
also suffer the same sickness.

Why not move the dma_rmb() in ath12k_hal_srng_access_begin() as below,
that would look to me as a good place to do it.

@@ -2133,6 +2133,9 @@ void ath12k_hal_srng_access_begin(struct
ath12k_base *ab, struct hal_srng *srng)
                        *(volatile u32 *)srng->u.src_ring.tp_addr;
        else
                srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
+
+       /* Make sure descriptors are read after the head pointer. */
+       dma_rmb();
 }

 /* Update cached ring head/tail pointers to HW.
 * ath12k_hal_srng_access_begin()

This should ensure the issue does not happen anywhere not just for
ath12k_ce_recv_process_cb().

Note that ath12k_hal_srng_dst_get_next_entry() does not need a barrier
as it uses cached_hp from ath12k_hal_srng_access_begin().

>  	*nbytes = ath12k_hal_ce_dst_status_get_length(desc);
> -	if (*nbytes == 0) {
> -		ret = -EIO;
> -		goto err;
> -	}
>  
>  	*skb = pipe->dest_ring->skb[sw_index];
>  	pipe->dest_ring->skb[sw_index] = NULL;
> @@ -380,8 +379,8 @@ static void ath12k_ce_recv_process_cb(struct ath12k_ce_pipe *pipe)
>  		dma_unmap_single(ab->dev, ATH12K_SKB_RXCB(skb)->paddr,
>  				 max_nbytes, DMA_FROM_DEVICE);
>  
> -		if (unlikely(max_nbytes < nbytes)) {
> -			ath12k_warn(ab, "rxed more than expected (nbytes %d, max %d)",
> +		if (unlikely(max_nbytes < nbytes || nbytes == 0)) {
> +			ath12k_warn(ab, "unexpected rx length (nbytes %d, max %d)",
>  				    nbytes, max_nbytes);
>  			dev_kfree_skb_any(skb);
>  			continue;
> diff --git a/drivers/net/wireless/ath/ath12k/hal.c b/drivers/net/wireless/ath/ath12k/hal.c
> index cd59ff8e6c7b..91d5126ca149 100644
> --- a/drivers/net/wireless/ath/ath12k/hal.c
> +++ b/drivers/net/wireless/ath/ath12k/hal.c
> @@ -1962,7 +1962,7 @@ u32 ath12k_hal_ce_dst_status_get_length(struct hal_ce_srng_dst_status_desc *desc
>  {
>  	u32 len;
>  
> -	len = le32_get_bits(desc->flags, HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> +	len = le32_get_bits(READ_ONCE(desc->flags), HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
>  	desc->flags &= ~cpu_to_le32(HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
>  
>  	return len;
> @@ -2132,7 +2132,7 @@ void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng)
>  		srng->u.src_ring.cached_tp =
>  			*(volatile u32 *)srng->u.src_ring.tp_addr;
>  	else
> -		srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> +		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);

dma_rmb() acting also as a compiler barrier why the need for both
READ_ONCE() ?

>  }
>  
>  /* Update cached ring head/tail pointers to HW. ath12k_hal_srng_access_begin()
> -- 
> 2.48.1

Regards,

-- 
Remi
Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Johan Hovold 6 months, 3 weeks ago
On Thu, May 22, 2025 at 05:11:21PM +0200, Remi Pommarel wrote:
> On Fri, Mar 21, 2025 at 10:52:19AM +0100, Johan Hovold wrote:
> > Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
> > breaks and the log fills up with errors like:
> > 
> >     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
> >     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> > 
> > which based on a quick look at the ath11k driver seemed to indicate some
> > kind of ring-buffer corruption.
> > 
> > Miaoqing Pan tracked it down to the host seeing the updated destination
> > ring head pointer before the updated descriptor, and the error handling
> > for that in turn leaves the ring buffer in an inconsistent state.
> > 
> > While this has not yet been observed with ath12k, the ring-buffer
> > implementation is very similar to the ath11k one and it suffers from the
> > same bugs.

> > Note that the READ_ONCE() are only needed to avoid compiler mischief in
> > case the ring-buffer helpers are ever inlined.

> > @@ -343,11 +343,10 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe,
> >  		goto err;
> >  	}
> >  
> > +	/* Make sure descriptor is read after the head pointer. */
> > +	dma_rmb();
> > +
> 
> That does not seem to be the only place descriptor is read just after
> the head pointer, ath12k_dp_rx_process{,err,reo_status,wbm_err} seem to
> also suffer the same sickness.

Indeed, I only started with the corruption issues that users were
reporting (with ath11k) and was gonna follow up with further fixes once
the initial ones were merged (and when I could find more time).

> Why not move the dma_rmb() in ath12k_hal_srng_access_begin() as below,
> that would look to me as a good place to do it.
> 
> @@ -2133,6 +2133,9 @@ void ath12k_hal_srng_access_begin(struct
> ath12k_base *ab, struct hal_srng *srng)
>                         *(volatile u32 *)srng->u.src_ring.tp_addr;
>         else
>                 srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> +
> +       /* Make sure descriptors are read after the head pointer. */
> +       dma_rmb();
>  }
> 
> This should ensure the issue does not happen anywhere not just for
> ath12k_ce_recv_process_cb().

We only need the read barrier for dest rings so the barrier would go in
the else branch, but I prefer keeping it in the caller so that it is
more obvious when it is needed and so that we can skip the barrier when
the ring is empty (e.g. as done above).

I've gone through and reviewed the remaining call sites now and will
send a follow-on fix for them.

> Note that ath12k_hal_srng_dst_get_next_entry() does not need a barrier
> as it uses cached_hp from ath12k_hal_srng_access_begin().

Yeah, it's only needed before accessing the descriptor fields.

> > @@ -1962,7 +1962,7 @@ u32 ath12k_hal_ce_dst_status_get_length(struct hal_ce_srng_dst_status_desc *desc
> >  {
> >  	u32 len;
> >  
> > -	len = le32_get_bits(desc->flags, HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > +	len = le32_get_bits(READ_ONCE(desc->flags), HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> >  	desc->flags &= ~cpu_to_le32(HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> >  
> >  	return len;
> > @@ -2132,7 +2132,7 @@ void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng)
> >  		srng->u.src_ring.cached_tp =
> >  			*(volatile u32 *)srng->u.src_ring.tp_addr;
> >  	else
> > -		srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> > +		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
> 
> dma_rmb() acting also as a compiler barrier why the need for both
> READ_ONCE() ?

Yeah, I was being overly cautious here and it should be fine with plain
accesses when reading the descriptor after the barrier, but the memory
model seems to require READ_ONCE() when fetching the head pointer.
Currently, hp_addr is marked as volatile so READ_ONCE() could be
dropped for that reason, but I'd rather keep it here explicitly (e.g. in
case someone decides to drop the volatile).

Johan
Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Remi Pommarel 6 months, 3 weeks ago
On Mon, May 26, 2025 at 01:35:02PM +0200, Johan Hovold wrote:
> On Thu, May 22, 2025 at 05:11:21PM +0200, Remi Pommarel wrote:
> > On Fri, Mar 21, 2025 at 10:52:19AM +0100, Johan Hovold wrote:
> > > Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
> > > breaks and the log fills up with errors like:
> > > 
> > >     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
> > >     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> > > 
> > > which based on a quick look at the ath11k driver seemed to indicate some
> > > kind of ring-buffer corruption.
> > > 
> > > Miaoqing Pan tracked it down to the host seeing the updated destination
> > > ring head pointer before the updated descriptor, and the error handling
> > > for that in turn leaves the ring buffer in an inconsistent state.
> > > 
> > > While this has not yet been observed with ath12k, the ring-buffer
> > > implementation is very similar to the ath11k one and it suffers from the
> > > same bugs.
> 
> > > Note that the READ_ONCE() are only needed to avoid compiler mischief in
> > > case the ring-buffer helpers are ever inlined.
> 
> > > @@ -343,11 +343,10 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe,
> > >  		goto err;
> > >  	}
> > >  
> > > +	/* Make sure descriptor is read after the head pointer. */
> > > +	dma_rmb();
> > > +
> > 
> > That does not seem to be the only place descriptor is read just after
> > the head pointer, ath12k_dp_rx_process{,err,reo_status,wbm_err} seem to
> > also suffer the same sickness.
> 
> Indeed, I only started with the corruption issues that users were
> reporting (with ath11k) and was gonna follow up with further fixes once
> the initial ones were merged (and when I could find more time).
> 
> > Why not move the dma_rmb() in ath12k_hal_srng_access_begin() as below,
> > that would look to me as a good place to do it.
> > 
> > @@ -2133,6 +2133,9 @@ void ath12k_hal_srng_access_begin(struct
> > ath12k_base *ab, struct hal_srng *srng)
> >                         *(volatile u32 *)srng->u.src_ring.tp_addr;
> >         else
> >                 srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> > +
> > +       /* Make sure descriptors are read after the head pointer. */
> > +       dma_rmb();
> >  }
> > 
> > This should ensure the issue does not happen anywhere not just for
> > ath12k_ce_recv_process_cb().
> 
> We only need the read barrier for dest rings so the barrier would go in
> the else branch, but I prefer keeping it in the caller so that it is
> more obvious when it is needed and so that we can skip the barrier when
> the ring is empty (e.g. as done above).

Thanks for taking time to clarify this.

Yes I messed up doing the patch by hand sorry, internally I test with
the dma_rmb() in the else part. I tend to prefer having it in
ath12k_hal_srng_access_begin() as caller does not have to take care of
the barrier itself. Which for me seems a little bit risky if further
refactoring (or adding other ring processing) is done in the future;
the barrier could easily be forgotten don't you think ?

> 
> I've gone through and reviewed the remaining call sites now and will
> send a follow-on fix for them.
> 
> > Note that ath12k_hal_srng_dst_get_next_entry() does not need a barrier
> > as it uses cached_hp from ath12k_hal_srng_access_begin().
> 
> Yeah, it's only needed before accessing the descriptor fields.
> 
> > > @@ -1962,7 +1962,7 @@ u32 ath12k_hal_ce_dst_status_get_length(struct hal_ce_srng_dst_status_desc *desc
> > >  {
> > >  	u32 len;
> > >  
> > > -	len = le32_get_bits(desc->flags, HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > > +	len = le32_get_bits(READ_ONCE(desc->flags), HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > >  	desc->flags &= ~cpu_to_le32(HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > >  
> > >  	return len;
> > > @@ -2132,7 +2132,7 @@ void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng)
> > >  		srng->u.src_ring.cached_tp =
> > >  			*(volatile u32 *)srng->u.src_ring.tp_addr;
> > >  	else
> > > -		srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> > > +		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
> > 
> > dma_rmb() acting also as a compiler barrier why the need for both
> > READ_ONCE() ?
> 
> Yeah, I was being overly cautious here and it should be fine with plain
> accesses when reading the descriptor after the barrier, but the memory
> model seems to require READ_ONCE() when fetching the head pointer.
> Currently, hp_addr is marked as volatile so READ_ONCE() could be
> dropped for that reason, but I'd rather keep it here explicitly (e.g. in
> case someone decides to drop the volatile).

Yes actually after more thinking, the READ_ONCE for fetching hp does make
sense and is also in the patch I am currently testing.

Also for source rings don't we need a dma_wmb()/WRITE_ONCE before
modifying the tail pointer (see ath12k_hal_srng_access_end()) for quite
the same reason (updates of the descriptor have to be visible before
write to tail pointer) ?

Thanks

-- 
Remi
Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Johan Hovold 6 months, 3 weeks ago
On Mon, May 26, 2025 at 02:58:51PM +0200, Remi Pommarel wrote:
> On Mon, May 26, 2025 at 01:35:02PM +0200, Johan Hovold wrote:
> > On Thu, May 22, 2025 at 05:11:21PM +0200, Remi Pommarel wrote:

> > > Why not move the dma_rmb() in ath12k_hal_srng_access_begin() as below,
> > > that would look to me as a good place to do it.

> > We only need the read barrier for dest rings so the barrier would go in
> > the else branch, but I prefer keeping it in the caller so that it is
> > more obvious when it is needed and so that we can skip the barrier when
> > the ring is empty (e.g. as done above).
> 
> Thanks for taking time to clarify this.
> 
> Yes I messed up doing the patch by hand sorry, internally I test with
> the dma_rmb() in the else part. I tend to prefer having it in
> ath12k_hal_srng_access_begin() as caller does not have to take care of
> the barrier itself. Which for me seems a little bit risky if further
> refactoring (or adding other ring processing) is done in the future;
> the barrier could easily be forgotten don't you think ?

Yeah, that would be the argument for putting in the helper. Big hammer
vs adding it where needed after reviewing the code.

There actually is a new ring being added for 6.16-rc1 I noticed after I
posted the latest series. That would require a follow-up fix with the
barrier-in-caller approach.

> > > dma_rmb() acting also as a compiler barrier why the need for both
> > > READ_ONCE() ?
> > 
> > Yeah, I was being overly cautious here and it should be fine with plain
> > accesses when reading the descriptor after the barrier, but the memory
> > model seems to require READ_ONCE() when fetching the head pointer.
> > Currently, hp_addr is marked as volatile so READ_ONCE() could be
> > dropped for that reason, but I'd rather keep it here explicitly (e.g. in
> > case someone decides to drop the volatile).
> 
> Yes actually after more thinking, the READ_ONCE for fetching hp does make
> sense and is also in the patch I am currently testing.
> 
> Also for source rings don't we need a dma_wmb()/WRITE_ONCE before
> modifying the tail pointer (see ath12k_hal_srng_access_end()) for quite
> the same reason (updates of the descriptor have to be visible before
> write to tail pointer) ?

Yep, the source rings need explicit barriers for the LMAC case, but
there are further issues here too.

And that may also suggest adding the barriers in the start/end helpers
for consistency (i.e. use the big hammer).

I'll try to find some more time to fix the remaining bits next week.

Johan
Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Jeff Johnson 7 months ago
On Fri, 21 Mar 2025 10:52:19 +0100, Johan Hovold wrote:
> Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
> breaks and the log fills up with errors like:
> 
>     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
>     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> 
> which based on a quick look at the ath11k driver seemed to indicate some
> kind of ring-buffer corruption.
> 
> [...]

Applied, thanks!

[1/1] wifi: ath12k: fix ring-buffer corruption
      commit: 6b67d2cf14ea997061f61e9c8afd4e1c0f22acb9

Best regards,
-- 
Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Posted by Miaoqing Pan 7 months, 2 weeks ago

On 3/21/2025 5:52 PM, Johan Hovold wrote:
> Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
> breaks and the log fills up with errors like:
> 
>      ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
>      ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> 
> which based on a quick look at the ath11k driver seemed to indicate some
> kind of ring-buffer corruption.
> 
> Miaoqing Pan tracked it down to the host seeing the updated destination
> ring head pointer before the updated descriptor, and the error handling
> for that in turn leaves the ring buffer in an inconsistent state.
> 
> While this has not yet been observed with ath12k, the ring-buffer
> implementation is very similar to the ath11k one and it suffers from the
> same bugs.
> 
> Add the missing memory barrier to make sure that the descriptor is read
> after the head pointer to address the root cause of the corruption while
> fixing up the error handling in case there are ever any (ordering) bugs
> on the device side.
> 
> Note that the READ_ONCE() are only needed to avoid compiler mischief in
> case the ring-buffer helpers are ever inlined.
> 
> Tested-on: WCN7850 hw2.0 WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
> 
> Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
> Cc: stable@vger.kernel.org	# 6.3
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218623
> Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com
> Cc: Miaoqing Pan <quic_miaoqing@quicinc.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>   drivers/net/wireless/ath/ath12k/ce.c  | 11 +++++------
>   drivers/net/wireless/ath/ath12k/hal.c |  4 ++--
>   2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath12k/ce.c b/drivers/net/wireless/ath/ath12k/ce.c
> index be0d669d31fc..740586fe49d1 100644
> --- a/drivers/net/wireless/ath/ath12k/ce.c
> +++ b/drivers/net/wireless/ath/ath12k/ce.c
> @@ -343,11 +343,10 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe,
>   		goto err;
>   	}
>   
> +	/* Make sure descriptor is read after the head pointer. */
> +	dma_rmb();
> +
>   	*nbytes = ath12k_hal_ce_dst_status_get_length(desc);
> -	if (*nbytes == 0) {
> -		ret = -EIO;
> -		goto err;
> -	}
>   
>   	*skb = pipe->dest_ring->skb[sw_index];
>   	pipe->dest_ring->skb[sw_index] = NULL;
> @@ -380,8 +379,8 @@ static void ath12k_ce_recv_process_cb(struct ath12k_ce_pipe *pipe)
>   		dma_unmap_single(ab->dev, ATH12K_SKB_RXCB(skb)->paddr,
>   				 max_nbytes, DMA_FROM_DEVICE);
>   
> -		if (unlikely(max_nbytes < nbytes)) {
> -			ath12k_warn(ab, "rxed more than expected (nbytes %d, max %d)",
> +		if (unlikely(max_nbytes < nbytes || nbytes == 0)) {
> +			ath12k_warn(ab, "unexpected rx length (nbytes %d, max %d)",
>   				    nbytes, max_nbytes);
>   			dev_kfree_skb_any(skb);
>   			continue;
> diff --git a/drivers/net/wireless/ath/ath12k/hal.c b/drivers/net/wireless/ath/ath12k/hal.c
> index cd59ff8e6c7b..91d5126ca149 100644
> --- a/drivers/net/wireless/ath/ath12k/hal.c
> +++ b/drivers/net/wireless/ath/ath12k/hal.c
> @@ -1962,7 +1962,7 @@ u32 ath12k_hal_ce_dst_status_get_length(struct hal_ce_srng_dst_status_desc *desc
>   {
>   	u32 len;
>   
> -	len = le32_get_bits(desc->flags, HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> +	len = le32_get_bits(READ_ONCE(desc->flags), HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
>   	desc->flags &= ~cpu_to_le32(HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
>   
>   	return len;
> @@ -2132,7 +2132,7 @@ void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng)
>   		srng->u.src_ring.cached_tp =
>   			*(volatile u32 *)srng->u.src_ring.tp_addr;
>   	else
> -		srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> +		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
>   }
>   
>   /* Update cached ring head/tail pointers to HW. ath12k_hal_srng_access_begin()

Reviewed-by: Miaoqing Pan <quic_miaoqing@quicinc.com>