[PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set

Mark Bloch posted 11 patches 1 month, 1 week ago
[PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Mark Bloch 1 month, 1 week ago
From: Alexei Lazar <alazar@nvidia.com>

Xon/Xoff sizes are derived from calculations that include
the port speed.
These settings need to be updated and applied whenever the
port speed is changed.
The port speed is typically set after the physical link goes down
and is negotiated as part of the link-up process between the two
connected interfaces.
Xon/Xoff parameters being updated at the point where the new
negotiated speed is established.

Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 15eded36b872..e680673ffb72 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -139,6 +139,8 @@ void mlx5e_update_carrier(struct mlx5e_priv *priv)
 	if (up) {
 		netdev_info(priv->netdev, "Link up\n");
 		netif_carrier_on(priv->netdev);
+		mlx5e_port_manual_buffer_config(priv, 0, priv->netdev->mtu,
+						NULL, NULL, NULL);
 	} else {
 		netdev_info(priv->netdev, "Link down\n");
 		netif_carrier_off(priv->netdev);
-- 
2.34.1
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Jakub Kicinski 3 weeks, 1 day ago
On Mon, 25 Aug 2025 17:34:33 +0300 Mark Bloch wrote:
> Xon/Xoff sizes are derived from calculations that include
> the port speed.
> These settings need to be updated and applied whenever the
> port speed is changed.
> The port speed is typically set after the physical link goes down
> and is negotiated as part of the link-up process between the two
> connected interfaces.
> Xon/Xoff parameters being updated at the point where the new
> negotiated speed is established.

Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
older FW versions, too). Looks like the host is not receiving any
mcast (ping within a subnet doesn't work because the host receives
no ndisc), and most traffic slows down to a trickle.
Lost of rx_prio0_buf_discard increments.

Please TAL ASAP, this change went to LTS last week.
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Jakub Kicinski 3 weeks, 1 day ago
On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:
> On Mon, 25 Aug 2025 17:34:33 +0300 Mark Bloch wrote:
> > Xon/Xoff sizes are derived from calculations that include
> > the port speed.
> > These settings need to be updated and applied whenever the
> > port speed is changed.
> > The port speed is typically set after the physical link goes down
> > and is negotiated as part of the link-up process between the two
> > connected interfaces.
> > Xon/Xoff parameters being updated at the point where the new
> > negotiated speed is established.  
> 
> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
> older FW versions, too). Looks like the host is not receiving any
> mcast (ping within a subnet doesn't work because the host receives
> no ndisc), and most traffic slows down to a trickle.
> Lost of rx_prio0_buf_discard increments.
> 
> Please TAL ASAP, this change went to LTS last week.

Any news on this? I heard that it also breaks DCB/QoS configuration
on 6.12.45 LTS.
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Mark Bloch 3 weeks, 1 day ago

On 11/09/2025 16:47, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:
>> On Mon, 25 Aug 2025 17:34:33 +0300 Mark Bloch wrote:
>>> Xon/Xoff sizes are derived from calculations that include
>>> the port speed.
>>> These settings need to be updated and applied whenever the
>>> port speed is changed.
>>> The port speed is typically set after the physical link goes down
>>> and is negotiated as part of the link-up process between the two
>>> connected interfaces.
>>> Xon/Xoff parameters being updated at the point where the new
>>> negotiated speed is established.  
>>
>> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
>> older FW versions, too). Looks like the host is not receiving any
>> mcast (ping within a subnet doesn't work because the host receives
>> no ndisc), and most traffic slows down to a trickle.
>> Lost of rx_prio0_buf_discard increments.
>>
>> Please TAL ASAP, this change went to LTS last week.
> 
> Any news on this? I heard that it also breaks DCB/QoS configuration
> on 6.12.45 LTS.

Hi Jakub,

We are looking into this, once we have anything I'll update.
Just to make sure, reverting this is one commit solves the
issue you are seeing?

Mark
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Jakub Kicinski 3 weeks, 1 day ago
On Thu, 11 Sep 2025 17:25:22 +0300 Mark Bloch wrote:
> On 11/09/2025 16:47, Jakub Kicinski wrote:
> > On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:  
> >> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
> >> older FW versions, too). Looks like the host is not receiving any
> >> mcast (ping within a subnet doesn't work because the host receives
> >> no ndisc), and most traffic slows down to a trickle.
> >> Lost of rx_prio0_buf_discard increments.
> >>
> >> Please TAL ASAP, this change went to LTS last week.  
> > 
> > Any news on this? I heard that it also breaks DCB/QoS configuration
> > on 6.12.45 LTS.  
> 
> We are looking into this, once we have anything I'll update.
> Just to make sure, reverting this is one commit solves the
> issue you are seeing?

It did for me, but Daniel (who is working on the PSP series)
mentioned that he had reverted all three to get net-next working:

  net/mlx5e: Set local Xoff after FW update
  net/mlx5e: Update and set Xon/Xoff upon port speed set
  net/mlx5e: Update and set Xon/Xoff upon MTU set
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Tariq Toukan 2 weeks, 4 days ago

On 11/09/2025 17:36, Jakub Kicinski wrote:
> On Thu, 11 Sep 2025 17:25:22 +0300 Mark Bloch wrote:
>> On 11/09/2025 16:47, Jakub Kicinski wrote:
>>> On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:
>>>> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
>>>> older FW versions, too). Looks like the host is not receiving any
>>>> mcast (ping within a subnet doesn't work because the host receives
>>>> no ndisc), and most traffic slows down to a trickle.
>>>> Lost of rx_prio0_buf_discard increments.
>>>>
>>>> Please TAL ASAP, this change went to LTS last week.
>>>
>>> Any news on this? I heard that it also breaks DCB/QoS configuration
>>> on 6.12.45 LTS.
>>
>> We are looking into this, once we have anything I'll update.
>> Just to make sure, reverting this is one commit solves the
>> issue you are seeing?
> 
> It did for me, but Daniel (who is working on the PSP series)
> mentioned that he had reverted all three to get net-next working:
> 
>    net/mlx5e: Set local Xoff after FW update
>    net/mlx5e: Update and set Xon/Xoff upon port speed set
>    net/mlx5e: Update and set Xon/Xoff upon MTU set
> 

Hi Jakub,

Thanks for reporting.
We're investigating and will update soon.

Regards,
Tariq
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Tariq Toukan 2 weeks, 2 days ago

On 15/09/2025 10:38, Tariq Toukan wrote:
> 
> 
> On 11/09/2025 17:36, Jakub Kicinski wrote:
>> On Thu, 11 Sep 2025 17:25:22 +0300 Mark Bloch wrote:
>>> On 11/09/2025 16:47, Jakub Kicinski wrote:
>>>> On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:
>>>>> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
>>>>> older FW versions, too). Looks like the host is not receiving any
>>>>> mcast (ping within a subnet doesn't work because the host receives
>>>>> no ndisc), and most traffic slows down to a trickle.
>>>>> Lost of rx_prio0_buf_discard increments.
>>>>>
>>>>> Please TAL ASAP, this change went to LTS last week.
>>>>
>>>> Any news on this? I heard that it also breaks DCB/QoS configuration
>>>> on 6.12.45 LTS.
>>>
>>> We are looking into this, once we have anything I'll update.
>>> Just to make sure, reverting this is one commit solves the
>>> issue you are seeing?
>>
>> It did for me, but Daniel (who is working on the PSP series)
>> mentioned that he had reverted all three to get net-next working:
>>
>>    net/mlx5e: Set local Xoff after FW update
>>    net/mlx5e: Update and set Xon/Xoff upon port speed set
>>    net/mlx5e: Update and set Xon/Xoff upon MTU set
>>
> 
> Hi Jakub,
> 
> Thanks for reporting.
> We're investigating and will update soon.
> 
> Regards,
> Tariq
> 

Hi,

We prefer reverting the single patch [1] for now. We'll submit a fixed 
version later.

Regarding the other two patches [2], initial testing showed no issues.
Can you/Daniel share more info? What issues you see, and the repro steps.

Thanks,
Tariq

[1]
net/mlx5e: Update and set Xon/Xoff upon port speed set

[2]
net/mlx5e: Set local Xoff after FW update
net/mlx5e: Update and set Xon/Xoff upon MTU set

Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Daniel Zahka 2 weeks, 2 days ago

On 9/17/25 6:39 AM, Tariq Toukan wrote:
>
>
> On 15/09/2025 10:38, Tariq Toukan wrote:
>>
>>
>> On 11/09/2025 17:36, Jakub Kicinski wrote:
>>> On Thu, 11 Sep 2025 17:25:22 +0300 Mark Bloch wrote:
>>>> On 11/09/2025 16:47, Jakub Kicinski wrote:
>>>>> On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:
>>>>>> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
>>>>>> older FW versions, too). Looks like the host is not receiving any
>>>>>> mcast (ping within a subnet doesn't work because the host receives
>>>>>> no ndisc), and most traffic slows down to a trickle.
>>>>>> Lost of rx_prio0_buf_discard increments.
>>>>>>
>>>>>> Please TAL ASAP, this change went to LTS last week.
>>>>>
>>>>> Any news on this? I heard that it also breaks DCB/QoS configuration
>>>>> on 6.12.45 LTS.
>>>>
>>>> We are looking into this, once we have anything I'll update.
>>>> Just to make sure, reverting this is one commit solves the
>>>> issue you are seeing?
>>>
>>> It did for me, but Daniel (who is working on the PSP series)
>>> mentioned that he had reverted all three to get net-next working:
>>>
>>>    net/mlx5e: Set local Xoff after FW update
>>>    net/mlx5e: Update and set Xon/Xoff upon port speed set
>>>    net/mlx5e: Update and set Xon/Xoff upon MTU set
>>>
>>
>> Hi Jakub,
>>
>> Thanks for reporting.
>> We're investigating and will update soon.
>>
>> Regards,
>> Tariq
>>
>
> Hi,
>
> We prefer reverting the single patch [1] for now. We'll submit a fixed 
> version later.
>
> Regarding the other two patches [2], initial testing showed no issues.
> Can you/Daniel share more info? What issues you see, and the repro steps.
>
> Thanks,
> Tariq
>
> [1]
> net/mlx5e: Update and set Xon/Xoff upon port speed set
>
> [2]
> net/mlx5e: Set local Xoff after FW update
> net/mlx5e: Update and set Xon/Xoff upon MTU set
>

Hello Tariq,

My notes for the situation were that I was running a vanilla net-next 
kernel on a dual host, CX7 system, with the 28.45.1300 FW at commit:

deb105f49879 net: phy: marvell: Fix 88e1510 downshift counter errata

and I was having the issues that Jakub described. No ping working in a 
subnet. Extremely slow bandwidth on a large transfer. My notes say that 
reverting just [1] (from your message) did not fix the problem, but then 
reverting [2] and [3] restored normal behavior.

However, I did attempt to reproduce again on the same system this 
morning, and now I'm seeing that reverting just [1] is sufficient to fix 
the issues.
Re: [PATCH net V2 10/11] net/mlx5e: Update and set Xon/Xoff upon port speed set
Posted by Tariq Toukan 2 weeks, 2 days ago

On 17/09/2025 16:00, Daniel Zahka wrote:
> 
> 
> On 9/17/25 6:39 AM, Tariq Toukan wrote:
>>
>>
>> On 15/09/2025 10:38, Tariq Toukan wrote:
>>>
>>>
>>> On 11/09/2025 17:36, Jakub Kicinski wrote:
>>>> On Thu, 11 Sep 2025 17:25:22 +0300 Mark Bloch wrote:
>>>>> On 11/09/2025 16:47, Jakub Kicinski wrote:
>>>>>> On Wed, 10 Sep 2025 17:00:11 -0700 Jakub Kicinski wrote:
>>>>>>> Hi, this is breaking dual host CX7 w/ 28.45.1300 (but I think most
>>>>>>> older FW versions, too). Looks like the host is not receiving any
>>>>>>> mcast (ping within a subnet doesn't work because the host receives
>>>>>>> no ndisc), and most traffic slows down to a trickle.
>>>>>>> Lost of rx_prio0_buf_discard increments.
>>>>>>>
>>>>>>> Please TAL ASAP, this change went to LTS last week.
>>>>>>
>>>>>> Any news on this? I heard that it also breaks DCB/QoS configuration
>>>>>> on 6.12.45 LTS.
>>>>>
>>>>> We are looking into this, once we have anything I'll update.
>>>>> Just to make sure, reverting this is one commit solves the
>>>>> issue you are seeing?
>>>>
>>>> It did for me, but Daniel (who is working on the PSP series)
>>>> mentioned that he had reverted all three to get net-next working:
>>>>
>>>>    net/mlx5e: Set local Xoff after FW update
>>>>    net/mlx5e: Update and set Xon/Xoff upon port speed set
>>>>    net/mlx5e: Update and set Xon/Xoff upon MTU set
>>>>
>>>
>>> Hi Jakub,
>>>
>>> Thanks for reporting.
>>> We're investigating and will update soon.
>>>
>>> Regards,
>>> Tariq
>>>
>>
>> Hi,
>>
>> We prefer reverting the single patch [1] for now. We'll submit a fixed 
>> version later.
>>
>> Regarding the other two patches [2], initial testing showed no issues.
>> Can you/Daniel share more info? What issues you see, and the repro steps.
>>
>> Thanks,
>> Tariq
>>
>> [1]
>> net/mlx5e: Update and set Xon/Xoff upon port speed set
>>
>> [2]
>> net/mlx5e: Set local Xoff after FW update
>> net/mlx5e: Update and set Xon/Xoff upon MTU set
>>
> 
> Hello Tariq,
> 
> My notes for the situation were that I was running a vanilla net-next 
> kernel on a dual host, CX7 system, with the 28.45.1300 FW at commit:
> 
> deb105f49879 net: phy: marvell: Fix 88e1510 downshift counter errata
> 
> and I was having the issues that Jakub described. No ping working in a 
> subnet. Extremely slow bandwidth on a large transfer. My notes say that 
> reverting just [1] (from your message) did not fix the problem, but then 
> reverting [2] and [3] restored normal behavior.
> 
> However, I did attempt to reproduce again on the same system this 
> morning, and now I'm seeing that reverting just [1] is sufficient to fix 
> the issues.

I see.
For now, I'll submit a revert only for [1].

Let us know of any related issue you still hit after the revert.

Thanks.