[PATCH net v2 1/4] octeon_ep: fix race conditions in ndo_get_stats64

Shinas Rasheed posted 4 patches 1 year ago
There is a newer version of this series
[PATCH net v2 1/4] octeon_ep: fix race conditions in ndo_get_stats64
Posted by Shinas Rasheed 1 year ago
ndo_get_stats64() can race with ndo_stop(), which frees input and
output queue resources. Call synchronize_net() to avoid such races.

Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V2:
  - Changed sync mechanism to fix race conditions from using an atomic
    set_bit ops to a much simpler synchronize_net()

V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/

 drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 549436efc204..941bbaaa67b5 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
 {
 	struct octep_device *oct = netdev_priv(netdev);
 
+	synchronize_net();
 	netdev_info(netdev, "Stopping the device ...\n");
 
 	octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,
-- 
2.25.1
Re: [PATCH net v2 1/4] octeon_ep: fix race conditions in ndo_get_stats64
Posted by Larysa Zaremba 1 year ago
On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> ndo_get_stats64() can race with ndo_stop(), which frees input and
> output queue resources. Call synchronize_net() to avoid such races.
> 
> Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> ---
> V2:
>   - Changed sync mechanism to fix race conditions from using an atomic
>     set_bit ops to a much simpler synchronize_net()
> 
> V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/
> 
>  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> index 549436efc204..941bbaaa67b5 100644
> --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
>  {
>  	struct octep_device *oct = netdev_priv(netdev);
>  
> +	synchronize_net();

You should have elaborated on the fact that this synchronize_net() is for 
__LINK_STATE_START flag in the commit message, this is not obvious. Also, is 
octep_get_stats64() called from RCU-safe context?

>  	netdev_info(netdev, "Stopping the device ...\n");
>  
>  	octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,
> -- 
> 2.25.1
> 
>
Re: [PATCH net v2 1/4] octeon_ep: fix race conditions in ndo_get_stats64
Posted by Larysa Zaremba 1 year ago
On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > output queue resources. Call synchronize_net() to avoid such races.
> > 
> > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > ---
> > V2:
> >   - Changed sync mechanism to fix race conditions from using an atomic
> >     set_bit ops to a much simpler synchronize_net()
> > 
> > V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/
> > 
> >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > index 549436efc204..941bbaaa67b5 100644
> > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
> >  {
> >  	struct octep_device *oct = netdev_priv(netdev);
> >  
> > +	synchronize_net();
> 
> You should have elaborated on the fact that this synchronize_net() is for 
> __LINK_STATE_START flag in the commit message, this is not obvious. Also, is 
> octep_get_stats64() called from RCU-safe context?
>

Now I see that in case !netif_running(), you do not bail out of 
octep_get_stats64() fully (or at all after the second patch). So, could you 
explain, how are you utilizing RCU here?

> >  	netdev_info(netdev, "Stopping the device ...\n");
> >  
> >  	octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,
> > -- 
> > 2.25.1
> > 
> > 
>