[PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.

fuyuanli posted 1 patch 1 month, 1 week ago
drivers/ntb/ntb_transport.c | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.
Posted by fuyuanli 1 month, 1 week ago
Since the CPU selected by schedule_work is uncertain, multiple link_event
callbacks may be executed at same time. For example, after peer's link is
up, it is down quickly before local link_work completed. If link_cleanup
is added to the workqueue of another CPU, then link_work and link_cleanup
may be executed at the same time. So add a mutex to prevent them from being
executed concurrently.

Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
---
v2:
1) use guard() instead of lock & unlock functions.

v1:
Link: https://lore.kernel.org/all/aKiBi4ZDlbgzed%2Fz@didi-ThinkCentre-M930t-N000/
---
 drivers/ntb/ntb_transport.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 4f775c3e218f..eb875e3db2e3 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -59,6 +59,7 @@
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/uaccess.h>
+#include <linux/mutex.h>
 #include "linux/ntb.h"
 #include "linux/ntb_transport.h"
 
@@ -241,6 +242,9 @@ struct ntb_transport_ctx {
 	struct work_struct link_cleanup;
 
 	struct dentry *debugfs_node_dir;
+
+	/* Make sure workq of link event be executed serially */
+	struct mutex link_event_lock;
 };
 
 enum {
@@ -1024,6 +1028,7 @@ static void ntb_transport_link_cleanup_work(struct work_struct *work)
 	struct ntb_transport_ctx *nt =
 		container_of(work, struct ntb_transport_ctx, link_cleanup);
 
+	guard(mutex)(&nt->link_event_lock);
 	ntb_transport_link_cleanup(nt);
 }
 
@@ -1047,6 +1052,8 @@ static void ntb_transport_link_work(struct work_struct *work)
 	u32 val;
 	int rc = 0, i, spad;
 
+	guard(mutex)(&nt->link_event_lock);
+
 	/* send the local info, in the opposite order of the way we read it */
 
 	if (nt->use_msi) {
-- 
2.34.1
Re: [PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.
Posted by yuanli fu 3 days, 13 hours ago
Hi Jon

just a gentle ping on patch, Is there anything else needed from my side?
Thank you!

Best regards,
Yuanli Fu

fuyuanli <fuyuanli0722@gmail.com> 于2025年8月25日周一 17:15写道:
>
> Since the CPU selected by schedule_work is uncertain, multiple link_event
> callbacks may be executed at same time. For example, after peer's link is
> up, it is down quickly before local link_work completed. If link_cleanup
> is added to the workqueue of another CPU, then link_work and link_cleanup
> may be executed at the same time. So add a mutex to prevent them from being
> executed concurrently.
>
> Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
> ---
> v2:
> 1) use guard() instead of lock & unlock functions.
>
> v1:
> Link: https://lore.kernel.org/all/aKiBi4ZDlbgzed%2Fz@didi-ThinkCentre-M930t-N000/
> ---
>  drivers/ntb/ntb_transport.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index 4f775c3e218f..eb875e3db2e3 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -59,6 +59,7 @@
>  #include <linux/slab.h>
>  #include <linux/types.h>
>  #include <linux/uaccess.h>
> +#include <linux/mutex.h>
>  #include "linux/ntb.h"
>  #include "linux/ntb_transport.h"
>
> @@ -241,6 +242,9 @@ struct ntb_transport_ctx {
>         struct work_struct link_cleanup;
>
>         struct dentry *debugfs_node_dir;
> +
> +       /* Make sure workq of link event be executed serially */
> +       struct mutex link_event_lock;
>  };
>
>  enum {
> @@ -1024,6 +1028,7 @@ static void ntb_transport_link_cleanup_work(struct work_struct *work)
>         struct ntb_transport_ctx *nt =
>                 container_of(work, struct ntb_transport_ctx, link_cleanup);
>
> +       guard(mutex)(&nt->link_event_lock);
>         ntb_transport_link_cleanup(nt);
>  }
>
> @@ -1047,6 +1052,8 @@ static void ntb_transport_link_work(struct work_struct *work)
>         u32 val;
>         int rc = 0, i, spad;
>
> +       guard(mutex)(&nt->link_event_lock);
> +
>         /* send the local info, in the opposite order of the way we read it */
>
>         if (nt->use_msi) {
> --
> 2.34.1
>
Re: [PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.
Posted by Logan Gunthorpe 1 month, 1 week ago

On 2025-08-25 03:15, fuyuanli wrote:
> Since the CPU selected by schedule_work is uncertain, multiple link_event
> callbacks may be executed at same time. For example, after peer's link is
> up, it is down quickly before local link_work completed. If link_cleanup
> is added to the workqueue of another CPU, then link_work and link_cleanup
> may be executed at the same time. So add a mutex to prevent them from being
> executed concurrently.
> 
> Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>

Looks good to me, thanks

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Re: [PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.
Posted by Dave Jiang 1 month, 1 week ago

On 8/25/25 2:15 AM, fuyuanli wrote:
> Since the CPU selected by schedule_work is uncertain, multiple link_event
> callbacks may be executed at same time. For example, after peer's link is
> up, it is down quickly before local link_work completed. If link_cleanup
> is added to the workqueue of another CPU, then link_work and link_cleanup
> may be executed at the same time. So add a mutex to prevent them from being
> executed concurrently.
> 
> Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
> v2:
> 1) use guard() instead of lock & unlock functions.
> 
> v1:
> Link: https://lore.kernel.org/all/aKiBi4ZDlbgzed%2Fz@didi-ThinkCentre-M930t-N000/
> ---
>  drivers/ntb/ntb_transport.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index 4f775c3e218f..eb875e3db2e3 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -59,6 +59,7 @@
>  #include <linux/slab.h>
>  #include <linux/types.h>
>  #include <linux/uaccess.h>
> +#include <linux/mutex.h>
>  #include "linux/ntb.h"
>  #include "linux/ntb_transport.h"
>  
> @@ -241,6 +242,9 @@ struct ntb_transport_ctx {
>  	struct work_struct link_cleanup;
>  
>  	struct dentry *debugfs_node_dir;
> +
> +	/* Make sure workq of link event be executed serially */
> +	struct mutex link_event_lock;
>  };
>  
>  enum {
> @@ -1024,6 +1028,7 @@ static void ntb_transport_link_cleanup_work(struct work_struct *work)
>  	struct ntb_transport_ctx *nt =
>  		container_of(work, struct ntb_transport_ctx, link_cleanup);
>  
> +	guard(mutex)(&nt->link_event_lock);
>  	ntb_transport_link_cleanup(nt);
>  }
>  
> @@ -1047,6 +1052,8 @@ static void ntb_transport_link_work(struct work_struct *work)
>  	u32 val;
>  	int rc = 0, i, spad;
>  
> +	guard(mutex)(&nt->link_event_lock);
> +
>  	/* send the local info, in the opposite order of the way we read it */
>  
>  	if (nt->use_msi) {
Re: [PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.
Posted by yuanli fu 1 month ago
Dave Jiang <dave.jiang@intel.com> 于2025年8月25日周一 23:06写道:
>
>
>
> On 8/25/25 2:15 AM, fuyuanli wrote:
> > Since the CPU selected by schedule_work is uncertain, multiple link_event
> > callbacks may be executed at same time. For example, after peer's link is
> > up, it is down quickly before local link_work completed. If link_cleanup
> > is added to the workqueue of another CPU, then link_work and link_cleanup
> > may be executed at the same time. So add a mutex to prevent them from being
> > executed concurrently.
> >
> > Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>

Hi Dave,

Hope you are doing well.

Just wanted to gently follow up on this patch which you had acked
before. Is there
anything else I can do to help get this merged? Perhaps it needs a rebase on a
different tree?

Thanks for your time and all your work!

Best regards,
Yuanli Fu


>
> > ---
> > v2:
> > 1) use guard() instead of lock & unlock functions.
> >
> > v1:
> > Link: https://lore.kernel.org/all/aKiBi4ZDlbgzed%2Fz@didi-ThinkCentre-M930t-N000/
> > ---
> >  drivers/ntb/ntb_transport.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> > index 4f775c3e218f..eb875e3db2e3 100644
> > --- a/drivers/ntb/ntb_transport.c
> > +++ b/drivers/ntb/ntb_transport.c
> > @@ -59,6 +59,7 @@
> >  #include <linux/slab.h>
> >  #include <linux/types.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/mutex.h>
> >  #include "linux/ntb.h"
> >  #include "linux/ntb_transport.h"
> >
> > @@ -241,6 +242,9 @@ struct ntb_transport_ctx {
> >       struct work_struct link_cleanup;
> >
> >       struct dentry *debugfs_node_dir;
> > +
> > +     /* Make sure workq of link event be executed serially */
> > +     struct mutex link_event_lock;
> >  };
> >
> >  enum {
> > @@ -1024,6 +1028,7 @@ static void ntb_transport_link_cleanup_work(struct work_struct *work)
> >       struct ntb_transport_ctx *nt =
> >               container_of(work, struct ntb_transport_ctx, link_cleanup);
> >
> > +     guard(mutex)(&nt->link_event_lock);
> >       ntb_transport_link_cleanup(nt);
> >  }
> >
> > @@ -1047,6 +1052,8 @@ static void ntb_transport_link_work(struct work_struct *work)
> >       u32 val;
> >       int rc = 0, i, spad;
> >
> > +     guard(mutex)(&nt->link_event_lock);
> > +
> >       /* send the local info, in the opposite order of the way we read it */
> >
> >       if (nt->use_msi) {
>
Re: [PATCH v2] ntb: Add mutex to make link_event_callback executed linearly.
Posted by Dave Jiang 1 month ago

On 9/2/25 7:20 PM, yuanli fu wrote:
> Dave Jiang <dave.jiang@intel.com> 于2025年8月25日周一 23:06写道:
>>
>>
>>
>> On 8/25/25 2:15 AM, fuyuanli wrote:
>>> Since the CPU selected by schedule_work is uncertain, multiple link_event
>>> callbacks may be executed at same time. For example, after peer's link is
>>> up, it is down quickly before local link_work completed. If link_cleanup
>>> is added to the workqueue of another CPU, then link_work and link_cleanup
>>> may be executed at the same time. So add a mutex to prevent them from being
>>> executed concurrently.
>>>
>>> Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
>>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 
> Hi Dave,
> 
> Hope you are doing well.
> 
> Just wanted to gently follow up on this patch which you had acked
> before. Is there
> anything else I can do to help get this merged? Perhaps it needs a rebase on a
> different tree?

Jon will merge it when he has a chance.

> 
> Thanks for your time and all your work!
> 
> Best regards,
> Yuanli Fu
> 
> 
>>
>>> ---
>>> v2:
>>> 1) use guard() instead of lock & unlock functions.
>>>
>>> v1:
>>> Link: https://lore.kernel.org/all/aKiBi4ZDlbgzed%2Fz@didi-ThinkCentre-M930t-N000/
>>> ---
>>>  drivers/ntb/ntb_transport.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
>>> index 4f775c3e218f..eb875e3db2e3 100644
>>> --- a/drivers/ntb/ntb_transport.c
>>> +++ b/drivers/ntb/ntb_transport.c
>>> @@ -59,6 +59,7 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/types.h>
>>>  #include <linux/uaccess.h>
>>> +#include <linux/mutex.h>
>>>  #include "linux/ntb.h"
>>>  #include "linux/ntb_transport.h"
>>>
>>> @@ -241,6 +242,9 @@ struct ntb_transport_ctx {
>>>       struct work_struct link_cleanup;
>>>
>>>       struct dentry *debugfs_node_dir;
>>> +
>>> +     /* Make sure workq of link event be executed serially */
>>> +     struct mutex link_event_lock;
>>>  };
>>>
>>>  enum {
>>> @@ -1024,6 +1028,7 @@ static void ntb_transport_link_cleanup_work(struct work_struct *work)
>>>       struct ntb_transport_ctx *nt =
>>>               container_of(work, struct ntb_transport_ctx, link_cleanup);
>>>
>>> +     guard(mutex)(&nt->link_event_lock);
>>>       ntb_transport_link_cleanup(nt);
>>>  }
>>>
>>> @@ -1047,6 +1052,8 @@ static void ntb_transport_link_work(struct work_struct *work)
>>>       u32 val;
>>>       int rc = 0, i, spad;
>>>
>>> +     guard(mutex)(&nt->link_event_lock);
>>> +
>>>       /* send the local info, in the opposite order of the way we read it */
>>>
>>>       if (nt->use_msi) {
>>