[PATCH] rpmsg: glink: fix deadlock in endpoint destroy during driver detach

Vishnu Santhosh posted 1 patch 4 days ago
drivers/rpmsg/qcom_glink_native.c | 3 ---
1 file changed, 3 deletions(-)
Re: [PATCH] rpmsg: glink: fix deadlock in endpoint destroy during driver detach
Posted by Jie Gan 3 days, 7 hours ago

On 6/4/2026 4:42 PM, Vishnu Santhosh wrote:
> During driver detach, the device core holds the device mutex throughout
> the driver's remove callback chain.  When the rpmsg endpoint is
> destroyed as part of that teardown, the GLINK endpoint destroy
> implementation attempts to unregister the underlying rpmsg device.
> That unregistration calls device_del(), which tries to re-acquire the
> same device mutex already held higher up the stack, causing rmmod to
> hang indefinitely.
> 
> The deadlock manifests with the following call chain:
> 
> [<0>] device_del+0x44/0x414  <- tries to acquire same mutex
> [<0>] device_unregister+0x18/0x34
> [<0>] rpmsg_unregister_device+0x28/0x4c
> [<0>] qcom_glink_remove_rpmsg_device+0x70/0xc0
> [<0>] qcom_glink_destroy_ept+0x58/0xbc
> [<0>] rpmsg_dev_remove+0x50/0x60
> [<0>] device_remove+0x4c/0x80
> [<0>] device_release_driver_internal+0x1cc/0x228  <- acquires device mutex
> [<0>] driver_detach+0x4c/0x98
> [<0>] bus_remove_driver+0x6c/0xbc
> [<0>] driver_unregister+0x30/0x60
> [<0>] unregister_rpmsg_driver+0x10/0x1c
> [<0>] fastrpc_exit+0x28/0x38 [fastrpc]
> [<0>] __arm64_sys_delete_module+0x1b8/0x294
> [<0>] invoke_syscall+0x48/0x10c
> [<0>] el0_svc_common.constprop.0+0xc0/0xe0
> [<0>] do_el0_svc+0x1c/0x28
> [<0>] el0_svc+0x34/0x108
> [<0>] el0t_64_sync_handler+0xa0/0xe4
> [<0>] el0t_64_sync+0x198/0x19c
> 
> The rpmsg device unregistration inside endpoint destroy is redundant.
> In both contexts where endpoint destruction is triggered:
> 
> - Driver detach path: the driver core already tears down the rpmsg
>    device.
> 
> - Channel close path: the rpmsg device is already unregistered before
>    endpoint destruction is reached.
> 
> Remove the redundant unregistration to fix the deadlock.
> 

Fixes: a53e356df548 ("rpmsg: glink: fix rpmsg device leak")

Thanks,
Jie

> Co-developed-by: Deepak Kumar Singh <deepak.singh@oss.qualcomm.com>
> Signed-off-by: Deepak Kumar Singh <deepak.singh@oss.qualcomm.com>
> Signed-off-by: Vishnu Santhosh <vishnu.santhosh@oss.qualcomm.com>
> ---
>   drivers/rpmsg/qcom_glink_native.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
> index 401a4ece0c9777398837d4427746fae0a5003e88..ab7ff3d2f56bf797592fc4227ce5b730bce72226 100644
> --- a/drivers/rpmsg/qcom_glink_native.c
> +++ b/drivers/rpmsg/qcom_glink_native.c
> @@ -1418,9 +1418,6 @@ static void qcom_glink_destroy_ept(struct rpmsg_endpoint *ept)
>   	channel->ept.cb = NULL;
>   	spin_unlock_irqrestore(&channel->recv_lock, flags);
>   
> -	/* Decouple the potential rpdev from the channel */
> -	qcom_glink_remove_rpmsg_device(glink, channel);
> -
>   	qcom_glink_send_close_req(glink, channel);
>   }
>   
> 
> ---
> base-commit: ba3e43a9e601636f5edb54e259a74f96ca3b8fd8
> change-id: 20260416-rpmsg-glink-fix-deadlock-destroy-ept-5cc7aac522a0
> 
> Best regards,

Re: [PATCH] rpmsg: glink: fix deadlock in endpoint destroy during driver detach
Posted by Dmitry Baryshkov 12 hours ago
On Fri, Jun 05, 2026 at 09:17:19AM +0800, Jie Gan wrote:
> 
> 
> On 6/4/2026 4:42 PM, Vishnu Santhosh wrote:
> > During driver detach, the device core holds the device mutex throughout
> > the driver's remove callback chain.  When the rpmsg endpoint is
> > destroyed as part of that teardown, the GLINK endpoint destroy
> > implementation attempts to unregister the underlying rpmsg device.
> > That unregistration calls device_del(), which tries to re-acquire the
> > same device mutex already held higher up the stack, causing rmmod to
> > hang indefinitely.
> > 
> > The deadlock manifests with the following call chain:
> > 
> > [<0>] device_del+0x44/0x414  <- tries to acquire same mutex
> > [<0>] device_unregister+0x18/0x34
> > [<0>] rpmsg_unregister_device+0x28/0x4c
> > [<0>] qcom_glink_remove_rpmsg_device+0x70/0xc0
> > [<0>] qcom_glink_destroy_ept+0x58/0xbc
> > [<0>] rpmsg_dev_remove+0x50/0x60
> > [<0>] device_remove+0x4c/0x80
> > [<0>] device_release_driver_internal+0x1cc/0x228  <- acquires device mutex
> > [<0>] driver_detach+0x4c/0x98
> > [<0>] bus_remove_driver+0x6c/0xbc
> > [<0>] driver_unregister+0x30/0x60
> > [<0>] unregister_rpmsg_driver+0x10/0x1c
> > [<0>] fastrpc_exit+0x28/0x38 [fastrpc]
> > [<0>] __arm64_sys_delete_module+0x1b8/0x294
> > [<0>] invoke_syscall+0x48/0x10c
> > [<0>] el0_svc_common.constprop.0+0xc0/0xe0
> > [<0>] do_el0_svc+0x1c/0x28
> > [<0>] el0_svc+0x34/0x108
> > [<0>] el0t_64_sync_handler+0xa0/0xe4
> > [<0>] el0t_64_sync+0x198/0x19c
> > 
> > The rpmsg device unregistration inside endpoint destroy is redundant.
> > In both contexts where endpoint destruction is triggered:
> > 
> > - Driver detach path: the driver core already tears down the rpmsg
> >    device.
> > 
> > - Channel close path: the rpmsg device is already unregistered before
> >    endpoint destruction is reached.
> > 
> > Remove the redundant unregistration to fix the deadlock.
> > 
> 
> Fixes: a53e356df548 ("rpmsg: glink: fix rpmsg device leak")
> 

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>


-- 
With best wishes
Dmitry