[PATCH v2] misc: fastrpc: Fix channel resource access in device_open

Ekansh Gupta posted 1 patch 3 months, 3 weeks ago
drivers/misc/fastrpc.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
[PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Ekansh Gupta 3 months, 3 weeks ago
During rpmsg_probe, fastrpc device nodes are created first, then
channel specific resources are initialized, followed by
of_platform_populate, which triggers context bank probing. This
sequence can cause issues as applications might open the device
node before channel resources are initialized or the session is
available, leading to problems. For example, spin_lock is initialized
after the device node creation, but it is used in device_open,
potentially before initialization. Move device registration after
channel resource initialization in fastrpc_rpmsg_probe.

Fixes: f6f9279f2bf0e ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
Cc: stable@kernel.org
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
Patch v1: https://lore.kernel.org/all/20250517072432.1331803-1-ekansh.gupta@oss.qualcomm.com/
Changes in v2:
  - Moved device registration after channel resource initialization
    to resolve the problem.
  - Modified commit text accordingly.

 drivers/misc/fastrpc.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 378923594f02..f9a2ab82d823 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -2326,6 +2326,22 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
 	secure_dsp = !(of_property_read_bool(rdev->of_node, "qcom,non-secure-domain"));
 	data->secure = secure_dsp;
 
+	kref_init(&data->refcount);
+
+	dev_set_drvdata(&rpdev->dev, data);
+	rdev->dma_mask = &data->dma_mask;
+	dma_set_mask_and_coherent(rdev, DMA_BIT_MASK(32));
+	INIT_LIST_HEAD(&data->users);
+	INIT_LIST_HEAD(&data->invoke_interrupted_mmaps);
+	spin_lock_init(&data->lock);
+	idr_init(&data->ctx_idr);
+	data->domain_id = domain_id;
+	data->rpdev = rpdev;
+
+	err = of_platform_populate(rdev->of_node, NULL, NULL, rdev);
+	if (err)
+		goto err_free_data;
+
 	switch (domain_id) {
 	case ADSP_DOMAIN_ID:
 	case MDSP_DOMAIN_ID:
@@ -2353,22 +2369,6 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
 		goto err_free_data;
 	}
 
-	kref_init(&data->refcount);
-
-	dev_set_drvdata(&rpdev->dev, data);
-	rdev->dma_mask = &data->dma_mask;
-	dma_set_mask_and_coherent(rdev, DMA_BIT_MASK(32));
-	INIT_LIST_HEAD(&data->users);
-	INIT_LIST_HEAD(&data->invoke_interrupted_mmaps);
-	spin_lock_init(&data->lock);
-	idr_init(&data->ctx_idr);
-	data->domain_id = domain_id;
-	data->rpdev = rpdev;
-
-	err = of_platform_populate(rdev->of_node, NULL, NULL, rdev);
-	if (err)
-		goto err_deregister_fdev;
-
 	return 0;
 
 err_deregister_fdev:
-- 
2.34.1
Re: [PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Dmitry Baryshkov 3 months, 2 weeks ago
On Thu, Jun 19, 2025 at 10:40:26AM +0530, Ekansh Gupta wrote:
> During rpmsg_probe, fastrpc device nodes are created first, then
> channel specific resources are initialized, followed by
> of_platform_populate, which triggers context bank probing. This
> sequence can cause issues as applications might open the device
> node before channel resources are initialized or the session is
> available, leading to problems. For example, spin_lock is initialized
> after the device node creation, but it is used in device_open,
> potentially before initialization. Move device registration after
> channel resource initialization in fastrpc_rpmsg_probe.

You've moved device init, however there is still a possibility for the
context devices to be created, but not bound to the driver (because all
the probings are async). I think instead we should drop the extra
platform driver layer and create and set up corresponding devices
manually. For example, see how it is handled in
host1x_memory_context_list_init(). That function uses iommu-maps, but we
can use OF nodes and iommus instead.

> 
> Fixes: f6f9279f2bf0e ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
> Cc: stable@kernel.org
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
> Patch v1: https://lore.kernel.org/all/20250517072432.1331803-1-ekansh.gupta@oss.qualcomm.com/
> Changes in v2:
>   - Moved device registration after channel resource initialization
>     to resolve the problem.
>   - Modified commit text accordingly.
> 
>  drivers/misc/fastrpc.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index 378923594f02..f9a2ab82d823 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
> @@ -2326,6 +2326,22 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
>  	secure_dsp = !(of_property_read_bool(rdev->of_node, "qcom,non-secure-domain"));
>  	data->secure = secure_dsp;
>  
> +	kref_init(&data->refcount);
> +
> +	dev_set_drvdata(&rpdev->dev, data);
> +	rdev->dma_mask = &data->dma_mask;
> +	dma_set_mask_and_coherent(rdev, DMA_BIT_MASK(32));
> +	INIT_LIST_HEAD(&data->users);
> +	INIT_LIST_HEAD(&data->invoke_interrupted_mmaps);
> +	spin_lock_init(&data->lock);
> +	idr_init(&data->ctx_idr);
> +	data->domain_id = domain_id;
> +	data->rpdev = rpdev;
> +
> +	err = of_platform_populate(rdev->of_node, NULL, NULL, rdev);
> +	if (err)
> +		goto err_free_data;
> +
>  	switch (domain_id) {
>  	case ADSP_DOMAIN_ID:
>  	case MDSP_DOMAIN_ID:
> @@ -2353,22 +2369,6 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
>  		goto err_free_data;
>  	}
>  
> -	kref_init(&data->refcount);
> -
> -	dev_set_drvdata(&rpdev->dev, data);
> -	rdev->dma_mask = &data->dma_mask;
> -	dma_set_mask_and_coherent(rdev, DMA_BIT_MASK(32));
> -	INIT_LIST_HEAD(&data->users);
> -	INIT_LIST_HEAD(&data->invoke_interrupted_mmaps);
> -	spin_lock_init(&data->lock);
> -	idr_init(&data->ctx_idr);
> -	data->domain_id = domain_id;
> -	data->rpdev = rpdev;
> -
> -	err = of_platform_populate(rdev->of_node, NULL, NULL, rdev);
> -	if (err)
> -		goto err_deregister_fdev;
> -
>  	return 0;
>  
>  err_deregister_fdev:
> -- 
> 2.34.1
> 

-- 
With best wishes
Dmitry
Re: [PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Greg KH 3 months, 2 weeks ago
On Tue, Jun 24, 2025 at 04:27:21PM +0300, Dmitry Baryshkov wrote:
> On Thu, Jun 19, 2025 at 10:40:26AM +0530, Ekansh Gupta wrote:
> > During rpmsg_probe, fastrpc device nodes are created first, then
> > channel specific resources are initialized, followed by
> > of_platform_populate, which triggers context bank probing. This
> > sequence can cause issues as applications might open the device
> > node before channel resources are initialized or the session is
> > available, leading to problems. For example, spin_lock is initialized
> > after the device node creation, but it is used in device_open,
> > potentially before initialization. Move device registration after
> > channel resource initialization in fastrpc_rpmsg_probe.
> 
> You've moved device init, however there is still a possibility for the
> context devices to be created, but not bound to the driver (because all
> the probings are async). I think instead we should drop the extra
> platform driver layer and create and set up corresponding devices
> manually. For example, see how it is handled in
> host1x_memory_context_list_init(). That function uses iommu-maps, but we
> can use OF nodes and iommus instead.

Is this a real platform device?  If so, why do you need a second
platform driver, what makes this so unique?  If this isn't a platform
device, then why not just use the faux bus instead?

It seems that "number of sessions" is a DT property, is that something
that is really defined by the hardware?  Or is it just a virtual thing
that people are abusing in the DT?

And if you really have all these sessions, why not make them real
devices, wouldn't that make things simpler?

thanks,

greg k-h
Re: [PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Greg KH 3 months, 2 weeks ago
On Tue, Jun 24, 2025 at 04:36:35PM +0100, Greg KH wrote:
> On Tue, Jun 24, 2025 at 04:27:21PM +0300, Dmitry Baryshkov wrote:
> > On Thu, Jun 19, 2025 at 10:40:26AM +0530, Ekansh Gupta wrote:
> > > During rpmsg_probe, fastrpc device nodes are created first, then
> > > channel specific resources are initialized, followed by
> > > of_platform_populate, which triggers context bank probing. This
> > > sequence can cause issues as applications might open the device
> > > node before channel resources are initialized or the session is
> > > available, leading to problems. For example, spin_lock is initialized
> > > after the device node creation, but it is used in device_open,
> > > potentially before initialization. Move device registration after
> > > channel resource initialization in fastrpc_rpmsg_probe.
> > 
> > You've moved device init, however there is still a possibility for the
> > context devices to be created, but not bound to the driver (because all
> > the probings are async). I think instead we should drop the extra
> > platform driver layer and create and set up corresponding devices
> > manually. For example, see how it is handled in
> > host1x_memory_context_list_init(). That function uses iommu-maps, but we
> > can use OF nodes and iommus instead.
> 
> Is this a real platform device?  If so, why do you need a second
> platform driver, what makes this so unique?  If this isn't a platform
> device, then why not just use the faux bus instead?
> 
> It seems that "number of sessions" is a DT property, is that something
> that is really defined by the hardware?  Or is it just a virtual thing
> that people are abusing in the DT?
> 
> And if you really have all these sessions, why not make them real
> devices, wouldn't that make things simpler?

Oh wait, these are "fake" platform devices under the parent (i.e. real)
platform device.  That's not good, please don't do that, use the faux
bus code now instead to properly handle this.  Attempting to create a
device when open() is called is really really odd...

thanks,

greg k-h
Re: [PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Dmitry Baryshkov 3 months, 2 weeks ago
On Tue, Jun 24, 2025 at 04:38:25PM +0100, Greg KH wrote:
> On Tue, Jun 24, 2025 at 04:36:35PM +0100, Greg KH wrote:
> > On Tue, Jun 24, 2025 at 04:27:21PM +0300, Dmitry Baryshkov wrote:
> > > On Thu, Jun 19, 2025 at 10:40:26AM +0530, Ekansh Gupta wrote:
> > > > During rpmsg_probe, fastrpc device nodes are created first, then
> > > > channel specific resources are initialized, followed by
> > > > of_platform_populate, which triggers context bank probing. This
> > > > sequence can cause issues as applications might open the device
> > > > node before channel resources are initialized or the session is
> > > > available, leading to problems. For example, spin_lock is initialized
> > > > after the device node creation, but it is used in device_open,
> > > > potentially before initialization. Move device registration after
> > > > channel resource initialization in fastrpc_rpmsg_probe.
> > > 
> > > You've moved device init, however there is still a possibility for the
> > > context devices to be created, but not bound to the driver (because all
> > > the probings are async). I think instead we should drop the extra
> > > platform driver layer and create and set up corresponding devices
> > > manually. For example, see how it is handled in
> > > host1x_memory_context_list_init(). That function uses iommu-maps, but we
> > > can use OF nodes and iommus instead.
> > 
> > Is this a real platform device?  If so, why do you need a second
> > platform driver, what makes this so unique?  If this isn't a platform
> > device, then why not just use the faux bus instead?
> > 
> > It seems that "number of sessions" is a DT property, is that something
> > that is really defined by the hardware?  Or is it just a virtual thing
> > that people are abusing in the DT?

Purely software value.

> > 
> > And if you really have all these sessions, why not make them real
> > devices, wouldn't that make things simpler?
> 
> Oh wait, these are "fake" platform devices under the parent (i.e. real)
> platform device.  That's not good, please don't do that, use the faux
> bus code now instead to properly handle this.  Attempting to create a
> device when open() is called is really really odd...

The driver doesn't created devices during open(). It creates them
earlier, then another driver probes an populates the data. I suggest to
follow Tegra approach, remove the sub-driver completely and instead of
calling of_platform_populate() create necessary devices manually and set
corresponding IOMMU configuration from the main driver's probe path.

-- 
With best wishes
Dmitry
Re: [PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Ekansh Gupta 3 months, 2 weeks ago

On 6/25/2025 5:15 AM, Dmitry Baryshkov wrote:
> On Tue, Jun 24, 2025 at 04:38:25PM +0100, Greg KH wrote:
>> On Tue, Jun 24, 2025 at 04:36:35PM +0100, Greg KH wrote:
>>> On Tue, Jun 24, 2025 at 04:27:21PM +0300, Dmitry Baryshkov wrote:
>>>> On Thu, Jun 19, 2025 at 10:40:26AM +0530, Ekansh Gupta wrote:
>>>>> During rpmsg_probe, fastrpc device nodes are created first, then
>>>>> channel specific resources are initialized, followed by
>>>>> of_platform_populate, which triggers context bank probing. This
>>>>> sequence can cause issues as applications might open the device
>>>>> node before channel resources are initialized or the session is
>>>>> available, leading to problems. For example, spin_lock is initialized
>>>>> after the device node creation, but it is used in device_open,
>>>>> potentially before initialization. Move device registration after
>>>>> channel resource initialization in fastrpc_rpmsg_probe.
>>>> You've moved device init, however there is still a possibility for the
>>>> context devices to be created, but not bound to the driver (because all
>>>> the probings are async). I think instead we should drop the extra
>>>> platform driver layer and create and set up corresponding devices
>>>> manually. For example, see how it is handled in
>>>> host1x_memory_context_list_init(). That function uses iommu-maps, but we
>>>> can use OF nodes and iommus instead.
>>> Is this a real platform device?  If so, why do you need a second
>>> platform driver, what makes this so unique?  If this isn't a platform
>>> device, then why not just use the faux bus instead?
>>>
>>> It seems that "number of sessions" is a DT property, is that something
>>> that is really defined by the hardware?  Or is it just a virtual thing
>>> that people are abusing in the DT?
> Purely software value.
>
>>> And if you really have all these sessions, why not make them real
>>> devices, wouldn't that make things simpler?
>> Oh wait, these are "fake" platform devices under the parent (i.e. real)
>> platform device.  That's not good, please don't do that, use the faux
>> bus code now instead to properly handle this.  Attempting to create a
>> device when open() is called is really really odd...
> The driver doesn't created devices during open(). It creates them
> earlier, then another driver probes an populates the data. I suggest to
> follow Tegra approach, remove the sub-driver completely and instead of
> calling of_platform_populate() create necessary devices manually and set
> corresponding IOMMU configuration from the main driver's probe path.
Thanks for the suggestions. I'm checking this approach.
>
Re: [PATCH v2] misc: fastrpc: Fix channel resource access in device_open
Posted by Greg KH 3 months, 2 weeks ago
On Wed, Jun 25, 2025 at 02:45:27AM +0300, Dmitry Baryshkov wrote:
> On Tue, Jun 24, 2025 at 04:38:25PM +0100, Greg KH wrote:
> > On Tue, Jun 24, 2025 at 04:36:35PM +0100, Greg KH wrote:
> > > On Tue, Jun 24, 2025 at 04:27:21PM +0300, Dmitry Baryshkov wrote:
> > > > On Thu, Jun 19, 2025 at 10:40:26AM +0530, Ekansh Gupta wrote:
> > > > > During rpmsg_probe, fastrpc device nodes are created first, then
> > > > > channel specific resources are initialized, followed by
> > > > > of_platform_populate, which triggers context bank probing. This
> > > > > sequence can cause issues as applications might open the device
> > > > > node before channel resources are initialized or the session is
> > > > > available, leading to problems. For example, spin_lock is initialized
> > > > > after the device node creation, but it is used in device_open,
> > > > > potentially before initialization. Move device registration after
> > > > > channel resource initialization in fastrpc_rpmsg_probe.
> > > > 
> > > > You've moved device init, however there is still a possibility for the
> > > > context devices to be created, but not bound to the driver (because all
> > > > the probings are async). I think instead we should drop the extra
> > > > platform driver layer and create and set up corresponding devices
> > > > manually. For example, see how it is handled in
> > > > host1x_memory_context_list_init(). That function uses iommu-maps, but we
> > > > can use OF nodes and iommus instead.
> > > 
> > > Is this a real platform device?  If so, why do you need a second
> > > platform driver, what makes this so unique?  If this isn't a platform
> > > device, then why not just use the faux bus instead?
> > > 
> > > It seems that "number of sessions" is a DT property, is that something
> > > that is really defined by the hardware?  Or is it just a virtual thing
> > > that people are abusing in the DT?
> 
> Purely software value.
> 
> > > 
> > > And if you really have all these sessions, why not make them real
> > > devices, wouldn't that make things simpler?
> > 
> > Oh wait, these are "fake" platform devices under the parent (i.e. real)
> > platform device.  That's not good, please don't do that, use the faux
> > bus code now instead to properly handle this.  Attempting to create a
> > device when open() is called is really really odd...
> 
> The driver doesn't created devices during open(). It creates them
> earlier, then another driver probes an populates the data. I suggest to
> follow Tegra approach, remove the sub-driver completely and instead of
> calling of_platform_populate() create necessary devices manually and set
> corresponding IOMMU configuration from the main driver's probe path.

That sounds much more reasonable.

thanks,

greg k-h