[PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown

Peng Fan (OSS) posted 1 patch 8 months, 3 weeks ago
drivers/remoteproc/remoteproc_core.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan (OSS) 8 months, 3 weeks ago
From: Peng Fan <peng.fan@nxp.com>

There is case as below could trigger kernel dump:
Use U-Boot to start remote processor(rproc) with resource table
published to a fixed address by rproc. After Kernel boots up,
stop the rproc, load a new firmware which doesn't have resource table
,and start rproc.

When starting rproc with a firmware not have resource table,
`memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
trigger dump, because rproc->cache_table is set to NULL during the last
stop operation, but rproc->table_sz is still valid.

This issue is found on i.MX8MP and i.MX9.

Dump as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
Hardware name: NXP i.MX8MPlus EVK board (DT)
pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __pi_memcpy_generic+0x110/0x22c
lr : rproc_start+0x88/0x1e0
Call trace:
 __pi_memcpy_generic+0x110/0x22c (P)
 rproc_boot+0x198/0x57c
 state_store+0x40/0x104
 dev_attr_store+0x18/0x2c
 sysfs_kf_write+0x7c/0x94
 kernfs_fop_write_iter+0x120/0x1cc
 vfs_write+0x240/0x378
 ksys_write+0x70/0x108
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x10c
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x30/0xcc
 el0t_64_sync_handler+0x10c/0x138
 el0t_64_sync+0x198/0x19c

Clear rproc->table_sz to address the issue.

While at here, also clear rproc->table_sz when rproc_fw_boot and
rproc_detach.

Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
---

V2:
 Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud

 drivers/remoteproc/remoteproc_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..1efa53d4e0c3 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
 	kfree(rproc->cached_table);
 	rproc->cached_table = NULL;
 	rproc->table_ptr = NULL;
+	rproc->table_sz = 0;
 unprepare_rproc:
 	/* release HW resources if needed */
 	rproc_unprepare_device(rproc);
@@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
 	kfree(rproc->cached_table);
 	rproc->cached_table = NULL;
 	rproc->table_ptr = NULL;
+	rproc->table_sz = 0;
 out:
 	mutex_unlock(&rproc->lock);
 	return ret;
@@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
 	kfree(rproc->cached_table);
 	rproc->cached_table = NULL;
 	rproc->table_ptr = NULL;
+	rproc->table_sz = 0;
 out:
 	mutex_unlock(&rproc->lock);
 	return ret;
-- 
2.37.1
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Mathieu Poirier 8 months, 3 weeks ago
Hi,

On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan <peng.fan@nxp.com>
> 
> There is case as below could trigger kernel dump:
> Use U-Boot to start remote processor(rproc) with resource table
> published to a fixed address by rproc. After Kernel boots up,
> stop the rproc, load a new firmware which doesn't have resource table
> ,and start rproc.
>

If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
remote processor won't be started.  What am I missing?

[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 

> When starting rproc with a firmware not have resource table,
> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
> trigger dump, because rproc->cache_table is set to NULL during the last
> stop operation, but rproc->table_sz is still valid.
> 
> This issue is found on i.MX8MP and i.MX9.
> 
> Dump as below:
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
> Hardware name: NXP i.MX8MPlus EVK board (DT)
> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __pi_memcpy_generic+0x110/0x22c
> lr : rproc_start+0x88/0x1e0
> Call trace:
>  __pi_memcpy_generic+0x110/0x22c (P)
>  rproc_boot+0x198/0x57c
>  state_store+0x40/0x104
>  dev_attr_store+0x18/0x2c
>  sysfs_kf_write+0x7c/0x94
>  kernfs_fop_write_iter+0x120/0x1cc
>  vfs_write+0x240/0x378
>  ksys_write+0x70/0x108
>  __arm64_sys_write+0x1c/0x28
>  invoke_syscall+0x48/0x10c
>  el0_svc_common.constprop.0+0xc0/0xe0
>  do_el0_svc+0x1c/0x28
>  el0_svc+0x30/0xcc
>  el0t_64_sync_handler+0x10c/0x138
>  el0t_64_sync+0x198/0x19c
> 
> Clear rproc->table_sz to address the issue.
> 
> While at here, also clear rproc->table_sz when rproc_fw_boot and
> rproc_detach.
> 
> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
> Signed-off-by: Peng Fan <peng.fan@nxp.com>
> ---
> 
> V2:
>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
> 
>  drivers/remoteproc/remoteproc_core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index c2cf0d277729..1efa53d4e0c3 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>  	kfree(rproc->cached_table);
>  	rproc->cached_table = NULL;
>  	rproc->table_ptr = NULL;
> +	rproc->table_sz = 0;
>  unprepare_rproc:
>  	/* release HW resources if needed */
>  	rproc_unprepare_device(rproc);
> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>  	kfree(rproc->cached_table);
>  	rproc->cached_table = NULL;
>  	rproc->table_ptr = NULL;
> +	rproc->table_sz = 0;
>  out:
>  	mutex_unlock(&rproc->lock);
>  	return ret;
> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>  	kfree(rproc->cached_table);
>  	rproc->cached_table = NULL;
>  	rproc->table_ptr = NULL;
> +	rproc->table_sz = 0;
>  out:
>  	mutex_unlock(&rproc->lock);
>  	return ret;
> -- 
> 2.37.1
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 3 weeks ago
On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
>Hi,
>
>On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
>> From: Peng Fan <peng.fan@nxp.com>
>> 
>> There is case as below could trigger kernel dump:
>> Use U-Boot to start remote processor(rproc) with resource table
>> published to a fixed address by rproc. After Kernel boots up,
>> stop the rproc, load a new firmware which doesn't have resource table
>> ,and start rproc.
>>
>
>If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
>will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
>remote processor won't be started.  What am I missing?

STM32 and i.MX use their own parse_fw implementation which allows no resource
table:
https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598

Thanks,
Peng

>
>[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
>[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
>
>> When starting rproc with a firmware not have resource table,
>> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
>> trigger dump, because rproc->cache_table is set to NULL during the last
>> stop operation, but rproc->table_sz is still valid.
>> 
>> This issue is found on i.MX8MP and i.MX9.
>> 
>> Dump as below:
>> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> Mem abort info:
>>   ESR = 0x0000000096000004
>>   EC = 0x25: DABT (current EL), IL = 32 bits
>>   SET = 0, FnV = 0
>>   EA = 0, S1PTW = 0
>>   FSC = 0x04: level 0 translation fault
>> Data abort info:
>>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
>> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
>> Hardware name: NXP i.MX8MPlus EVK board (DT)
>> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : __pi_memcpy_generic+0x110/0x22c
>> lr : rproc_start+0x88/0x1e0
>> Call trace:
>>  __pi_memcpy_generic+0x110/0x22c (P)
>>  rproc_boot+0x198/0x57c
>>  state_store+0x40/0x104
>>  dev_attr_store+0x18/0x2c
>>  sysfs_kf_write+0x7c/0x94
>>  kernfs_fop_write_iter+0x120/0x1cc
>>  vfs_write+0x240/0x378
>>  ksys_write+0x70/0x108
>>  __arm64_sys_write+0x1c/0x28
>>  invoke_syscall+0x48/0x10c
>>  el0_svc_common.constprop.0+0xc0/0xe0
>>  do_el0_svc+0x1c/0x28
>>  el0_svc+0x30/0xcc
>>  el0t_64_sync_handler+0x10c/0x138
>>  el0t_64_sync+0x198/0x19c
>> 
>> Clear rproc->table_sz to address the issue.
>> 
>> While at here, also clear rproc->table_sz when rproc_fw_boot and
>> rproc_detach.
>> 
>> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
>> Signed-off-by: Peng Fan <peng.fan@nxp.com>
>> ---
>> 
>> V2:
>>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
>> 
>>  drivers/remoteproc/remoteproc_core.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> index c2cf0d277729..1efa53d4e0c3 100644
>> --- a/drivers/remoteproc/remoteproc_core.c
>> +++ b/drivers/remoteproc/remoteproc_core.c
>> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>>  	kfree(rproc->cached_table);
>>  	rproc->cached_table = NULL;
>>  	rproc->table_ptr = NULL;
>> +	rproc->table_sz = 0;
>>  unprepare_rproc:
>>  	/* release HW resources if needed */
>>  	rproc_unprepare_device(rproc);
>> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>>  	kfree(rproc->cached_table);
>>  	rproc->cached_table = NULL;
>>  	rproc->table_ptr = NULL;
>> +	rproc->table_sz = 0;
>>  out:
>>  	mutex_unlock(&rproc->lock);
>>  	return ret;
>> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>>  	kfree(rproc->cached_table);
>>  	rproc->cached_table = NULL;
>>  	rproc->table_ptr = NULL;
>> +	rproc->table_sz = 0;
>>  out:
>>  	mutex_unlock(&rproc->lock);
>>  	return ret;
>> -- 
>> 2.37.1
>> 
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Mathieu Poirier 8 months, 3 weeks ago
On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
> >Hi,
> >
> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
> >> From: Peng Fan <peng.fan@nxp.com>
> >> 
> >> There is case as below could trigger kernel dump:
> >> Use U-Boot to start remote processor(rproc) with resource table
> >> published to a fixed address by rproc. After Kernel boots up,
> >> stop the rproc, load a new firmware which doesn't have resource table
> >> ,and start rproc.
> >>
> >
> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
> >remote processor won't be started.  What am I missing?
> 
> STM32 and i.MX use their own parse_fw implementation which allows no resource
> table:
> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598

Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
that will return NULL if a resource table is not found and preventing the
memcpy() in rproc_start() from happening:

https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288

> 
> Thanks,
> Peng
> 
> >
> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
> >
> >> When starting rproc with a firmware not have resource table,
> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
> >> trigger dump, because rproc->cache_table is set to NULL during the last
> >> stop operation, but rproc->table_sz is still valid.
> >> 
> >> This issue is found on i.MX8MP and i.MX9.
> >> 
> >> Dump as below:
> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> >> Mem abort info:
> >>   ESR = 0x0000000096000004
> >>   EC = 0x25: DABT (current EL), IL = 32 bits
> >>   SET = 0, FnV = 0
> >>   EA = 0, S1PTW = 0
> >>   FSC = 0x04: level 0 translation fault
> >> Data abort info:
> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> Modules linked in:
> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> pc : __pi_memcpy_generic+0x110/0x22c
> >> lr : rproc_start+0x88/0x1e0
> >> Call trace:
> >>  __pi_memcpy_generic+0x110/0x22c (P)
> >>  rproc_boot+0x198/0x57c
> >>  state_store+0x40/0x104
> >>  dev_attr_store+0x18/0x2c
> >>  sysfs_kf_write+0x7c/0x94
> >>  kernfs_fop_write_iter+0x120/0x1cc
> >>  vfs_write+0x240/0x378
> >>  ksys_write+0x70/0x108
> >>  __arm64_sys_write+0x1c/0x28
> >>  invoke_syscall+0x48/0x10c
> >>  el0_svc_common.constprop.0+0xc0/0xe0
> >>  do_el0_svc+0x1c/0x28
> >>  el0_svc+0x30/0xcc
> >>  el0t_64_sync_handler+0x10c/0x138
> >>  el0t_64_sync+0x198/0x19c
> >> 
> >> Clear rproc->table_sz to address the issue.
> >> 
> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
> >> rproc_detach.
> >> 
> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >> ---
> >> 
> >> V2:
> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
> >> 
> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >> index c2cf0d277729..1efa53d4e0c3 100644
> >> --- a/drivers/remoteproc/remoteproc_core.c
> >> +++ b/drivers/remoteproc/remoteproc_core.c
> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> >>  	kfree(rproc->cached_table);
> >>  	rproc->cached_table = NULL;
> >>  	rproc->table_ptr = NULL;
> >> +	rproc->table_sz = 0;
> >>  unprepare_rproc:
> >>  	/* release HW resources if needed */
> >>  	rproc_unprepare_device(rproc);
> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
> >>  	kfree(rproc->cached_table);
> >>  	rproc->cached_table = NULL;
> >>  	rproc->table_ptr = NULL;
> >> +	rproc->table_sz = 0;
> >>  out:
> >>  	mutex_unlock(&rproc->lock);
> >>  	return ret;
> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
> >>  	kfree(rproc->cached_table);
> >>  	rproc->cached_table = NULL;
> >>  	rproc->table_ptr = NULL;
> >> +	rproc->table_sz = 0;
> >>  out:
> >>  	mutex_unlock(&rproc->lock);
> >>  	return ret;
> >> -- 
> >> 2.37.1
> >> 
> >
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 3 weeks ago
On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
>On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
>> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
>> >Hi,
>> >
>> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
>> >> From: Peng Fan <peng.fan@nxp.com>
>> >> 
>> >> There is case as below could trigger kernel dump:
>> >> Use U-Boot to start remote processor(rproc) with resource table
>> >> published to a fixed address by rproc. After Kernel boots up,
>> >> stop the rproc, load a new firmware which doesn't have resource table
>> >> ,and start rproc.
>> >>
>> >
>> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
>> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
>> >remote processor won't be started.  What am I missing?
>> 
>> STM32 and i.MX use their own parse_fw implementation which allows no resource
>> table:
>> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
>> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
>
>Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
>that will return NULL if a resource table is not found and preventing the
>memcpy() in rproc_start() from happening:
>
>https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288


Sorry, I forgot to mention below code:
loaded_table is a valid pointer for i.MX, see
https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,

So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
not zero while cached_table is NULL.
	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
	if (loaded_table) {
		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
		rproc->table_ptr = loaded_table;
	}

Thanks,
Peng

>
>> 
>> Thanks,
>> Peng
>> 
>> >
>> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
>> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
>> >
>> >> When starting rproc with a firmware not have resource table,
>> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
>> >> trigger dump, because rproc->cache_table is set to NULL during the last
>> >> stop operation, but rproc->table_sz is still valid.
>> >> 
>> >> This issue is found on i.MX8MP and i.MX9.
>> >> 
>> >> Dump as below:
>> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> >> Mem abort info:
>> >>   ESR = 0x0000000096000004
>> >>   EC = 0x25: DABT (current EL), IL = 32 bits
>> >>   SET = 0, FnV = 0
>> >>   EA = 0, S1PTW = 0
>> >>   FSC = 0x04: level 0 translation fault
>> >> Data abort info:
>> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
>> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> >> Modules linked in:
>> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
>> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
>> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> >> pc : __pi_memcpy_generic+0x110/0x22c
>> >> lr : rproc_start+0x88/0x1e0
>> >> Call trace:
>> >>  __pi_memcpy_generic+0x110/0x22c (P)
>> >>  rproc_boot+0x198/0x57c
>> >>  state_store+0x40/0x104
>> >>  dev_attr_store+0x18/0x2c
>> >>  sysfs_kf_write+0x7c/0x94
>> >>  kernfs_fop_write_iter+0x120/0x1cc
>> >>  vfs_write+0x240/0x378
>> >>  ksys_write+0x70/0x108
>> >>  __arm64_sys_write+0x1c/0x28
>> >>  invoke_syscall+0x48/0x10c
>> >>  el0_svc_common.constprop.0+0xc0/0xe0
>> >>  do_el0_svc+0x1c/0x28
>> >>  el0_svc+0x30/0xcc
>> >>  el0t_64_sync_handler+0x10c/0x138
>> >>  el0t_64_sync+0x198/0x19c
>> >> 
>> >> Clear rproc->table_sz to address the issue.
>> >> 
>> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
>> >> rproc_detach.
>> >> 
>> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
>> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
>> >> ---
>> >> 
>> >> V2:
>> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
>> >> 
>> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
>> >>  1 file changed, 3 insertions(+)
>> >> 
>> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> >> index c2cf0d277729..1efa53d4e0c3 100644
>> >> --- a/drivers/remoteproc/remoteproc_core.c
>> >> +++ b/drivers/remoteproc/remoteproc_core.c
>> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>> >>  	kfree(rproc->cached_table);
>> >>  	rproc->cached_table = NULL;
>> >>  	rproc->table_ptr = NULL;
>> >> +	rproc->table_sz = 0;
>> >>  unprepare_rproc:
>> >>  	/* release HW resources if needed */
>> >>  	rproc_unprepare_device(rproc);
>> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>> >>  	kfree(rproc->cached_table);
>> >>  	rproc->cached_table = NULL;
>> >>  	rproc->table_ptr = NULL;
>> >> +	rproc->table_sz = 0;
>> >>  out:
>> >>  	mutex_unlock(&rproc->lock);
>> >>  	return ret;
>> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>> >>  	kfree(rproc->cached_table);
>> >>  	rproc->cached_table = NULL;
>> >>  	rproc->table_ptr = NULL;
>> >> +	rproc->table_sz = 0;
>> >>  out:
>> >>  	mutex_unlock(&rproc->lock);
>> >>  	return ret;
>> >> -- 
>> >> 2.37.1
>> >> 
>> >
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Mathieu Poirier 8 months, 2 weeks ago
On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
> >> >Hi,
> >> >
> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
> >> >> From: Peng Fan <peng.fan@nxp.com>
> >> >> 
> >> >> There is case as below could trigger kernel dump:
> >> >> Use U-Boot to start remote processor(rproc) with resource table
> >> >> published to a fixed address by rproc. After Kernel boots up,
> >> >> stop the rproc, load a new firmware which doesn't have resource table
> >> >> ,and start rproc.
> >> >>
> >> >
> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
> >> >remote processor won't be started.  What am I missing?
> >> 
> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
> >> table:
> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
> >
> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
> >that will return NULL if a resource table is not found and preventing the
> >memcpy() in rproc_start() from happening:
> >
> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
> 
> 
> Sorry, I forgot to mention below code:
> loaded_table is a valid pointer for i.MX, see
> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,

(SIGH)

The changelong for this patch says "... load a new firmware which doesn't have a
resource table..." and now you are telling me that @load_table is valid.  As
such I have to _guess_ that @priv->rsc_table is not null.  So which is it -
valid or not valid?  

If my assumption above is valid than fix that instead of hacking the remoteproc
core.  If my assumption is not valid the changelog and your justification for
this patch are wrong.  Either way I have spent way too much time on this patch
already and dropping it.  The same goes for your other patch [1] - resent it
when you will have properly address the work herein.   

[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue

> 
> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
> not zero while cached_table is NULL.
> 	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
> 	if (loaded_table) {
> 		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
> 		rproc->table_ptr = loaded_table;
> 	}
> 
> Thanks,
> Peng
> 
> >
> >> 
> >> Thanks,
> >> Peng
> >> 
> >> >
> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
> >> >
> >> >> When starting rproc with a firmware not have resource table,
> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
> >> >> stop operation, but rproc->table_sz is still valid.
> >> >> 
> >> >> This issue is found on i.MX8MP and i.MX9.
> >> >> 
> >> >> Dump as below:
> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> >> >> Mem abort info:
> >> >>   ESR = 0x0000000096000004
> >> >>   EC = 0x25: DABT (current EL), IL = 32 bits
> >> >>   SET = 0, FnV = 0
> >> >>   EA = 0, S1PTW = 0
> >> >>   FSC = 0x04: level 0 translation fault
> >> >> Data abort info:
> >> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> >> Modules linked in:
> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> >> pc : __pi_memcpy_generic+0x110/0x22c
> >> >> lr : rproc_start+0x88/0x1e0
> >> >> Call trace:
> >> >>  __pi_memcpy_generic+0x110/0x22c (P)
> >> >>  rproc_boot+0x198/0x57c
> >> >>  state_store+0x40/0x104
> >> >>  dev_attr_store+0x18/0x2c
> >> >>  sysfs_kf_write+0x7c/0x94
> >> >>  kernfs_fop_write_iter+0x120/0x1cc
> >> >>  vfs_write+0x240/0x378
> >> >>  ksys_write+0x70/0x108
> >> >>  __arm64_sys_write+0x1c/0x28
> >> >>  invoke_syscall+0x48/0x10c
> >> >>  el0_svc_common.constprop.0+0xc0/0xe0
> >> >>  do_el0_svc+0x1c/0x28
> >> >>  el0_svc+0x30/0xcc
> >> >>  el0t_64_sync_handler+0x10c/0x138
> >> >>  el0t_64_sync+0x198/0x19c
> >> >> 
> >> >> Clear rproc->table_sz to address the issue.
> >> >> 
> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
> >> >> rproc_detach.
> >> >> 
> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
> >> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >> >> ---
> >> >> 
> >> >> V2:
> >> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
> >> >> 
> >> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
> >> >>  1 file changed, 3 insertions(+)
> >> >> 
> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >> >> index c2cf0d277729..1efa53d4e0c3 100644
> >> >> --- a/drivers/remoteproc/remoteproc_core.c
> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> >> >>  	kfree(rproc->cached_table);
> >> >>  	rproc->cached_table = NULL;
> >> >>  	rproc->table_ptr = NULL;
> >> >> +	rproc->table_sz = 0;
> >> >>  unprepare_rproc:
> >> >>  	/* release HW resources if needed */
> >> >>  	rproc_unprepare_device(rproc);
> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
> >> >>  	kfree(rproc->cached_table);
> >> >>  	rproc->cached_table = NULL;
> >> >>  	rproc->table_ptr = NULL;
> >> >> +	rproc->table_sz = 0;
> >> >>  out:
> >> >>  	mutex_unlock(&rproc->lock);
> >> >>  	return ret;
> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
> >> >>  	kfree(rproc->cached_table);
> >> >>  	rproc->cached_table = NULL;
> >> >>  	rproc->table_ptr = NULL;
> >> >> +	rproc->table_sz = 0;
> >> >>  out:
> >> >>  	mutex_unlock(&rproc->lock);
> >> >>  	return ret;
> >> >> -- 
> >> >> 2.37.1
> >> >> 
> >> >
> >
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 2 weeks ago
On Mon, Mar 31, 2025 at 09:40:41AM -0600, Mathieu Poirier wrote:
>On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
>> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
>> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
>> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
>> >> >Hi,
>> >> >
>> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
>> >> >> From: Peng Fan <peng.fan@nxp.com>
>> >> >> 
>> >> >> There is case as below could trigger kernel dump:
>> >> >> Use U-Boot to start remote processor(rproc) with resource table
>> >> >> published to a fixed address by rproc. After Kernel boots up,
>> >> >> stop the rproc, load a new firmware which doesn't have resource table
>> >> >> ,and start rproc.
>> >> >>
>> >> >
>> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
>> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
>> >> >remote processor won't be started.  What am I missing?
>> >> 
>> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
>> >> table:
>> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
>> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
>> >
>> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
>> >that will return NULL if a resource table is not found and preventing the
>> >memcpy() in rproc_start() from happening:
>> >
>> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
>> 
>> 
>> Sorry, I forgot to mention below code:
>> loaded_table is a valid pointer for i.MX, see
>> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,
>
>(SIGH)
>
>The changelong for this patch says "... load a new firmware which doesn't have a
>resource table..." and now you are telling me that @load_table is valid.  As
>such I have to _guess_ that @priv->rsc_table is not null.  So which is it -
>valid or not valid?  

As wrote in commit log, bootloader kicks the m7 and m7 publishes a valid
resource table to a fixed address.

When linux boots up, first stop m7, then load a new firmware which does not
have resource table, then stop m7.

Even the new firmware does not have resource table, the imx_rproc driver
still returns a valid resource table address which is got from device tree
(rsrc_table) in imx DTS when the driver probe.

@priv->rsc_table is always valid even the firwmare does not have a valid
resource table. The TCM area is not writeable by Linux, so the firmware will
copy the resource table from TCM to DDR if the firmware has a resource table.

Hope this is clear.

>
>If my assumption above is valid than fix that instead of hacking the remoteproc
>core.

I just found V1 was picked up by Bjorn.
It is not hack, clearing table_sz in core code does not hurt, I think.

If my assumption is not valid the changelog and your justification for
>this patch are wrong.  Either way I have spent way too much time on this patch
>already and dropping it.  The same goes for your other patch [1] - resent it
>when you will have properly address the work herein.   

sure.

Thanks,
Peng

>
>[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue
>
>> 
>> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
>> not zero while cached_table is NULL.
>> 	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
>> 	if (loaded_table) {
>> 		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
>> 		rproc->table_ptr = loaded_table;
>> 	}
>> 
>> Thanks,
>> Peng
>> 
>> >
>> >> 
>> >> Thanks,
>> >> Peng
>> >> 
>> >> >
>> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
>> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
>> >> >
>> >> >> When starting rproc with a firmware not have resource table,
>> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
>> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
>> >> >> stop operation, but rproc->table_sz is still valid.
>> >> >> 
>> >> >> This issue is found on i.MX8MP and i.MX9.
>> >> >> 
>> >> >> Dump as below:
>> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> >> >> Mem abort info:
>> >> >>   ESR = 0x0000000096000004
>> >> >>   EC = 0x25: DABT (current EL), IL = 32 bits
>> >> >>   SET = 0, FnV = 0
>> >> >>   EA = 0, S1PTW = 0
>> >> >>   FSC = 0x04: level 0 translation fault
>> >> >> Data abort info:
>> >> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> >> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> >> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
>> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> >> >> Modules linked in:
>> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
>> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
>> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> >> >> pc : __pi_memcpy_generic+0x110/0x22c
>> >> >> lr : rproc_start+0x88/0x1e0
>> >> >> Call trace:
>> >> >>  __pi_memcpy_generic+0x110/0x22c (P)
>> >> >>  rproc_boot+0x198/0x57c
>> >> >>  state_store+0x40/0x104
>> >> >>  dev_attr_store+0x18/0x2c
>> >> >>  sysfs_kf_write+0x7c/0x94
>> >> >>  kernfs_fop_write_iter+0x120/0x1cc
>> >> >>  vfs_write+0x240/0x378
>> >> >>  ksys_write+0x70/0x108
>> >> >>  __arm64_sys_write+0x1c/0x28
>> >> >>  invoke_syscall+0x48/0x10c
>> >> >>  el0_svc_common.constprop.0+0xc0/0xe0
>> >> >>  do_el0_svc+0x1c/0x28
>> >> >>  el0_svc+0x30/0xcc
>> >> >>  el0t_64_sync_handler+0x10c/0x138
>> >> >>  el0t_64_sync+0x198/0x19c
>> >> >> 
>> >> >> Clear rproc->table_sz to address the issue.
>> >> >> 
>> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
>> >> >> rproc_detach.
>> >> >> 
>> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
>> >> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
>> >> >> ---
>> >> >> 
>> >> >> V2:
>> >> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
>> >> >> 
>> >> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
>> >> >>  1 file changed, 3 insertions(+)
>> >> >> 
>> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> >> >> index c2cf0d277729..1efa53d4e0c3 100644
>> >> >> --- a/drivers/remoteproc/remoteproc_core.c
>> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
>> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>> >> >>  	kfree(rproc->cached_table);
>> >> >>  	rproc->cached_table = NULL;
>> >> >>  	rproc->table_ptr = NULL;
>> >> >> +	rproc->table_sz = 0;
>> >> >>  unprepare_rproc:
>> >> >>  	/* release HW resources if needed */
>> >> >>  	rproc_unprepare_device(rproc);
>> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>> >> >>  	kfree(rproc->cached_table);
>> >> >>  	rproc->cached_table = NULL;
>> >> >>  	rproc->table_ptr = NULL;
>> >> >> +	rproc->table_sz = 0;
>> >> >>  out:
>> >> >>  	mutex_unlock(&rproc->lock);
>> >> >>  	return ret;
>> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>> >> >>  	kfree(rproc->cached_table);
>> >> >>  	rproc->cached_table = NULL;
>> >> >>  	rproc->table_ptr = NULL;
>> >> >> +	rproc->table_sz = 0;
>> >> >>  out:
>> >> >>  	mutex_unlock(&rproc->lock);
>> >> >>  	return ret;
>> >> >> -- 
>> >> >> 2.37.1
>> >> >> 
>> >> >
>> >
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Mathieu Poirier 8 months, 2 weeks ago
On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
> On Mon, Mar 31, 2025 at 09:40:41AM -0600, Mathieu Poirier wrote:
> >On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
> >> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
> >> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
> >> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
> >> >> >Hi,
> >> >> >
> >> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
> >> >> >> From: Peng Fan <peng.fan@nxp.com>
> >> >> >> 
> >> >> >> There is case as below could trigger kernel dump:
> >> >> >> Use U-Boot to start remote processor(rproc) with resource table
> >> >> >> published to a fixed address by rproc. After Kernel boots up,
> >> >> >> stop the rproc, load a new firmware which doesn't have resource table
> >> >> >> ,and start rproc.
> >> >> >>
> >> >> >
> >> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
> >> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
> >> >> >remote processor won't be started.  What am I missing?
> >> >> 
> >> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
> >> >> table:
> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
> >> >
> >> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
> >> >that will return NULL if a resource table is not found and preventing the
> >> >memcpy() in rproc_start() from happening:
> >> >
> >> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
> >> 
> >> 
> >> Sorry, I forgot to mention below code:
> >> loaded_table is a valid pointer for i.MX, see
> >> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,
> >
> >(SIGH)
> >
> >The changelong for this patch says "... load a new firmware which doesn't have a
> >resource table..." and now you are telling me that @load_table is valid.  As
> >such I have to _guess_ that @priv->rsc_table is not null.  So which is it -
> >valid or not valid?  
> 
> As wrote in commit log, bootloader kicks the m7 and m7 publishes a valid
> resource table to a fixed address.
> 
> When linux boots up, first stop m7, then load a new firmware which does not
> have resource table, then stop m7.
> 
> Even the new firmware does not have resource table, the imx_rproc driver
> still returns a valid resource table address which is got from device tree
> (rsrc_table) in imx DTS when the driver probe.
> 
> @priv->rsc_table is always valid even the firwmare does not have a valid

And that is where the problem is - why can't that situation be fixed instead of
pushing it to the subsystem core?  Why can't you have code in
imx_rproc_elf_find_loaded_rsc_table() that checks if there is a valid resource
table at the address held by @priv->rsc_table and return NULL if there isn't?

The core is already checking if @loaded_table is valid in rproc_start(), why
can't that be used instead of adding yet another check?

> resource table. The TCM area is not writeable by Linux, so the firmware will
> copy the resource table from TCM to DDR if the firmware has a resource table.
> 
> Hope this is clear.

What is clear is that I spend 4 sessions on a 3-line patch, valuable time I
could have spent reviewing other peoples' patches.

> 
> >
> >If my assumption above is valid than fix that instead of hacking the remoteproc
> >core.
> 
> I just found V1 was picked up by Bjorn.

I am currently working with Bjorn on that.

> It is not hack, clearing table_sz in core code does not hurt, I think.

It is a hack for as long as you haven't provided a valid explanation for the
changes you are proposing.  

> 
> If my assumption is not valid the changelog and your justification for
> >this patch are wrong.  Either way I have spent way too much time on this patch
> >already and dropping it.  The same goes for your other patch [1] - resent it
> >when you will have properly address the work herein.   
> 

And yet you just sent a V2.

> sure.
> 
> Thanks,
> Peng
> 
> >
> >[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue
> >
> >> 
> >> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
> >> not zero while cached_table is NULL.
> >> 	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
> >> 	if (loaded_table) {
> >> 		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
> >> 		rproc->table_ptr = loaded_table;
> >> 	}
> >> 
> >> Thanks,
> >> Peng
> >> 
> >> >
> >> >> 
> >> >> Thanks,
> >> >> Peng
> >> >> 
> >> >> >
> >> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
> >> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
> >> >> >
> >> >> >> When starting rproc with a firmware not have resource table,
> >> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
> >> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
> >> >> >> stop operation, but rproc->table_sz is still valid.
> >> >> >> 
> >> >> >> This issue is found on i.MX8MP and i.MX9.
> >> >> >> 
> >> >> >> Dump as below:
> >> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> >> >> >> Mem abort info:
> >> >> >>   ESR = 0x0000000096000004
> >> >> >>   EC = 0x25: DABT (current EL), IL = 32 bits
> >> >> >>   SET = 0, FnV = 0
> >> >> >>   EA = 0, S1PTW = 0
> >> >> >>   FSC = 0x04: level 0 translation fault
> >> >> >> Data abort info:
> >> >> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >> >> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >> >> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
> >> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> >> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> >> >> Modules linked in:
> >> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
> >> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
> >> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> >> >> pc : __pi_memcpy_generic+0x110/0x22c
> >> >> >> lr : rproc_start+0x88/0x1e0
> >> >> >> Call trace:
> >> >> >>  __pi_memcpy_generic+0x110/0x22c (P)
> >> >> >>  rproc_boot+0x198/0x57c
> >> >> >>  state_store+0x40/0x104
> >> >> >>  dev_attr_store+0x18/0x2c
> >> >> >>  sysfs_kf_write+0x7c/0x94
> >> >> >>  kernfs_fop_write_iter+0x120/0x1cc
> >> >> >>  vfs_write+0x240/0x378
> >> >> >>  ksys_write+0x70/0x108
> >> >> >>  __arm64_sys_write+0x1c/0x28
> >> >> >>  invoke_syscall+0x48/0x10c
> >> >> >>  el0_svc_common.constprop.0+0xc0/0xe0
> >> >> >>  do_el0_svc+0x1c/0x28
> >> >> >>  el0_svc+0x30/0xcc
> >> >> >>  el0t_64_sync_handler+0x10c/0x138
> >> >> >>  el0t_64_sync+0x198/0x19c
> >> >> >> 
> >> >> >> Clear rproc->table_sz to address the issue.
> >> >> >> 
> >> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
> >> >> >> rproc_detach.
> >> >> >> 
> >> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
> >> >> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >> >> >> ---
> >> >> >> 
> >> >> >> V2:
> >> >> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
> >> >> >> 
> >> >> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
> >> >> >>  1 file changed, 3 insertions(+)
> >> >> >> 
> >> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >> >> >> index c2cf0d277729..1efa53d4e0c3 100644
> >> >> >> --- a/drivers/remoteproc/remoteproc_core.c
> >> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
> >> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> >> >> >>  	kfree(rproc->cached_table);
> >> >> >>  	rproc->cached_table = NULL;
> >> >> >>  	rproc->table_ptr = NULL;
> >> >> >> +	rproc->table_sz = 0;
> >> >> >>  unprepare_rproc:
> >> >> >>  	/* release HW resources if needed */
> >> >> >>  	rproc_unprepare_device(rproc);
> >> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
> >> >> >>  	kfree(rproc->cached_table);
> >> >> >>  	rproc->cached_table = NULL;
> >> >> >>  	rproc->table_ptr = NULL;
> >> >> >> +	rproc->table_sz = 0;
> >> >> >>  out:
> >> >> >>  	mutex_unlock(&rproc->lock);
> >> >> >>  	return ret;
> >> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
> >> >> >>  	kfree(rproc->cached_table);
> >> >> >>  	rproc->cached_table = NULL;
> >> >> >>  	rproc->table_ptr = NULL;
> >> >> >> +	rproc->table_sz = 0;
> >> >> >>  out:
> >> >> >>  	mutex_unlock(&rproc->lock);
> >> >> >>  	return ret;
> >> >> >> -- 
> >> >> >> 2.37.1
> >> >> >> 
> >> >> >
> >> >
> >
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 2 weeks ago
On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>> On Mon, Mar 31, 2025 at 09:40:41AM -0600, Mathieu Poirier wrote:
>> >On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
>> >> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
>> >> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
>> >> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
>> >> >> >Hi,
>> >> >> >
>> >> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
>> >> >> >> From: Peng Fan <peng.fan@nxp.com>
>> >> >> >> 
>> >> >> >> There is case as below could trigger kernel dump:
>> >> >> >> Use U-Boot to start remote processor(rproc) with resource table
>> >> >> >> published to a fixed address by rproc. After Kernel boots up,
>> >> >> >> stop the rproc, load a new firmware which doesn't have resource table
>> >> >> >> ,and start rproc.
>> >> >> >>
>> >> >> >
>> >> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
>> >> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
>> >> >> >remote processor won't be started.  What am I missing?
>> >> >> 
>> >> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
>> >> >> table:
>> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
>> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
>> >> >
>> >> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
>> >> >that will return NULL if a resource table is not found and preventing the
>> >> >memcpy() in rproc_start() from happening:
>> >> >
>> >> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
>> >> 
>> >> 
>> >> Sorry, I forgot to mention below code:
>> >> loaded_table is a valid pointer for i.MX, see
>> >> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,
>> >
>> >(SIGH)
>> >
>> >The changelong for this patch says "... load a new firmware which doesn't have a
>> >resource table..." and now you are telling me that @load_table is valid.  As
>> >such I have to _guess_ that @priv->rsc_table is not null.  So which is it -
>> >valid or not valid?  
>> 
>> As wrote in commit log, bootloader kicks the m7 and m7 publishes a valid
>> resource table to a fixed address.
>> 
>> When linux boots up, first stop m7, then load a new firmware which does not
>> have resource table, then stop m7.
>> 
>> Even the new firmware does not have resource table, the imx_rproc driver
>> still returns a valid resource table address which is got from device tree
>> (rsrc_table) in imx DTS when the driver probe.
>> 
>> @priv->rsc_table is always valid even the firwmare does not have a valid
>
>And that is where the problem is - why can't that situation be fixed instead of
>pushing it to the subsystem core?  Why can't you have code in
>imx_rproc_elf_find_loaded_rsc_table() that checks if there is a valid resource
>table at the address held by @priv->rsc_table and return NULL if there isn't?

It is ok address the issue in imx_rproc.c without touching core code.

priv->rsc_table contains valid resource table which is published when
m7 is kicked by bootloader, and m7 publishes a valid table to
priv->rsc_table.

It still contains valid content when linux stops m7.

To make it invalid when linux starts m7 with a firwmare(the elf image not has
resource table), need to clear the content of priv->rsc_table or
write some magic number when stop the m7 which was started by bootloader.

Then it is possible to check priv->rsc_table in
imx_rproc_elf_find_loaded_rsc_table.

The 2nd approach is to clear rproc->table_sz and rproc->table_ptr in
imx_rproc_parse_fw before rproc_elf_load_rsc_table.


>
>The core is already checking if @loaded_table is valid in rproc_start(), why
>can't that be used instead of adding yet another check?

Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
could benifit others in case other platforms meet similar issue in future.

If you think the current patch is not proper, I could do a v3 with the upper
2nd approach.

>
>> resource table. The TCM area is not writeable by Linux, so the firmware will
>> copy the resource table from TCM to DDR if the firmware has a resource table.
>> 
>> Hope this is clear.
>
>What is clear is that I spend 4 sessions on a 3-line patch, valuable time I
>could have spent reviewing other peoples' patches.

Sorry. Not intend to waste your time.

Thanks,
Peng
>
>> 
>> >
>> >If my assumption above is valid than fix that instead of hacking the remoteproc
>> >core.
>> 
>> I just found V1 was picked up by Bjorn.
>
>I am currently working with Bjorn on that.
>
>> It is not hack, clearing table_sz in core code does not hurt, I think.
>
>It is a hack for as long as you haven't provided a valid explanation for the
>changes you are proposing.  
>
>> 
>> If my assumption is not valid the changelog and your justification for
>> >this patch are wrong.  Either way I have spent way too much time on this patch
>> >already and dropping it.  The same goes for your other patch [1] - resent it
>> >when you will have properly address the work herein.   
>> 
>
>And yet you just sent a V2.
>
>> sure.
>> 
>> Thanks,
>> Peng
>> 
>> >
>> >[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue
>> >
>> >> 
>> >> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
>> >> not zero while cached_table is NULL.
>> >> 	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
>> >> 	if (loaded_table) {
>> >> 		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
>> >> 		rproc->table_ptr = loaded_table;
>> >> 	}
>> >> 
>> >> Thanks,
>> >> Peng
>> >> 
>> >> >
>> >> >> 
>> >> >> Thanks,
>> >> >> Peng
>> >> >> 
>> >> >> >
>> >> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
>> >> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
>> >> >> >
>> >> >> >> When starting rproc with a firmware not have resource table,
>> >> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
>> >> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
>> >> >> >> stop operation, but rproc->table_sz is still valid.
>> >> >> >> 
>> >> >> >> This issue is found on i.MX8MP and i.MX9.
>> >> >> >> 
>> >> >> >> Dump as below:
>> >> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> >> >> >> Mem abort info:
>> >> >> >>   ESR = 0x0000000096000004
>> >> >> >>   EC = 0x25: DABT (current EL), IL = 32 bits
>> >> >> >>   SET = 0, FnV = 0
>> >> >> >>   EA = 0, S1PTW = 0
>> >> >> >>   FSC = 0x04: level 0 translation fault
>> >> >> >> Data abort info:
>> >> >> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> >> >> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> >> >> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> >> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
>> >> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>> >> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> >> >> >> Modules linked in:
>> >> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
>> >> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
>> >> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> >> >> >> pc : __pi_memcpy_generic+0x110/0x22c
>> >> >> >> lr : rproc_start+0x88/0x1e0
>> >> >> >> Call trace:
>> >> >> >>  __pi_memcpy_generic+0x110/0x22c (P)
>> >> >> >>  rproc_boot+0x198/0x57c
>> >> >> >>  state_store+0x40/0x104
>> >> >> >>  dev_attr_store+0x18/0x2c
>> >> >> >>  sysfs_kf_write+0x7c/0x94
>> >> >> >>  kernfs_fop_write_iter+0x120/0x1cc
>> >> >> >>  vfs_write+0x240/0x378
>> >> >> >>  ksys_write+0x70/0x108
>> >> >> >>  __arm64_sys_write+0x1c/0x28
>> >> >> >>  invoke_syscall+0x48/0x10c
>> >> >> >>  el0_svc_common.constprop.0+0xc0/0xe0
>> >> >> >>  do_el0_svc+0x1c/0x28
>> >> >> >>  el0_svc+0x30/0xcc
>> >> >> >>  el0t_64_sync_handler+0x10c/0x138
>> >> >> >>  el0t_64_sync+0x198/0x19c
>> >> >> >> 
>> >> >> >> Clear rproc->table_sz to address the issue.
>> >> >> >> 
>> >> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
>> >> >> >> rproc_detach.
>> >> >> >> 
>> >> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
>> >> >> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
>> >> >> >> ---
>> >> >> >> 
>> >> >> >> V2:
>> >> >> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
>> >> >> >> 
>> >> >> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
>> >> >> >>  1 file changed, 3 insertions(+)
>> >> >> >> 
>> >> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> >> >> >> index c2cf0d277729..1efa53d4e0c3 100644
>> >> >> >> --- a/drivers/remoteproc/remoteproc_core.c
>> >> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
>> >> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  unprepare_rproc:
>> >> >> >>  	/* release HW resources if needed */
>> >> >> >>  	rproc_unprepare_device(rproc);
>> >> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  out:
>> >> >> >>  	mutex_unlock(&rproc->lock);
>> >> >> >>  	return ret;
>> >> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  out:
>> >> >> >>  	mutex_unlock(&rproc->lock);
>> >> >> >>  	return ret;
>> >> >> >> -- 
>> >> >> >> 2.37.1
>> >> >> >> 
>> >> >> >
>> >> >
>> >
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Bjorn Andersson 8 months, 2 weeks ago
On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
...
> >
> >The core is already checking if @loaded_table is valid in rproc_start(), why
> >can't that be used instead of adding yet another check?
> 
> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
> could benifit others in case other platforms meet similar issue in future.
> 

I like the general idea of keeping things clean and avoid leaving stale
data behind.

But clearing table_sz during stop in order to hide the fact that the
future table_ptr will contain valid data that shouldn't be used, that's
just a bug waiting to show up again in the future.

Regards,
Bjorn
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 2 weeks ago
Hi Bjorn,


Thanks for replying this thread.

On Wed, Apr 02, 2025 at 08:48:58AM -0500, Bjorn Andersson wrote:
>On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
>> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>...
>> >
>> >The core is already checking if @loaded_table is valid in rproc_start(), why
>> >can't that be used instead of adding yet another check?
>> 
>> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
>> could benifit others in case other platforms meet similar issue in future.
>> 
>
>I like the general idea of keeping things clean and avoid leaving stale
>data behind.
>
>But clearing table_sz during stop in order to hide the fact that the
>future table_ptr will contain valid data that shouldn't be used, that's
>just a bug waiting to show up again in the future.

Agree.

Do you need me to post a fix for
commit efdde3d73ab25ce("remoteproc: core: Clear table_sz when rproc_shutdown")
by clearing table_sz in rproc_fw_boot and rproc_detach as did in this v2?

To i.MX, the above in-tree patch is ok, so all it fine, and this v2 patch
could be dropped.

But anyway, if you prefer a follow up fix, please let me know, I
could post a patch.

Thanks,
Peng

>
>Regards,
>Bjorn
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 1 week ago
On Thu, Apr 03, 2025 at 10:32:39PM +0800, Peng Fan wrote:
>Hi Bjorn,
>
>
>Thanks for replying this thread.
>
>On Wed, Apr 02, 2025 at 08:48:58AM -0500, Bjorn Andersson wrote:
>>On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
>>> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>>> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>>...
>>> >
>>> >The core is already checking if @loaded_table is valid in rproc_start(), why
>>> >can't that be used instead of adding yet another check?
>>> 
>>> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
>>> could benifit others in case other platforms meet similar issue in future.
>>> 
>>
>>I like the general idea of keeping things clean and avoid leaving stale
>>data behind.
>>
>>But clearing table_sz during stop in order to hide the fact that the
>>future table_ptr will contain valid data that shouldn't be used, that's
>>just a bug waiting to show up again in the future.
>
>Agree.
>
>Do you need me to post a fix for
>commit efdde3d73ab25ce("remoteproc: core: Clear table_sz when rproc_shutdown")
>by clearing table_sz in rproc_fw_boot and rproc_detach as did in this v2?
>
>To i.MX, the above in-tree patch is ok, so all it fine, and this v2 patch
>could be dropped.
>
>But anyway, if you prefer a follow up fix, please let me know, I
>could post a patch.

Hi Bjorn, Mathieu,

 I will wait for one more week to see if any concerns or questions.
 Please raise if you have.

 If no, I suppose this thread is done and I will start my other work
 regarding rproc.

Thanks,
Peng

>
>Thanks,
>Peng
>
>>
>>Regards,
>>Bjorn
>>
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Mathieu Poirier 8 months, 1 week ago
On Tue, 8 Apr 2025 at 09:02, Peng Fan <peng.fan@oss.nxp.com> wrote:
>
> On Thu, Apr 03, 2025 at 10:32:39PM +0800, Peng Fan wrote:
> >Hi Bjorn,
> >
> >
> >Thanks for replying this thread.
> >
> >On Wed, Apr 02, 2025 at 08:48:58AM -0500, Bjorn Andersson wrote:
> >>On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
> >>> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
> >>> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
> >>...
> >>> >
> >>> >The core is already checking if @loaded_table is valid in rproc_start(), why
> >>> >can't that be used instead of adding yet another check?
> >>>
> >>> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
> >>> could benifit others in case other platforms meet similar issue in future.
> >>>
> >>
> >>I like the general idea of keeping things clean and avoid leaving stale
> >>data behind.
> >>
> >>But clearing table_sz during stop in order to hide the fact that the
> >>future table_ptr will contain valid data that shouldn't be used, that's
> >>just a bug waiting to show up again in the future.
> >
> >Agree.
> >
> >Do you need me to post a fix for
> >commit efdde3d73ab25ce("remoteproc: core: Clear table_sz when rproc_shutdown")
> >by clearing table_sz in rproc_fw_boot and rproc_detach as did in this v2?
> >
> >To i.MX, the above in-tree patch is ok, so all it fine, and this v2 patch
> >could be dropped.
> >
> >But anyway, if you prefer a follow up fix, please let me know, I
> >could post a patch.
>
> Hi Bjorn, Mathieu,
>
>  I will wait for one more week to see if any concerns or questions.
>  Please raise if you have.
>

I am working with Bjorn to get your patch reverted.  Once that has
happened you can send another patch.

>  If no, I suppose this thread is done and I will start my other work
>  regarding rproc.
>
> Thanks,
> Peng
>
> >
> >Thanks,
> >Peng
> >
> >>
> >>Regards,
> >>Bjorn
> >>
> >
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 8 months, 1 week ago
On Tue, Apr 08, 2025 at 10:59:58AM -0600, Mathieu Poirier wrote:
>On Tue, 8 Apr 2025 at 09:02, Peng Fan <peng.fan@oss.nxp.com> wrote:
>>
>> On Thu, Apr 03, 2025 at 10:32:39PM +0800, Peng Fan wrote:
>> >Hi Bjorn,
>> >
>> >
>> >Thanks for replying this thread.
>> >
>> >On Wed, Apr 02, 2025 at 08:48:58AM -0500, Bjorn Andersson wrote:
>> >>On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
>> >>> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>> >>> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>> >>...
>> >>> >
>> >>> >The core is already checking if @loaded_table is valid in rproc_start(), why
>> >>> >can't that be used instead of adding yet another check?
>> >>>
>> >>> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
>> >>> could benifit others in case other platforms meet similar issue in future.
>> >>>
>> >>
>> >>I like the general idea of keeping things clean and avoid leaving stale
>> >>data behind.
>> >>
>> >>But clearing table_sz during stop in order to hide the fact that the
>> >>future table_ptr will contain valid data that shouldn't be used, that's
>> >>just a bug waiting to show up again in the future.
>> >
>> >Agree.
>> >
>> >Do you need me to post a fix for
>> >commit efdde3d73ab25ce("remoteproc: core: Clear table_sz when rproc_shutdown")
>> >by clearing table_sz in rproc_fw_boot and rproc_detach as did in this v2?
>> >
>> >To i.MX, the above in-tree patch is ok, so all it fine, and this v2 patch
>> >could be dropped.
>> >
>> >But anyway, if you prefer a follow up fix, please let me know, I
>> >could post a patch.
>>
>> Hi Bjorn, Mathieu,
>>
>>  I will wait for one more week to see if any concerns or questions.
>>  Please raise if you have.
>>
>
>I am working with Bjorn to get your patch reverted.  Once that has
>happened you can send another patch.

ok, I am fine with this.

when get reverted, I need use another method to fix the issue.

I posted two approaches[1], but not get you reply. Since Bjorn raised
his concern on 1st approach, I think I need to use the 2nd approach without
touching the core code.
pasted here,
"The 2nd approach is to clear rproc->table_sz and rproc->table_ptr in
imx_rproc_parse_fw before rproc_elf_load_rsc_table.
"

Or a V3 of current patch with updated commit log.

Please suggest.

If you still have concern or things still not clear to you, please
let me know.

[1] https://lore.kernel.org/all/20250402014355.GA22575@nxa18884-linux/

Regards,
Peng

>
>>  If no, I suppose this thread is done and I will start my other work
>>  regarding rproc.
>>
>> Thanks,
>> Peng
>>
>> >
>> >Thanks,
>> >Peng
>> >
>> >>
>> >>Regards,
>> >>Bjorn
>> >>
>> >
>
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Posted by Peng Fan 7 months, 2 weeks ago
On Wed, Apr 09, 2025 at 02:46:10PM +0800, Peng Fan wrote:
>On Tue, Apr 08, 2025 at 10:59:58AM -0600, Mathieu Poirier wrote:
>>On Tue, 8 Apr 2025 at 09:02, Peng Fan <peng.fan@oss.nxp.com> wrote:
>>>
>>> On Thu, Apr 03, 2025 at 10:32:39PM +0800, Peng Fan wrote:
>>> >Hi Bjorn,
>>> >
>>> >
>>> >Thanks for replying this thread.
>>> >
>>> >On Wed, Apr 02, 2025 at 08:48:58AM -0500, Bjorn Andersson wrote:
>>> >>On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
>>> >>> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>>> >>> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>>> >>...
>>> >>> >
>>> >>> >The core is already checking if @loaded_table is valid in rproc_start(), why
>>> >>> >can't that be used instead of adding yet another check?
>>> >>>
>>> >>> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
>>> >>> could benifit others in case other platforms meet similar issue in future.
>>> >>>
>>> >>
>>> >>I like the general idea of keeping things clean and avoid leaving stale
>>> >>data behind.
>>> >>
>>> >>But clearing table_sz during stop in order to hide the fact that the
>>> >>future table_ptr will contain valid data that shouldn't be used, that's
>>> >>just a bug waiting to show up again in the future.
>>> >
>>> >Agree.
>>> >
>>> >Do you need me to post a fix for
>>> >commit efdde3d73ab25ce("remoteproc: core: Clear table_sz when rproc_shutdown")
>>> >by clearing table_sz in rproc_fw_boot and rproc_detach as did in this v2?
>>> >
>>> >To i.MX, the above in-tree patch is ok, so all it fine, and this v2 patch
>>> >could be dropped.
>>> >
>>> >But anyway, if you prefer a follow up fix, please let me know, I
>>> >could post a patch.
>>>
>>> Hi Bjorn, Mathieu,
>>>
>>>  I will wait for one more week to see if any concerns or questions.
>>>  Please raise if you have.
>>>
>>
>>I am working with Bjorn to get your patch reverted.  Once that has
>>happened you can send another patch.

It almost one month passed, I am not sure what status now.
I have patches for i.MX95 that are pending at my local.
I will wait for one more month until 6.16 merge window close, then
post new patches. If any concern, please raise.

Regards,
Peng


>
>ok, I am fine with this.
>
>when get reverted, I need use another method to fix the issue.
>
>I posted two approaches[1], but not get you reply. Since Bjorn raised
>his concern on 1st approach, I think I need to use the 2nd approach without
>touching the core code.
>pasted here,
>"The 2nd approach is to clear rproc->table_sz and rproc->table_ptr in
>imx_rproc_parse_fw before rproc_elf_load_rsc_table.
>"
>
>Or a V3 of current patch with updated commit log.
>
>Please suggest.
>
>If you still have concern or things still not clear to you, please
>let me know.
>
>[1] https://lore.kernel.org/all/20250402014355.GA22575@nxa18884-linux/
>
>Regards,
>Peng
>
>>
>>>  If no, I suppose this thread is done and I will start my other work
>>>  regarding rproc.
>>>
>>> Thanks,
>>> Peng
>>>
>>> >
>>> >Thanks,
>>> >Peng
>>> >
>>> >>
>>> >>Regards,
>>> >>Bjorn
>>> >>
>>> >
>>