drivers/remoteproc/remoteproc_core.c | 22 +++++---- drivers/remoteproc/xlnx_r5_remoteproc.c | 66 ++++++++++++++++++++++++- 2 files changed, 77 insertions(+), 11 deletions(-)
Remote processor can crash or hang during normal execution. Linux remoteproc framework supports different mechanisms to recover the remote processor and re-establish the RPMsg communication in such case. Crash reporting: 1) Using debugfs node User can report the crash to the core framework via debugfs node using following command: echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash 2) Remoteproc notify to the host about crash state and crash reason via the resource table This is a platform specific method where the remote firmware contains vendor specific resource to update the crash state and the crash reason. Then the remote notifies the crash to the host via mailbox notification. The host then will check this resource on every mbox notification and reports the crash to the core framework if needed. Crash recovery mechanism: There are two mechanisms available to recover the remote processor from the crash. 1) boot recovery, 2) attach on recovery Remoteproc core framework will choose proper mechanism based on the rproc features set by the platform driver. 1) Boot recovery This is the default mechanism to recover the remote processor. In this method core framework will first stop the remote processor, load the firmware again and then starts the remote processor. On AMD-Xilinx platforms this method is supported. The coredump callback in the platform driver isn't implemented so far, but that shouldn't cause the recovery failure. 2) Attach on recovery If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver, then the core framework will choose this method for recovery. On zynqmp platform following is the sequence of events expected during remoteproc crash and attach on recovery: a) rproc attach/detach flow is working, and RPMsg comm is established b) Remote processor (RPU) crashed (crash not reported yet) c) Platform management controller stops and reloads elf on inactive remote processor before reboot d) platform management controller reboots the remote processor e) Remote processor boots again, and detects previous crash (platform specific mechanism to detect the crash) f) Remote processor Reports crash to the Linux (Host) and wait for the recovery. g) Linux performs full detach and reattach to remote processor. h) Normal RPMsg communication is established. It is required to destroy all RPMsg related resource and re-create them during recovery to establish successful RPMsg communication. To achieve this complete rproc_detach followed by rproc_attach calls are needed. Tanmay Shah (3): remoteproc: xlnx: enable boot recovery remoteproc: core: full attach detach during recovery remoteproc: xlnx: add crash detection mechanism drivers/remoteproc/remoteproc_core.c | 22 +++++---- drivers/remoteproc/xlnx_r5_remoteproc.c | 66 ++++++++++++++++++++++++- 2 files changed, 77 insertions(+), 11 deletions(-) base-commit: 5f4d69f0ef4f09cd926d193fe0df0c84d53e8252 -- 2.34.1
Hi Tanmay, On Mon, Oct 27, 2025 at 09:57:28PM -0700, Tanmay Shah wrote: >Remote processor can crash or hang during normal execution. Linux >remoteproc framework supports different mechanisms to recover the >remote processor and re-establish the RPMsg communication in such case. > >Crash reporting: > >1) Using debugfs node > >User can report the crash to the core framework via debugfs node using >following command: > >echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash > >2) Remoteproc notify to the host about crash state and crash reason >via the resource table > >This is a platform specific method where the remote firmware contains >vendor specific resource to update the crash state and the crash >reason. Then the remote notifies the crash to the host via mailbox >notification. The host then will check this resource on every mbox >notification and reports the crash to the core framework if needed. > >Crash recovery mechanism: > >There are two mechanisms available to recover the remote processor from >the crash. 1) boot recovery, 2) attach on recovery > >Remoteproc core framework will choose proper mechanism based on the >rproc features set by the platform driver. > >1) Boot recovery > >This is the default mechanism to recover the remote processor. >In this method core framework will first stop the remote processor, >load the firmware again and then starts the remote processor. On >AMD-Xilinx platforms this method is supported. The coredump callback in >the platform driver isn't implemented so far, but that shouldn't cause >the recovery failure. > >2) Attach on recovery > >If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver, >then the core framework will choose this method for recovery. > >On zynqmp platform following is the sequence of events expected during >remoteproc crash and attach on recovery: > >a) rproc attach/detach flow is working, and RPMsg comm is established >b) Remote processor (RPU) crashed (crash not reported yet) >c) Platform management controller stops and reloads elf on inactive > remote processor before reboot >d) platform management controller reboots the remote processor >e) Remote processor boots again, and detects previous crash (platform > specific mechanism to detect the crash) >f) Remote processor Reports crash to the Linux (Host) and wait for > the recovery. >g) Linux performs full detach and reattach to remote processor. >h) Normal RPMsg communication is established. > >It is required to destroy all RPMsg related resource and re-create them >during recovery to establish successful RPMsg communication. To achieve >this complete rproc_detach followed by rproc_attach calls are needed. > > >Tanmay Shah (3): > remoteproc: xlnx: enable boot recovery > remoteproc: core: full attach detach during recovery > remoteproc: xlnx: add crash detection mechanism > I gave a test on i.MX8QM-MEK, there are failures, 1st test pass, 2nd test fail. Without this patch, I not see failures. root@imx8qmmek:~# remoteproc remoteproc0: crash detected in imx-rproc: type watchdog Partition3 reset! remoteproc remoteproc0: handling crash #1 in imx-rproc remoteproc remoteproc0: detached remote processor imx-rproc rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@90400000 virtio_rpmsg_bus virtio0: rpmsg host is online rproc-virtio rproc-virtio.1.auto: registered virtio0 (type 7) rproc-virtio rproc-virtio.2.auto: assigned reserved memory node vdevbuffer@90400000 virtio_rpmsg_bus virtio1: rpmsg host is online rproc-virtio rproc-virtio.2.auto: registered virtio1 (type 7) remoteproc remoteproc0: remote processor imx-rproc is now attached virtio_rpmsg_bus virtio1: creating channel rpmsg-openamp-demo-channel addr 0x1e remoteproc remoteproc0: crash detected in imx-rproc: type watchdog Partition3 reset! remoteproc remoteproc0: handling crash #2 in imx-rproc rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@90400000 virtio_rpmsg_bus virtio4: probe with driver virtio_rpmsg_bus failed with error -12 rproc-virtio rproc-virtio.1.auto: registered virtio4 (type 7) rproc-virtio rproc-virtio.2.auto: assigned reserved memory node vdevbuffer@90400000 virtio_rpmsg_bus virtio5: probe with driver virtio_rpmsg_bus failed with error -12 rproc-virtio rproc-virtio.2.auto: registered virtio5 (type 7) rproc-virtio rproc-virtio.5.auto: assigned reserved memory node vdevbuffer@90400000 virtio_rpmsg_bus virtio6: probe with driver virtio_rpmsg_bus failed with error -12 rproc-virtio rproc-virtio.5.auto: registered virtio6 (type 7) rproc-virtio rproc-virtio.6.auto: assigned reserved memory node vdevbuffer@90400000 virtio_rpmsg_bus virtio7: probe with driver virtio_rpmsg_bus failed with error -12 rproc-virtio rproc-virtio.6.auto: registered virtio7 (type 7) remoteproc remoteproc0: remote processor imx-rproc is now attached Thanks, Peng
On 10/28/25 10:24 PM, Peng Fan wrote: > Hi Tanmay, > > On Mon, Oct 27, 2025 at 09:57:28PM -0700, Tanmay Shah wrote: >> Remote processor can crash or hang during normal execution. Linux >> remoteproc framework supports different mechanisms to recover the >> remote processor and re-establish the RPMsg communication in such case. >> >> Crash reporting: >> >> 1) Using debugfs node >> >> User can report the crash to the core framework via debugfs node using >> following command: >> >> echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash >> >> 2) Remoteproc notify to the host about crash state and crash reason >> via the resource table >> >> This is a platform specific method where the remote firmware contains >> vendor specific resource to update the crash state and the crash >> reason. Then the remote notifies the crash to the host via mailbox >> notification. The host then will check this resource on every mbox >> notification and reports the crash to the core framework if needed. >> >> Crash recovery mechanism: >> >> There are two mechanisms available to recover the remote processor from >> the crash. 1) boot recovery, 2) attach on recovery >> >> Remoteproc core framework will choose proper mechanism based on the >> rproc features set by the platform driver. >> >> 1) Boot recovery >> >> This is the default mechanism to recover the remote processor. >> In this method core framework will first stop the remote processor, >> load the firmware again and then starts the remote processor. On >> AMD-Xilinx platforms this method is supported. The coredump callback in >> the platform driver isn't implemented so far, but that shouldn't cause >> the recovery failure. >> >> 2) Attach on recovery >> >> If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver, >> then the core framework will choose this method for recovery. >> >> On zynqmp platform following is the sequence of events expected during >> remoteproc crash and attach on recovery: >> >> a) rproc attach/detach flow is working, and RPMsg comm is established >> b) Remote processor (RPU) crashed (crash not reported yet) >> c) Platform management controller stops and reloads elf on inactive >> remote processor before reboot >> d) platform management controller reboots the remote processor >> e) Remote processor boots again, and detects previous crash (platform >> specific mechanism to detect the crash) >> f) Remote processor Reports crash to the Linux (Host) and wait for >> the recovery. >> g) Linux performs full detach and reattach to remote processor. >> h) Normal RPMsg communication is established. >> >> It is required to destroy all RPMsg related resource and re-create them >> during recovery to establish successful RPMsg communication. To achieve >> this complete rproc_detach followed by rproc_attach calls are needed. >> >> >> Tanmay Shah (3): >> remoteproc: xlnx: enable boot recovery >> remoteproc: core: full attach detach during recovery >> remoteproc: xlnx: add crash detection mechanism >> > > I gave a test on i.MX8QM-MEK, there are failures, 1st test pass, 2nd test fail. > Without this patch, I not see failures. > root@imx8qmmek:~# > remoteproc remoteproc0: crash detected in imx-rproc: type watchdog > Partition3 reset! > remoteproc remoteproc0: handling crash #1 in imx-rproc > remoteproc remoteproc0: detached remote processor imx-rproc > rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@90400000 > virtio_rpmsg_bus virtio0: rpmsg host is online > rproc-virtio rproc-virtio.1.auto: registered virtio0 (type 7) > rproc-virtio rproc-virtio.2.auto: assigned reserved memory node vdevbuffer@90400000 > virtio_rpmsg_bus virtio1: rpmsg host is online > rproc-virtio rproc-virtio.2.auto: registered virtio1 (type 7) > remoteproc remoteproc0: remote processor imx-rproc is now attached > virtio_rpmsg_bus virtio1: creating channel rpmsg-openamp-demo-channel addr 0x1e > > remoteproc remoteproc0: crash detected in imx-rproc: type watchdog > Partition3 reset! > remoteproc remoteproc0: handling crash #2 in imx-rproc > rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@90400000 > virtio_rpmsg_bus virtio4: probe with driver virtio_rpmsg_bus failed with error -12 > rproc-virtio rproc-virtio.1.auto: registered virtio4 (type 7) > rproc-virtio rproc-virtio.2.auto: assigned reserved memory node vdevbuffer@90400000 > virtio_rpmsg_bus virtio5: probe with driver virtio_rpmsg_bus failed with error -12 > rproc-virtio rproc-virtio.2.auto: registered virtio5 (type 7) > rproc-virtio rproc-virtio.5.auto: assigned reserved memory node vdevbuffer@90400000 > virtio_rpmsg_bus virtio6: probe with driver virtio_rpmsg_bus failed with error -12 > rproc-virtio rproc-virtio.5.auto: registered virtio6 (type 7) > rproc-virtio rproc-virtio.6.auto: assigned reserved memory node vdevbuffer@90400000 > virtio_rpmsg_bus virtio7: probe with driver virtio_rpmsg_bus failed with error -12 > rproc-virtio rproc-virtio.6.auto: registered virtio7 (type 7) > remoteproc remoteproc0: remote processor imx-rproc is now attached > Hi Peng, I don't understand why it should fail. The patch simply implements rproc_detach() -> rproc_attach() sequence. In your case, when you do detach -> attach via sysfs that sequence works? If that works, then crash recovery should work as well. Could you give steps how do you generate the crash? Thanks, Tanmay > Thanks, > Peng
On 10/28/25 11:15 PM, Tanmay Shah wrote: > > > On 10/28/25 10:24 PM, Peng Fan wrote: >> Hi Tanmay, >> >> On Mon, Oct 27, 2025 at 09:57:28PM -0700, Tanmay Shah wrote: >>> Remote processor can crash or hang during normal execution. Linux >>> remoteproc framework supports different mechanisms to recover the >>> remote processor and re-establish the RPMsg communication in such case. >>> >>> Crash reporting: >>> >>> 1) Using debugfs node >>> >>> User can report the crash to the core framework via debugfs node using >>> following command: >>> >>> echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash >>> >>> 2) Remoteproc notify to the host about crash state and crash reason >>> via the resource table >>> >>> This is a platform specific method where the remote firmware contains >>> vendor specific resource to update the crash state and the crash >>> reason. Then the remote notifies the crash to the host via mailbox >>> notification. The host then will check this resource on every mbox >>> notification and reports the crash to the core framework if needed. >>> >>> Crash recovery mechanism: >>> >>> There are two mechanisms available to recover the remote processor from >>> the crash. 1) boot recovery, 2) attach on recovery >>> >>> Remoteproc core framework will choose proper mechanism based on the >>> rproc features set by the platform driver. >>> >>> 1) Boot recovery >>> >>> This is the default mechanism to recover the remote processor. >>> In this method core framework will first stop the remote processor, >>> load the firmware again and then starts the remote processor. On >>> AMD-Xilinx platforms this method is supported. The coredump callback in >>> the platform driver isn't implemented so far, but that shouldn't cause >>> the recovery failure. >>> >>> 2) Attach on recovery >>> >>> If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver, >>> then the core framework will choose this method for recovery. >>> >>> On zynqmp platform following is the sequence of events expected during >>> remoteproc crash and attach on recovery: >>> >>> a) rproc attach/detach flow is working, and RPMsg comm is established >>> b) Remote processor (RPU) crashed (crash not reported yet) >>> c) Platform management controller stops and reloads elf on inactive >>> remote processor before reboot >>> d) platform management controller reboots the remote processor >>> e) Remote processor boots again, and detects previous crash (platform >>> specific mechanism to detect the crash) >>> f) Remote processor Reports crash to the Linux (Host) and wait for >>> the recovery. >>> g) Linux performs full detach and reattach to remote processor. >>> h) Normal RPMsg communication is established. >>> >>> It is required to destroy all RPMsg related resource and re-create them >>> during recovery to establish successful RPMsg communication. To achieve >>> this complete rproc_detach followed by rproc_attach calls are needed. >>> >>> >>> Tanmay Shah (3): >>> remoteproc: xlnx: enable boot recovery >>> remoteproc: core: full attach detach during recovery >>> remoteproc: xlnx: add crash detection mechanism >>> >> >> I gave a test on i.MX8QM-MEK, there are failures, 1st test pass, 2nd >> test fail. >> Without this patch, I not see failures. >> root@imx8qmmek:~# >> remoteproc remoteproc0: crash detected in imx-rproc: type watchdog >> Partition3 reset! >> remoteproc remoteproc0: handling crash #1 in imx-rproc >> remoteproc remoteproc0: detached remote processor imx-rproc >> rproc-virtio rproc-virtio.1.auto: assigned reserved memory node >> vdevbuffer@90400000 >> virtio_rpmsg_bus virtio0: rpmsg host is online >> rproc-virtio rproc-virtio.1.auto: registered virtio0 (type 7) >> rproc-virtio rproc-virtio.2.auto: assigned reserved memory node >> vdevbuffer@90400000 >> virtio_rpmsg_bus virtio1: rpmsg host is online >> rproc-virtio rproc-virtio.2.auto: registered virtio1 (type 7) >> remoteproc remoteproc0: remote processor imx-rproc is now attached >> virtio_rpmsg_bus virtio1: creating channel rpmsg-openamp-demo-channel >> addr 0x1e >> >> remoteproc remoteproc0: crash detected in imx-rproc: type watchdog >> Partition3 reset! >> remoteproc remoteproc0: handling crash #2 in imx-rproc >> rproc-virtio rproc-virtio.1.auto: assigned reserved memory node >> vdevbuffer@90400000 >> virtio_rpmsg_bus virtio4: probe with driver virtio_rpmsg_bus failed >> with error -12 >> rproc-virtio rproc-virtio.1.auto: registered virtio4 (type 7) >> rproc-virtio rproc-virtio.2.auto: assigned reserved memory node >> vdevbuffer@90400000 >> virtio_rpmsg_bus virtio5: probe with driver virtio_rpmsg_bus failed >> with error -12 >> rproc-virtio rproc-virtio.2.auto: registered virtio5 (type 7) >> rproc-virtio rproc-virtio.5.auto: assigned reserved memory node >> vdevbuffer@90400000 >> virtio_rpmsg_bus virtio6: probe with driver virtio_rpmsg_bus failed >> with error -12 >> rproc-virtio rproc-virtio.5.auto: registered virtio6 (type 7) >> rproc-virtio rproc-virtio.6.auto: assigned reserved memory node >> vdevbuffer@90400000 >> virtio_rpmsg_bus virtio7: probe with driver virtio_rpmsg_bus failed >> with error -12 >> rproc-virtio rproc-virtio.6.auto: registered virtio7 (type 7) >> remoteproc remoteproc0: remote processor imx-rproc is now attached >> > > Hi Peng, > > I don't understand why it should fail. The patch simply implements > rproc_detach() -> rproc_attach() sequence. > Hi Peng, Thanks for testing the patch. I appreciate your quick response. I think rproc_boot() should be used instead of rproc_attach(). That should probably solve the issue you are facing. I will send v2 with this change for you to try. Thanks, Tanmay > In your case, when you do detach -> attach via sysfs that sequence > works? If that works, then crash recovery should work as well. > > Could you give steps how do you generate the crash? > > Thanks, > Tanmay > >> Thanks, >> Peng >
Hi Tanmay, On Wed, Oct 29, 2025 at 06:51:51PM -0500, Tanmay Shah wrote: ... >> > >> >> Hi Peng, >> >> I don't understand why it should fail. The patch simply implements >> rproc_detach() -> rproc_attach() sequence. >> > >Hi Peng, > >Thanks for testing the patch. I appreciate your quick response. I think >rproc_boot() should be used instead of rproc_attach(). That should probably >solve the issue you are facing. I will send v2 with this change for you to >try. > >Thanks, >Tanmay > >> In your case, when you do detach -> attach via sysfs that sequence works? >> If that works, then crash recovery should work as well. sysfs does not have attach option, only start/stop/detach are there. >> >> Could you give steps how do you generate the crash? I have not look into the details on why it fails at my side for the 2nd time. On my board, the M4 core use watchdog to reset itself and notify Linux, then linux side imx_rproc driver will do "rproc_report_crash(priv->rproc, RPROC_WATCHDOG);" I will give a debug on the failures in a few days. Thanks, Peng >> >> Thanks, >> Tanmay >> >> > Thanks, >> > Peng >> >
On Thu, Oct 30, 2025 at 12:21:24PM +0800, Peng Fan wrote: > Hi Tanmay, > > On Wed, Oct 29, 2025 at 06:51:51PM -0500, Tanmay Shah wrote: > ... > >> > > >> > >> Hi Peng, > >> > >> I don't understand why it should fail. The patch simply implements > >> rproc_detach() -> rproc_attach() sequence. > >> > > > >Hi Peng, > > > >Thanks for testing the patch. I appreciate your quick response. I think > >rproc_boot() should be used instead of rproc_attach(). That should probably > >solve the issue you are facing. I will send v2 with this change for you to > >try. > > > >Thanks, > >Tanmay > > > >> In your case, when you do detach -> attach via sysfs that sequence works? > >> If that works, then crash recovery should work as well. > > sysfs does not have attach option, only start/stop/detach are there. > > >> > >> Could you give steps how do you generate the crash? > > I have not look into the details on why it fails at my side for the 2nd time. > > On my board, the M4 core use watchdog to reset itself and notify Linux, then > linux side imx_rproc driver will do > "rproc_report_crash(priv->rproc, RPROC_WATCHDOG);" > > I will give a debug on the failures in a few days. > So what is happening here - Peng, do you plan on providing more debugging information? Tanmay - are you planning on sending a second revision? > Thanks, > Peng > > >> > >> Thanks, > >> Tanmay > >> > >> > Thanks, > >> > Peng > >> > >
On 11/10/25 12:03 PM, Mathieu Poirier wrote: > On Thu, Oct 30, 2025 at 12:21:24PM +0800, Peng Fan wrote: >> Hi Tanmay, >> >> On Wed, Oct 29, 2025 at 06:51:51PM -0500, Tanmay Shah wrote: >> ... >>>>> >>>> >>>> Hi Peng, >>>> >>>> I don't understand why it should fail. The patch simply implements >>>> rproc_detach() -> rproc_attach() sequence. >>>> >>> >>> Hi Peng, >>> >>> Thanks for testing the patch. I appreciate your quick response. I think >>> rproc_boot() should be used instead of rproc_attach(). That should probably >>> solve the issue you are facing. I will send v2 with this change for you to >>> try. >>> >>> Thanks, >>> Tanmay >>> >>>> In your case, when you do detach -> attach via sysfs that sequence works? >>>> If that works, then crash recovery should work as well. >> >> sysfs does not have attach option, only start/stop/detach are there. >> >>>> >>>> Could you give steps how do you generate the crash? >> >> I have not look into the details on why it fails at my side for the 2nd time. >> >> On my board, the M4 core use watchdog to reset itself and notify Linux, then >> linux side imx_rproc driver will do >> "rproc_report_crash(priv->rproc, RPROC_WATCHDOG);" >> >> I will give a debug on the failures in a few days. >> > > So what is happening here - Peng, do you plan on providing more debugging > information? Tanmay - are you planning on sending a second revision? > Mathieu, I will be providing the v2, that will replace rproc_attach with rproc_boot. I am testing it, so far have not seen any issues. I hope that will resolve Peng's problem. V2 will be posted this week sometime. Thanks, Tanmay >> Thanks, >> Peng >> >>>> >>>> Thanks, >>>> Tanmay >>>> >>>>> Thanks, >>>>> Peng >>>> >>>
© 2016 - 2026 Red Hat, Inc.