drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
Current recovery operation does only virtio device reset, but do not
free and re-allocate all the resources. As third-party is booting the
remote processor during attach-detach, it is better to free and
re-allocate resoruces as resource table state might be unknown to linux
when remote processor boots and reports crash.
Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
---
Note: RFC patch for design discussion. Please do not merge.
drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 825672100528..4971508bc5b2 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
if (ret)
return ret;
- return __rproc_attach(rproc);
+ /* clean up all acquired resources */
+ rproc_resource_cleanup(rproc);
+
+ /* release HW resources if needed */
+ rproc_unprepare_device(rproc);
+
+ rproc_disable_iommu(rproc);
+
+ /* Free the copy of the resource table */
+ kfree(rproc->cached_table);
+ rproc->cached_table = NULL;
+ rproc->table_ptr = NULL;
+
+ return rproc_attach(rproc);
}
static int rproc_boot_recovery(struct rproc *rproc)
base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
--
2.34.1
Good morning, On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote: > Current recovery operation does only virtio device reset, but do not > free and re-allocate all the resources. As third-party is booting the > remote processor during attach-detach, it is better to free and > re-allocate resoruces as resource table state might be unknown to linux > when remote processor boots and reports crash. 1) When referring to "third-party", should I assume boot loader? 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls rproc_reset_rsc_table_on_detach(). That function deals explicitly with the resource table. 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something that is likely not a good idea. We either do __rproc_detach/__rproc_attach or rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the amount of possible states to a minimum. If I understand correctly, the main motivation for this patch is the management of the resource table. But as noted in (2), this should be taken care of. Am I missing some information? Thanks, Mathieu > > Signed-off-by: Tanmay Shah <tanmay.shah@amd.com> > --- > > Note: RFC patch for design discussion. Please do not merge. > > drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index 825672100528..4971508bc5b2 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc) > if (ret) > return ret; > > - return __rproc_attach(rproc); > + /* clean up all acquired resources */ > + rproc_resource_cleanup(rproc); > + > + /* release HW resources if needed */ > + rproc_unprepare_device(rproc); > + > + rproc_disable_iommu(rproc); > + > + /* Free the copy of the resource table */ > + kfree(rproc->cached_table); > + rproc->cached_table = NULL; > + rproc->table_ptr = NULL; > + > + return rproc_attach(rproc); > } > > static int rproc_boot_recovery(struct rproc *rproc) > > base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac > -- > 2.34.1 >
Hello, Please find my comments below: On 10/16/25 10:12 AM, Mathieu Poirier wrote: > Good morning, > > On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote: >> Current recovery operation does only virtio device reset, but do not >> free and re-allocate all the resources. As third-party is booting the >> remote processor during attach-detach, it is better to free and >> re-allocate resoruces as resource table state might be unknown to linux >> when remote processor boots and reports crash. > > 1) When referring to "third-party", should I assume boot loader? Here, "third-party" could be a bootloader or another core in a heterogeneous system. In my-case it is a platform management controller. > 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls > rproc_reset_rsc_table_on_detach(). That function deals explicitly with the > resource table. As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean resource table, that sets vring addresses to 0xffffffff. Please let me know if this understanding is not correct. If we do not, call rproc_attach(), then correct vring addresses are not setup in the resource table for next attach to work. Because, rproc_handle_resources() and rproc_alloc_registered_carveouts() are not called as part __rproc_attach(). > 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something > that is likely not a good idea. We either do __rproc_detach/__rproc_attach or > rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the > amount of possible states to a minimum. > I agree to this. I can find a way to call rproc_detach() and rproc_attach() sequentially, instead of __rproc_detach() and rproc_attach() calls. I might have to remove rproc_trigger_attach_recovery completely, but that is implementation details. We can work it out later, once we agree to the current problem & solution. > If I understand correctly, the main motivation for this patch is the management > of the resource table. But as noted in (2), this should be taken care of. Am I > missing some information? > The main motivation is to make the attach operation works during attach_recovery(). The __rproc_detach() works as expected, but attach doesn't work. After recovery, I am not able to strat RPMsg communication. Please let me know if I am missing something. Thanks, Tanmay > Thanks, > Mathieu > >> >> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com> >> --- >> >> Note: RFC patch for design discussion. Please do not merge. >> >> drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >> index 825672100528..4971508bc5b2 100644 >> --- a/drivers/remoteproc/remoteproc_core.c >> +++ b/drivers/remoteproc/remoteproc_core.c >> @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc) >> if (ret) >> return ret; >> >> - return __rproc_attach(rproc); >> + /* clean up all acquired resources */ >> + rproc_resource_cleanup(rproc); >> + >> + /* release HW resources if needed */ >> + rproc_unprepare_device(rproc); >> + >> + rproc_disable_iommu(rproc); >> + >> + /* Free the copy of the resource table */ >> + kfree(rproc->cached_table); >> + rproc->cached_table = NULL; >> + rproc->table_ptr = NULL; >> + >> + return rproc_attach(rproc); >> } >> >> static int rproc_boot_recovery(struct rproc *rproc) >> >> base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac >> -- >> 2.34.1 >>
On Thu, Oct 16, 2025 at 11:12:26AM -0500, Tanmay Shah wrote: > > Hello, > > Please find my comments below: > > On 10/16/25 10:12 AM, Mathieu Poirier wrote: > > Good morning, > > > > On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote: > > > Current recovery operation does only virtio device reset, but do not > > > free and re-allocate all the resources. As third-party is booting the > > > remote processor during attach-detach, it is better to free and > > > re-allocate resoruces as resource table state might be unknown to linux > > > when remote processor boots and reports crash. > > > > 1) When referring to "third-party", should I assume boot loader? > > Here, "third-party" could be a bootloader or another core in a heterogeneous > system. In my-case it is a platform management controller. Ok > > > > 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls > > rproc_reset_rsc_table_on_detach(). That function deals explicitly with the > > resource table. > > As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean > resource table, that sets vring addresses to 0xffffffff. Please let me know > if this understanding is not correct. > > If we do not, call rproc_attach(), then correct vring addresses are not > setup in the resource table for next attach to work. Because, > rproc_handle_resources() and rproc_alloc_registered_carveouts() are not > called as part __rproc_attach(). Your assessment is correct. When the clean_table was introduced, it was to address the detach->attach scenario. At that time the only recovery we supported was to stop and start again, which did not involved the clean_table. Re-attaching on crash was introduced later in a scenario that may not have included a resource table. > > > 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something > > that is likely not a good idea. We either do __rproc_detach/__rproc_attach or > > rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the > > amount of possible states to a minimum. > > > > I agree to this. I can find a way to call rproc_detach() and rproc_attach() > sequentially, instead of __rproc_detach() and rproc_attach() calls. I might > have to remove rproc_trigger_attach_recovery completely, but that is > implementation details. We can work it out later, once we agree to the > current problem & solution. > Humm... You might just be able to call rproc_detach/rproc_attach from rproc_attach_recovery() if you enhance rproc_detach to be called in a CRASHED context [1]. Let's see what you find when trying this on real HW. [1]. https://elixir.bootlin.com/linux/v6.17.1/source/drivers/remoteproc/remoteproc_core.c#L2065 > > If I understand correctly, the main motivation for this patch is the management > > of the resource table. But as noted in (2), this should be taken care of. Am I > > missing some information? > > > > The main motivation is to make the attach operation works during > attach_recovery(). The __rproc_detach() works as expected, but attach > doesn't work. After recovery, I am not able to strat RPMsg communication. > > Please let me know if I am missing something. > > Thanks, > Tanmay > > > Thanks, > > Mathieu > > > > > > > > Signed-off-by: Tanmay Shah <tanmay.shah@amd.com> > > > --- > > > > > > Note: RFC patch for design discussion. Please do not merge. > > > > > > drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++- > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > > index 825672100528..4971508bc5b2 100644 > > > --- a/drivers/remoteproc/remoteproc_core.c > > > +++ b/drivers/remoteproc/remoteproc_core.c > > > @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc) > > > if (ret) > > > return ret; > > > - return __rproc_attach(rproc); > > > + /* clean up all acquired resources */ > > > + rproc_resource_cleanup(rproc); > > > + > > > + /* release HW resources if needed */ > > > + rproc_unprepare_device(rproc); > > > + > > > + rproc_disable_iommu(rproc); > > > + > > > + /* Free the copy of the resource table */ > > > + kfree(rproc->cached_table); > > > + rproc->cached_table = NULL; > > > + rproc->table_ptr = NULL; > > > + > > > + return rproc_attach(rproc); > > > } > > > static int rproc_boot_recovery(struct rproc *rproc) > > > > > > base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac > > > -- > > > 2.34.1 > > > >
On 10/17/25 10:35 AM, Mathieu Poirier wrote: > On Thu, Oct 16, 2025 at 11:12:26AM -0500, Tanmay Shah wrote: >> >> Hello, >> >> Please find my comments below: >> >> On 10/16/25 10:12 AM, Mathieu Poirier wrote: >>> Good morning, >>> >>> On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote: >>>> Current recovery operation does only virtio device reset, but do not >>>> free and re-allocate all the resources. As third-party is booting the >>>> remote processor during attach-detach, it is better to free and >>>> re-allocate resoruces as resource table state might be unknown to linux >>>> when remote processor boots and reports crash. >>> >>> 1) When referring to "third-party", should I assume boot loader? >> >> Here, "third-party" could be a bootloader or another core in a heterogeneous >> system. In my-case it is a platform management controller. > > Ok > >> >> >>> 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls >>> rproc_reset_rsc_table_on_detach(). That function deals explicitly with the >>> resource table. >> >> As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean >> resource table, that sets vring addresses to 0xffffffff. Please let me know >> if this understanding is not correct. >> >> If we do not, call rproc_attach(), then correct vring addresses are not >> setup in the resource table for next attach to work. Because, >> rproc_handle_resources() and rproc_alloc_registered_carveouts() are not >> called as part __rproc_attach(). > > Your assessment is correct. When the clean_table was introduced, it was to > address the detach->attach scenario. At that time the only recovery we > supported was to stop and start again, which did not involved the clean_table. > Re-attaching on crash was introduced later in a scenario that may not have > included a resource table. > Okay that explains the current architecture. >> >>> 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something >>> that is likely not a good idea. We either do __rproc_detach/__rproc_attach or >>> rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the >>> amount of possible states to a minimum. >>> >> >> I agree to this. I can find a way to call rproc_detach() and rproc_attach() >> sequentially, instead of __rproc_detach() and rproc_attach() calls. I might >> have to remove rproc_trigger_attach_recovery completely, but that is >> implementation details. We can work it out later, once we agree to the >> current problem & solution. >> > > Humm... You might just be able to call rproc_detach/rproc_attach from > rproc_attach_recovery() if you enhance rproc_detach to be called in a CRASHED > context [1]. Let's see what you find when trying this on real HW. > > [1]. https://elixir.bootlin.com/linux/v6.17.1/source/drivers/remoteproc/remoteproc_core.c#L2065 > > Thank You for the suggestion. Agreed. Since we are coming to an agreement on the final solution, I will send the actual series which also takes care of start/stop recovery on xlnx platform driver. I will implement above suggestion, and test on HW. Thanks, Tanmay >>> If I understand correctly, the main motivation for this patch is the management >>> of the resource table. But as noted in (2), this should be taken care of. Am I >>> missing some information? >>> >> >> The main motivation is to make the attach operation works during >> attach_recovery(). The __rproc_detach() works as expected, but attach >> doesn't work. After recovery, I am not able to strat RPMsg communication. >> >> Please let me know if I am missing something. >> >> Thanks, >> Tanmay >> >>> Thanks, >>> Mathieu >>> >>>> >>>> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com> >>>> --- >>>> >>>> Note: RFC patch for design discussion. Please do not merge. >>>> >>>> drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++- >>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >>>> index 825672100528..4971508bc5b2 100644 >>>> --- a/drivers/remoteproc/remoteproc_core.c >>>> +++ b/drivers/remoteproc/remoteproc_core.c >>>> @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc) >>>> if (ret) >>>> return ret; >>>> - return __rproc_attach(rproc); >>>> + /* clean up all acquired resources */ >>>> + rproc_resource_cleanup(rproc); >>>> + >>>> + /* release HW resources if needed */ >>>> + rproc_unprepare_device(rproc); >>>> + >>>> + rproc_disable_iommu(rproc); >>>> + >>>> + /* Free the copy of the resource table */ >>>> + kfree(rproc->cached_table); >>>> + rproc->cached_table = NULL; >>>> + rproc->table_ptr = NULL; >>>> + >>>> + return rproc_attach(rproc); >>>> } >>>> static int rproc_boot_recovery(struct rproc *rproc) >>>> >>>> base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac >>>> -- >>>> 2.34.1 >>>> >>
© 2016 - 2026 Red Hat, Inc.