[PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync

Ulf Hansson posted 24 patches 3 months, 1 week ago
[PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Ulf Hansson 3 months, 1 week ago
Powering-off a genpd that was on during boot, before all of its consumer
devices have been probed, is certainly prone to problems.

As a step to improve this situation, let's prevent these genpds from being
powered-off until genpd_power_off_unused() gets called, which is a
late_initcall_sync().

Note that, this still doesn't guarantee that all the consumer devices has
been probed before we allow to power-off the genpds. Yet, this should be a
step in the right direction.

Suggested-by: Saravana Kannan <saravanak@google.com>
Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/pmdomain/core.c   | 10 ++++++++--
 include/linux/pm_domain.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index 5cef6de60c72..18951ed6295d 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -931,11 +931,12 @@ static void genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on,
 	 * The domain is already in the "power off" state.
 	 * System suspend is in progress.
 	 * The domain is configured as always on.
+	 * The domain was on at boot and still need to stay on.
 	 * The domain has a subdomain being powered on.
 	 */
 	if (!genpd_status_on(genpd) || genpd->prepared_count > 0 ||
 	    genpd_is_always_on(genpd) || genpd_is_rpm_always_on(genpd) ||
-	    atomic_read(&genpd->sd_count) > 0)
+	    genpd->stay_on || atomic_read(&genpd->sd_count) > 0)
 		return;
 
 	/*
@@ -1346,8 +1347,12 @@ static int __init genpd_power_off_unused(void)
 	pr_info("genpd: Disabling unused power domains\n");
 	mutex_lock(&gpd_list_lock);
 
-	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
+	list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
+		genpd_lock(genpd);
+		genpd->stay_on = false;
+		genpd_unlock(genpd);
 		genpd_queue_power_off_work(genpd);
+	}
 
 	mutex_unlock(&gpd_list_lock);
 
@@ -2352,6 +2357,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
 	INIT_WORK(&genpd->power_off_work, genpd_power_off_work_fn);
 	atomic_set(&genpd->sd_count, 0);
 	genpd->status = is_off ? GENPD_STATE_OFF : GENPD_STATE_ON;
+	genpd->stay_on = !is_off;
 	genpd->sync_state = GENPD_SYNC_STATE_OFF;
 	genpd->device_count = 0;
 	genpd->provider = NULL;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index d68e07dadc99..99556589f45e 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -199,6 +199,7 @@ struct generic_pm_domain {
 	unsigned int performance_state;	/* Aggregated max performance state */
 	cpumask_var_t cpus;		/* A cpumask of the attached CPUs */
 	bool synced_poweroff;		/* A consumer needs a synced poweroff */
+	bool stay_on;			/* Stay powered-on during boot. */
 	enum genpd_sync_state sync_state; /* How sync_state is managed. */
 	int (*power_off)(struct generic_pm_domain *domain);
 	int (*power_on)(struct generic_pm_domain *domain);
-- 
2.43.0
Re: [PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Marek Szyprowski 2 months, 4 weeks ago
On 01.07.2025 13:47, Ulf Hansson wrote:
> Powering-off a genpd that was on during boot, before all of its consumer
> devices have been probed, is certainly prone to problems.
>
> As a step to improve this situation, let's prevent these genpds from being
> powered-off until genpd_power_off_unused() gets called, which is a
> late_initcall_sync().
>
> Note that, this still doesn't guarantee that all the consumer devices has
> been probed before we allow to power-off the genpds. Yet, this should be a
> step in the right direction.
>
> Suggested-by: Saravana Kannan <saravanak@google.com>
> Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

This change has a side effect on some Exynos based boards, which have 
display and bootloader is configured to setup a splash screen on it. 
Since today's linux-next, those boards fails to boot, because of the 
IOMMU page fault.

This happens because the display controller is enabled and configured to 
perform the scanout from the spash-screen buffer until the respective 
driver will reset it in driver probe() function. This however doesn't 
work with IOMMU, which is being probed earlier than the display 
controller driver, what in turn causes IOMMU page fault once the IOMMU 
driver gets attached. This worked before applying this patch, because 
the power domain of display controller was simply turned off early 
effectively reseting the display controller.

This has been discussed a bit recently: 
https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/ 
and I can add a workaround for this issue in the bootloaders of those 
boards, but this is something that has to be somehow addressed in a 
generic way.

> ---
>   drivers/pmdomain/core.c   | 10 ++++++++--
>   include/linux/pm_domain.h |  1 +
>   2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> index 5cef6de60c72..18951ed6295d 100644
> --- a/drivers/pmdomain/core.c
> +++ b/drivers/pmdomain/core.c
> @@ -931,11 +931,12 @@ static void genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on,
>   	 * The domain is already in the "power off" state.
>   	 * System suspend is in progress.
>   	 * The domain is configured as always on.
> +	 * The domain was on at boot and still need to stay on.
>   	 * The domain has a subdomain being powered on.
>   	 */
>   	if (!genpd_status_on(genpd) || genpd->prepared_count > 0 ||
>   	    genpd_is_always_on(genpd) || genpd_is_rpm_always_on(genpd) ||
> -	    atomic_read(&genpd->sd_count) > 0)
> +	    genpd->stay_on || atomic_read(&genpd->sd_count) > 0)
>   		return;
>   
>   	/*
> @@ -1346,8 +1347,12 @@ static int __init genpd_power_off_unused(void)
>   	pr_info("genpd: Disabling unused power domains\n");
>   	mutex_lock(&gpd_list_lock);
>   
> -	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> +	list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> +		genpd_lock(genpd);
> +		genpd->stay_on = false;
> +		genpd_unlock(genpd);
>   		genpd_queue_power_off_work(genpd);
> +	}
>   
>   	mutex_unlock(&gpd_list_lock);
>   
> @@ -2352,6 +2357,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
>   	INIT_WORK(&genpd->power_off_work, genpd_power_off_work_fn);
>   	atomic_set(&genpd->sd_count, 0);
>   	genpd->status = is_off ? GENPD_STATE_OFF : GENPD_STATE_ON;
> +	genpd->stay_on = !is_off;
>   	genpd->sync_state = GENPD_SYNC_STATE_OFF;
>   	genpd->device_count = 0;
>   	genpd->provider = NULL;
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index d68e07dadc99..99556589f45e 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -199,6 +199,7 @@ struct generic_pm_domain {
>   	unsigned int performance_state;	/* Aggregated max performance state */
>   	cpumask_var_t cpus;		/* A cpumask of the attached CPUs */
>   	bool synced_poweroff;		/* A consumer needs a synced poweroff */
> +	bool stay_on;			/* Stay powered-on during boot. */
>   	enum genpd_sync_state sync_state; /* How sync_state is managed. */
>   	int (*power_off)(struct generic_pm_domain *domain);
>   	int (*power_on)(struct generic_pm_domain *domain);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Re: [PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Ulf Hansson 2 months, 4 weeks ago
On Thu, 10 Jul 2025 at 14:26, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>
> On 01.07.2025 13:47, Ulf Hansson wrote:
> > Powering-off a genpd that was on during boot, before all of its consumer
> > devices have been probed, is certainly prone to problems.
> >
> > As a step to improve this situation, let's prevent these genpds from being
> > powered-off until genpd_power_off_unused() gets called, which is a
> > late_initcall_sync().
> >
> > Note that, this still doesn't guarantee that all the consumer devices has
> > been probed before we allow to power-off the genpds. Yet, this should be a
> > step in the right direction.
> >
> > Suggested-by: Saravana Kannan <saravanak@google.com>
> > Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
> > Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> This change has a side effect on some Exynos based boards, which have
> display and bootloader is configured to setup a splash screen on it.
> Since today's linux-next, those boards fails to boot, because of the
> IOMMU page fault.

Thanks for reporting, let's try to fix this as soon as possible then.

>
> This happens because the display controller is enabled and configured to
> perform the scanout from the spash-screen buffer until the respective
> driver will reset it in driver probe() function. This however doesn't
> work with IOMMU, which is being probed earlier than the display
> controller driver, what in turn causes IOMMU page fault once the IOMMU
> driver gets attached. This worked before applying this patch, because
> the power domain of display controller was simply turned off early
> effectively reseting the display controller.

I can certainly try to help to find a solution, but I believe I need
some more details of what is happening.

Perhaps you can point me to some relevant DTS file to start with?

>
> This has been discussed a bit recently:
> https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/
> and I can add a workaround for this issue in the bootloaders of those
> boards, but this is something that has to be somehow addressed in a
> generic way.

It kind of sounds like there is a missing power-domain not being
described in DT for the IOMMU, but I might have understood the whole
thing wrong.

Let's see if we can work something out in the next few days, otherwise
we need to find another way to let some genpds for these platforms to
opt out from this new behaviour.

Kind regards
Uffe

>
> > ---
> >   drivers/pmdomain/core.c   | 10 ++++++++--
> >   include/linux/pm_domain.h |  1 +
> >   2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > index 5cef6de60c72..18951ed6295d 100644
> > --- a/drivers/pmdomain/core.c
> > +++ b/drivers/pmdomain/core.c
> > @@ -931,11 +931,12 @@ static void genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on,
> >        * The domain is already in the "power off" state.
> >        * System suspend is in progress.
> >        * The domain is configured as always on.
> > +      * The domain was on at boot and still need to stay on.
> >        * The domain has a subdomain being powered on.
> >        */
> >       if (!genpd_status_on(genpd) || genpd->prepared_count > 0 ||
> >           genpd_is_always_on(genpd) || genpd_is_rpm_always_on(genpd) ||
> > -         atomic_read(&genpd->sd_count) > 0)
> > +         genpd->stay_on || atomic_read(&genpd->sd_count) > 0)
> >               return;
> >
> >       /*
> > @@ -1346,8 +1347,12 @@ static int __init genpd_power_off_unused(void)
> >       pr_info("genpd: Disabling unused power domains\n");
> >       mutex_lock(&gpd_list_lock);
> >
> > -     list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> > +     list_for_each_entry(genpd, &gpd_list, gpd_list_node) {
> > +             genpd_lock(genpd);
> > +             genpd->stay_on = false;
> > +             genpd_unlock(genpd);
> >               genpd_queue_power_off_work(genpd);
> > +     }
> >
> >       mutex_unlock(&gpd_list_lock);
> >
> > @@ -2352,6 +2357,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
> >       INIT_WORK(&genpd->power_off_work, genpd_power_off_work_fn);
> >       atomic_set(&genpd->sd_count, 0);
> >       genpd->status = is_off ? GENPD_STATE_OFF : GENPD_STATE_ON;
> > +     genpd->stay_on = !is_off;
> >       genpd->sync_state = GENPD_SYNC_STATE_OFF;
> >       genpd->device_count = 0;
> >       genpd->provider = NULL;
> > diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> > index d68e07dadc99..99556589f45e 100644
> > --- a/include/linux/pm_domain.h
> > +++ b/include/linux/pm_domain.h
> > @@ -199,6 +199,7 @@ struct generic_pm_domain {
> >       unsigned int performance_state; /* Aggregated max performance state */
> >       cpumask_var_t cpus;             /* A cpumask of the attached CPUs */
> >       bool synced_poweroff;           /* A consumer needs a synced poweroff */
> > +     bool stay_on;                   /* Stay powered-on during boot. */
> >       enum genpd_sync_state sync_state; /* How sync_state is managed. */
> >       int (*power_off)(struct generic_pm_domain *domain);
> >       int (*power_on)(struct generic_pm_domain *domain);
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
Re: [PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Jon Hunter 2 months, 3 weeks ago
Hi Ulf,

On 10/07/2025 15:54, Ulf Hansson wrote:
> On Thu, 10 Jul 2025 at 14:26, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>
>> On 01.07.2025 13:47, Ulf Hansson wrote:
>>> Powering-off a genpd that was on during boot, before all of its consumer
>>> devices have been probed, is certainly prone to problems.
>>>
>>> As a step to improve this situation, let's prevent these genpds from being
>>> powered-off until genpd_power_off_unused() gets called, which is a
>>> late_initcall_sync().
>>>
>>> Note that, this still doesn't guarantee that all the consumer devices has
>>> been probed before we allow to power-off the genpds. Yet, this should be a
>>> step in the right direction.
>>>
>>> Suggested-by: Saravana Kannan <saravanak@google.com>
>>> Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
>>> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
>>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>>
>> This change has a side effect on some Exynos based boards, which have
>> display and bootloader is configured to setup a splash screen on it.
>> Since today's linux-next, those boards fails to boot, because of the
>> IOMMU page fault.
> 
> Thanks for reporting, let's try to fix this as soon as possible then.
> 
>>
>> This happens because the display controller is enabled and configured to
>> perform the scanout from the spash-screen buffer until the respective
>> driver will reset it in driver probe() function. This however doesn't
>> work with IOMMU, which is being probed earlier than the display
>> controller driver, what in turn causes IOMMU page fault once the IOMMU
>> driver gets attached. This worked before applying this patch, because
>> the power domain of display controller was simply turned off early
>> effectively reseting the display controller.
> 
> I can certainly try to help to find a solution, but I believe I need
> some more details of what is happening.
> 
> Perhaps you can point me to some relevant DTS file to start with?
> 
>>
>> This has been discussed a bit recently:
>> https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/
>> and I can add a workaround for this issue in the bootloaders of those
>> boards, but this is something that has to be somehow addressed in a
>> generic way.
> 
> It kind of sounds like there is a missing power-domain not being
> described in DT for the IOMMU, but I might have understood the whole
> thing wrong.
> 
> Let's see if we can work something out in the next few days, otherwise
> we need to find another way to let some genpds for these platforms to
> opt out from this new behaviour.

Have you found any resolution for this? I have also noticed a boot 
regression on one of our Tegra210 boards and bisect is pointing to this 
commit. I don't see any particular crash, but a hang on boot.

If there is any debug we can enable to see which pmdomain is the problem 
let me know.

Thanks!
Jon

-- 
nvpublic
Re: [PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Ulf Hansson 2 months, 3 weeks ago
On Tue, 15 Jul 2025 at 12:28, Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Ulf,
>
> On 10/07/2025 15:54, Ulf Hansson wrote:
> > On Thu, 10 Jul 2025 at 14:26, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> >>
> >> On 01.07.2025 13:47, Ulf Hansson wrote:
> >>> Powering-off a genpd that was on during boot, before all of its consumer
> >>> devices have been probed, is certainly prone to problems.
> >>>
> >>> As a step to improve this situation, let's prevent these genpds from being
> >>> powered-off until genpd_power_off_unused() gets called, which is a
> >>> late_initcall_sync().
> >>>
> >>> Note that, this still doesn't guarantee that all the consumer devices has
> >>> been probed before we allow to power-off the genpds. Yet, this should be a
> >>> step in the right direction.
> >>>
> >>> Suggested-by: Saravana Kannan <saravanak@google.com>
> >>> Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
> >>> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
> >>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> >>
> >> This change has a side effect on some Exynos based boards, which have
> >> display and bootloader is configured to setup a splash screen on it.
> >> Since today's linux-next, those boards fails to boot, because of the
> >> IOMMU page fault.
> >
> > Thanks for reporting, let's try to fix this as soon as possible then.
> >
> >>
> >> This happens because the display controller is enabled and configured to
> >> perform the scanout from the spash-screen buffer until the respective
> >> driver will reset it in driver probe() function. This however doesn't
> >> work with IOMMU, which is being probed earlier than the display
> >> controller driver, what in turn causes IOMMU page fault once the IOMMU
> >> driver gets attached. This worked before applying this patch, because
> >> the power domain of display controller was simply turned off early
> >> effectively reseting the display controller.
> >
> > I can certainly try to help to find a solution, but I believe I need
> > some more details of what is happening.
> >
> > Perhaps you can point me to some relevant DTS file to start with?
> >
> >>
> >> This has been discussed a bit recently:
> >> https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/
> >> and I can add a workaround for this issue in the bootloaders of those
> >> boards, but this is something that has to be somehow addressed in a
> >> generic way.
> >
> > It kind of sounds like there is a missing power-domain not being
> > described in DT for the IOMMU, but I might have understood the whole
> > thing wrong.
> >
> > Let's see if we can work something out in the next few days, otherwise
> > we need to find another way to let some genpds for these platforms to
> > opt out from this new behaviour.
>
> Have you found any resolution for this? I have also noticed a boot
> regression on one of our Tegra210 boards and bisect is pointing to this
> commit. I don't see any particular crash, but a hang on boot.

Thanks for reporting!

For Exynos we opt-out from the behaviour by enforcing a sync_state of
all PM domains upfront [1], which means before any devices get
attached.

Even if that defeats the purpose of the $subject series, this was one
way forward that solved the problem. When the boot-ordering problem
(that's how I understood the issue) for Exynos gets resolved, we
should be able to drop the hack, at least that's the idea.

>
> If there is any debug we can enable to see which pmdomain is the problem
> let me know.

There aren't many debug prints in genpd that I think makes much sense
to enable, but you can always give it a try. Since you are hanging,
obviously you can't look at the genpd debugfs data...

Note that, the interesting PM domains are those that are powered-on
when calling pm_genpd_init(). As a start, I would add some debug
prints in () to see which PM domains that are relevant to track.
Potentially you could then try to power them off and register them
accordingly with genpd. One by one, to see which of them is causing
the problem.

Another option could be to add a new genpd config flag
(GENPD_FLAG_DONT_STAY_ON or something along those lines), that informs
genpd to not set the genpd->stay_on in pm_genpd_init(). Then
tegra_powergate_add() would have to set GENPD_FLAG_DONT_STAY_ON for
those genpds that really need it.

Kind regards
Uffe

[1]
https://lore.kernel.org/all/20250711114719.189441-1-ulf.hansson@linaro.org/
Re: [PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Ulf Hansson 2 months, 3 weeks ago
On Tue, 15 Jul 2025 at 13:32, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 15 Jul 2025 at 12:28, Jon Hunter <jonathanh@nvidia.com> wrote:
> >
> > Hi Ulf,
> >
> > On 10/07/2025 15:54, Ulf Hansson wrote:
> > > On Thu, 10 Jul 2025 at 14:26, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> > >>
> > >> On 01.07.2025 13:47, Ulf Hansson wrote:
> > >>> Powering-off a genpd that was on during boot, before all of its consumer
> > >>> devices have been probed, is certainly prone to problems.
> > >>>
> > >>> As a step to improve this situation, let's prevent these genpds from being
> > >>> powered-off until genpd_power_off_unused() gets called, which is a
> > >>> late_initcall_sync().
> > >>>
> > >>> Note that, this still doesn't guarantee that all the consumer devices has
> > >>> been probed before we allow to power-off the genpds. Yet, this should be a
> > >>> step in the right direction.
> > >>>
> > >>> Suggested-by: Saravana Kannan <saravanak@google.com>
> > >>> Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
> > >>> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
> > >>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > >>
> > >> This change has a side effect on some Exynos based boards, which have
> > >> display and bootloader is configured to setup a splash screen on it.
> > >> Since today's linux-next, those boards fails to boot, because of the
> > >> IOMMU page fault.
> > >
> > > Thanks for reporting, let's try to fix this as soon as possible then.
> > >
> > >>
> > >> This happens because the display controller is enabled and configured to
> > >> perform the scanout from the spash-screen buffer until the respective
> > >> driver will reset it in driver probe() function. This however doesn't
> > >> work with IOMMU, which is being probed earlier than the display
> > >> controller driver, what in turn causes IOMMU page fault once the IOMMU
> > >> driver gets attached. This worked before applying this patch, because
> > >> the power domain of display controller was simply turned off early
> > >> effectively reseting the display controller.
> > >
> > > I can certainly try to help to find a solution, but I believe I need
> > > some more details of what is happening.
> > >
> > > Perhaps you can point me to some relevant DTS file to start with?
> > >
> > >>
> > >> This has been discussed a bit recently:
> > >> https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/
> > >> and I can add a workaround for this issue in the bootloaders of those
> > >> boards, but this is something that has to be somehow addressed in a
> > >> generic way.
> > >
> > > It kind of sounds like there is a missing power-domain not being
> > > described in DT for the IOMMU, but I might have understood the whole
> > > thing wrong.
> > >
> > > Let's see if we can work something out in the next few days, otherwise
> > > we need to find another way to let some genpds for these platforms to
> > > opt out from this new behaviour.
> >
> > Have you found any resolution for this? I have also noticed a boot
> > regression on one of our Tegra210 boards and bisect is pointing to this
> > commit. I don't see any particular crash, but a hang on boot.
>
> Thanks for reporting!
>
> For Exynos we opt-out from the behaviour by enforcing a sync_state of
> all PM domains upfront [1], which means before any devices get
> attached.
>
> Even if that defeats the purpose of the $subject series, this was one
> way forward that solved the problem. When the boot-ordering problem
> (that's how I understood the issue) for Exynos gets resolved, we
> should be able to drop the hack, at least that's the idea.
>
> >
> > If there is any debug we can enable to see which pmdomain is the problem
> > let me know.
>
> There aren't many debug prints in genpd that I think makes much sense
> to enable, but you can always give it a try. Since you are hanging,
> obviously you can't look at the genpd debugfs data...
>
> Note that, the interesting PM domains are those that are powered-on
> when calling pm_genpd_init(). As a start, I would add some debug
> prints in () to see which PM domains that are relevant to track.

/s/()/tegra_powergate_add()

> Potentially you could then try to power them off and register them
> accordingly with genpd. One by one, to see which of them is causing
> the problem.
>
> Another option could be to add a new genpd config flag
> (GENPD_FLAG_DONT_STAY_ON or something along those lines), that informs
> genpd to not set the genpd->stay_on in pm_genpd_init(). Then
> tegra_powergate_add() would have to set GENPD_FLAG_DONT_STAY_ON for
> those genpds that really need it.
>
> Kind regards
> Uffe
>
> [1]
> https://lore.kernel.org/all/20250711114719.189441-1-ulf.hansson@linaro.org/
Re: [PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Posted by Jon Hunter 2 months, 1 week ago
On 15/07/2025 12:34, Ulf Hansson wrote:

...

>>> Have you found any resolution for this? I have also noticed a boot
>>> regression on one of our Tegra210 boards and bisect is pointing to this
>>> commit. I don't see any particular crash, but a hang on boot.
>>
>> Thanks for reporting!
>>
>> For Exynos we opt-out from the behaviour by enforcing a sync_state of
>> all PM domains upfront [1], which means before any devices get
>> attached.
>>
>> Even if that defeats the purpose of the $subject series, this was one
>> way forward that solved the problem. When the boot-ordering problem
>> (that's how I understood the issue) for Exynos gets resolved, we
>> should be able to drop the hack, at least that's the idea.
>>
>>>
>>> If there is any debug we can enable to see which pmdomain is the problem
>>> let me know.
>>
>> There aren't many debug prints in genpd that I think makes much sense
>> to enable, but you can always give it a try. Since you are hanging,
>> obviously you can't look at the genpd debugfs data...
>>
>> Note that, the interesting PM domains are those that are powered-on
>> when calling pm_genpd_init(). As a start, I would add some debug
>> prints in () to see which PM domains that are relevant to track.
> 
> /s/()/tegra_powergate_add()


I have been able to track this down to a problem in the Tegra PMC driver 
where we are registering the power-domains and I have sent a fix [0]. 
Looks like we have been getting lucky up until now.

Cheers!
Jon

[0] 
https://lore.kernel.org/linux-tegra/20250731121832.213671-1-jonathanh@nvidia.com/T/#u

-- 
nvpublic