[PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd

Ulf Hansson posted 24 patches 3 months, 1 week ago
drivers/base/core.c                         |   8 +-
drivers/cpuidle/cpuidle-psci-domain.c       |  14 --
drivers/cpuidle/cpuidle-riscv-sbi.c         |  14 --
drivers/firmware/xilinx/zynqmp.c            |  18 +-
drivers/pmdomain/core.c                     | 211 ++++++++++++++++++--
drivers/pmdomain/qcom/rpmhpd.c              |   2 +
drivers/pmdomain/qcom/rpmpd.c               |   2 +
drivers/pmdomain/renesas/rcar-gen4-sysc.c   |   2 +-
drivers/pmdomain/renesas/rcar-sysc.c        |  19 +-
drivers/pmdomain/renesas/rmobile-sysc.c     |   3 +-
drivers/pmdomain/xilinx/zynqmp-pm-domains.c |  16 --
drivers/soc/tegra/pmc.c                     |  26 ++-
include/linux/device.h                      |  13 ++
include/linux/firmware/xlnx-zynqmp.h        |   6 -
include/linux/pm_domain.h                   |  17 ++
15 files changed, 291 insertions(+), 80 deletions(-)
[PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 3 months, 1 week ago
Changes in v3:
	- Added a couple of patches to adress problems on some Renesas
	platforms. Thanks Geert and Tomi for helping out!
	- Adressed a few comments from Saravanna and Konrad.
	- Added some tested-by tags.

Changes in v2:
	- Well, quite a lot as I discovered various problems when doing
	additional testing of corner-case. I suggest re-review from scratch,
	even if I decided to keep some reviewed-by tags.
	- Added patches to allow some drivers that needs to align or opt-out
	from the new common behaviour in genpd.

If a PM domain (genpd) is powered-on during boot, there is probably a good
reason for it. Therefore it's known to be a bad idea to allow such genpd to be
powered-off before all of its consumer devices have been probed. This series
intends to fix this problem.

We have been discussing these issues at LKML and at various Linux-conferences
in the past. I have therefore tried to include the people I can recall being
involved, but I may have forgotten some (my apologies), feel free to loop them
in.

I have tested this with QEMU with a bunch of local test-drivers and DT nodes.
Let me know if you want me to share this code too.

Please help review and test!
Finally, a big thanks to Saravana for all the support!

Kind regards
Ulf Hansson


Saravana Kannan (1):
  driver core: Add dev_set_drv_sync_state()

Ulf Hansson (23):
  pmdomain: renesas: rcar-sysc: Add genpd OF provider at
    postcore_initcall
  pmdomain: renesas: rmobile-sysc: Move init to postcore_initcall
  pmdomain: renesas: rcar-gen4-sysc: Move init to postcore_initcall
  pmdomain: core: Prevent registering devices before the bus
  pmdomain: core: Add a bus and a driver for genpd providers
  pmdomain: core: Add the genpd->dev to the genpd provider bus
  pmdomain: core: Export a common ->sync_state() helper for genpd
    providers
  pmdomain: core: Prepare to add the common ->sync_state() support
  soc/tegra: pmc: Opt-out from genpd's common ->sync_state() support
  cpuidle: psci: Opt-out from genpd's common ->sync_state() support
  cpuidle: riscv-sbi: Opt-out from genpd's common ->sync_state() support
  pmdomain: qcom: rpmpd: Use of_genpd_sync_state()
  pmdomain: qcom: rpmhpd: Use of_genpd_sync_state()
  firmware/pmdomain: xilinx: Move ->sync_state() support to firmware
    driver
  firmware: xilinx: Don't share zynqmp_pm_init_finalize()
  firmware: xilinx: Use of_genpd_sync_state()
  driver core: Export get_dev_from_fwnode()
  pmdomain: core: Add common ->sync_state() support for genpd providers
  pmdomain: core: Default to use of_genpd_sync_state() for genpd
    providers
  pmdomain: core: Leave powered-on genpds on until late_initcall_sync
  pmdomain: core: Leave powered-on genpds on until sync_state
  cpuidle: psci: Drop redundant sync_state support
  cpuidle: riscv-sbi: Drop redundant sync_state support

 drivers/base/core.c                         |   8 +-
 drivers/cpuidle/cpuidle-psci-domain.c       |  14 --
 drivers/cpuidle/cpuidle-riscv-sbi.c         |  14 --
 drivers/firmware/xilinx/zynqmp.c            |  18 +-
 drivers/pmdomain/core.c                     | 211 ++++++++++++++++++--
 drivers/pmdomain/qcom/rpmhpd.c              |   2 +
 drivers/pmdomain/qcom/rpmpd.c               |   2 +
 drivers/pmdomain/renesas/rcar-gen4-sysc.c   |   2 +-
 drivers/pmdomain/renesas/rcar-sysc.c        |  19 +-
 drivers/pmdomain/renesas/rmobile-sysc.c     |   3 +-
 drivers/pmdomain/xilinx/zynqmp-pm-domains.c |  16 --
 drivers/soc/tegra/pmc.c                     |  26 ++-
 include/linux/device.h                      |  13 ++
 include/linux/firmware/xlnx-zynqmp.h        |   6 -
 include/linux/pm_domain.h                   |  17 ++
 15 files changed, 291 insertions(+), 80 deletions(-)

-- 
2.43.0
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 3 months ago
On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> Changes in v3:
>         - Added a couple of patches to adress problems on some Renesas
>         platforms. Thanks Geert and Tomi for helping out!
>         - Adressed a few comments from Saravanna and Konrad.
>         - Added some tested-by tags.

I decided it was time to give this a try, so I have queued this up for
v6.17 via the next branch at my pmdomain tree.

If you encounter any issues, please let me know so I can help to fix them.

Kind regards
Uffe


>
> Changes in v2:
>         - Well, quite a lot as I discovered various problems when doing
>         additional testing of corner-case. I suggest re-review from scratch,
>         even if I decided to keep some reviewed-by tags.
>         - Added patches to allow some drivers that needs to align or opt-out
>         from the new common behaviour in genpd.
>
> If a PM domain (genpd) is powered-on during boot, there is probably a good
> reason for it. Therefore it's known to be a bad idea to allow such genpd to be
> powered-off before all of its consumer devices have been probed. This series
> intends to fix this problem.
>
> We have been discussing these issues at LKML and at various Linux-conferences
> in the past. I have therefore tried to include the people I can recall being
> involved, but I may have forgotten some (my apologies), feel free to loop them
> in.
>
> I have tested this with QEMU with a bunch of local test-drivers and DT nodes.
> Let me know if you want me to share this code too.
>
> Please help review and test!
> Finally, a big thanks to Saravana for all the support!
>
> Kind regards
> Ulf Hansson
>
>
> Saravana Kannan (1):
>   driver core: Add dev_set_drv_sync_state()
>
> Ulf Hansson (23):
>   pmdomain: renesas: rcar-sysc: Add genpd OF provider at
>     postcore_initcall
>   pmdomain: renesas: rmobile-sysc: Move init to postcore_initcall
>   pmdomain: renesas: rcar-gen4-sysc: Move init to postcore_initcall
>   pmdomain: core: Prevent registering devices before the bus
>   pmdomain: core: Add a bus and a driver for genpd providers
>   pmdomain: core: Add the genpd->dev to the genpd provider bus
>   pmdomain: core: Export a common ->sync_state() helper for genpd
>     providers
>   pmdomain: core: Prepare to add the common ->sync_state() support
>   soc/tegra: pmc: Opt-out from genpd's common ->sync_state() support
>   cpuidle: psci: Opt-out from genpd's common ->sync_state() support
>   cpuidle: riscv-sbi: Opt-out from genpd's common ->sync_state() support
>   pmdomain: qcom: rpmpd: Use of_genpd_sync_state()
>   pmdomain: qcom: rpmhpd: Use of_genpd_sync_state()
>   firmware/pmdomain: xilinx: Move ->sync_state() support to firmware
>     driver
>   firmware: xilinx: Don't share zynqmp_pm_init_finalize()
>   firmware: xilinx: Use of_genpd_sync_state()
>   driver core: Export get_dev_from_fwnode()
>   pmdomain: core: Add common ->sync_state() support for genpd providers
>   pmdomain: core: Default to use of_genpd_sync_state() for genpd
>     providers
>   pmdomain: core: Leave powered-on genpds on until late_initcall_sync
>   pmdomain: core: Leave powered-on genpds on until sync_state
>   cpuidle: psci: Drop redundant sync_state support
>   cpuidle: riscv-sbi: Drop redundant sync_state support
>
>  drivers/base/core.c                         |   8 +-
>  drivers/cpuidle/cpuidle-psci-domain.c       |  14 --
>  drivers/cpuidle/cpuidle-riscv-sbi.c         |  14 --
>  drivers/firmware/xilinx/zynqmp.c            |  18 +-
>  drivers/pmdomain/core.c                     | 211 ++++++++++++++++++--
>  drivers/pmdomain/qcom/rpmhpd.c              |   2 +
>  drivers/pmdomain/qcom/rpmpd.c               |   2 +
>  drivers/pmdomain/renesas/rcar-gen4-sysc.c   |   2 +-
>  drivers/pmdomain/renesas/rcar-sysc.c        |  19 +-
>  drivers/pmdomain/renesas/rmobile-sysc.c     |   3 +-
>  drivers/pmdomain/xilinx/zynqmp-pm-domains.c |  16 --
>  drivers/soc/tegra/pmc.c                     |  26 ++-
>  include/linux/device.h                      |  13 ++
>  include/linux/firmware/xlnx-zynqmp.h        |   6 -
>  include/linux/pm_domain.h                   |  17 ++
>  15 files changed, 291 insertions(+), 80 deletions(-)
>
> --
> 2.43.0
>
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 1 month, 3 weeks ago
Hi Ulf,

On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > Changes in v3:
> >         - Added a couple of patches to adress problems on some Renesas
> >         platforms. Thanks Geert and Tomi for helping out!
> >         - Adressed a few comments from Saravanna and Konrad.
> >         - Added some tested-by tags.
>
> I decided it was time to give this a try, so I have queued this up for
> v6.17 via the next branch at my pmdomain tree.
>
> If you encounter any issues, please let me know so I can help to fix them.

FTR, I discovered two more issues with clock drivers registering a
genpd OF provider from CLK_OF_DECLARE(), which is now too early:
  1. drivers/clk/mmp/clk-of-mmp2.c,
  2. drivers/clk/renesas/clk-mstp.c.

I have submitted a fix for the second driver: "[PATCH] clk: renesas:
mstp: Add genpd OF provider at postcore_initcall()"
https://lore.kernel.org/all/81ef5f8d5d31374b7852b05453c52d2f735062a2.1755073087.git.geert+renesas@glider.be

I don't have MMP2 hardware, I guess it needs a similar fix.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 2 months, 1 week ago
Hi Ulf,

On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > Changes in v3:
> >         - Added a couple of patches to adress problems on some Renesas
> >         platforms. Thanks Geert and Tomi for helping out!
> >         - Adressed a few comments from Saravanna and Konrad.
> >         - Added some tested-by tags.
>
> I decided it was time to give this a try, so I have queued this up for
> v6.17 via the next branch at my pmdomain tree.
>
> If you encounter any issues, please let me know so I can help to fix them.

Thanks for your series!  Due to holidays, I only managed to test
this very recently.

Unfortunately I have an issue with unused PM Domains no longer being
disabled on R-Car:
  - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
    disabled.
  - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
    sometimes not disabled.
    At first, I noticed the IOMMU driver was not enabled in my config,
    and enabling it did fix the issue.  However, after that I still
    encountered the issue in a different config that does have the
    IOMMU driver enabled...

FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
(using rmobile-sysc.c) and on BeagleBone Black. Note that these use
of_genpd_add_provider_simple(), while all R-Car drivers use
of_genpd_add_provider_onecell().  Perhaps there is an issue with
the latter?  If you don't have a clue, I plan to do some more
investigation later...

BTW, the "pending due to"-messages look weird to me.
On R-Car M2-W (r8a7791.dtsi) I see e.g.:

    genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
    renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
due to e6020000.watchdog

ca15-cpu0 is the PM Domain holding the first CPU core, while
the watchdog resides in the always-on Clock Domain, and uses the
clock-controller for PM_CLK handling.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 2 months, 1 week ago
On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > Changes in v3:
> > >         - Added a couple of patches to adress problems on some Renesas
> > >         platforms. Thanks Geert and Tomi for helping out!
> > >         - Adressed a few comments from Saravanna and Konrad.
> > >         - Added some tested-by tags.
> >
> > I decided it was time to give this a try, so I have queued this up for
> > v6.17 via the next branch at my pmdomain tree.
> >
> > If you encounter any issues, please let me know so I can help to fix them.
>
> Thanks for your series!  Due to holidays, I only managed to test
> this very recently.
>
> Unfortunately I have an issue with unused PM Domains no longer being
> disabled on R-Car:
>   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
>     disabled.
>   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
>     sometimes not disabled.
>     At first, I noticed the IOMMU driver was not enabled in my config,
>     and enabling it did fix the issue.  However, after that I still
>     encountered the issue in a different config that does have the
>     IOMMU driver enabled...
>
> FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> of_genpd_add_provider_simple(), while all R-Car drivers use
> of_genpd_add_provider_onecell().  Perhaps there is an issue with
> the latter?  If you don't have a clue, I plan to do some more
> investigation later...

Geert, thanks for reporting!

>
> BTW, the "pending due to"-messages look weird to me.
> On R-Car M2-W (r8a7791.dtsi) I see e.g.:
>
>     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
>     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> due to e6020000.watchdog
>
> ca15-cpu0 is the PM Domain holding the first CPU core, while
> the watchdog resides in the always-on Clock Domain, and uses the
> clock-controller for PM_CLK handling.

I will have a closer look as soon as I can  to see if I can find some
potential problems.

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 2 months ago
Hi Ulf,

On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > Changes in v3:
> > > >         - Added a couple of patches to adress problems on some Renesas
> > > >         platforms. Thanks Geert and Tomi for helping out!
> > > >         - Adressed a few comments from Saravanna and Konrad.
> > > >         - Added some tested-by tags.
> > >
> > > I decided it was time to give this a try, so I have queued this up for
> > > v6.17 via the next branch at my pmdomain tree.
> > >
> > > If you encounter any issues, please let me know so I can help to fix them.
> >
> > Thanks for your series!  Due to holidays, I only managed to test
> > this very recently.
> >
> > Unfortunately I have an issue with unused PM Domains no longer being
> > disabled on R-Car:
> >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> >     disabled.
> >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> >     sometimes not disabled.
> >     At first, I noticed the IOMMU driver was not enabled in my config,
> >     and enabling it did fix the issue.  However, after that I still
> >     encountered the issue in a different config that does have the
> >     IOMMU driver enabled...
> >
> > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > of_genpd_add_provider_simple(), while all R-Car drivers use
> > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > the latter?  If you don't have a clue, I plan to do some more
> > investigation later...

of_genpd_add_provider_onecell() has:

    if (!dev)
            sync_state = true;
    else
            dev_set_drv_sync_state(dev, genpd_sync_state);

    for (i = 0; i < data->num_domains; i++) {
            ...
            if (sync_state && !genpd_is_no_sync_state(genpd)) {
                    genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
                    device_set_node(&genpd->dev, fwnode);
                    sync_state = false;
                    ^^^^^^^^^^^^^^^^^^^
            }
            ...
    }

As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
Domain only.  All other domains have the default value of sync_state
(0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
is called later, it ignores all but the first domain.
Apparently this is intentional, as of_genpd_sync_state() tries to
power off all domains handled by the same controller anyway (see below)?

> > BTW, the "pending due to"-messages look weird to me.
> > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> >
> >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > due to e6020000.watchdog
> >
> > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > the watchdog resides in the always-on Clock Domain, and uses the
> > clock-controller for PM_CLK handling.

Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
these bogus pending states, and no PM Domain is powered off.

If I remove the "sync_state = false" above, genpd_provider_sync_state()
considers all domains, and does power down all unused domains (even
multiple times, as expected).

Upon closer look, all "pending due to" messages I see claim that the
first (index 0) PM Domain is pending on some devices, while all of
these devices are part of a different domain (usually the always-on
domain, which is always the last (32 or 64) on R-Car).

So I think there are two issues:
  1. Devices are not attributed to the correct PM Domain using
     fw_devlink sync_state,
  2. One PM Domain of a multi-domain controller being blocked should
     not block all other domains handled by the same controller.

Does that make sense?
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Saravana Kannan 1 week, 5 days ago
On Thu, Aug 7, 2025 at 2:43 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > Changes in v3:
> > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > >         - Added some tested-by tags.
> > > >
> > > > I decided it was time to give this a try, so I have queued this up for
> > > > v6.17 via the next branch at my pmdomain tree.
> > > >
> > > > If you encounter any issues, please let me know so I can help to fix them.
> > >
> > > Thanks for your series!  Due to holidays, I only managed to test
> > > this very recently.
> > >
> > > Unfortunately I have an issue with unused PM Domains no longer being
> > > disabled on R-Car:
> > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > >     disabled.
> > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > >     sometimes not disabled.
> > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > >     and enabling it did fix the issue.  However, after that I still
> > >     encountered the issue in a different config that does have the
> > >     IOMMU driver enabled...
> > >
> > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > the latter?  If you don't have a clue, I plan to do some more
> > > investigation later...
>
> of_genpd_add_provider_onecell() has:
>
>     if (!dev)
>             sync_state = true;
>     else
>             dev_set_drv_sync_state(dev, genpd_sync_state);
>
>     for (i = 0; i < data->num_domains; i++) {
>             ...
>             if (sync_state && !genpd_is_no_sync_state(genpd)) {
>                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
>                     device_set_node(&genpd->dev, fwnode);
>                     sync_state = false;
>                     ^^^^^^^^^^^^^^^^^^^
>             }
>             ...
>     }
>
> As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> Domain only.  All other domains have the default value of sync_state
> (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> is called later, it ignores all but the first domain.
> Apparently this is intentional, as of_genpd_sync_state() tries to
> power off all domains handled by the same controller anyway (see below)?
>
> > > BTW, the "pending due to"-messages look weird to me.
> > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > >
> > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > due to e6020000.watchdog
> > >
> > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > the watchdog resides in the always-on Clock Domain, and uses the
> > > clock-controller for PM_CLK handling.
>
> Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> these bogus pending states, and no PM Domain is powered off.

Hi Geert,

Thanks for all the reports as ever!

Can you explain why you call these as bogus? Sorry if you already
explained it. But the reason I'm asking is to see if you can set a
flag for the watchdog device so fw_devlink will completely ignore it.

It looks like there's a driver for this watchdog node? Why is it not
probing then?

>
> If I remove the "sync_state = false" above, genpd_provider_sync_state()
> considers all domains, and does power down all unused domains (even
> multiple times, as expected).
>
> Upon closer look, all "pending due to" messages I see claim that the
> first (index 0) PM Domain is pending on some devices, while all of
> these devices are part of a different domain (usually the always-on
> domain, which is always the last (32 or 64) on R-Car).
>
> So I think there are two issues:
>   1. Devices are not attributed to the correct PM Domain using
>      fw_devlink sync_state,

Is it a fw_devlink issue? Or is this a multi-domain controller?

>   2. One PM Domain of a multi-domain controller being blocked should
>      not block all other domains handled by the same controller.

This is going to take a while to sort out. But the current behavior is
the safest. How grumpy will you be if we don't fix this :)

Thanks,
Saravana
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 1 week, 4 days ago
Hi Saravana,

On Fri, 26 Sept 2025 at 00:41, Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Aug 7, 2025 at 2:43 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > BTW, the "pending due to"-messages look weird to me.
> > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > >
> > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > due to e6020000.watchdog
> > > >
> > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > clock-controller for PM_CLK handling.
> >
> > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > these bogus pending states, and no PM Domain is powered off.
>
> Can you explain why you call these as bogus? Sorry if you already
> explained it. But the reason I'm asking is to see if you can set a
> flag for the watchdog device so fw_devlink will completely ignore it.

"bogus" refers to "1." below.

Furthermore, devices that are located in an alway-on domain should
not block the sync state.

> It looks like there's a driver for this watchdog node? Why is it not
> probing then?

Because this particular revision of the SoC has a hardware bug that
prevents the watchdog timer from rebooting the system, and the driver
detects that.

Anyway, if the driver is not available, unused power domains should
still be powered down, like before.

> > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > considers all domains, and does power down all unused domains (even
> > multiple times, as expected).
> >
> > Upon closer look, all "pending due to" messages I see claim that the
> > first (index 0) PM Domain is pending on some devices, while all of
> > these devices are part of a different domain (usually the always-on
> > domain, which is always the last (32 or 64) on R-Car).
> >
> > So I think there are two issues:
> >   1. Devices are not attributed to the correct PM Domain using
> >      fw_devlink sync_state,
>
> Is it a fw_devlink issue? Or is this a multi-domain controller?

This is a multi-domain controller.

> >   2. One PM Domain of a multi-domain controller being blocked should
> >      not block all other domains handled by the same controller.
>
> This is going to take a while to sort out. But the current behavior is
> the safest. How grumpy will you be if we don't fix this :)

Depending on your definition of "safe".  Keeping all power domains on
increases power consumption and heat generation, and may cause e.g. CPU
frequency throttling to kick in, slowing down operation of the system.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 1 week, 4 days ago
On Fri, 26 Sept 2025 at 08:57, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, 26 Sept 2025 at 00:41, Saravana Kannan <saravanak@google.com> wrote:
> > On Thu, Aug 7, 2025 at 2:43 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > >
> > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > due to e6020000.watchdog
> > > > >
> > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > clock-controller for PM_CLK handling.
> > >
> > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > these bogus pending states, and no PM Domain is powered off.
> >
> > Can you explain why you call these as bogus? Sorry if you already
> > explained it. But the reason I'm asking is to see if you can set a
> > flag for the watchdog device so fw_devlink will completely ignore it.
>
> "bogus" refers to "1." below.
>
> Furthermore, devices that are located in an alway-on domain should
> not block the sync state.
>
> > It looks like there's a driver for this watchdog node? Why is it not
> > probing then?
>
> Because this particular revision of the SoC has a hardware bug that
> prevents the watchdog timer from rebooting the system, and the driver
> detects that.
>
> Anyway, if the driver is not available, unused power domains should
> still be powered down, like before.
>
> > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > considers all domains, and does power down all unused domains (even
> > > multiple times, as expected).
> > >
> > > Upon closer look, all "pending due to" messages I see claim that the
> > > first (index 0) PM Domain is pending on some devices, while all of
> > > these devices are part of a different domain (usually the always-on
> > > domain, which is always the last (32 or 64) on R-Car).
> > >
> > > So I think there are two issues:
> > >   1. Devices are not attributed to the correct PM Domain using
> > >      fw_devlink sync_state,
> >
> > Is it a fw_devlink issue? Or is this a multi-domain controller?
>
> This is a multi-domain controller.
>
> > >   2. One PM Domain of a multi-domain controller being blocked should
> > >      not block all other domains handled by the same controller.
> >
> > This is going to take a while to sort out. But the current behavior is
> > the safest. How grumpy will you be if we don't fix this :)
>
> Depending on your definition of "safe".  Keeping all power domains on
> increases power consumption and heat generation, and may cause e.g. CPU
> frequency throttling to kick in, slowing down operation of the system.

FYI, I agree that we need to address these problems, in one way or the
other. I am trying to summarize them and have also proposed a CFP for
LPC (power/thermal MC) to discuss and try to solve them.

Now, as I also proposed in the other thread [1] just now. How about
changing the default behaviour from FW_DEVLINK_SYNC_STATE_STRICT to
FW_DEVLINK_SYNC_STATE_TIMEOUT? I think that would solve a lot of
problems for us, as it would provide a more similar behaviour to what
we had in genpd originally, hence it would be a smoother transition.

I think this would be true when trying to add ->sync_state() support
for other subsystems too, like clocks and regulators.

In the end, when platforms are ready to move to the "strict" mode,
they can do that via the command-line parameter, for example.

Kind regards
Uffe

[1]
https://lore.kernel.org/all/20250925115924.188257-1-ulf.hansson@linaro.org/
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 1 week, 1 day ago
Hi Ulf,

On Fri, 26 Sept 2025 at 14:23, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Fri, 26 Sept 2025 at 08:57, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Fri, 26 Sept 2025 at 00:41, Saravana Kannan <saravanak@google.com> wrote:
> > > On Thu, Aug 7, 2025 at 2:43 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > > >
> > > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > > due to e6020000.watchdog
> > > > > >
> > > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > > clock-controller for PM_CLK handling.
> > > >
> > > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > > these bogus pending states, and no PM Domain is powered off.
> > >
> > > Can you explain why you call these as bogus? Sorry if you already
> > > explained it. But the reason I'm asking is to see if you can set a
> > > flag for the watchdog device so fw_devlink will completely ignore it.
> >
> > "bogus" refers to "1." below.
> >
> > Furthermore, devices that are located in an alway-on domain should
> > not block the sync state.
> >
> > > It looks like there's a driver for this watchdog node? Why is it not
> > > probing then?
> >
> > Because this particular revision of the SoC has a hardware bug that
> > prevents the watchdog timer from rebooting the system, and the driver
> > detects that.
> >
> > Anyway, if the driver is not available, unused power domains should
> > still be powered down, like before.
> >
> > > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > > considers all domains, and does power down all unused domains (even
> > > > multiple times, as expected).
> > > >
> > > > Upon closer look, all "pending due to" messages I see claim that the
> > > > first (index 0) PM Domain is pending on some devices, while all of
> > > > these devices are part of a different domain (usually the always-on
> > > > domain, which is always the last (32 or 64) on R-Car).
> > > >
> > > > So I think there are two issues:
> > > >   1. Devices are not attributed to the correct PM Domain using
> > > >      fw_devlink sync_state,
> > >
> > > Is it a fw_devlink issue? Or is this a multi-domain controller?
> >
> > This is a multi-domain controller.
> >
> > > >   2. One PM Domain of a multi-domain controller being blocked should
> > > >      not block all other domains handled by the same controller.
> > >
> > > This is going to take a while to sort out. But the current behavior is
> > > the safest. How grumpy will you be if we don't fix this :)
> >
> > Depending on your definition of "safe".  Keeping all power domains on
> > increases power consumption and heat generation, and may cause e.g. CPU
> > frequency throttling to kick in, slowing down operation of the system.
>
> FYI, I agree that we need to address these problems, in one way or the
> other. I am trying to summarize them and have also proposed a CFP for
> LPC (power/thermal MC) to discuss and try to solve them.

OK, CU there! ;-)

> Now, as I also proposed in the other thread [1] just now. How about
> changing the default behaviour from FW_DEVLINK_SYNC_STATE_STRICT to
> FW_DEVLINK_SYNC_STATE_TIMEOUT? I think that would solve a lot of
> problems for us, as it would provide a more similar behaviour to what
> we had in genpd originally, hence it would be a smoother transition.
>
> I think this would be true when trying to add ->sync_state() support
> for other subsystems too, like clocks and regulators.
>
> In the end, when platforms are ready to move to the "strict" mode,
> they can do that via the command-line parameter, for example.

Thanks, changing the default to timeout sounds good to me!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 1 month, 3 weeks ago
On Thu, 7 Aug 2025 at 11:38, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > Changes in v3:
> > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > >         - Added some tested-by tags.
> > > >
> > > > I decided it was time to give this a try, so I have queued this up for
> > > > v6.17 via the next branch at my pmdomain tree.
> > > >
> > > > If you encounter any issues, please let me know so I can help to fix them.
> > >
> > > Thanks for your series!  Due to holidays, I only managed to test
> > > this very recently.
> > >
> > > Unfortunately I have an issue with unused PM Domains no longer being
> > > disabled on R-Car:
> > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > >     disabled.
> > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > >     sometimes not disabled.
> > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > >     and enabling it did fix the issue.  However, after that I still
> > >     encountered the issue in a different config that does have the
> > >     IOMMU driver enabled...
> > >
> > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > the latter?  If you don't have a clue, I plan to do some more
> > > investigation later...
>
> of_genpd_add_provider_onecell() has:
>
>     if (!dev)
>             sync_state = true;
>     else
>             dev_set_drv_sync_state(dev, genpd_sync_state);
>
>     for (i = 0; i < data->num_domains; i++) {
>             ...
>             if (sync_state && !genpd_is_no_sync_state(genpd)) {
>                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
>                     device_set_node(&genpd->dev, fwnode);
>                     sync_state = false;
>                     ^^^^^^^^^^^^^^^^^^^
>             }
>             ...
>     }
>
> As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> Domain only.  All other domains have the default value of sync_state
> (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> is called later, it ignores all but the first domain.
> Apparently this is intentional, as of_genpd_sync_state() tries to
> power off all domains handled by the same controller anyway (see below)?

Right, this is intentional and mainly because of how fw_devlink works.

fw_devlink is limited to use only the first device - if multiple
devices share the same fwnode. In principle, we could have picked any
of the devices in the array of genpds here - and reached the same
result.

>
> > > BTW, the "pending due to"-messages look weird to me.
> > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > >
> > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > due to e6020000.watchdog
> > >
> > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > the watchdog resides in the always-on Clock Domain, and uses the
> > > clock-controller for PM_CLK handling.
>
> Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> these bogus pending states, and no PM Domain is powered off.

I see, thanks for the details. I am looking closer at this.

In any case, this is the main issue, as it prevents the ->sync_state()
callback to be called. Hence the "genpd->stay_on" will also *not* be
cleared for any of the genpd's for the genpd-provider.

>
> If I remove the "sync_state = false" above, genpd_provider_sync_state()
> considers all domains, and does power down all unused domains (even
> multiple times, as expected).

I think those are getting called because with the change above, there
is no device_link being tracked. As stated above, fw_devlink is
limited to use only one device - if multiple devices share the same
fwnode.

In other words, the ->sync_state() callbacks are called even if the
corresponding consumer devices have not been probed yet.

>
> Upon closer look, all "pending due to" messages I see claim that the
> first (index 0) PM Domain is pending on some devices, while all of
> these devices are part of a different domain (usually the always-on
> domain, which is always the last (32 or 64) on R-Car).
>
> So I think there are two issues:
>   1. Devices are not attributed to the correct PM Domain using
>      fw_devlink sync_state,
>   2. One PM Domain of a multi-domain controller being blocked should
>      not block all other domains handled by the same controller.

Right, that's a current limitation with fw_devlink. To cope with this,
it's possible to enforce the ->sync_state() callback to be invoked
from user-space (timeout or explicitly) for a device.

Another option would be to allow an opt-out behavior for some genpd's
that are powered-on at initialization. Something along the lines of
the below.

From: Ulf Hansson <ulf.hansson@linaro.org>
Date: Tue, 29 Jul 2025 14:27:22 +0200
Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
 during boot

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/pmdomain/core.c   | 3 ++-
 include/linux/pm_domain.h | 7 +++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index 0006ab3d0789..ef0760824c92 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -187,6 +187,7 @@ static const struct genpd_lock_ops genpd_raw_spin_ops = {
 #define genpd_is_opp_table_fw(genpd)   (genpd->flags & GENPD_FLAG_OPP_TABLE_FW)
 #define genpd_is_dev_name_fw(genpd)    (genpd->flags & GENPD_FLAG_DEV_NAME_FW)
 #define genpd_is_no_sync_state(genpd)  (genpd->flags &
GENPD_FLAG_NO_SYNC_STATE)
+#define genpd_is_no_stay_on(genpd)     (genpd->flags & GENPD_FLAG_NO_STAY_ON)

 static inline bool irq_safe_dev_in_sleep_domain(struct device *dev,
                const struct generic_pm_domain *genpd)
@@ -2392,7 +2393,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
        INIT_WORK(&genpd->power_off_work, genpd_power_off_work_fn);
        atomic_set(&genpd->sd_count, 0);
        genpd->status = is_off ? GENPD_STATE_OFF : GENPD_STATE_ON;
-       genpd->stay_on = !is_off;
+       genpd->stay_on = !genpd_is_no_stay_on(genpd) && !is_off;
        genpd->sync_state = GENPD_SYNC_STATE_OFF;
        genpd->device_count = 0;
        genpd->provider = NULL;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index b9d3c7d5c4f8..61b81574efc4 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -109,6 +109,12 @@ struct dev_pm_domain_list {
  *                             genpd provider specific way, likely through a
  *                             parent device node. This flag makes genpd to
  *                             skip its internal support for this.
+ *
+ * GENPD_FLAG_NO_STAY_ON:      A powered-on PM domain at initialization is
+ *                             prevented by genpd from being powered-off until
+ *                             we receive a ->sync_state() or runs the
+ *                             late_initcall_sync. Use this flag to allow
+ *                             power-off without waiting for these conditions.
  */
 #define GENPD_FLAG_PM_CLK       (1U << 0)
 #define GENPD_FLAG_IRQ_SAFE     (1U << 1)
@@ -120,6 +126,7 @@ struct dev_pm_domain_list {
 #define GENPD_FLAG_OPP_TABLE_FW         (1U << 7)
 #define GENPD_FLAG_DEV_NAME_FW  (1U << 8)
 #define GENPD_FLAG_NO_SYNC_STATE (1U << 9)
+#define GENPD_FLAG_NO_STAY_ON   (1U << 10)

 enum gpd_status {
        GENPD_STATE_ON = 0,     /* PM domain is on */
-- 
2.43.0

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 1 month, 3 weeks ago
Hi Ulf,

On Tue, 12 Aug 2025 at 12:01, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Thu, 7 Aug 2025 at 11:38, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > Changes in v3:
> > > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > > >         - Added some tested-by tags.
> > > > >
> > > > > I decided it was time to give this a try, so I have queued this up for
> > > > > v6.17 via the next branch at my pmdomain tree.
> > > > >
> > > > > If you encounter any issues, please let me know so I can help to fix them.
> > > >
> > > > Thanks for your series!  Due to holidays, I only managed to test
> > > > this very recently.
> > > >
> > > > Unfortunately I have an issue with unused PM Domains no longer being
> > > > disabled on R-Car:
> > > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > > >     disabled.
> > > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > > >     sometimes not disabled.
> > > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > > >     and enabling it did fix the issue.  However, after that I still
> > > >     encountered the issue in a different config that does have the
> > > >     IOMMU driver enabled...
> > > >
> > > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > > the latter?  If you don't have a clue, I plan to do some more
> > > > investigation later...
> >
> > of_genpd_add_provider_onecell() has:
> >
> >     if (!dev)
> >             sync_state = true;
> >     else
> >             dev_set_drv_sync_state(dev, genpd_sync_state);
> >
> >     for (i = 0; i < data->num_domains; i++) {
> >             ...
> >             if (sync_state && !genpd_is_no_sync_state(genpd)) {
> >                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
> >                     device_set_node(&genpd->dev, fwnode);
> >                     sync_state = false;
> >                     ^^^^^^^^^^^^^^^^^^^
> >             }
> >             ...
> >     }
> >
> > As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> > genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> > Domain only.  All other domains have the default value of sync_state
> > (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> > is called later, it ignores all but the first domain.
> > Apparently this is intentional, as of_genpd_sync_state() tries to
> > power off all domains handled by the same controller anyway (see below)?
>
> Right, this is intentional and mainly because of how fw_devlink works.
>
> fw_devlink is limited to use only the first device - if multiple
> devices share the same fwnode. In principle, we could have picked any
> of the devices in the array of genpds here - and reached the same
> result.

OK, just like I already assumed...

> > > > BTW, the "pending due to"-messages look weird to me.
> > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > >
> > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > due to e6020000.watchdog
> > > >
> > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > clock-controller for PM_CLK handling.
> >
> > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > these bogus pending states, and no PM Domain is powered off.
>
> I see, thanks for the details. I am looking closer at this.
>
> In any case, this is the main issue, as it prevents the ->sync_state()
> callback to be called. Hence the "genpd->stay_on" will also *not* be
> cleared for any of the genpd's for the genpd-provider.

I was under the impression there is a time-out, after which the
.sync_state() callback would be called anyway, just like for probe
deferral due to missing optional providers like DMACs and IOMMUs.
Apparently that is not the case?

> > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > considers all domains, and does power down all unused domains (even
> > multiple times, as expected).
>
> I think those are getting called because with the change above, there
> is no device_link being tracked. As stated above, fw_devlink is
> limited to use only one device - if multiple devices share the same
> fwnode.

Indeed.

> In other words, the ->sync_state() callbacks are called even if the
> corresponding consumer devices have not been probed yet.

Hence shouldn't there be a timeout, as the kernel may not even have
a driver for one or more consumer devices?

> > Upon closer look, all "pending due to" messages I see claim that the
> > first (index 0) PM Domain is pending on some devices, while all of
> > these devices are part of a different domain (usually the always-on
> > domain, which is always the last (32 or 64) on R-Car).
> >
> > So I think there are two issues:
> >   1. Devices are not attributed to the correct PM Domain using
> >      fw_devlink sync_state,
> >   2. One PM Domain of a multi-domain controller being blocked should
> >      not block all other domains handled by the same controller.
>
> Right, that's a current limitation with fw_devlink. To cope with this,
> it's possible to enforce the ->sync_state() callback to be invoked
> from user-space (timeout or explicitly) for a device.
>
> Another option would be to allow an opt-out behavior for some genpd's
> that are powered-on at initialization. Something along the lines of
> the below.
>
> From: Ulf Hansson <ulf.hansson@linaro.org>
> Date: Tue, 29 Jul 2025 14:27:22 +0200
> Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
>  during boot

[...]

I gave this a try (i.e. "| GENPD_FLAG_NO_STAY_ON" in rcar-sysc.c), but
this doesn't make any difference.  I assume this would only work when
actively calling genpd_power_off() (i.e. not from of_genpd_sync_state()
or genpd_provider_sync_state())?

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 1 month, 3 weeks ago
On Wed, 13 Aug 2025 at 13:58, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Tue, 12 Aug 2025 at 12:01, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Thu, 7 Aug 2025 at 11:38, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > Changes in v3:
> > > > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > > > >         - Added some tested-by tags.
> > > > > >
> > > > > > I decided it was time to give this a try, so I have queued this up for
> > > > > > v6.17 via the next branch at my pmdomain tree.
> > > > > >
> > > > > > If you encounter any issues, please let me know so I can help to fix them.
> > > > >
> > > > > Thanks for your series!  Due to holidays, I only managed to test
> > > > > this very recently.
> > > > >
> > > > > Unfortunately I have an issue with unused PM Domains no longer being
> > > > > disabled on R-Car:
> > > > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > > > >     disabled.
> > > > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > > > >     sometimes not disabled.
> > > > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > > > >     and enabling it did fix the issue.  However, after that I still
> > > > >     encountered the issue in a different config that does have the
> > > > >     IOMMU driver enabled...
> > > > >
> > > > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > > > the latter?  If you don't have a clue, I plan to do some more
> > > > > investigation later...
> > >
> > > of_genpd_add_provider_onecell() has:
> > >
> > >     if (!dev)
> > >             sync_state = true;
> > >     else
> > >             dev_set_drv_sync_state(dev, genpd_sync_state);
> > >
> > >     for (i = 0; i < data->num_domains; i++) {
> > >             ...
> > >             if (sync_state && !genpd_is_no_sync_state(genpd)) {
> > >                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
> > >                     device_set_node(&genpd->dev, fwnode);
> > >                     sync_state = false;
> > >                     ^^^^^^^^^^^^^^^^^^^
> > >             }
> > >             ...
> > >     }
> > >
> > > As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> > > genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> > > Domain only.  All other domains have the default value of sync_state
> > > (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> > > is called later, it ignores all but the first domain.
> > > Apparently this is intentional, as of_genpd_sync_state() tries to
> > > power off all domains handled by the same controller anyway (see below)?
> >
> > Right, this is intentional and mainly because of how fw_devlink works.
> >
> > fw_devlink is limited to use only the first device - if multiple
> > devices share the same fwnode. In principle, we could have picked any
> > of the devices in the array of genpds here - and reached the same
> > result.
>
> OK, just like I already assumed...
>
> > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > >
> > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > due to e6020000.watchdog
> > > > >
> > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > clock-controller for PM_CLK handling.
> > >
> > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > these bogus pending states, and no PM Domain is powered off.
> >
> > I see, thanks for the details. I am looking closer at this.
> >
> > In any case, this is the main issue, as it prevents the ->sync_state()
> > callback to be called. Hence the "genpd->stay_on" will also *not* be
> > cleared for any of the genpd's for the genpd-provider.
>
> I was under the impression there is a time-out, after which the
> .sync_state() callback would be called anyway, just like for probe
> deferral due to missing optional providers like DMACs and IOMMUs.
> Apparently that is not the case?

The behaviour is configurable, so it depends. The current default
behaviour does *not* enforce the ->sync_state() callbacks to be
called, even after a time-out.

You may set CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT to achieve the above
behavior or use the fw_devlink command line parameters to change it.
Like setting "fw_devlink.sync_state=timeout".

I guess it can be debated what the default behaviour should be.
Perhaps we should even allow the default behaviour to be dynamically
tweaked on a per provider device/driver basis?

>
> > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > considers all domains, and does power down all unused domains (even
> > > multiple times, as expected).
> >
> > I think those are getting called because with the change above, there
> > is no device_link being tracked. As stated above, fw_devlink is
> > limited to use only one device - if multiple devices share the same
> > fwnode.
>
> Indeed.
>
> > In other words, the ->sync_state() callbacks are called even if the
> > corresponding consumer devices have not been probed yet.
>
> Hence shouldn't there be a timeout, as the kernel may not even have
> a driver for one or more consumer devices?

See above.

>
> > > Upon closer look, all "pending due to" messages I see claim that the
> > > first (index 0) PM Domain is pending on some devices, while all of
> > > these devices are part of a different domain (usually the always-on
> > > domain, which is always the last (32 or 64) on R-Car).
> > >
> > > So I think there are two issues:
> > >   1. Devices are not attributed to the correct PM Domain using
> > >      fw_devlink sync_state,
> > >   2. One PM Domain of a multi-domain controller being blocked should
> > >      not block all other domains handled by the same controller.
> >
> > Right, that's a current limitation with fw_devlink. To cope with this,
> > it's possible to enforce the ->sync_state() callback to be invoked
> > from user-space (timeout or explicitly) for a device.
> >
> > Another option would be to allow an opt-out behavior for some genpd's
> > that are powered-on at initialization. Something along the lines of
> > the below.
> >
> > From: Ulf Hansson <ulf.hansson@linaro.org>
> > Date: Tue, 29 Jul 2025 14:27:22 +0200
> > Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
> >  during boot
>
> [...]
>
> I gave this a try (i.e. "| GENPD_FLAG_NO_STAY_ON" in rcar-sysc.c), but
> this doesn't make any difference.  I assume this would only work when
> actively calling genpd_power_off() (i.e. not from of_genpd_sync_state()
> or genpd_provider_sync_state())?

Right. Thanks for testing!

So, we may need to restore some part of the genpd_power_off_unused()
when CONFIG_PM_GENERIC_DOMAINS_OF is set. Without clearing
"genpd->stay_on".

I can extend the patch, if you think it would make sense for you?

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 1 month ago
Hi Ulf,

On Thu, 14 Aug 2025 at 17:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Wed, 13 Aug 2025 at 13:58, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, 12 Aug 2025 at 12:01, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > On Thu, 7 Aug 2025 at 11:38, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > Changes in v3:
> > > > > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > > > > >         - Added some tested-by tags.
> > > > > > >
> > > > > > > I decided it was time to give this a try, so I have queued this up for
> > > > > > > v6.17 via the next branch at my pmdomain tree.
> > > > > > >
> > > > > > > If you encounter any issues, please let me know so I can help to fix them.
> > > > > >
> > > > > > Thanks for your series!  Due to holidays, I only managed to test
> > > > > > this very recently.
> > > > > >
> > > > > > Unfortunately I have an issue with unused PM Domains no longer being
> > > > > > disabled on R-Car:
> > > > > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > > > > >     disabled.
> > > > > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > > > > >     sometimes not disabled.
> > > > > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > > > > >     and enabling it did fix the issue.  However, after that I still
> > > > > >     encountered the issue in a different config that does have the
> > > > > >     IOMMU driver enabled...
> > > > > >
> > > > > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > > > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > > > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > > > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > > > > the latter?  If you don't have a clue, I plan to do some more
> > > > > > investigation later...
> > > >
> > > > of_genpd_add_provider_onecell() has:
> > > >
> > > >     if (!dev)
> > > >             sync_state = true;
> > > >     else
> > > >             dev_set_drv_sync_state(dev, genpd_sync_state);
> > > >
> > > >     for (i = 0; i < data->num_domains; i++) {
> > > >             ...
> > > >             if (sync_state && !genpd_is_no_sync_state(genpd)) {
> > > >                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
> > > >                     device_set_node(&genpd->dev, fwnode);
> > > >                     sync_state = false;
> > > >                     ^^^^^^^^^^^^^^^^^^^
> > > >             }
> > > >             ...
> > > >     }
> > > >
> > > > As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> > > > genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> > > > Domain only.  All other domains have the default value of sync_state
> > > > (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> > > > is called later, it ignores all but the first domain.
> > > > Apparently this is intentional, as of_genpd_sync_state() tries to
> > > > power off all domains handled by the same controller anyway (see below)?
> > >
> > > Right, this is intentional and mainly because of how fw_devlink works.
> > >
> > > fw_devlink is limited to use only the first device - if multiple
> > > devices share the same fwnode. In principle, we could have picked any
> > > of the devices in the array of genpds here - and reached the same
> > > result.
> >
> > OK, just like I already assumed...
> >
> > > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > > >
> > > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > > due to e6020000.watchdog
> > > > > >
> > > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > > clock-controller for PM_CLK handling.
> > > >
> > > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > > these bogus pending states, and no PM Domain is powered off.
> > >
> > > I see, thanks for the details. I am looking closer at this.
> > >
> > > In any case, this is the main issue, as it prevents the ->sync_state()
> > > callback to be called. Hence the "genpd->stay_on" will also *not* be
> > > cleared for any of the genpd's for the genpd-provider.
> >
> > I was under the impression there is a time-out, after which the
> > .sync_state() callback would be called anyway, just like for probe
> > deferral due to missing optional providers like DMACs and IOMMUs.
> > Apparently that is not the case?
>
> The behaviour is configurable, so it depends. The current default
> behaviour does *not* enforce the ->sync_state() callbacks to be
> called, even after a time-out.
>
> You may set CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT to achieve the above
> behavior or use the fw_devlink command line parameters to change it.
> Like setting "fw_devlink.sync_state=timeout".
>
> I guess it can be debated what the default behaviour should be.
> Perhaps we should even allow the default behaviour to be dynamically
> tweaked on a per provider device/driver basis?

The domains are indeed powered off like before when passing
"fw_devlink.sync_state=timeout", so that fixes the regression.
But it was not needed before...

I could add "select FW_DEVLINK_SYNC_STATE_TIMEOUT" to the SYSC_RCAR
and SYSC_RCAR_GEN4 Kconfig options, but that would play badly with
multi-platform kernels.  As the fw_devlink_sync_state flag is static,
the R-Car SYSC drivers can't just auto-enable the flag at runtime.

Any other options? Perhaps a device-specific flag to be set by the PM
Domain driver, and to be checked by fw_devlink_dev_sync_state()?

> > > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > > considers all domains, and does power down all unused domains (even
> > > > multiple times, as expected).
> > >
> > > I think those are getting called because with the change above, there
> > > is no device_link being tracked. As stated above, fw_devlink is
> > > limited to use only one device - if multiple devices share the same
> > > fwnode.
> >
> > Indeed.
> >
> > > In other words, the ->sync_state() callbacks are called even if the
> > > corresponding consumer devices have not been probed yet.
> >
> > Hence shouldn't there be a timeout, as the kernel may not even have
> > a driver for one or more consumer devices?
>
> See above.
>
> > > > Upon closer look, all "pending due to" messages I see claim that the
> > > > first (index 0) PM Domain is pending on some devices, while all of
> > > > these devices are part of a different domain (usually the always-on
> > > > domain, which is always the last (32 or 64) on R-Car).
> > > >
> > > > So I think there are two issues:
> > > >   1. Devices are not attributed to the correct PM Domain using
> > > >      fw_devlink sync_state,
> > > >   2. One PM Domain of a multi-domain controller being blocked should
> > > >      not block all other domains handled by the same controller.
> > >
> > > Right, that's a current limitation with fw_devlink. To cope with this,
> > > it's possible to enforce the ->sync_state() callback to be invoked
> > > from user-space (timeout or explicitly) for a device.

That needs explicit handling, which was not needed before.

Perhaps the fw_devlink creation should be removed again from
of_genpd_add_provider_onecell(), as it is not correct, except for
the first domain?

> > > Another option would be to allow an opt-out behavior for some genpd's
> > > that are powered-on at initialization. Something along the lines of
> > > the below.
> > >
> > > From: Ulf Hansson <ulf.hansson@linaro.org>
> > > Date: Tue, 29 Jul 2025 14:27:22 +0200
> > > Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
> > >  during boot
> >
> > [...]
> >
> > I gave this a try (i.e. "| GENPD_FLAG_NO_STAY_ON" in rcar-sysc.c), but
> > this doesn't make any difference.  I assume this would only work when
> > actively calling genpd_power_off() (i.e. not from of_genpd_sync_state()
> > or genpd_provider_sync_state())?
>
> Right. Thanks for testing!
>
> So, we may need to restore some part of the genpd_power_off_unused()
> when CONFIG_PM_GENERIC_DOMAINS_OF is set. Without clearing
> "genpd->stay_on".
>
> I can extend the patch, if you think it would make sense for you?

I would applaud anything that would fix these regressions.
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 1 month ago
On Thu, 4 Sept 2025 at 14:41, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Thu, 14 Aug 2025 at 17:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Wed, 13 Aug 2025 at 13:58, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Tue, 12 Aug 2025 at 12:01, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > On Thu, 7 Aug 2025 at 11:38, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > > Changes in v3:
> > > > > > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > > > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > > > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > > > > > >         - Added some tested-by tags.
> > > > > > > >
> > > > > > > > I decided it was time to give this a try, so I have queued this up for
> > > > > > > > v6.17 via the next branch at my pmdomain tree.
> > > > > > > >
> > > > > > > > If you encounter any issues, please let me know so I can help to fix them.
> > > > > > >
> > > > > > > Thanks for your series!  Due to holidays, I only managed to test
> > > > > > > this very recently.
> > > > > > >
> > > > > > > Unfortunately I have an issue with unused PM Domains no longer being
> > > > > > > disabled on R-Car:
> > > > > > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > > > > > >     disabled.
> > > > > > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > > > > > >     sometimes not disabled.
> > > > > > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > > > > > >     and enabling it did fix the issue.  However, after that I still
> > > > > > >     encountered the issue in a different config that does have the
> > > > > > >     IOMMU driver enabled...
> > > > > > >
> > > > > > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > > > > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > > > > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > > > > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > > > > > the latter?  If you don't have a clue, I plan to do some more
> > > > > > > investigation later...
> > > > >
> > > > > of_genpd_add_provider_onecell() has:
> > > > >
> > > > >     if (!dev)
> > > > >             sync_state = true;
> > > > >     else
> > > > >             dev_set_drv_sync_state(dev, genpd_sync_state);
> > > > >
> > > > >     for (i = 0; i < data->num_domains; i++) {
> > > > >             ...
> > > > >             if (sync_state && !genpd_is_no_sync_state(genpd)) {
> > > > >                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
> > > > >                     device_set_node(&genpd->dev, fwnode);
> > > > >                     sync_state = false;
> > > > >                     ^^^^^^^^^^^^^^^^^^^
> > > > >             }
> > > > >             ...
> > > > >     }
> > > > >
> > > > > As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> > > > > genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> > > > > Domain only.  All other domains have the default value of sync_state
> > > > > (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> > > > > is called later, it ignores all but the first domain.
> > > > > Apparently this is intentional, as of_genpd_sync_state() tries to
> > > > > power off all domains handled by the same controller anyway (see below)?
> > > >
> > > > Right, this is intentional and mainly because of how fw_devlink works.
> > > >
> > > > fw_devlink is limited to use only the first device - if multiple
> > > > devices share the same fwnode. In principle, we could have picked any
> > > > of the devices in the array of genpds here - and reached the same
> > > > result.
> > >
> > > OK, just like I already assumed...
> > >
> > > > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > > > >
> > > > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > > > due to e6020000.watchdog
> > > > > > >
> > > > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > > > clock-controller for PM_CLK handling.
> > > > >
> > > > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > > > these bogus pending states, and no PM Domain is powered off.
> > > >
> > > > I see, thanks for the details. I am looking closer at this.
> > > >
> > > > In any case, this is the main issue, as it prevents the ->sync_state()
> > > > callback to be called. Hence the "genpd->stay_on" will also *not* be
> > > > cleared for any of the genpd's for the genpd-provider.
> > >
> > > I was under the impression there is a time-out, after which the
> > > .sync_state() callback would be called anyway, just like for probe
> > > deferral due to missing optional providers like DMACs and IOMMUs.
> > > Apparently that is not the case?
> >
> > The behaviour is configurable, so it depends. The current default
> > behaviour does *not* enforce the ->sync_state() callbacks to be
> > called, even after a time-out.
> >
> > You may set CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT to achieve the above
> > behavior or use the fw_devlink command line parameters to change it.
> > Like setting "fw_devlink.sync_state=timeout".
> >
> > I guess it can be debated what the default behaviour should be.
> > Perhaps we should even allow the default behaviour to be dynamically
> > tweaked on a per provider device/driver basis?
>
> The domains are indeed powered off like before when passing
> "fw_devlink.sync_state=timeout", so that fixes the regression.
> But it was not needed before...

Right, the default behavior in genpd has changed. The approach was
taken as we believed that original behavior was not correct.

Although, the current situation isn't good as it causes lot of churns for us.

>
> I could add "select FW_DEVLINK_SYNC_STATE_TIMEOUT" to the SYSC_RCAR
> and SYSC_RCAR_GEN4 Kconfig options, but that would play badly with
> multi-platform kernels.  As the fw_devlink_sync_state flag is static,
> the R-Car SYSC drivers can't just auto-enable the flag at runtime.
>
> Any other options? Perhaps a device-specific flag to be set by the PM
> Domain driver, and to be checked by fw_devlink_dev_sync_state()?

Something along those lines seems reasonable to me too.

However, the remaining question though, is what should be the default
behavior in genpd for this. Due to all the churns, we may want
something that is closer to what we had *before*, which means to
always use the timeout option, unless the genpd provider driver
explicitly requests to not to (an opt-out genpd config flag).

Saravana, it would be nice to get some thoughts from you are around
this? Does it make sense?

>
> > > > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > > > considers all domains, and does power down all unused domains (even
> > > > > multiple times, as expected).
> > > >
> > > > I think those are getting called because with the change above, there
> > > > is no device_link being tracked. As stated above, fw_devlink is
> > > > limited to use only one device - if multiple devices share the same
> > > > fwnode.
> > >
> > > Indeed.
> > >
> > > > In other words, the ->sync_state() callbacks are called even if the
> > > > corresponding consumer devices have not been probed yet.
> > >
> > > Hence shouldn't there be a timeout, as the kernel may not even have
> > > a driver for one or more consumer devices?
> >
> > See above.
> >
> > > > > Upon closer look, all "pending due to" messages I see claim that the
> > > > > first (index 0) PM Domain is pending on some devices, while all of
> > > > > these devices are part of a different domain (usually the always-on
> > > > > domain, which is always the last (32 or 64) on R-Car).
> > > > >
> > > > > So I think there are two issues:
> > > > >   1. Devices are not attributed to the correct PM Domain using
> > > > >      fw_devlink sync_state,
> > > > >   2. One PM Domain of a multi-domain controller being blocked should
> > > > >      not block all other domains handled by the same controller.
> > > >
> > > > Right, that's a current limitation with fw_devlink. To cope with this,
> > > > it's possible to enforce the ->sync_state() callback to be invoked
> > > > from user-space (timeout or explicitly) for a device.
>
> That needs explicit handling, which was not needed before.
>
> Perhaps the fw_devlink creation should be removed again from
> of_genpd_add_provider_onecell(), as it is not correct, except for
> the first domain?

There is nothing wrong with it, the behavior is correct, unless I
failed to understand your problem. To me the problem is that it is
just less fine grained, compared to using
of_genpd_add_provider_simple(). All PM domains that is provided by a
single power-domain controller that uses #power-domain-cells = <1>,
requires all consumers of them to be probed, to allow the sync_state
callback to be invoked.

If we could teach fw_devlink to take into account the
power-domain-*id* that is specified by the consumer node, when the
provider is using #power-domain-cells = <1>, this could potentially
become as fine-grained as of_genpd_add_provider_simple().

Saravana, do think we can extend fw_devlink to take this into account somehow?

>
> > > > Another option would be to allow an opt-out behavior for some genpd's
> > > > that are powered-on at initialization. Something along the lines of
> > > > the below.
> > > >
> > > > From: Ulf Hansson <ulf.hansson@linaro.org>
> > > > Date: Tue, 29 Jul 2025 14:27:22 +0200
> > > > Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
> > > >  during boot
> > >
> > > [...]
> > >
> > > I gave this a try (i.e. "| GENPD_FLAG_NO_STAY_ON" in rcar-sysc.c), but
> > > this doesn't make any difference.  I assume this would only work when
> > > actively calling genpd_power_off() (i.e. not from of_genpd_sync_state()
> > > or genpd_provider_sync_state())?
> >
> > Right. Thanks for testing!
> >
> > So, we may need to restore some part of the genpd_power_off_unused()
> > when CONFIG_PM_GENERIC_DOMAINS_OF is set. Without clearing
> > "genpd->stay_on".
> >
> > I can extend the patch, if you think it would make sense for you?
>
> I would applaud anything that would fix these regressions.
> Thanks!

While being mostly silent from my side for a while, I have been
continuing to evolve my test suite for sync_state tests. Wanted to
make sure I cover additional new corner cases that you have pointed
out for Renesas platforms.

That said, I have not observed any issues on my side, so it looks like
we are simply facing new behaviors that you have pointed out to be a
problem. In this regards, I think it's important to note that we are
currently only seeing problems for those genpds that are powered-on
when the provider driver calls pm_genpd_init(). Another simple option
is to power-off those PM domains that we know is not really needed to
be powered-on. Also not perfect, but an option.

Another option is something along the lines of what I posted, but
perhaps extending it to let the late_initcall_sync to try to power-off
the unused PM domains.

I will work on updating the patch so we can try it out - or perhaps
what you suggested above with a device-specific flag taken into
account by fw_devlink_dev_sync_state() would be better/sufficient you
think?

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 1 month ago
Hi Ulf,

On Thu, 4 Sept 2025 at 17:44, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Thu, 4 Sept 2025 at 14:41, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Thu, 14 Aug 2025 at 17:50, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > On Wed, 13 Aug 2025 at 13:58, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Tue, 12 Aug 2025 at 12:01, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > On Thu, 7 Aug 2025 at 11:38, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > On Wed, 30 Jul 2025 at 12:29, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > On Wed, 30 Jul 2025 at 11:56, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > > > On Wed, 9 Jul 2025 at 13:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > > On Tue, 1 Jul 2025 at 13:47, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > > > Changes in v3:
> > > > > > > > > >         - Added a couple of patches to adress problems on some Renesas
> > > > > > > > > >         platforms. Thanks Geert and Tomi for helping out!
> > > > > > > > > >         - Adressed a few comments from Saravanna and Konrad.
> > > > > > > > > >         - Added some tested-by tags.
> > > > > > > > >
> > > > > > > > > I decided it was time to give this a try, so I have queued this up for
> > > > > > > > > v6.17 via the next branch at my pmdomain tree.
> > > > > > > > >
> > > > > > > > > If you encounter any issues, please let me know so I can help to fix them.
> > > > > > > >
> > > > > > > > Thanks for your series!  Due to holidays, I only managed to test
> > > > > > > > this very recently.
> > > > > > > >
> > > > > > > > Unfortunately I have an issue with unused PM Domains no longer being
> > > > > > > > disabled on R-Car:
> > > > > > > >   - On R-Car Gen1/2/3, using rcar-sysc.c, unused PM Domains are never
> > > > > > > >     disabled.
> > > > > > > >   - On R-Car Gen4, using rcar-gen4-sysc.c, unused PM Domains are
> > > > > > > >     sometimes not disabled.
> > > > > > > >     At first, I noticed the IOMMU driver was not enabled in my config,
> > > > > > > >     and enabling it did fix the issue.  However, after that I still
> > > > > > > >     encountered the issue in a different config that does have the
> > > > > > > >     IOMMU driver enabled...
> > > > > > > >
> > > > > > > > FTR, unused PM Domains are still disabled correctly on R/SH-Mobile
> > > > > > > > (using rmobile-sysc.c) and on BeagleBone Black. Note that these use
> > > > > > > > of_genpd_add_provider_simple(), while all R-Car drivers use
> > > > > > > > of_genpd_add_provider_onecell().  Perhaps there is an issue with
> > > > > > > > the latter?  If you don't have a clue, I plan to do some more
> > > > > > > > investigation later...
> > > > > >
> > > > > > of_genpd_add_provider_onecell() has:
> > > > > >
> > > > > >     if (!dev)
> > > > > >             sync_state = true;
> > > > > >     else
> > > > > >             dev_set_drv_sync_state(dev, genpd_sync_state);
> > > > > >
> > > > > >     for (i = 0; i < data->num_domains; i++) {
> > > > > >             ...
> > > > > >             if (sync_state && !genpd_is_no_sync_state(genpd)) {
> > > > > >                     genpd->sync_state = GENPD_SYNC_STATE_ONECELL;
> > > > > >                     device_set_node(&genpd->dev, fwnode);
> > > > > >                     sync_state = false;
> > > > > >                     ^^^^^^^^^^^^^^^^^^^
> > > > > >             }
> > > > > >             ...
> > > > > >     }
> > > > > >
> > > > > > As the R-Car SYSC drivers are not platform drivers, dev is NULL, and
> > > > > > genpd->sync_state is set to GENPD_SYNC_STATE_ONECELL for the first PM
> > > > > > Domain only.  All other domains have the default value of sync_state
> > > > > > (0 = GENPD_SYNC_STATE_OFF).  Hence when genpd_provider_sync_state()
> > > > > > is called later, it ignores all but the first domain.
> > > > > > Apparently this is intentional, as of_genpd_sync_state() tries to
> > > > > > power off all domains handled by the same controller anyway (see below)?
> > > > >
> > > > > Right, this is intentional and mainly because of how fw_devlink works.
> > > > >
> > > > > fw_devlink is limited to use only the first device - if multiple
> > > > > devices share the same fwnode. In principle, we could have picked any
> > > > > of the devices in the array of genpds here - and reached the same
> > > > > result.
> > > >
> > > > OK, just like I already assumed...
> > > >
> > > > > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > > > > >
> > > > > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > > > > due to e6020000.watchdog
> > > > > > > >
> > > > > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > > > > clock-controller for PM_CLK handling.
> > > > > >
> > > > > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > > > > these bogus pending states, and no PM Domain is powered off.
> > > > >
> > > > > I see, thanks for the details. I am looking closer at this.
> > > > >
> > > > > In any case, this is the main issue, as it prevents the ->sync_state()
> > > > > callback to be called. Hence the "genpd->stay_on" will also *not* be
> > > > > cleared for any of the genpd's for the genpd-provider.
> > > >
> > > > I was under the impression there is a time-out, after which the
> > > > .sync_state() callback would be called anyway, just like for probe
> > > > deferral due to missing optional providers like DMACs and IOMMUs.
> > > > Apparently that is not the case?
> > >
> > > The behaviour is configurable, so it depends. The current default
> > > behaviour does *not* enforce the ->sync_state() callbacks to be
> > > called, even after a time-out.
> > >
> > > You may set CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT to achieve the above
> > > behavior or use the fw_devlink command line parameters to change it.
> > > Like setting "fw_devlink.sync_state=timeout".
> > >
> > > I guess it can be debated what the default behaviour should be.
> > > Perhaps we should even allow the default behaviour to be dynamically
> > > tweaked on a per provider device/driver basis?
> >
> > The domains are indeed powered off like before when passing
> > "fw_devlink.sync_state=timeout", so that fixes the regression.
> > But it was not needed before...
>
> Right, the default behavior in genpd has changed. The approach was
> taken as we believed that original behavior was not correct.
>
> Although, the current situation isn't good as it causes lot of churns for us.

I did some more testing, and there seems to be a similar issue with
DMA controllers, which I hadn't really noticed before, as I have
DMAC drivers built-in in all my kernel configs.  If the (optional)
DMAC driver is not available, you need "fw_devlink=permissive" to make
drivers probe devices that have (optional) DMA support.
Interestingly, that is not the case with (optional) IOMMU support:
drivers do probe devices that have (optional) DMA support when the
latter is not available.

> > I could add "select FW_DEVLINK_SYNC_STATE_TIMEOUT" to the SYSC_RCAR
> > and SYSC_RCAR_GEN4 Kconfig options, but that would play badly with
> > multi-platform kernels.  As the fw_devlink_sync_state flag is static,
> > the R-Car SYSC drivers can't just auto-enable the flag at runtime.
> >
> > Any other options? Perhaps a device-specific flag to be set by the PM
> > Domain driver, and to be checked by fw_devlink_dev_sync_state()?
>
> Something along those lines seems reasonable to me too.
>
> However, the remaining question though, is what should be the default
> behavior in genpd for this. Due to all the churns, we may want
> something that is closer to what we had *before*, which means to
> always use the timeout option, unless the genpd provider driver
> explicitly requests to not to (an opt-out genpd config flag).
>
> Saravana, it would be nice to get some thoughts from you are around
> this? Does it make sense?
>
> >
> > > > > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > > > > considers all domains, and does power down all unused domains (even
> > > > > > multiple times, as expected).
> > > > >
> > > > > I think those are getting called because with the change above, there
> > > > > is no device_link being tracked. As stated above, fw_devlink is
> > > > > limited to use only one device - if multiple devices share the same
> > > > > fwnode.
> > > >
> > > > Indeed.
> > > >
> > > > > In other words, the ->sync_state() callbacks are called even if the
> > > > > corresponding consumer devices have not been probed yet.
> > > >
> > > > Hence shouldn't there be a timeout, as the kernel may not even have
> > > > a driver for one or more consumer devices?
> > >
> > > See above.
> > >
> > > > > > Upon closer look, all "pending due to" messages I see claim that the
> > > > > > first (index 0) PM Domain is pending on some devices, while all of
> > > > > > these devices are part of a different domain (usually the always-on
> > > > > > domain, which is always the last (32 or 64) on R-Car).
> > > > > >
> > > > > > So I think there are two issues:
> > > > > >   1. Devices are not attributed to the correct PM Domain using
> > > > > >      fw_devlink sync_state,
> > > > > >   2. One PM Domain of a multi-domain controller being blocked should
> > > > > >      not block all other domains handled by the same controller.
> > > > >
> > > > > Right, that's a current limitation with fw_devlink. To cope with this,
> > > > > it's possible to enforce the ->sync_state() callback to be invoked
> > > > > from user-space (timeout or explicitly) for a device.
> >
> > That needs explicit handling, which was not needed before.
> >
> > Perhaps the fw_devlink creation should be removed again from
> > of_genpd_add_provider_onecell(), as it is not correct, except for
> > the first domain?
>
> There is nothing wrong with it, the behavior is correct, unless I
> failed to understand your problem. To me the problem is that it is

It does print wrong and confusing warnings indicating the problem:

    genpd_provider ca57-cpu0: sync_state() pending due to fe000000.pcie

(no, the PCIe controller is not part of the CPU PM Domain)

> just less fine grained, compared to using
> of_genpd_add_provider_simple(). All PM domains that is provided by a
> single power-domain controller that uses #power-domain-cells = <1>,
> requires all consumers of them to be probed, to allow the sync_state
> callback to be invoked.

if it would be just coarse-grained, the link should be created to a
device representing the controller, not to the device representing
the first PM domain.

> If we could teach fw_devlink to take into account the
> power-domain-*id* that is specified by the consumer node, when the
> provider is using #power-domain-cells = <1>, this could potentially
> become as fine-grained as of_genpd_add_provider_simple().
>
> Saravana, do think we can extend fw_devlink to take this into account somehow?

Doesn't the pmdomain core create a device for each PM domain, so the
index could be used to link to the correct domain?

> > > > > Another option would be to allow an opt-out behavior for some genpd's
> > > > > that are powered-on at initialization. Something along the lines of
> > > > > the below.
> > > > >
> > > > > From: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > Date: Tue, 29 Jul 2025 14:27:22 +0200
> > > > > Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
> > > > >  during boot
> > > >
> > > > [...]
> > > >
> > > > I gave this a try (i.e. "| GENPD_FLAG_NO_STAY_ON" in rcar-sysc.c), but
> > > > this doesn't make any difference.  I assume this would only work when
> > > > actively calling genpd_power_off() (i.e. not from of_genpd_sync_state()
> > > > or genpd_provider_sync_state())?
> > >
> > > Right. Thanks for testing!
> > >
> > > So, we may need to restore some part of the genpd_power_off_unused()
> > > when CONFIG_PM_GENERIC_DOMAINS_OF is set. Without clearing
> > > "genpd->stay_on".
> > >
> > > I can extend the patch, if you think it would make sense for you?
> >
> > I would applaud anything that would fix these regressions.
> > Thanks!
>
> While being mostly silent from my side for a while, I have been
> continuing to evolve my test suite for sync_state tests. Wanted to
> make sure I cover additional new corner cases that you have pointed
> out for Renesas platforms.
>
> That said, I have not observed any issues on my side, so it looks like
> we are simply facing new behaviors that you have pointed out to be a
> problem. In this regards, I think it's important to note that we are
> currently only seeing problems for those genpds that are powered-on
> when the provider driver calls pm_genpd_init(). Another simple option
> is to power-off those PM domains that we know is not really needed to
> be powered-on. Also not perfect, but an option.
>
> Another option is something along the lines of what I posted, but
> perhaps extending it to let the late_initcall_sync to try to power-off
> the unused PM domains.

Yep, like before.

> I will work on updating the patch so we can try it out - or perhaps
> what you suggested above with a device-specific flag taken into
> account by fw_devlink_dev_sync_state() would be better/sufficient you
> think?

I also had a closer look at the few pending sync_state() pending
warnings I do see on SH/R-Mobile SoCs.  Their PM Domain driver uses
of_genpd_add_provider_simple() instead of of_genpd_add_provider_onecell().
E.g. on R-Mobile A1:

    genpd_provider a3sm: sync_state() pending due to f0100000.cache-controller
    genpd_provider a4s: sync_state() pending due to fe400000.memory-controller
    genpd_provider d4: sync_state() pending due to ptm

All three domains were already handled specially in the PM Domain
driver, as they must not be turned off, so these platforms didn't
regress due to this series.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 1 month ago
[...]

> > > > > > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > > > > > >
> > > > > > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > > > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > > > > > due to e6020000.watchdog
> > > > > > > > >
> > > > > > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > > > > > clock-controller for PM_CLK handling.
> > > > > > >
> > > > > > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > > > > > these bogus pending states, and no PM Domain is powered off.
> > > > > >
> > > > > > I see, thanks for the details. I am looking closer at this.
> > > > > >
> > > > > > In any case, this is the main issue, as it prevents the ->sync_state()
> > > > > > callback to be called. Hence the "genpd->stay_on" will also *not* be
> > > > > > cleared for any of the genpd's for the genpd-provider.
> > > > >
> > > > > I was under the impression there is a time-out, after which the
> > > > > .sync_state() callback would be called anyway, just like for probe
> > > > > deferral due to missing optional providers like DMACs and IOMMUs.
> > > > > Apparently that is not the case?
> > > >
> > > > The behaviour is configurable, so it depends. The current default
> > > > behaviour does *not* enforce the ->sync_state() callbacks to be
> > > > called, even after a time-out.
> > > >
> > > > You may set CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT to achieve the above
> > > > behavior or use the fw_devlink command line parameters to change it.
> > > > Like setting "fw_devlink.sync_state=timeout".
> > > >
> > > > I guess it can be debated what the default behaviour should be.
> > > > Perhaps we should even allow the default behaviour to be dynamically
> > > > tweaked on a per provider device/driver basis?
> > >
> > > The domains are indeed powered off like before when passing
> > > "fw_devlink.sync_state=timeout", so that fixes the regression.
> > > But it was not needed before...
> >
> > Right, the default behavior in genpd has changed. The approach was
> > taken as we believed that original behavior was not correct.
> >
> > Although, the current situation isn't good as it causes lot of churns for us.
>
> I did some more testing, and there seems to be a similar issue with
> DMA controllers, which I hadn't really noticed before, as I have
> DMAC drivers built-in in all my kernel configs.  If the (optional)
> DMAC driver is not available, you need "fw_devlink=permissive" to make
> drivers probe devices that have (optional) DMA support.

Well, after the deferred probe timeout, the missing supplier depency
should be ignored even if you don't have "permissive".

Typically we get a print in the log that could say:
"pm-test-drv pm_test24: deferred probe timeout, ignoring dependency"

... and then the driver-core tries probing the device anyway. It's
then up to the consumer driver to allow probing to succeed, even if
the DMA setup procedure fails.

> Interestingly, that is not the case with (optional) IOMMU support:
> drivers do probe devices that have (optional) DMA support when the
> latter is not available.

Okay, so some special treatments are already in place for IOMMUs, to
allow them to probe sooner, I guess. I don't know in detail how this
works, maybe Saravana can help to fill in?

Although my guess is that, by allowing IOMMUs to probe like this,
means that these drivers need to manage the DMA setup quite
differently.

The DMA driver (supplier) may probe and become available *after* the
IOMMU driver has probed. In other words, the IOMMU drivers then need
to manage the DMA setup even beyond probing.

>
> > > I could add "select FW_DEVLINK_SYNC_STATE_TIMEOUT" to the SYSC_RCAR
> > > and SYSC_RCAR_GEN4 Kconfig options, but that would play badly with
> > > multi-platform kernels.  As the fw_devlink_sync_state flag is static,
> > > the R-Car SYSC drivers can't just auto-enable the flag at runtime.
> > >
> > > Any other options? Perhaps a device-specific flag to be set by the PM
> > > Domain driver, and to be checked by fw_devlink_dev_sync_state()?
> >
> > Something along those lines seems reasonable to me too.
> >
> > However, the remaining question though, is what should be the default
> > behavior in genpd for this. Due to all the churns, we may want
> > something that is closer to what we had *before*, which means to
> > always use the timeout option, unless the genpd provider driver
> > explicitly requests to not to (an opt-out genpd config flag).
> >
> > Saravana, it would be nice to get some thoughts from you are around
> > this? Does it make sense?
> >
> > >
> > > > > > > If I remove the "sync_state = false" above, genpd_provider_sync_state()
> > > > > > > considers all domains, and does power down all unused domains (even
> > > > > > > multiple times, as expected).
> > > > > >
> > > > > > I think those are getting called because with the change above, there
> > > > > > is no device_link being tracked. As stated above, fw_devlink is
> > > > > > limited to use only one device - if multiple devices share the same
> > > > > > fwnode.
> > > > >
> > > > > Indeed.
> > > > >
> > > > > > In other words, the ->sync_state() callbacks are called even if the
> > > > > > corresponding consumer devices have not been probed yet.
> > > > >
> > > > > Hence shouldn't there be a timeout, as the kernel may not even have
> > > > > a driver for one or more consumer devices?
> > > >
> > > > See above.
> > > >
> > > > > > > Upon closer look, all "pending due to" messages I see claim that the
> > > > > > > first (index 0) PM Domain is pending on some devices, while all of
> > > > > > > these devices are part of a different domain (usually the always-on
> > > > > > > domain, which is always the last (32 or 64) on R-Car).
> > > > > > >
> > > > > > > So I think there are two issues:
> > > > > > >   1. Devices are not attributed to the correct PM Domain using
> > > > > > >      fw_devlink sync_state,
> > > > > > >   2. One PM Domain of a multi-domain controller being blocked should
> > > > > > >      not block all other domains handled by the same controller.
> > > > > >
> > > > > > Right, that's a current limitation with fw_devlink. To cope with this,
> > > > > > it's possible to enforce the ->sync_state() callback to be invoked
> > > > > > from user-space (timeout or explicitly) for a device.
> > >
> > > That needs explicit handling, which was not needed before.
> > >
> > > Perhaps the fw_devlink creation should be removed again from
> > > of_genpd_add_provider_onecell(), as it is not correct, except for
> > > the first domain?
> >
> > There is nothing wrong with it, the behavior is correct, unless I
> > failed to understand your problem. To me the problem is that it is
>
> It does print wrong and confusing warnings indicating the problem:
>
>     genpd_provider ca57-cpu0: sync_state() pending due to fe000000.pcie
>
> (no, the PCIe controller is not part of the CPU PM Domain)

That's a good point. We should fix this to avoid confusion.

>
> > just less fine grained, compared to using
> > of_genpd_add_provider_simple(). All PM domains that is provided by a
> > single power-domain controller that uses #power-domain-cells = <1>,
> > requires all consumers of them to be probed, to allow the sync_state
> > callback to be invoked.
>
> if it would be just coarse-grained, the link should be created to a
> device representing the controller, not to the device representing
> the first PM domain.
>
> > If we could teach fw_devlink to take into account the
> > power-domain-*id* that is specified by the consumer node, when the
> > provider is using #power-domain-cells = <1>, this could potentially
> > become as fine-grained as of_genpd_add_provider_simple().
> >
> > Saravana, do think we can extend fw_devlink to take this into account somehow?
>
> Doesn't the pmdomain core create a device for each PM domain, so the
> index could be used to link to the correct domain?

Correct, we have a device associated for each PM domain - and when
of_genpd_add_provider_onecell() is called, we get the index associated
for each of them.

If I understand your proposal, you are suggesting that genpd itself
creates the device_link between its corresponding device (supplier)
and the upcoming consumer device(s), at the point when
of_genpd_add_provider_onecell() is called, right?

It's an interesting idea, but I am not sure it will work. To create
the device_link at this point, assumes that all the consumer OF-nodes
have been populated (to have devices available for them). Maybe there
is an intermediate step we can take to instruct fw_devlink to treat
these "links" differently?

Saravana, any thoughts around this?

>
> > > > > > Another option would be to allow an opt-out behavior for some genpd's
> > > > > > that are powered-on at initialization. Something along the lines of
> > > > > > the below.
> > > > > >
> > > > > > From: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > > Date: Tue, 29 Jul 2025 14:27:22 +0200
> > > > > > Subject: [PATCH] pmdomain: core: Allow powered-on PM domains to be powered-off
> > > > > >  during boot
> > > > >
> > > > > [...]
> > > > >
> > > > > I gave this a try (i.e. "| GENPD_FLAG_NO_STAY_ON" in rcar-sysc.c), but
> > > > > this doesn't make any difference.  I assume this would only work when
> > > > > actively calling genpd_power_off() (i.e. not from of_genpd_sync_state()
> > > > > or genpd_provider_sync_state())?
> > > >
> > > > Right. Thanks for testing!
> > > >
> > > > So, we may need to restore some part of the genpd_power_off_unused()
> > > > when CONFIG_PM_GENERIC_DOMAINS_OF is set. Without clearing
> > > > "genpd->stay_on".
> > > >
> > > > I can extend the patch, if you think it would make sense for you?
> > >
> > > I would applaud anything that would fix these regressions.
> > > Thanks!
> >
> > While being mostly silent from my side for a while, I have been
> > continuing to evolve my test suite for sync_state tests. Wanted to
> > make sure I cover additional new corner cases that you have pointed
> > out for Renesas platforms.
> >
> > That said, I have not observed any issues on my side, so it looks like
> > we are simply facing new behaviors that you have pointed out to be a
> > problem. In this regards, I think it's important to note that we are
> > currently only seeing problems for those genpds that are powered-on
> > when the provider driver calls pm_genpd_init(). Another simple option
> > is to power-off those PM domains that we know is not really needed to
> > be powered-on. Also not perfect, but an option.
> >
> > Another option is something along the lines of what I posted, but
> > perhaps extending it to let the late_initcall_sync to try to power-off
> > the unused PM domains.
>
> Yep, like before.

Okay!

>
> > I will work on updating the patch so we can try it out - or perhaps
> > what you suggested above with a device-specific flag taken into
> > account by fw_devlink_dev_sync_state() would be better/sufficient you
> > think?
>
> I also had a closer look at the few pending sync_state() pending
> warnings I do see on SH/R-Mobile SoCs.  Their PM Domain driver uses
> of_genpd_add_provider_simple() instead of of_genpd_add_provider_onecell().
> E.g. on R-Mobile A1:
>
>     genpd_provider a3sm: sync_state() pending due to f0100000.cache-controller
>     genpd_provider a4s: sync_state() pending due to fe400000.memory-controller
>     genpd_provider d4: sync_state() pending due to ptm
>
> All three domains were already handled specially in the PM Domain
> driver, as they must not be turned off, so these platforms didn't
> regress due to this series.

Okay!

Again the prints are confusing and annoying, but in principle we are
fine. Thanks for clarifying!

This also makes me wonder if we should skip enabling the fw_devlink
support for those genpds with "simple" providers that are special
(GENPD_FLAG_ALWAYS_ON and GENPD_FLAG_RPM_ALWAYS_ON). It doesn't make
sense to track consumers for them, I think.

[...]

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Geert Uytterhoeven 4 weeks ago
Hi Ulf,

On Fri, 5 Sept 2025 at 13:09, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > > > BTW, the "pending due to"-messages look weird to me.
> > > > > > > > > > On R-Car M2-W (r8a7791.dtsi) I see e.g.:
> > > > > > > > > >
> > > > > > > > > >     genpd_provider ca15-cpu0: sync_state() pending due to e6020000.watchdog
> > > > > > > > > >     renesas-cpg-mssr e6150000.clock-controller: sync_state() pending
> > > > > > > > > > due to e6020000.watchdog
> > > > > > > > > >
> > > > > > > > > > ca15-cpu0 is the PM Domain holding the first CPU core, while
> > > > > > > > > > the watchdog resides in the always-on Clock Domain, and uses the
> > > > > > > > > > clock-controller for PM_CLK handling.
> > > > > > > >
> > > > > > > > Unfortunately the first PM Domain is "ca15-cpu0", which is blocked on
> > > > > > > > these bogus pending states, and no PM Domain is powered off.
> > > > > > >
> > > > > > > I see, thanks for the details. I am looking closer at this.
> > > > > > >
> > > > > > > In any case, this is the main issue, as it prevents the ->sync_state()
> > > > > > > callback to be called. Hence the "genpd->stay_on" will also *not* be
> > > > > > > cleared for any of the genpd's for the genpd-provider.
> > > > > >
> > > > > > I was under the impression there is a time-out, after which the
> > > > > > .sync_state() callback would be called anyway, just like for probe
> > > > > > deferral due to missing optional providers like DMACs and IOMMUs.
> > > > > > Apparently that is not the case?
> > > > >
> > > > > The behaviour is configurable, so it depends. The current default
> > > > > behaviour does *not* enforce the ->sync_state() callbacks to be
> > > > > called, even after a time-out.
> > > > >
> > > > > You may set CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT to achieve the above
> > > > > behavior or use the fw_devlink command line parameters to change it.
> > > > > Like setting "fw_devlink.sync_state=timeout".
> > > > >
> > > > > I guess it can be debated what the default behaviour should be.
> > > > > Perhaps we should even allow the default behaviour to be dynamically
> > > > > tweaked on a per provider device/driver basis?
> > > >
> > > > The domains are indeed powered off like before when passing
> > > > "fw_devlink.sync_state=timeout", so that fixes the regression.
> > > > But it was not needed before...
> > >
> > > Right, the default behavior in genpd has changed. The approach was
> > > taken as we believed that original behavior was not correct.
> > >
> > > Although, the current situation isn't good as it causes lot of churns for us.
> >
> > I did some more testing, and there seems to be a similar issue with
> > DMA controllers, which I hadn't really noticed before, as I have
> > DMAC drivers built-in in all my kernel configs.  If the (optional)
> > DMAC driver is not available, you need "fw_devlink=permissive" to make
> > drivers probe devices that have (optional) DMA support.
>
> Well, after the deferred probe timeout, the missing supplier depency
> should be ignored even if you don't have "permissive".
>
> Typically we get a print in the log that could say:
> "pm-test-drv pm_test24: deferred probe timeout, ignoring dependency"
>
> ... and then the driver-core tries probing the device anyway. It's
> then up to the consumer driver to allow probing to succeed, even if
> the DMA setup procedure fails.

Well, that didn't seem to work. Perhaps we still have some drivers that
do not handle -EPROBE_DEFER correctly. Will check...

> > > > > > > > Upon closer look, all "pending due to" messages I see claim that the
> > > > > > > > first (index 0) PM Domain is pending on some devices, while all of
> > > > > > > > these devices are part of a different domain (usually the always-on
> > > > > > > > domain, which is always the last (32 or 64) on R-Car).
> > > > > > > >
> > > > > > > > So I think there are two issues:
> > > > > > > >   1. Devices are not attributed to the correct PM Domain using
> > > > > > > >      fw_devlink sync_state,
> > > > > > > >   2. One PM Domain of a multi-domain controller being blocked should
> > > > > > > >      not block all other domains handled by the same controller.
> > > > > > >
> > > > > > > Right, that's a current limitation with fw_devlink. To cope with this,
> > > > > > > it's possible to enforce the ->sync_state() callback to be invoked
> > > > > > > from user-space (timeout or explicitly) for a device.
> > > >
> > > > That needs explicit handling, which was not needed before.
> > > >
> > > > Perhaps the fw_devlink creation should be removed again from
> > > > of_genpd_add_provider_onecell(), as it is not correct, except for
> > > > the first domain?
> > >
> > > There is nothing wrong with it, the behavior is correct, unless I
> > > failed to understand your problem. To me the problem is that it is
> >
> > It does print wrong and confusing warnings indicating the problem:
> >
> >     genpd_provider ca57-cpu0: sync_state() pending due to fe000000.pcie
> >
> > (no, the PCIe controller is not part of the CPU PM Domain)
>
> That's a good point. We should fix this to avoid confusion.
>
> > > just less fine grained, compared to using
> > > of_genpd_add_provider_simple(). All PM domains that is provided by a
> > > single power-domain controller that uses #power-domain-cells = <1>,
> > > requires all consumers of them to be probed, to allow the sync_state
> > > callback to be invoked.
> >
> > if it would be just coarse-grained, the link should be created to a
> > device representing the controller, not to the device representing
> > the first PM domain.
> >
> > > If we could teach fw_devlink to take into account the
> > > power-domain-*id* that is specified by the consumer node, when the
> > > provider is using #power-domain-cells = <1>, this could potentially
> > > become as fine-grained as of_genpd_add_provider_simple().
> > >
> > > Saravana, do think we can extend fw_devlink to take this into account somehow?
> >
> > Doesn't the pmdomain core create a device for each PM domain, so the
> > index could be used to link to the correct domain?
>
> Correct, we have a device associated for each PM domain - and when
> of_genpd_add_provider_onecell() is called, we get the index associated
> for each of them.
>
> If I understand your proposal, you are suggesting that genpd itself
> creates the device_link between its corresponding device (supplier)
> and the upcoming consumer device(s), at the point when
> of_genpd_add_provider_onecell() is called, right?
>
> It's an interesting idea, but I am not sure it will work. To create
> the device_link at this point, assumes that all the consumer OF-nodes
> have been populated (to have devices available for them). Maybe there
> is an intermediate step we can take to instruct fw_devlink to treat
> these "links" differently?

I am not sure where these links are created. In the example above
"ca57-cpu0" is the device name of the supplier, but that is not the name
of a platform device.  Doesn't the core create devlinks between platform
devices? If yes, who creates this link to a non-platform device,
and can that code be updated easily to take into account the index cell?

> > I also had a closer look at the few pending sync_state() pending
> > warnings I do see on SH/R-Mobile SoCs.  Their PM Domain driver uses
> > of_genpd_add_provider_simple() instead of of_genpd_add_provider_onecell().
> > E.g. on R-Mobile A1:
> >
> >     genpd_provider a3sm: sync_state() pending due to f0100000.cache-controller
> >     genpd_provider a4s: sync_state() pending due to fe400000.memory-controller
> >     genpd_provider d4: sync_state() pending due to ptm
> >
> > All three domains were already handled specially in the PM Domain
> > driver, as they must not be turned off, so these platforms didn't
> > regress due to this series.
>
> Okay!
>
> Again the prints are confusing and annoying, but in principle we are
> fine. Thanks for clarifying!
>
> This also makes me wonder if we should skip enabling the fw_devlink
> support for those genpds with "simple" providers that are special
> (GENPD_FLAG_ALWAYS_ON and GENPD_FLAG_RPM_ALWAYS_ON). It doesn't make
> sense to track consumers for them, I think.

Makes sense.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Danilo Krummrich 2 months, 3 weeks ago
Hi Ulf,

On Wed Jul 9, 2025 at 1:30 PM CEST, Ulf Hansson wrote:
> I decided it was time to give this a try, so I have queued this up for
> v6.17 via the next branch at my pmdomain tree.
>
> If you encounter any issues, please let me know so I can help to fix them.

Can you please address my concern in patch 17 ("driver core: Export
get_dev_from_fwnode()")?

Since this has been applied already, a subsequent patch would be perfectly fine.

Thanks,
Danilo
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 2 months, 3 weeks ago
On Tue, 15 Jul 2025 at 10:50, Danilo Krummrich <dakr@kernel.org> wrote:
>
> Hi Ulf,
>
> On Wed Jul 9, 2025 at 1:30 PM CEST, Ulf Hansson wrote:
> > I decided it was time to give this a try, so I have queued this up for
> > v6.17 via the next branch at my pmdomain tree.
> >
> > If you encounter any issues, please let me know so I can help to fix them.
>
> Can you please address my concern in patch 17 ("driver core: Export
> get_dev_from_fwnode()")?
>
> Since this has been applied already, a subsequent patch would be perfectly fine.

Hi Danilo,

As Greg and Saravana were happy, I didn't want to hold back the whole
series only because of a minor comment on some missing documentation.

But, yes, let me look into it. It may take a while though, as vacation
is getting closer. If you want to send a patch yourself, please, feel
free to do it.

Note that, while at it, we should probably also add some documentation
of device_set_node() (next to get_dev_from_fwnode()) as it also lacks
it.

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Danilo Krummrich 2 months, 3 weeks ago
On Wed Jul 16, 2025 at 2:46 PM CEST, Ulf Hansson wrote:
> On Tue, 15 Jul 2025 at 10:50, Danilo Krummrich <dakr@kernel.org> wrote:
>>
>> Hi Ulf,
>>
>> On Wed Jul 9, 2025 at 1:30 PM CEST, Ulf Hansson wrote:
>> > I decided it was time to give this a try, so I have queued this up for
>> > v6.17 via the next branch at my pmdomain tree.
>> >
>> > If you encounter any issues, please let me know so I can help to fix them.
>>
>> Can you please address my concern in patch 17 ("driver core: Export
>> get_dev_from_fwnode()")?
>>
>> Since this has been applied already, a subsequent patch would be perfectly fine.
>
> Hi Danilo,
>
> As Greg and Saravana were happy, I didn't want to hold back the whole
> series only because of a minor comment on some missing documentation.

Fair enough -- yet, a brief reply asking if it can be done as a follow-up would
have been appreciated.

I don't think it's that minor. The race may be unlikely to happen, but if it
happens because someone isn't careful with this (now exported) API - and I'm
absolutely sure it's only gonna be a matter of time until that happens -, it'll
be painful. :(

> But, yes, let me look into it. It may take a while though, as vacation
> is getting closer. If you want to send a patch yourself, please, feel
> free to do it.

Ok, let me see if I can get to it.

> Note that, while at it, we should probably also add some documentation
> of device_set_node() (next to get_dev_from_fwnode()) as it also lacks
> it.

Agreed!

Have a nice vacation!

- Danilo
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Sebin Francis 1 month ago
Hi Ulf,

On 01/07/25 17:17, Ulf Hansson wrote:
> Changes in v3:
> 	- Added a couple of patches to adress problems on some Renesas
> 	platforms. Thanks Geert and Tomi for helping out!
> 	- Adressed a few comments from Saravanna and Konrad.
> 	- Added some tested-by tags.
> 
> Changes in v2:
> 	- Well, quite a lot as I discovered various problems when doing
> 	additional testing of corner-case. I suggest re-review from scratch,
> 	even if I decided to keep some reviewed-by tags.
> 	- Added patches to allow some drivers that needs to align or opt-out
> 	from the new common behaviour in genpd.
> 
> If a PM domain (genpd) is powered-on during boot, there is probably a good
> reason for it. Therefore it's known to be a bad idea to allow such genpd to be
> powered-off before all of its consumer devices have been probed. This series
> intends to fix this problem.
> 
> We have been discussing these issues at LKML and at various Linux-conferences
> in the past. I have therefore tried to include the people I can recall being
> involved, but I may have forgotten some (my apologies), feel free to loop them
> in.
> > I have tested this with QEMU with a bunch of local test-drivers and 
DT nodes.
> Let me know if you want me to share this code too.
> 
> Please help review and test!

During testing on a TI platform, I observed new kernel warnings after 
applying this patch series:

ti_sci_pm_domains 44043000.system-controller:power-controller: 
sync_state() pending due to fd00000.gpu

These warnings occur when a device (in this case, the GPU) has no driver 
bound to it. The fw_devlink_dev_sync_state[0] in the core has a check 
before printing this warning. It checks whether the device driver has a 
sync_state handler OR the device bus has a sync_state handler in the 
dev_has_sync_state[1]. If both conditions are false, 
fw_devlink_dev_sync_state[0] performs an early return before printing 
the warning.

Before this patch series, both handlers were absent for device driver 
ti_sci_pm_domains and the device bus, so both conditions failed and no 
warnings were printed.

This patch series adds a sync_state handler for the bus, which now 
satisfies the second condition. So it doesn't do an early return and 
proceeds to print the warning.

[0]: https://elixir.bootlin.com/linux/v6.16/source/drivers/base/core.c#L1777
[1]: 
https://elixir.bootlin.com/linux/v6.16/source/include/linux/device.h#L909

Thanks
Sebin

> Finally, a big thanks to Saravana for all the support!
> 
> Kind regards
> Ulf Hansson
> 
> 
> Saravana Kannan (1):
>    driver core: Add dev_set_drv_sync_state()
> 
> Ulf Hansson (23):
>    pmdomain: renesas: rcar-sysc: Add genpd OF provider at
>      postcore_initcall
>    pmdomain: renesas: rmobile-sysc: Move init to postcore_initcall
>    pmdomain: renesas: rcar-gen4-sysc: Move init to postcore_initcall
>    pmdomain: core: Prevent registering devices before the bus
>    pmdomain: core: Add a bus and a driver for genpd providers
>    pmdomain: core: Add the genpd->dev to the genpd provider bus
>    pmdomain: core: Export a common ->sync_state() helper for genpd
>      providers
>    pmdomain: core: Prepare to add the common ->sync_state() support
>    soc/tegra: pmc: Opt-out from genpd's common ->sync_state() support
>    cpuidle: psci: Opt-out from genpd's common ->sync_state() support
>    cpuidle: riscv-sbi: Opt-out from genpd's common ->sync_state() support
>    pmdomain: qcom: rpmpd: Use of_genpd_sync_state()
>    pmdomain: qcom: rpmhpd: Use of_genpd_sync_state()
>    firmware/pmdomain: xilinx: Move ->sync_state() support to firmware
>      driver
>    firmware: xilinx: Don't share zynqmp_pm_init_finalize()
>    firmware: xilinx: Use of_genpd_sync_state()
>    driver core: Export get_dev_from_fwnode()
>    pmdomain: core: Add common ->sync_state() support for genpd providers
>    pmdomain: core: Default to use of_genpd_sync_state() for genpd
>      providers
>    pmdomain: core: Leave powered-on genpds on until late_initcall_sync
>    pmdomain: core: Leave powered-on genpds on until sync_state
>    cpuidle: psci: Drop redundant sync_state support
>    cpuidle: riscv-sbi: Drop redundant sync_state support
> 
>   drivers/base/core.c                         |   8 +-
>   drivers/cpuidle/cpuidle-psci-domain.c       |  14 --
>   drivers/cpuidle/cpuidle-riscv-sbi.c         |  14 --
>   drivers/firmware/xilinx/zynqmp.c            |  18 +-
>   drivers/pmdomain/core.c                     | 211 ++++++++++++++++++--
>   drivers/pmdomain/qcom/rpmhpd.c              |   2 +
>   drivers/pmdomain/qcom/rpmpd.c               |   2 +
>   drivers/pmdomain/renesas/rcar-gen4-sysc.c   |   2 +-
>   drivers/pmdomain/renesas/rcar-sysc.c        |  19 +-
>   drivers/pmdomain/renesas/rmobile-sysc.c     |   3 +-
>   drivers/pmdomain/xilinx/zynqmp-pm-domains.c |  16 --
>   drivers/soc/tegra/pmc.c                     |  26 ++-
>   include/linux/device.h                      |  13 ++
>   include/linux/firmware/xlnx-zynqmp.h        |   6 -
>   include/linux/pm_domain.h                   |  17 ++
>   15 files changed, 291 insertions(+), 80 deletions(-)
>
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Ulf Hansson 1 month ago
On Wed, 3 Sept 2025 at 09:39, Sebin Francis <sebin.francis@ti.com> wrote:
>
> Hi Ulf,
>
> On 01/07/25 17:17, Ulf Hansson wrote:
> > Changes in v3:
> >       - Added a couple of patches to adress problems on some Renesas
> >       platforms. Thanks Geert and Tomi for helping out!
> >       - Adressed a few comments from Saravanna and Konrad.
> >       - Added some tested-by tags.
> >
> > Changes in v2:
> >       - Well, quite a lot as I discovered various problems when doing
> >       additional testing of corner-case. I suggest re-review from scratch,
> >       even if I decided to keep some reviewed-by tags.
> >       - Added patches to allow some drivers that needs to align or opt-out
> >       from the new common behaviour in genpd.
> >
> > If a PM domain (genpd) is powered-on during boot, there is probably a good
> > reason for it. Therefore it's known to be a bad idea to allow such genpd to be
> > powered-off before all of its consumer devices have been probed. This series
> > intends to fix this problem.
> >
> > We have been discussing these issues at LKML and at various Linux-conferences
> > in the past. I have therefore tried to include the people I can recall being
> > involved, but I may have forgotten some (my apologies), feel free to loop them
> > in.
> > > I have tested this with QEMU with a bunch of local test-drivers and
> DT nodes.
> > Let me know if you want me to share this code too.
> >
> > Please help review and test!
>
> During testing on a TI platform, I observed new kernel warnings after
> applying this patch series:
>
> ti_sci_pm_domains 44043000.system-controller:power-controller:
> sync_state() pending due to fd00000.gpu
>
> These warnings occur when a device (in this case, the GPU) has no driver
> bound to it. The fw_devlink_dev_sync_state[0] in the core has a check
> before printing this warning. It checks whether the device driver has a
> sync_state handler OR the device bus has a sync_state handler in the
> dev_has_sync_state[1]. If both conditions are false,
> fw_devlink_dev_sync_state[0] performs an early return before printing
> the warning.
>
> Before this patch series, both handlers were absent for device driver
> ti_sci_pm_domains and the device bus, so both conditions failed and no
> warnings were printed.
>
> This patch series adds a sync_state handler for the bus, which now
> satisfies the second condition. So it doesn't do an early return and
> proceeds to print the warning.

Thanks for the report and testing!

Indeed this is the new and expected behaviour. I agree that it's
certainly questionable if those prints should be at the warning level.

We should probably downgrade those to dev_info(), at least. Let me
send a patch to see what Saravana and others are thinking about it!

>
> [0]: https://elixir.bootlin.com/linux/v6.16/source/drivers/base/core.c#L1777
> [1]:
> https://elixir.bootlin.com/linux/v6.16/source/include/linux/device.h#L909
>
> Thanks
> Sebin

[...]

Kind regards
Uffe
Re: [PATCH v3 00/24] pmdomain: Add generic ->sync_state() support to genpd
Posted by Diederik de Haas 1 month ago
Hi,

On Wed Sep 3, 2025 at 12:33 PM CEST, Ulf Hansson wrote:
> On Wed, 3 Sept 2025 at 09:39, Sebin Francis <sebin.francis@ti.com> wrote:
>> On 01/07/25 17:17, Ulf Hansson wrote:
>> >
>> > If a PM domain (genpd) is powered-on during boot, there is probably a good
>> > reason for it. Therefore it's known to be a bad idea to allow such genpd to be
>> > powered-off before all of its consumer devices have been probed. This series
>> > intends to fix this problem.
>> >
>> > We have been discussing these issues at LKML and at various Linux-conferences
>> > in the past. I have therefore tried to include the people I can recall being
>> > involved, but I may have forgotten some (my apologies), feel free to loop them
>> > in.
>> >
>> > Please help review and test!
>>
>> During testing on a TI platform, I observed new kernel warnings after
>> applying this patch series:
>>
>> ti_sci_pm_domains 44043000.system-controller:power-controller:
>> sync_state() pending due to fd00000.gpu
>>
>> These warnings occur when a device (in this case, the GPU) has no driver
>> bound to it. The fw_devlink_dev_sync_state[0] in the core has a check
>> before printing this warning. It checks whether the device driver has a
>> sync_state handler OR the device bus has a sync_state handler in the
>> dev_has_sync_state[1]. If both conditions are false,
>> fw_devlink_dev_sync_state[0] performs an early return before printing
>> the warning.
>>
>> Before this patch series, both handlers were absent for device driver
>> ti_sci_pm_domains and the device bus, so both conditions failed and no
>> warnings were printed.
>>
>> This patch series adds a sync_state handler for the bus, which now
>> satisfies the second condition. So it doesn't do an early return and
>> proceeds to print the warning.
>
> Thanks for the report and testing!
>
> Indeed this is the new and expected behaviour. I agree that it's
> certainly questionable if those prints should be at the warning level.
>
> We should probably downgrade those to dev_info(), at least. Let me
> send a patch to see what Saravana and others are thinking about it!

I want to report that I see similar warnings on Rock64 (rk3328):

[   16.868033] rockchip-pm-domain ff100000.syscon:power-controller: sync_state() pending due to ff300000.gpu
[   16.873637] rockchip-pm-domain ff100000.syscon:power-controller: sync_state() pending due to ff350000.video-codec
[   16.896495] rockchip-pm-domain ff100000.syscon:power-controller: sync_state() pending due to ff360000.video-codec

This is with a 6.17-rc3 kernel with various rkvdec patches and in dmesg
I later see msgs wrt ff300000.gpu (lima) and ff350000.video-codec
(hantro-vpu), but not ff360000.video-codec (rkvdec). Full dmesg:
https://paste.sr.ht/~diederik/951b54ea8422756e5efaa61d6bcefb575cfe28a4

But there were also USB issues (not sure why), so I rebooted and then I
did see msgs wrt rkvdec. Full dmesg:
https://paste.sr.ht/~diederik/156f65fc6be05d02484568dfd303c46ba76b3a8e

I also have a 6.17-rc4 kernel which is clean upstream, thus without any
media patches. This time no USB issues (also no USB device plugged in)
and I see msgs wrt lima and hantro-vpu, but not rkvdec. Full dmesg:
https://paste.sr.ht/~diederik/4affea034b0c9fb522a8ad5b90e8b59b4bd856ec

What's possibly relevant is that the 6.17-rc3+unreleased kernel also has
this patch added, which adds 'power-domain@RK3328_PD_GPU' to rk3328.dtsi
https://lore.kernel.org/linux-rockchip/20250830115135.3549305-1-christianshewitt@gmail.com/

I actually found this thread because I too couldn't find the commit ID
Nicolas referenced in this post:
https://lore.kernel.org/linux-rockchip/20250902-rk3576-lockup-regression-v1-1-c4a0c9daeb00@collabora.com/

I have no idea whether it's related though (I have no rk3576 device).

I haven't tried (yet) whether the sync_state() msg is also present on
other Rockchip based devices.

Cheers,
  Diederik