[v4] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

[PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam via B4 Relay 1 month ago

Hi,

This series provides a major rework for the PCI power control (pwrctrl)
framework to enable the pwrctrl devices to be controlled by the PCI controller
drivers.

Problem Statement
=================

Currently, the pwrctrl framework faces two major issues:

1. Missing PERST# integration
2. Inability to properly handle bus extenders such as PCIe switch devices

First issue arises from the disconnect between the PCI controller drivers and
pwrctrl framework. At present, the pwrctrl framework just operates on its own
with the help of the PCI core. The pwrctrl devices are created by the PCI core
during initial bus scan and the pwrctrl drivers once bind, just power on the
PCI devices during their probe(). This design conflicts with the PCI Express
Card Electromechanical Specification requirements for PERST# timing. The reason
is, PERST# signals are mostly handled by the controller drivers and often
deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
should be deasserted only after power and reference clock to the device are
stable, within predefined timing parameters.

The second issue stems from the PCI bus scan completing before pwrctrl drivers
probe. This poses a significant problem for PCI bus extenders like switches
because the PCI core allocates upstream bridge resources during the initial
scan. If the upstream bridge is not hotplug capable, resources are allocated
only for the number of downstream buses detected at scan time, which might be
just one if the switch was not powered and enumerated at that time. Later, when
the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
insufficient upstream bridge resources.

Proposal
========

This series addresses both issues by introducing new individual APIs for pwrctrl
device creation, destruction, power on, and power off operations. Controller
drivers are expected to invoke these APIs during their probe(), remove(),
suspend(), and resume() operations. This integration allows better coordination
between controller drivers and the pwrctrl framework, enabling enhanced features
such as D3Cold support.

The original design aimed to avoid modifying controller drivers for pwrctrl
integration. However, this approach lacked scalability because different
controllers have varying requirements for when devices should be powered on. For
example, controller drivers require devices to be powered on early for
successful PHY initialization.

By using these explicit APIs, controller drivers gain fine grained control over
their associated pwrctrl devices.

This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
could be used as a reference to add pwrctrl support for other controller drivers
also.

For example, to control the 3.3v supply to the PCI slot where the NVMe device is
connected, below modifications are required:

Devicetree
----------

	// In SoC dtsi:

	pci@1bf8000 { // controller node
		...
		pcie1_port0: pcie@0 { // PCI Root Port node
			compatible = "pciclass,0604"; // required for pwrctrl
							 driver bind
			...
		};
	};

	// In board dts:

	&pcie1_port0 {
		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
	};

Controller driver
-----------------

	// Select PCI_PWRCTRL_SLOT in controller Kconfig

	probe() {
		...
		// Initialize controller resources
		pci_pwrctrl_create_devices(&pdev->dev);
		pci_pwrctrl_power_on_devices(&pdev->dev);
		// Deassert PERST# (optional)
		...
		pci_host_probe(); // Allocate host bridge and start bus scan
	}

	suspend {
		// PME_Turn_Off broadcast
		// Assert PERST# (optional)
		pci_pwrctrl_power_off_devices(&pdev->dev);
		...
	}

	resume {
		...
		pci_pwrctrl_power_on_devices(&pdev->dev);
		// Deassert PERST# (optional)
	}

I will add a documentation for the pwrctrl framework in the coming days to make
it easier to use.

Testing
=======

This series is tested on the Lenovo Thinkpad T14s laptop based on Qcom X1E
chipset and RB3Gen2 development board with TC9563 switch based on Qcom QCS6490
chipset.

**NOTE**: With this series, the controller driver may undergo multiple probe
deferral if the pwrctrl driver was not available during the controller driver
probe. This is pretty much required to avoid the resource allocation issue. I
plan to replace probe deferral with blocking wait in the coming days.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Changes in v4:
- Used platform_device_put()
- Changed the return value of power_off() callback to 'int'
- Splitted patch 6 into two and reworded the commit message
- Collected tags
- Link to v3: https://lore.kernel.org/r/20251229-pci-pwrctrl-rework-v3-0-c7d5918cd0db@oss.qualcomm.com

Changes in v3:
- Integrated TC9563 change
- Reworked the power_on API to properly power off the devices in error path
- Fixed the error path in pcie-qcom.c to not destroy pwrctrl devices during
  probe deferral
- Rebased on top of pci/controller/dwc-qcom branch and dropped the PERST# patch
- Added a patch for TC9563 to fix the refcount dropping for i2c adapter
- Added patches to drop the assert_perst callback and rename the PERST# helpers in
  pcie-qcom.c
- Link to v2: https://lore.kernel.org/r/20251216-pci-pwrctrl-rework-v2-0-745a563b9be6@oss.qualcomm.com

Changes in v2:
- Exported of_pci_supply_present() API
- Demoted the -EPROBE_DEFER log to dev_dbg()
- Collected tags and rebased on top of v6.19-rc1
- Link to v1: https://lore.kernel.org/r/20251124-pci-pwrctrl-rework-v1-0-78a72627683d@oss.qualcomm.com

---
Krishna Chaitanya Chundru (1):
      PCI/pwrctrl: Add APIs for explicitly creating and destroying pwrctrl devices

Manivannan Sadhasivam (7):
      PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
      PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
      PCI/pwrctrl: Add APIs to power on/off the pwrctrl devices
      PCI/pwrctrl: Switch to the new pwrctrl APIs
      PCI: qcom: Drop the assert_perst() callbacks
      PCI: Drop the assert_perst() callback
      PCI: qcom: Rename PERST# assert/deassert helpers for uniformity

 drivers/pci/bus.c                                 |  19 --
 drivers/pci/controller/dwc/pcie-designware-host.c |   9 -
 drivers/pci/controller/dwc/pcie-designware.h      |   9 -
 drivers/pci/controller/dwc/pcie-qcom.c            |  54 +++--
 drivers/pci/of.c                                  |   1 +
 drivers/pci/probe.c                               |  59 -----
 drivers/pci/pwrctrl/core.c                        | 260 ++++++++++++++++++++--
 drivers/pci/pwrctrl/pci-pwrctrl-pwrseq.c          |  30 ++-
 drivers/pci/pwrctrl/pci-pwrctrl-tc9563.c          |  48 ++--
 drivers/pci/pwrctrl/slot.c                        |  48 ++--
 drivers/pci/remove.c                              |  20 --
 include/linux/pci-pwrctrl.h                       |  16 +-
 include/linux/pci.h                               |   1 -
 13 files changed, 367 insertions(+), 207 deletions(-)
---
base-commit: 3e7f562e20ee87a25e104ef4fce557d39d62fa85
change-id: 20251124-pci-pwrctrl-rework-c91a6e16c2f6
prerequisite-message-id: 20251126081718.8239-1-mani@kernel.org
prerequisite-patch-id: db9ff6c713e2303c397e645935280fd0d277793a
prerequisite-patch-id: b5351b0a41f618435f973ea2c3275e51d46f01c5

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Sean Anderson 3 weeks, 4 days ago

On 1/5/26 08:55, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,

I asked substantially similar questions on v2, but since I never got a
response I want to reiterate them on v4 to make sure they don't get
lost.

> This series provides a major rework for the PCI power control (pwrctrl)
> framework to enable the pwrctrl devices to be controlled by the PCI controller
> drivers.
> 
> Problem Statement
> =================
> 
> Currently, the pwrctrl framework faces two major issues:
> 
> 1. Missing PERST# integration
> 2. Inability to properly handle bus extenders such as PCIe switch devices
> 
> First issue arises from the disconnect between the PCI controller drivers and
> pwrctrl framework. At present, the pwrctrl framework just operates on its own
> with the help of the PCI core. The pwrctrl devices are created by the PCI core
> during initial bus scan and the pwrctrl drivers once bind, just power on the
> PCI devices during their probe(). This design conflicts with the PCI Express
> Card Electromechanical Specification requirements for PERST# timing. The reason
> is, PERST# signals are mostly handled by the controller drivers and often
> deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
> should be deasserted only after power and reference clock to the device are
> stable, within predefined timing parameters.
> 
> The second issue stems from the PCI bus scan completing before pwrctrl drivers
> probe. This poses a significant problem for PCI bus extenders like switches
> because the PCI core allocates upstream bridge resources during the initial
> scan. If the upstream bridge is not hotplug capable, resources are allocated
> only for the number of downstream buses detected at scan time, which might be
> just one if the switch was not powered and enumerated at that time. Later, when
> the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
> insufficient upstream bridge resources.

OK, so to clarify the problem is an architecture like

    RP
    |-- Bridge 1 (automatic)
    |   |-- Device 1
    |   `-- Bridge 2 (needs pwrseq)
    |       `-- Device 2
    `-- Bridge 3 (automatic)
        `-- Device 3

where Bridge 2 has a devicetree node with a pwrseq binding? So we do the
initial scan and allocate resources for bridge/devices 1 and 3 with the
resources for bridge 3 immediately above those for bridge 1. Then when
bridge 2 shows up we can't resize bridge 1's windows since bridge 3's
windows are in the way?

But is it even valid to have a pwrseq node on bridge 2 without one on
bridge 1? If bridge 1 is automatically controlled, then I would expect
bridge 2 to be as well. E.g. I would expect bridge 2's reset sequence to
be controlled by the secondary bus reset bit in bridge 1's bridge
control register.

And a very similar architecture like

    RP
    |-- Bridge 4 (pwrseq)
    |   |-- Device 4
    `-- Bridge 5 (automatic)
        `-- Device 5

has no problems since the resources for bridge 4 can be allocated above
those for bridge 5 whenever it shows up.

These problems seem very similar to what hotplug bridges have to handle
(except that we (usually) only need to do one hotplug per boot). So
maybe we should set is_hotplug_bridge on bridges with a pwrseq node.
That way they'll get resources distributed for when the downstream port
shows up. As an optimization, we could then release those resources once
the downstream port is scanned.

> Proposal
> ========
> 
> This series addresses both issues by introducing new individual APIs for pwrctrl
> device creation, destruction, power on, and power off operations. Controller
> drivers are expected to invoke these APIs during their probe(), remove(),
> suspend(), and resume() operations.

(just for the record)

I think the existing design is quite elegant, since the operations
associated with the bridge correspond directly to device lifecycle
operations. It also avoids problems related to the root port trying to
look up its own child (possibly missing a driver) during probe.

> This integration allows better coordination
> between controller drivers and the pwrctrl framework, enabling enhanced features
> such as D3Cold support.


I think this should be handled by the power sequencing driver,
especially as there are timing requirements for the other resources
referenced to PERST? If we are going to touch each driver, it would
be much better to consolidate things by removing the ad-hoc PERST
support.

Different drivers control PERST in various ways, but I think this can
be abstracted behind a GPIO controller (if necessary for e.g. MMIO-based
control). If there's no reset-gpios property in the pwrseq node then we
could automatically look up the GPIO on the root port.

> The original design aimed to avoid modifying controller drivers for pwrctrl
> integration. However, this approach lacked scalability because different
> controllers have varying requirements for when devices should be powered on. For
> example, controller drivers require devices to be powered on early for
> successful PHY initialization.

Can you elaborate on this? Previously you said

| Some platforms do LTSSM during phy_init(), so they will fail if the
| device is not powered ON at that time.

What do you mean by "do LTSSM during phy_init()"? Do you have a specific
driver in mind?

I would expect that the LTSSM would remain in the Detect state until the
pwrseq driver is being probed.

> By using these explicit APIs, controller drivers gain fine grained control over
> their associated pwrctrl devices.
> 
> This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
> to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
> could be used as a reference to add pwrctrl support for other controller drivers
> also.
> 
> For example, to control the 3.3v supply to the PCI slot where the NVMe device is
> connected, below modifications are required:
> 
> Devicetree
> ----------
> 
> 	// In SoC dtsi:
> 
> 	pci@1bf8000 { // controller node
> 		...
> 		pcie1_port0: pcie@0 { // PCI Root Port node
> 			compatible = "pciclass,0604"; // required for pwrctrl
> 							 driver bind
> 			...
> 		};
> 	};
> 
> 	// In board dts:
> 
> 	&pcie1_port0 {
> 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
> 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
> 	};
> 
> Controller driver
> -----------------
> 
> 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
> 
> 	probe() {
> 		...
> 		// Initialize controller resources
> 		pci_pwrctrl_create_devices(&pdev->dev);
> 		pci_pwrctrl_power_on_devices(&pdev->dev);
> 		// Deassert PERST# (optional)
> 		...
> 		pci_host_probe(); // Allocate host bridge and start bus scan
> 	}
> 
> 	suspend {
> 		// PME_Turn_Off broadcast
> 		// Assert PERST# (optional)
> 		pci_pwrctrl_power_off_devices(&pdev->dev);
> 		...
> 	}
> 
> 	resume {
> 		...
> 		pci_pwrctrl_power_on_devices(&pdev->dev);
> 		// Deassert PERST# (optional)
> 	}
> 
> I will add a documentation for the pwrctrl framework in the coming days to make
> it easier to use.
> 
> Testing
> =======
> 
> This series is tested on the Lenovo Thinkpad T14s laptop based on Qcom X1E
> chipset and RB3Gen2 development board with TC9563 switch based on Qcom QCS6490
> chipset.
> 
> **NOTE**: With this series, the controller driver may undergo multiple probe
> deferral if the pwrctrl driver was not available during the controller driver
> probe. This is pretty much required to avoid the resource allocation issue. I
> plan to replace probe deferral with blocking wait in the coming days.

You can only do a blocking wait after deferring at least once, since the
root port may be probed synchronously during boot. I really think this
is rather messy and something we should avoid architecturally while we
have the chance.

--Sean

> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Changes in v4:
> - Used platform_device_put()
> - Changed the return value of power_off() callback to 'int'
> - Splitted patch 6 into two and reworded the commit message
> - Collected tags
> - Link to v3: https://lore.kernel.org/r/20251229-pci-pwrctrl-rework-v3-0-c7d5918cd0db@oss.qualcomm.com
> 
> Changes in v3:
> - Integrated TC9563 change
> - Reworked the power_on API to properly power off the devices in error path
> - Fixed the error path in pcie-qcom.c to not destroy pwrctrl devices during
>   probe deferral
> - Rebased on top of pci/controller/dwc-qcom branch and dropped the PERST# patch
> - Added a patch for TC9563 to fix the refcount dropping for i2c adapter
> - Added patches to drop the assert_perst callback and rename the PERST# helpers in
>   pcie-qcom.c
> - Link to v2: https://lore.kernel.org/r/20251216-pci-pwrctrl-rework-v2-0-745a563b9be6@oss.qualcomm.com
> 
> Changes in v2:
> - Exported of_pci_supply_present() API
> - Demoted the -EPROBE_DEFER log to dev_dbg()
> - Collected tags and rebased on top of v6.19-rc1
> - Link to v1: https://lore.kernel.org/r/20251124-pci-pwrctrl-rework-v1-0-78a72627683d@oss.qualcomm.com
> 
> ---
> Krishna Chaitanya Chundru (1):
>       PCI/pwrctrl: Add APIs for explicitly creating and destroying pwrctrl devices
> 
> Manivannan Sadhasivam (7):
>       PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
>       PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
>       PCI/pwrctrl: Add APIs to power on/off the pwrctrl devices
>       PCI/pwrctrl: Switch to the new pwrctrl APIs
>       PCI: qcom: Drop the assert_perst() callbacks
>       PCI: Drop the assert_perst() callback
>       PCI: qcom: Rename PERST# assert/deassert helpers for uniformity
> 
>  drivers/pci/bus.c                                 |  19 --
>  drivers/pci/controller/dwc/pcie-designware-host.c |   9 -
>  drivers/pci/controller/dwc/pcie-designware.h      |   9 -
>  drivers/pci/controller/dwc/pcie-qcom.c            |  54 +++--
>  drivers/pci/of.c                                  |   1 +
>  drivers/pci/probe.c                               |  59 -----
>  drivers/pci/pwrctrl/core.c                        | 260 ++++++++++++++++++++--
>  drivers/pci/pwrctrl/pci-pwrctrl-pwrseq.c          |  30 ++-
>  drivers/pci/pwrctrl/pci-pwrctrl-tc9563.c          |  48 ++--
>  drivers/pci/pwrctrl/slot.c                        |  48 ++--
>  drivers/pci/remove.c                              |  20 --
>  include/linux/pci-pwrctrl.h                       |  16 +-
>  include/linux/pci.h                               |   1 -
>  13 files changed, 367 insertions(+), 207 deletions(-)
> ---
> base-commit: 3e7f562e20ee87a25e104ef4fce557d39d62fa85
> change-id: 20251124-pci-pwrctrl-rework-c91a6e16c2f6
> prerequisite-message-id: 20251126081718.8239-1-mani@kernel.org
> prerequisite-patch-id: db9ff6c713e2303c397e645935280fd0d277793a
> prerequisite-patch-id: b5351b0a41f618435f973ea2c3275e51d46f01c5
> 
> Best regards,

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam 3 weeks, 3 days ago

On Tue, Jan 13, 2026 at 12:15:01PM -0500, Sean Anderson wrote:
> On 1/5/26 08:55, Manivannan Sadhasivam via B4 Relay wrote:
> > Hi,
> 
> I asked substantially similar questions on v2, but since I never got a
> response I want to reiterate them on v4 to make sure they don't get
> lost.
> 

I did respond to your queries in v2, but lost your last reply in that thread:
https://lore.kernel.org/linux-pci/8269249f-48a9-4136-a326-23f5076be487@linux.dev/

Apologies!

> > This series provides a major rework for the PCI power control (pwrctrl)
> > framework to enable the pwrctrl devices to be controlled by the PCI controller
> > drivers.
> > 
> > Problem Statement
> > =================
> > 
> > Currently, the pwrctrl framework faces two major issues:
> > 
> > 1. Missing PERST# integration
> > 2. Inability to properly handle bus extenders such as PCIe switch devices
> > 
> > First issue arises from the disconnect between the PCI controller drivers and
> > pwrctrl framework. At present, the pwrctrl framework just operates on its own
> > with the help of the PCI core. The pwrctrl devices are created by the PCI core
> > during initial bus scan and the pwrctrl drivers once bind, just power on the
> > PCI devices during their probe(). This design conflicts with the PCI Express
> > Card Electromechanical Specification requirements for PERST# timing. The reason
> > is, PERST# signals are mostly handled by the controller drivers and often
> > deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
> > should be deasserted only after power and reference clock to the device are
> > stable, within predefined timing parameters.
> > 
> > The second issue stems from the PCI bus scan completing before pwrctrl drivers
> > probe. This poses a significant problem for PCI bus extenders like switches
> > because the PCI core allocates upstream bridge resources during the initial
> > scan. If the upstream bridge is not hotplug capable, resources are allocated
> > only for the number of downstream buses detected at scan time, which might be
> > just one if the switch was not powered and enumerated at that time. Later, when
> > the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
> > insufficient upstream bridge resources.
> 
> OK, so to clarify the problem is an architecture like
> 
>     RP
>     |-- Bridge 1 (automatic)
>     |   |-- Device 1
>     |   `-- Bridge 2 (needs pwrseq)
>     |       `-- Device 2
>     `-- Bridge 3 (automatic)
>         `-- Device 3
> 

This topology is not possible with PCIe. A single Root Port can only connect to
a single bridge. But applies to PCI.

> where Bridge 2 has a devicetree node with a pwrseq binding? So we do the
> initial scan and allocate resources for bridge/devices 1 and 3 with the
> resources for bridge 3 immediately above those for bridge 1. Then when
> bridge 2 shows up we can't resize bridge 1's windows since bridge 3's
> windows are in the way?
> 

It is not a problem with resizing, it is the problem with how much you can
resize. And also if that bridge 2 is a switch and if it exposes multiple
downstream busses, then the upstream bridge 1 will run out of resources.
If bridge 2 is a hotplug bridge, then no issues. But I was only referring to
non-hotplug capable switches.

> But is it even valid to have a pwrseq node on bridge 2 without one on
> bridge 1? If bridge 1 is automatically controlled, then I would expect
> bridge 2 to be as well. E.g. I would expect bridge 2's reset sequence to
> be controlled by the secondary bus reset bit in bridge 1's bridge
> control register.
> 

Technically it is possible for Bridge 2 to have a pwrctrl requirement. What is
limiting from spec PoV?

> And a very similar architecture like
> 
>     RP
>     |-- Bridge 4 (pwrseq)
>     |   |-- Device 4
>     `-- Bridge 5 (automatic)
>         `-- Device 5
> 
> has no problems since the resources for bridge 4 can be allocated above
> those for bridge 5 whenever it shows up.
> 

Again, if bridge 4 is not hotplug capable and if it is a switch, the problem is
still applicable.

> These problems seem very similar to what hotplug bridges have to handle
> (except that we (usually) only need to do one hotplug per boot). So
> maybe we should set is_hotplug_bridge on bridges with a pwrseq node.
> That way they'll get resources distributed for when the downstream port
> shows up. As an optimization, we could then release those resources once
> the downstream port is scanned.
> 

That would be incorrect. You cannot set 'is_hotplug_bridge' to 'true' for a
non-hotplug capable bridge. You can call it as a hack, but there is no place
for that in upstream.

> > Proposal
> > ========
> > 
> > This series addresses both issues by introducing new individual APIs for pwrctrl
> > device creation, destruction, power on, and power off operations. Controller
> > drivers are expected to invoke these APIs during their probe(), remove(),
> > suspend(), and resume() operations.
> 
> (just for the record)
> 
> I think the existing design is quite elegant, since the operations
> associated with the bridge correspond directly to device lifecycle
> operations. It also avoids problems related to the root port trying to
> look up its own child (possibly missing a driver) during probe.
> 

I agree with you that it is elegant and I even was very reluctant to move out of
it [1]. But lately, I understood that we cannot scale the pwrctrl framework if we
do not give flexibility to the controller drivers [2].

[1] https://lore.kernel.org/linux-pci/eix65qdwtk5ocd7lj6sw2lslidivauzyn6h5cc4mc2nnci52im@qfmbmwy2zjbe/
[2] https://lore.kernel.org/linux-pci/aG3IWdZIhnk01t2A@google.com/

> > This integration allows better coordination
> > between controller drivers and the pwrctrl framework, enabling enhanced features
> > such as D3Cold support.
> 
> 
> I think this should be handled by the power sequencing driver,
> especially as there are timing requirements for the other resources
> referenced to PERST? If we are going to touch each driver, it would
> be much better to consolidate things by removing the ad-hoc PERST
> support.
> 
> Different drivers control PERST in various ways, but I think this can
> be abstracted behind a GPIO controller (if necessary for e.g. MMIO-based
> control). If there's no reset-gpios property in the pwrseq node then we
> could automatically look up the GPIO on the root port.
> 

Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
that space belongs to the controller driver.

FYI, I did try something similar before:
https://lore.kernel.org/linux-pci/20250707-pci-pwrctrl-perst-v1-0-c3c7e513e312@kernel.org/

> > The original design aimed to avoid modifying controller drivers for pwrctrl
> > integration. However, this approach lacked scalability because different
> > controllers have varying requirements for when devices should be powered on. For
> > example, controller drivers require devices to be powered on early for
> > successful PHY initialization.
> 
> Can you elaborate on this? Previously you said
> 
> | Some platforms do LTSSM during phy_init(), so they will fail if the
> | device is not powered ON at that time.
> 
> What do you mean by "do LTSSM during phy_init()"? Do you have a specific
> driver in mind?
> 

I believe the Mediatek PCIe controller driver used in Chromebooks exhibit this
behavior. Chen talked about it in his LPC session:
https://lpc.events/event/19/contributions/2023/

> I would expect that the LTSSM would remain in the Detect state until the
> pwrseq driver is being probed.
> 

True, but if the API (phy_init()) expects the LTSSM to move to L0, then it will
fail, right? It might be what's happening with above mentioned platform.

> > By using these explicit APIs, controller drivers gain fine grained control over
> > their associated pwrctrl devices.
> > 
> > This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
> > to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
> > could be used as a reference to add pwrctrl support for other controller drivers
> > also.
> > 
> > For example, to control the 3.3v supply to the PCI slot where the NVMe device is
> > connected, below modifications are required:
> > 
> > Devicetree
> > ----------
> > 
> > 	// In SoC dtsi:
> > 
> > 	pci@1bf8000 { // controller node
> > 		...
> > 		pcie1_port0: pcie@0 { // PCI Root Port node
> > 			compatible = "pciclass,0604"; // required for pwrctrl
> > 							 driver bind
> > 			...
> > 		};
> > 	};
> > 
> > 	// In board dts:
> > 
> > 	&pcie1_port0 {
> > 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
> > 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
> > 	};
> > 
> > Controller driver
> > -----------------
> > 
> > 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
> > 
> > 	probe() {
> > 		...
> > 		// Initialize controller resources
> > 		pci_pwrctrl_create_devices(&pdev->dev);
> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
> > 		// Deassert PERST# (optional)
> > 		...
> > 		pci_host_probe(); // Allocate host bridge and start bus scan
> > 	}
> > 
> > 	suspend {
> > 		// PME_Turn_Off broadcast
> > 		// Assert PERST# (optional)
> > 		pci_pwrctrl_power_off_devices(&pdev->dev);
> > 		...
> > 	}
> > 
> > 	resume {
> > 		...
> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
> > 		// Deassert PERST# (optional)
> > 	}
> > 
> > I will add a documentation for the pwrctrl framework in the coming days to make
> > it easier to use.
> > 
> > Testing
> > =======
> > 
> > This series is tested on the Lenovo Thinkpad T14s laptop based on Qcom X1E
> > chipset and RB3Gen2 development board with TC9563 switch based on Qcom QCS6490
> > chipset.
> > 
> > **NOTE**: With this series, the controller driver may undergo multiple probe
> > deferral if the pwrctrl driver was not available during the controller driver
> > probe. This is pretty much required to avoid the resource allocation issue. I
> > plan to replace probe deferral with blocking wait in the coming days.
> 
> You can only do a blocking wait after deferring at least once, since the
> root port may be probed synchronously during boot. I really think this
> is rather messy and something we should avoid architecturally while we
> have the chance.
> 

By blocking wait I meant that the controller probe itself will do a blocking
wait until the pwrctrl drivers gets bound. Since this happens way before the PCI
bus scan, there won't be any Root Port probed synchronously.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Sean Anderson 3 weeks, 2 days ago

On 1/14/26 03:48, Manivannan Sadhasivam wrote:
> On Tue, Jan 13, 2026 at 12:15:01PM -0500, Sean Anderson wrote:
>> On 1/5/26 08:55, Manivannan Sadhasivam via B4 Relay wrote:
>> > Hi,
>> 
>> I asked substantially similar questions on v2, but since I never got a
>> response I want to reiterate them on v4 to make sure they don't get
>> lost.
>> 
> 
> I did respond to your queries in v2, but lost your last reply in that thread:
> https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2f8269249f%2d48a9%2d4136%2da326%2d23f5076be487%40linux.dev%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-8453843623be88c725b7c9a8baf78220575003f2
> 
> Apologies!
> 
>> > This series provides a major rework for the PCI power control (pwrctrl)
>> > framework to enable the pwrctrl devices to be controlled by the PCI controller
>> > drivers.
>> > 
>> > Problem Statement
>> > =================
>> > 
>> > Currently, the pwrctrl framework faces two major issues:
>> > 
>> > 1. Missing PERST# integration
>> > 2. Inability to properly handle bus extenders such as PCIe switch devices
>> > 
>> > First issue arises from the disconnect between the PCI controller drivers and
>> > pwrctrl framework. At present, the pwrctrl framework just operates on its own
>> > with the help of the PCI core. The pwrctrl devices are created by the PCI core
>> > during initial bus scan and the pwrctrl drivers once bind, just power on the
>> > PCI devices during their probe(). This design conflicts with the PCI Express
>> > Card Electromechanical Specification requirements for PERST# timing. The reason
>> > is, PERST# signals are mostly handled by the controller drivers and often
>> > deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
>> > should be deasserted only after power and reference clock to the device are
>> > stable, within predefined timing parameters.
>> > 
>> > The second issue stems from the PCI bus scan completing before pwrctrl drivers
>> > probe. This poses a significant problem for PCI bus extenders like switches
>> > because the PCI core allocates upstream bridge resources during the initial
>> > scan. If the upstream bridge is not hotplug capable, resources are allocated
>> > only for the number of downstream buses detected at scan time, which might be
>> > just one if the switch was not powered and enumerated at that time. Later, when
>> > the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
>> > insufficient upstream bridge resources.
>> 
>> OK, so to clarify the problem is an architecture like
>> 
>>     RP
>>     |-- Bridge 1 (automatic)
>>     |   |-- Device 1
>>     |   `-- Bridge 2 (needs pwrseq)
>>     |       `-- Device 2
>>     `-- Bridge 3 (automatic)
>>         `-- Device 3
>> 
> 
> This topology is not possible with PCIe. A single Root Port can only connect to
> a single bridge. But applies to PCI.

OK, well imagine it like

     RP
     `-- Host Bridge (automatic)
         |-- Bridge 1 (automatic)
         |   |-- Device 1
         |   `-- Bridge 2 (needs pwrseq)
         |       `-- Device 2
         `-- Bridge 3 (automatic)
             `-- Device 3

You raised the problem, so what I am asking is: is this such a
problematic topology? And if not, please describe one.

>> where Bridge 2 has a devicetree node with a pwrseq binding? So we do the
>> initial scan and allocate resources for bridge/devices 1 and 3 with the
>> resources for bridge 3 immediately above those for bridge 1. Then when
>> bridge 2 shows up we can't resize bridge 1's windows since bridge 3's
>> windows are in the way?
>> 
> 
> It is not a problem with resizing, it is the problem with how much you can
> resize. And also if that bridge 2 is a switch and if it exposes multiple
> downstream busses, then the upstream bridge 1 will run out of resources.

OK, but what I am saying is that I don't believe Bridge 2 can need
pwrseq if Bridge 1 doesn't. So I don't think the topology as-illustrated
can exist.

It's possible that there could be a problem with multiple levels of
bridges all needing pwrseq, but does such a system exist?

> If bridge 2 is a hotplug bridge, then no issues. But I was only referring to
> non-hotplug capable switches.
> 
>> But is it even valid to have a pwrseq node on bridge 2 without one on
>> bridge 1? If bridge 1 is automatically controlled, then I would expect
>> bridge 2 to be as well. E.g. I would expect bridge 2's reset sequence to
>> be controlled by the secondary bus reset bit in bridge 1's bridge
>> control register.
>> 
> 
> Technically it is possible for Bridge 2 to have a pwrctrl requirement. What is
> limiting from spec PoV?

If this is the case then we need to be able to handle the resource
constraint problem. But if it doesn't exist then there is no problem
with the existing architecture. Only this sort of design has resource
problems, while most designs like

     RP
     `-- Bridge 1 (pwrseq)
         |-- Bridge 2 (automatic)
         |   |-- Device 1
         |   |-- Device 2
         `-- Bridge 3 (automatic)
             `-- Device 3

have no resource problems even with the current subsystem.

>> And a very similar architecture like
>> 
>>     RP
>>     |-- Bridge 4 (pwrseq)
>>     |   |-- Device 4
>>     `-- Bridge 5 (automatic)
>>         `-- Device 5
>> 
>> has no problems since the resources for bridge 4 can be allocated above
>> those for bridge 5 whenever it shows up.
>> 
> 
> Again, if bridge 4 is not hotplug capable and if it is a switch, the problem is
> still applicable.

This doesn't apply even if bridge 4 is not hotplug capable. It will show
up after bridge 5 gets probed and just grab the next available
resources.

>> These problems seem very similar to what hotplug bridges have to handle
>> (except that we (usually) only need to do one hotplug per boot). So
>> maybe we should set is_hotplug_bridge on bridges with a pwrseq node.
>> That way they'll get resources distributed for when the downstream port
>> shows up. As an optimization, we could then release those resources once
>> the downstream port is scanned.
>> 
> 
> That would be incorrect. You cannot set 'is_hotplug_bridge' to 'true' for a
> non-hotplug capable bridge. You can call it as a hack, but there is no place
> for that in upstream.

Introduce a new boolean called 'is_pwrseq_bridge' and check for it when
allocating resources.

>> > Proposal
>> > ========
>> > 
>> > This series addresses both issues by introducing new individual APIs for pwrctrl
>> > device creation, destruction, power on, and power off operations. Controller
>> > drivers are expected to invoke these APIs during their probe(), remove(),
>> > suspend(), and resume() operations.
>> 
>> (just for the record)
>> 
>> I think the existing design is quite elegant, since the operations
>> associated with the bridge correspond directly to device lifecycle
>> operations. It also avoids problems related to the root port trying to
>> look up its own child (possibly missing a driver) during probe.
>> 
> 
> I agree with you that it is elegant and I even was very reluctant to move out of
> it [1]. But lately, I understood that we cannot scale the pwrctrl framework if we
> do not give flexibility to the controller drivers [2].
> 
> [1] https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2feix65qdwtk5ocd7lj6sw2lslidivauzyn6h5cc4mc2nnci52im%40qfmbmwy2zjbe%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-377ad79c69a5ff9c69de76d9fcf5f030d066027a
> [2] https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2faG3IWdZIhnk01t2A%40google.com%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-9a33d827cf703f2827fca86fd99acf563ca26bd9
> 
>> > This integration allows better coordination
>> > between controller drivers and the pwrctrl framework, enabling enhanced features
>> > such as D3Cold support.
>> 
>> 
>> I think this should be handled by the power sequencing driver,
>> especially as there are timing requirements for the other resources
>> referenced to PERST? If we are going to touch each driver, it would
>> be much better to consolidate things by removing the ad-hoc PERST
>> support.
>> 
>> Different drivers control PERST in various ways, but I think this can
>> be abstracted behind a GPIO controller (if necessary for e.g. MMIO-based
>> control). If there's no reset-gpios property in the pwrseq node then we
>> could automatically look up the GPIO on the root port.
>> 
> 
> Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
> implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
> that space belongs to the controller driver.

That's what I mean. Implement a GPIO driver with one GPIO and perform
the MMIO operations as requested.

Or we can invert things and add a reset op to pci_ops. If present then
call it, and if absent use the PERST GPIO on the bridge.

> FYI, I did try something similar before:
> https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2f20250707%2dpci%2dpwrctrl%2dperst%2dv1%2d0%2dc3c7e513e312%40kernel.org%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-e06652b06144d91b37cae1f9289747fe7cbe0762
>> > The original design aimed to avoid modifying controller drivers for pwrctrl
>> > integration. However, this approach lacked scalability because different
>> > controllers have varying requirements for when devices should be powered on. For
>> > example, controller drivers require devices to be powered on early for
>> > successful PHY initialization.
>> 
>> Can you elaborate on this? Previously you said
>> 
>> | Some platforms do LTSSM during phy_init(), so they will fail if the
>> | device is not powered ON at that time.
>> 
>> What do you mean by "do LTSSM during phy_init()"? Do you have a specific
>> driver in mind?
>> 
> 
> I believe the Mediatek PCIe controller driver used in Chromebooks exhibit this
> behavior. Chen talked about it in his LPC session:
> https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flpc.events%2fevent%2f19%2fcontributions%2f2023%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-59ecd8a94baa970f1f962febb6fe20f15058ef42

I went through 

mediatek/phy-mtk-pcie.c
mediatek/phy-mtk-tphy.c
mediatek/phy-mtk-xsphy.c
ralink/phy-mt7621-pci.c

and didn't see anything where they wait for the link to come up or check
the link state and fail.

The mtk PCIe drivers may check for this, but I'm saying that we
*shouldn't* do that in probe.

>> I would expect that the LTSSM would remain in the Detect state until the
>> pwrseq driver is being probed.
>> 
> 
> True, but if the API (phy_init()) expects the LTSSM to move to L0, then it will
> fail, right? It might be what's happening with above mentioned platform.

How can the API expect this?

I'm not saying that such a situation cannot exist, but I don't think
it's a common case.

>> > By using these explicit APIs, controller drivers gain fine grained control over
>> > their associated pwrctrl devices.
>> > 
>> > This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
>> > to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
>> > could be used as a reference to add pwrctrl support for other controller drivers
>> > also.
>> > 
>> > For example, to control the 3.3v supply to the PCI slot where the NVMe device is
>> > connected, below modifications are required:
>> > 
>> > Devicetree
>> > ----------
>> > 
>> > 	// In SoC dtsi:
>> > 
>> > 	pci@1bf8000 { // controller node
>> > 		...
>> > 		pcie1_port0: pcie@0 { // PCI Root Port node
>> > 			compatible = "pciclass,0604"; // required for pwrctrl
>> > 							 driver bind
>> > 			...
>> > 		};
>> > 	};
>> > 
>> > 	// In board dts:
>> > 
>> > 	&pcie1_port0 {
>> > 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
>> > 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
>> > 	};
>> > 
>> > Controller driver
>> > -----------------
>> > 
>> > 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
>> > 
>> > 	probe() {
>> > 		...
>> > 		// Initialize controller resources
>> > 		pci_pwrctrl_create_devices(&pdev->dev);
>> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
>> > 		// Deassert PERST# (optional)
>> > 		...
>> > 		pci_host_probe(); // Allocate host bridge and start bus scan
>> > 	}
>> > 
>> > 	suspend {
>> > 		// PME_Turn_Off broadcast
>> > 		// Assert PERST# (optional)
>> > 		pci_pwrctrl_power_off_devices(&pdev->dev);
>> > 		...
>> > 	}
>> > 
>> > 	resume {
>> > 		...
>> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
>> > 		// Deassert PERST# (optional)
>> > 	}
>> > 
>> > I will add a documentation for the pwrctrl framework in the coming days to make
>> > it easier to use.
>> > 
>> > Testing
>> > =======
>> > 
>> > This series is tested on the Lenovo Thinkpad T14s laptop based on Qcom X1E
>> > chipset and RB3Gen2 development board with TC9563 switch based on Qcom QCS6490
>> > chipset.
>> > 
>> > **NOTE**: With this series, the controller driver may undergo multiple probe
>> > deferral if the pwrctrl driver was not available during the controller driver
>> > probe. This is pretty much required to avoid the resource allocation issue. I
>> > plan to replace probe deferral with blocking wait in the coming days.
>> 
>> You can only do a blocking wait after deferring at least once, since the
>> root port may be probed synchronously during boot. I really think this
>> is rather messy and something we should avoid architecturally while we
>> have the chance.
>> 
> 
> By blocking wait I meant that the controller probe itself will do a blocking
> wait until the pwrctrl drivers gets bound. Since this happens way before the PCI
> bus scan, there won't be any Root Port probed synchronously.

You can't do that because the pwrctrl driver may *never* be loaded. And
this may deadlock the boot sequence because the initial probe is
performed synchronously from the initcall. i.e.

do_initcalls
  my_driver_init
    driver_register
      bus_add_driver
        driver_attach
          driver_probe_device

If the PCI controller is probed before the device that has the module
you will deadlock! So you can only sleep indefinitely if you are being
probed asynchronously.

-----

Maybe the best way to address this is to add assert_reset/link_up/
link_down callbacks to pci_ops. Then pwrctrl_slot probe could look like

    bridge = to_pci_host_bridge(dev->parent);
    of_regulator_bulk_get_all();
    regulator_bulk_enable();
    devm_clk_get_optional_enabled();
    devm_gpiod_get_optional(/* "reset" */);
    if (bridge && bridge->ops->assert_reset)
        ret = bridge->ops->assert_reset(bridge, slot)
    else
        ret = assert_reset_gpio(slot);

    if (ret != ALREADY_ASSERTED)
	    fdelay(100000);

    /* Deassert PERST and bring the link up */
    if (bridge && bridge->ops->link_up)
        bridge->ops->link_up(bridge, slot);
    else
        slot_deassert_reset(slot);

    devm_add_action_or_reset(link_down);
    pci_pwrctrl_init();
    devm_pci_pwrctrl_device_set_ready();

--Sean

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam 3 weeks, 2 days ago

On Thu, Jan 15, 2026 at 02:26:32PM -0500, Sean Anderson wrote:

[...]

> >> OK, so to clarify the problem is an architecture like
> >> 
> >>     RP
> >>     |-- Bridge 1 (automatic)
> >>     |   |-- Device 1
> >>     |   `-- Bridge 2 (needs pwrseq)
> >>     |       `-- Device 2
> >>     `-- Bridge 3 (automatic)
> >>         `-- Device 3
> >> 
> > 
> > This topology is not possible with PCIe. A single Root Port can only connect to
> > a single bridge. But applies to PCI.
> 
> OK, well imagine it like
> 
>      RP
>      `-- Host Bridge (automatic)
>          |-- Bridge 1 (automatic)
>          |   |-- Device 1
>          |   `-- Bridge 2 (needs pwrseq)
>          |       `-- Device 2
>          `-- Bridge 3 (automatic)
>              `-- Device 3
> 
> You raised the problem, so what I am asking is: is this such a
> problematic topology? And if not, please describe one.
> 

Again, this topology is also incorrect, but my point is that in whatever
topology, if you have a PCIe switch and that requires a pwrctrl driver to power
it on, then there will be a resource allocation problem.

> >> where Bridge 2 has a devicetree node with a pwrseq binding? So we do the
> >> initial scan and allocate resources for bridge/devices 1 and 3 with the
> >> resources for bridge 3 immediately above those for bridge 1. Then when
> >> bridge 2 shows up we can't resize bridge 1's windows since bridge 3's
> >> windows are in the way?
> >> 
> > 
> > It is not a problem with resizing, it is the problem with how much you can
> > resize. And also if that bridge 2 is a switch and if it exposes multiple
> > downstream busses, then the upstream bridge 1 will run out of resources.
> 
> OK, but what I am saying is that I don't believe Bridge 2 can need
> pwrseq if Bridge 1 doesn't. So I don't think the topology as-illustrated
> can exist.
> 
> It's possible that there could be a problem with multiple levels of
> bridges all needing pwrseq, but does such a system exist?
> 

Yes, it does exists atm. Below is the TC9563 PCIe switch topology in Qcom
RB3Gen2 board:

	Host bridge
	`--> Root Port (auto)
	     `--> TC9563 (pwrctrl)

https://lore.kernel.org/linux-arm-msm/20260105-tc9563-v1-1-642fd1fe7893@oss.qualcomm.com/

And then there is also a design which is underway that connects one more TC9653
to the downstream of existing one for peripheral expansion. So the topology will
become:

	Host bridge
	`--> Root Port (auto)
	     `--> TC9563 (pwrctrl)
		  `--> TC9563 (pwrctrl)

This is just one example and the OEMs may come up with many such designs and we
cannot deny them.

> > If bridge 2 is a hotplug bridge, then no issues. But I was only referring to
> > non-hotplug capable switches.
> > 
> >> But is it even valid to have a pwrseq node on bridge 2 without one on
> >> bridge 1? If bridge 1 is automatically controlled, then I would expect
> >> bridge 2 to be as well. E.g. I would expect bridge 2's reset sequence to
> >> be controlled by the secondary bus reset bit in bridge 1's bridge
> >> control register.
> >> 
> > 
> > Technically it is possible for Bridge 2 to have a pwrctrl requirement. What is
> > limiting from spec PoV?
> 
> If this is the case then we need to be able to handle the resource
> constraint problem. But if it doesn't exist then there is no problem
> with the existing architecture. Only this sort of design has resource
> problems, while most designs like
> 
>      RP
>      `-- Bridge 1 (pwrseq)
>          |-- Bridge 2 (automatic)
>          |   |-- Device 1
>          |   |-- Device 2
>          `-- Bridge 3 (automatic)
>              `-- Device 3
> 
> have no resource problems even with the current subsystem.
> 

Not at all. I think you don't get the issue. The Root Port is just a PCI bridge.
If the downstream device is not found during the initial scan, and if RP is a
non-hotplug capable device, then the PCI core will allocate resources for only
one downstream bus. And if the PCIe switch shows up on the downstream bus later,
then it will fail to enumerate due to resource constraint for the switch's
downstream busses.

This is pretty much what happens with the single switch TC9563 design in RB3Gen2
that I referenced above.

> >> And a very similar architecture like
> >> 
> >>     RP
> >>     |-- Bridge 4 (pwrseq)
> >>     |   |-- Device 4
> >>     `-- Bridge 5 (automatic)
> >>         `-- Device 5
> >> 
> >> has no problems since the resources for bridge 4 can be allocated above
> >> those for bridge 5 whenever it shows up.
> >> 
> > 
> > Again, if bridge 4 is not hotplug capable and if it is a switch, the problem is
> > still applicable.
> 
> This doesn't apply even if bridge 4 is not hotplug capable. It will show
> up after bridge 5 gets probed and just grab the next available
> resources.
> 

See above. The next available resources are very limited if the upstream bridge
is not hotplug capable. And we can't blame PCI core for this because, we are
pretty much emulating hotplug on a non-hotplug capable bridge, which is not
ideal.

> >> These problems seem very similar to what hotplug bridges have to handle
> >> (except that we (usually) only need to do one hotplug per boot). So
> >> maybe we should set is_hotplug_bridge on bridges with a pwrseq node.
> >> That way they'll get resources distributed for when the downstream port
> >> shows up. As an optimization, we could then release those resources once
> >> the downstream port is scanned.
> >> 
> > 
> > That would be incorrect. You cannot set 'is_hotplug_bridge' to 'true' for a
> > non-hotplug capable bridge. You can call it as a hack, but there is no place
> > for that in upstream.
> 
> Introduce a new boolean called 'is_pwrseq_bridge' and check for it when
> allocating resources.
> 

Sorry, I'm not upto introducing such hacks.

> >> > Proposal
> >> > ========
> >> > 
> >> > This series addresses both issues by introducing new individual APIs for pwrctrl
> >> > device creation, destruction, power on, and power off operations. Controller
> >> > drivers are expected to invoke these APIs during their probe(), remove(),
> >> > suspend(), and resume() operations.
> >> 
> >> (just for the record)
> >> 
> >> I think the existing design is quite elegant, since the operations
> >> associated with the bridge correspond directly to device lifecycle
> >> operations. It also avoids problems related to the root port trying to
> >> look up its own child (possibly missing a driver) during probe.
> >> 
> > 
> > I agree with you that it is elegant and I even was very reluctant to move out of
> > it [1]. But lately, I understood that we cannot scale the pwrctrl framework if we
> > do not give flexibility to the controller drivers [2].
> > 
> > [1] https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2feix65qdwtk5ocd7lj6sw2lslidivauzyn6h5cc4mc2nnci52im%40qfmbmwy2zjbe%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-377ad79c69a5ff9c69de76d9fcf5f030d066027a
> > [2] https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2faG3IWdZIhnk01t2A%40google.com%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-9a33d827cf703f2827fca86fd99acf563ca26bd9
> > 
> >> > This integration allows better coordination
> >> > between controller drivers and the pwrctrl framework, enabling enhanced features
> >> > such as D3Cold support.
> >> 
> >> 
> >> I think this should be handled by the power sequencing driver,
> >> especially as there are timing requirements for the other resources
> >> referenced to PERST? If we are going to touch each driver, it would
> >> be much better to consolidate things by removing the ad-hoc PERST
> >> support.
> >> 
> >> Different drivers control PERST in various ways, but I think this can
> >> be abstracted behind a GPIO controller (if necessary for e.g. MMIO-based
> >> control). If there's no reset-gpios property in the pwrseq node then we
> >> could automatically look up the GPIO on the root port.
> >> 
> > 
> > Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
> > implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
> > that space belongs to the controller driver.
> 
> That's what I mean. Implement a GPIO driver with one GPIO and perform
> the MMIO operations as requested.
> 
> Or we can invert things and add a reset op to pci_ops. If present then
> call it, and if absent use the PERST GPIO on the bridge.
> 

Having a callback for controlling the PERST# will work for the addressing the
PERST# issue, but it won't solve the PCIe switch issue we were talking above.
And this API design will fix both the problems.

But even in this callback design, you need to have modifications in the existing
controller drivers to integrate pwrctrl. So how that is different from calling
just two (or one unified API for create/power_on)?

> > FYI, I did try something similar before:
> > https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dpci%2f20250707%2dpci%2dpwrctrl%2dperst%2dv1%2d0%2dc3c7e513e312%40kernel.org%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-e06652b06144d91b37cae1f9289747fe7cbe0762
> >> > The original design aimed to avoid modifying controller drivers for pwrctrl
> >> > integration. However, this approach lacked scalability because different
> >> > controllers have varying requirements for when devices should be powered on. For
> >> > example, controller drivers require devices to be powered on early for
> >> > successful PHY initialization.
> >> 
> >> Can you elaborate on this? Previously you said
> >> 
> >> | Some platforms do LTSSM during phy_init(), so they will fail if the
> >> | device is not powered ON at that time.
> >> 
> >> What do you mean by "do LTSSM during phy_init()"? Do you have a specific
> >> driver in mind?
> >> 
> > 
> > I believe the Mediatek PCIe controller driver used in Chromebooks exhibit this
> > behavior. Chen talked about it in his LPC session:
> > https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flpc.events%2fevent%2f19%2fcontributions%2f2023%2f&umid=db5ea813-d162-4dc2-9847-b6f01a3e22ce&rct=1768380513&auth=d807158c60b7d2502abde8a2fc01f40662980862-59ecd8a94baa970f1f962febb6fe20f15058ef42
> 
> I went through 
> 
> mediatek/phy-mtk-pcie.c
> mediatek/phy-mtk-tphy.c
> mediatek/phy-mtk-xsphy.c
> ralink/phy-mt7621-pci.c
> 
> and didn't see anything where they wait for the link to come up or check
> the link state and fail.
> 

See tegra_pcie_config_rp().

> The mtk PCIe drivers may check for this, but I'm saying that we
> *shouldn't* do that in probe.
> 

Such drivers already exist. Sure they can just leave the LTSSM in detect state
instead of failing probe. But if their Root Port is not hotplug capable, why
should they expect a device to get attached to the bus after probe?

That being said, we do have DWC drivers ignoring the link up failure expecting
the link to come up later. But I may just fix that in the coming days once the
pwrctrl APIs are added. There is no reason to wait for hotplug if the RP is not
hotplug capable. It is just asking for troubles.

> >> I would expect that the LTSSM would remain in the Detect state until the
> >> pwrseq driver is being probed.
> >> 
> > 
> > True, but if the API (phy_init()) expects the LTSSM to move to L0, then it will
> > fail, right? It might be what's happening with above mentioned platform.
> 
> How can the API expect this?
> 
> I'm not saying that such a situation cannot exist, but I don't think
> it's a common case.
>

Starting LTSSM in phy_init() is weird I agree. I for sure know that someone
raised this up earlier, but don't exactly remember which driver is doing it.
 
> >> > By using these explicit APIs, controller drivers gain fine grained control over
> >> > their associated pwrctrl devices.
> >> > 
> >> > This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
> >> > to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
> >> > could be used as a reference to add pwrctrl support for other controller drivers
> >> > also.
> >> > 
> >> > For example, to control the 3.3v supply to the PCI slot where the NVMe device is
> >> > connected, below modifications are required:
> >> > 
> >> > Devicetree
> >> > ----------
> >> > 
> >> > 	// In SoC dtsi:
> >> > 
> >> > 	pci@1bf8000 { // controller node
> >> > 		...
> >> > 		pcie1_port0: pcie@0 { // PCI Root Port node
> >> > 			compatible = "pciclass,0604"; // required for pwrctrl
> >> > 							 driver bind
> >> > 			...
> >> > 		};
> >> > 	};
> >> > 
> >> > 	// In board dts:
> >> > 
> >> > 	&pcie1_port0 {
> >> > 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
> >> > 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
> >> > 	};
> >> > 
> >> > Controller driver
> >> > -----------------
> >> > 
> >> > 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
> >> > 
> >> > 	probe() {
> >> > 		...
> >> > 		// Initialize controller resources
> >> > 		pci_pwrctrl_create_devices(&pdev->dev);
> >> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
> >> > 		// Deassert PERST# (optional)
> >> > 		...
> >> > 		pci_host_probe(); // Allocate host bridge and start bus scan
> >> > 	}
> >> > 
> >> > 	suspend {
> >> > 		// PME_Turn_Off broadcast
> >> > 		// Assert PERST# (optional)
> >> > 		pci_pwrctrl_power_off_devices(&pdev->dev);
> >> > 		...
> >> > 	}
> >> > 
> >> > 	resume {
> >> > 		...
> >> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
> >> > 		// Deassert PERST# (optional)
> >> > 	}
> >> > 
> >> > I will add a documentation for the pwrctrl framework in the coming days to make
> >> > it easier to use.
> >> > 
> >> > Testing
> >> > =======
> >> > 
> >> > This series is tested on the Lenovo Thinkpad T14s laptop based on Qcom X1E
> >> > chipset and RB3Gen2 development board with TC9563 switch based on Qcom QCS6490
> >> > chipset.
> >> > 
> >> > **NOTE**: With this series, the controller driver may undergo multiple probe
> >> > deferral if the pwrctrl driver was not available during the controller driver
> >> > probe. This is pretty much required to avoid the resource allocation issue. I
> >> > plan to replace probe deferral with blocking wait in the coming days.
> >> 
> >> You can only do a blocking wait after deferring at least once, since the
> >> root port may be probed synchronously during boot. I really think this
> >> is rather messy and something we should avoid architecturally while we
> >> have the chance.
> >> 
> > 
> > By blocking wait I meant that the controller probe itself will do a blocking
> > wait until the pwrctrl drivers gets bound. Since this happens way before the PCI
> > bus scan, there won't be any Root Port probed synchronously.
> 
> You can't do that because the pwrctrl driver may *never* be loaded. And
> this may deadlock the boot sequence because the initial probe is
> performed synchronously from the initcall. i.e.
> 
> do_initcalls
>   my_driver_init
>     driver_register
>       bus_add_driver
>         driver_attach
>           driver_probe_device
> 
> If the PCI controller is probed before the device that has the module
> you will deadlock! So you can only sleep indefinitely if you are being
> probed asynchronously.
> 

Yes, I was thinking about controller drivers setting PROBE_PREFER_ASYNCHRONOUS
flag. We can restrict the blocking wait to such drivers.

> -----
> 
> Maybe the best way to address this is to add assert_reset/link_up/
> link_down callbacks to pci_ops. Then pwrctrl_slot probe could look like
> 
>     bridge = to_pci_host_bridge(dev->parent);
>     of_regulator_bulk_get_all();
>     regulator_bulk_enable();
>     devm_clk_get_optional_enabled();
>     devm_gpiod_get_optional(/* "reset" */);
>     if (bridge && bridge->ops->assert_reset)
>         ret = bridge->ops->assert_reset(bridge, slot)
>     else
>         ret = assert_reset_gpio(slot);
> 
>     if (ret != ALREADY_ASSERTED)
> 	    fdelay(100000);
> 
>     /* Deassert PERST and bring the link up */
>     if (bridge && bridge->ops->link_up)
>         bridge->ops->link_up(bridge, slot);
>     else
>         slot_deassert_reset(slot);
> 
>     devm_add_action_or_reset(link_down);
>     pci_pwrctrl_init();
>     devm_pci_pwrctrl_device_set_ready();
> 

Sorry, I'm not inclined to take this route for the reasons mentioned above.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Niklas Cassel 3 weeks, 1 day ago

On Fri, Jan 16, 2026 at 11:54:26AM +0530, Manivannan Sadhasivam wrote:
> On Thu, Jan 15, 2026 at 02:26:32PM -0500, Sean Anderson wrote:
> > > 
> > > Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
> > > implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
> > > that space belongs to the controller driver.
> > 
> > That's what I mean. Implement a GPIO driver with one GPIO and perform
> > the MMIO operations as requested.
> > 
> > Or we can invert things and add a reset op to pci_ops. If present then
> > call it, and if absent use the PERST GPIO on the bridge.
> > 
> 
> Having a callback for controlling the PERST# will work for the addressing the
> PERST# issue, but it won't solve the PCIe switch issue we were talking above.
> And this API design will fix both the problems.
> 
> But even in this callback design, you need to have modifications in the existing
> controller drivers to integrate pwrctrl. So how that is different from calling
> just two (or one unified API for create/power_on)?

FWIW, I do think that it is a good idea to create a reset op to pci_ops
that will implement PERST# assertion/deassertion.

Right now, it is a mess, with various drivers doing this at various
different places.

Having a specific callback, the driver implement it however they want
GPIO, MMIO, whatever, and it could even be called by (in case of DWC,
the host_init, by pwrctrl, or potentially by the PCI core itself before
enumerating the bus.

If we don't do something about it now, the problem will just get worse
with time. Yes, it will take time before all drivers have migrated and
been tested to have a dedicated PERST# reset op, but for the long term
maintainability, I think it is something that we should do.

I also know that some drivers have some loops with retry logic, where
they might go down in link speed, but right now I don't see why those
drivers shouldn't be able to keep that retry logic just because we
add a dedicated PERST# callback.

All that said, that would be a separate endeavor and can be implemented
later.

Kind regards,
Niklas

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Shawn Lin 3 weeks, 1 day ago

在 2026/01/16 星期五 21:40, Niklas Cassel 写道:
> On Fri, Jan 16, 2026 at 11:54:26AM +0530, Manivannan Sadhasivam wrote:
>> On Thu, Jan 15, 2026 at 02:26:32PM -0500, Sean Anderson wrote:
>>>>
>>>> Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
>>>> implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
>>>> that space belongs to the controller driver.
>>>
>>> That's what I mean. Implement a GPIO driver with one GPIO and perform
>>> the MMIO operations as requested.
>>>
>>> Or we can invert things and add a reset op to pci_ops. If present then
>>> call it, and if absent use the PERST GPIO on the bridge.
>>>
>>
>> Having a callback for controlling the PERST# will work for the addressing the
>> PERST# issue, but it won't solve the PCIe switch issue we were talking above.
>> And this API design will fix both the problems.
>>
>> But even in this callback design, you need to have modifications in the existing
>> controller drivers to integrate pwrctrl. So how that is different from calling
>> just two (or one unified API for create/power_on)?
> 
> FWIW, I do think that it is a good idea to create a reset op to pci_ops
> that will implement PERST# assertion/deassertion.
> 

That's exactly what I had in mind when looking at different PCIe host
drivers that why individual drivers should implement their own powering
up sequence, regarding to the face that bringing devices up should
always follow the timing defined PCI Express Card Electromechanical
Specification R6.0.1, section 2.2.1 "Initial Power Up(G3 to S0)"


> Right now, it is a mess, with various drivers doing this at various
> different places.
> 
> Having a specific callback, the driver implement it however they want
> GPIO, MMIO, whatever, and it could even be called by (in case of DWC,
> the host_init, by pwrctrl, or potentially by the PCI core itself before
> enumerating the bus.
> 
> If we don't do something about it now, the problem will just get worse
> with time. Yes, it will take time before all drivers have migrated and
> been tested to have a dedicated PERST# reset op, but for the long term
> maintainability, I think it is something that we should do.

Ack. That will also consolidate more timing relate improvements
together. For instance, almost all drivers holds PERST# for 100ms
Tpvperl before release it, but 100ms is the minimal value defined by
spec. I have to say it's broken for several EP cards I have debugged in 
practice these years. With holding all powering up timing together into
pwrctrl design could really helpful in the future.


> 
> I also know that some drivers have some loops with retry logic, where
> they might go down in link speed, but right now I don't see why those
> drivers shouldn't be able to keep that retry logic just because we
> add a dedicated PERST# callback.
> 
> 
> All that said, that would be a separate endeavor and can be implemented
> later.
> 
> 
> Kind regards,
> Niklas
> 
>

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam 3 weeks, 1 day ago

On Fri, Jan 16, 2026 at 02:40:36PM +0100, Niklas Cassel wrote:
> On Fri, Jan 16, 2026 at 11:54:26AM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Jan 15, 2026 at 02:26:32PM -0500, Sean Anderson wrote:
> > > > 
> > > > Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
> > > > implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
> > > > that space belongs to the controller driver.
> > > 
> > > That's what I mean. Implement a GPIO driver with one GPIO and perform
> > > the MMIO operations as requested.
> > > 
> > > Or we can invert things and add a reset op to pci_ops. If present then
> > > call it, and if absent use the PERST GPIO on the bridge.
> > > 
> > 
> > Having a callback for controlling the PERST# will work for the addressing the
> > PERST# issue, but it won't solve the PCIe switch issue we were talking above.
> > And this API design will fix both the problems.
> > 
> > But even in this callback design, you need to have modifications in the existing
> > controller drivers to integrate pwrctrl. So how that is different from calling
> > just two (or one unified API for create/power_on)?
> 
> FWIW, I do think that it is a good idea to create a reset op to pci_ops
> that will implement PERST# assertion/deassertion.
> 
> Right now, it is a mess, with various drivers doing this at various
> different places.
> 
> Having a specific callback, the driver implement it however they want
> GPIO, MMIO, whatever, and it could even be called by (in case of DWC,
> the host_init, by pwrctrl, or potentially by the PCI core itself before
> enumerating the bus.
> 
> If we don't do something about it now, the problem will just get worse
> with time. Yes, it will take time before all drivers have migrated and
> been tested to have a dedicated PERST# reset op, but for the long term
> maintainability, I think it is something that we should do.
> 
> I also know that some drivers have some loops with retry logic, where
> they might go down in link speed, but right now I don't see why those
> drivers shouldn't be able to keep that retry logic just because we
> add a dedicated PERST# callback.
> 
> 
> All that said, that would be a separate endeavor and can be implemented
> later.
> 

[looks like my previous reply was lost, so resending it]

Agree. Having unification always helps!

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam 3 weeks, 1 day ago

On Fri, Jan 16, 2026 at 02:40:36PM +0100, Niklas Cassel wrote:
> On Fri, Jan 16, 2026 at 11:54:26AM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Jan 15, 2026 at 02:26:32PM -0500, Sean Anderson wrote:
> > > > 
> > > > Not at all. We cannot model PERST# as a GPIO in all the cases. Some drivers
> > > > implement PERST# as a set of MMIO operations in the Root Complex MMIO space and
> > > > that space belongs to the controller driver.
> > > 
> > > That's what I mean. Implement a GPIO driver with one GPIO and perform
> > > the MMIO operations as requested.
> > > 
> > > Or we can invert things and add a reset op to pci_ops. If present then
> > > call it, and if absent use the PERST GPIO on the bridge.
> > > 
> > 
> > Having a callback for controlling the PERST# will work for the addressing the
> > PERST# issue, but it won't solve the PCIe switch issue we were talking above.
> > And this API design will fix both the problems.
> > 
> > But even in this callback design, you need to have modifications in the existing
> > controller drivers to integrate pwrctrl. So how that is different from calling
> > just two (or one unified API for create/power_on)?
> 
> FWIW, I do think that it is a good idea to create a reset op to pci_ops
> that will implement PERST# assertion/deassertion.
> 
> Right now, it is a mess, with various drivers doing this at various
> different places.
> 
> Having a specific callback, the driver implement it however they want
> GPIO, MMIO, whatever, and it could even be called by (in case of DWC,
> the host_init, by pwrctrl, or potentially by the PCI core itself before
> enumerating the bus.
> 
> If we don't do something about it now, the problem will just get worse
> with time. Yes, it will take time before all drivers have migrated and
> been tested to have a dedicated PERST# reset op, but for the long term
> maintainability, I think it is something that we should do.
> 
> I also know that some drivers have some loops with retry logic, where
> they might go down in link speed, but right now I don't see why those
> drivers shouldn't be able to keep that retry logic just because we
> add a dedicated PERST# callback.
> 
> 
> All that said, that would be a separate endeavor and can be implemented
> later.
> 

Agree. Having unification always helps!

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Chen-Yu Tsai 3 weeks, 3 days ago

On Wed, Jan 14, 2026 at 4:48 PM Manivannan Sadhasivam <mani@kernel.org> wrote:

[...]

> > > The original design aimed to avoid modifying controller drivers for pwrctrl
> > > integration. However, this approach lacked scalability because different
> > > controllers have varying requirements for when devices should be powered on. For
> > > example, controller drivers require devices to be powered on early for
> > > successful PHY initialization.
> >
> > Can you elaborate on this? Previously you said
> >
> > | Some platforms do LTSSM during phy_init(), so they will fail if the
> > | device is not powered ON at that time.
> >
> > What do you mean by "do LTSSM during phy_init()"? Do you have a specific
> > driver in mind?
> >
>
> I believe the Mediatek PCIe controller driver used in Chromebooks exhibit this
> behavior. Chen talked about it in his LPC session:
> https://lpc.events/event/19/contributions/2023/

I don't remember all the details off the top of my head, but at least the
MediaTek and old (non-DesignWare) Rockchip drivers both did this:

    Wait for link up during the probe function; if it times out then
    nothing is there, and just fail the probe.

And this probably makes sense if the controller does not support hotplug,
and you want to keep unused devices / interfaces disabled to save power.

> > I would expect that the LTSSM would remain in the Detect state until the
> > pwrseq driver is being probed.
> >
>
> True, but if the API (phy_init()) expects the LTSSM to move to L0, then it will
> fail, right? It might be what's happening with above mentioned platform.

I can't remember if any drivers expected this. IIRC they waited for link up
in the probe function before registering the PCI host.

[...]


ChenYu

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Bjorn Helgaas 3 weeks, 6 days ago

On Mon, Jan 05, 2026 at 07:25:40PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
> 
> This series provides a major rework for the PCI power control (pwrctrl)
> framework to enable the pwrctrl devices to be controlled by the PCI controller
> drivers.

I pushed a pci/pwrctrl-v5 that incorporates some of the comments I
sent.  If it's useful, you can use it as a basis for a v6; if not, no
worries.

Bjorn

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam 3 weeks, 6 days ago

On Sun, Jan 11, 2026 at 09:31:32PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 05, 2026 at 07:25:40PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > Hi,
> > 
> > This series provides a major rework for the PCI power control (pwrctrl)
> > framework to enable the pwrctrl devices to be controlled by the PCI controller
> > drivers.
> 
> I pushed a pci/pwrctrl-v5 that incorporates some of the comments I
> sent.  If it's useful, you can use it as a basis for a v6; if not, no
> worries.
> 

Thanks for making the changes, they look good to me. Do you expect me to send v6
or you intend to merge this pwrctrl-v5 branch to pci/next?

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Bjorn Helgaas 3 weeks, 5 days ago

On Mon, Jan 12, 2026 at 01:23:02PM +0530, Manivannan Sadhasivam wrote:
> On Sun, Jan 11, 2026 at 09:31:32PM -0600, Bjorn Helgaas wrote:
> > On Mon, Jan 05, 2026 at 07:25:40PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > Hi,
> > > 
> > > This series provides a major rework for the PCI power control (pwrctrl)
> > > framework to enable the pwrctrl devices to be controlled by the PCI controller
> > > drivers.
> > 
> > I pushed a pci/pwrctrl-v5 that incorporates some of the comments I
> > sent.  If it's useful, you can use it as a basis for a v6; if not,
> > no worries.
> > 
> 
> Thanks for making the changes, they look good to me. Do you expect
> me to send v6 or you intend to merge this pwrctrl-v5 branch to
> pci/next?

I'm not planning to merge pwrctrl-v5 yet.

Still hoping pci_pwrctrl_slot_power_on() could be factored out earlier
to simplify "PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}'
callbacks".  It wasn't quite obvious to me how to do that.

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Shawn Lin 3 weeks, 2 days ago

在 2026/01/05 星期一 21:55, Manivannan Sadhasivam via B4 Relay 写道:
> Hi,
> 
> This series provides a major rework for the PCI power control (pwrctrl)
> framework to enable the pwrctrl devices to be controlled by the PCI controller
> drivers.
> 
> Problem Statement
> =================
> 
> Currently, the pwrctrl framework faces two major issues:
> 
> 1. Missing PERST# integration
> 2. Inability to properly handle bus extenders such as PCIe switch devices
> 
> First issue arises from the disconnect between the PCI controller drivers and
> pwrctrl framework. At present, the pwrctrl framework just operates on its own
> with the help of the PCI core. The pwrctrl devices are created by the PCI core
> during initial bus scan and the pwrctrl drivers once bind, just power on the
> PCI devices during their probe(). This design conflicts with the PCI Express
> Card Electromechanical Specification requirements for PERST# timing. The reason
> is, PERST# signals are mostly handled by the controller drivers and often
> deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
> should be deasserted only after power and reference clock to the device are
> stable, within predefined timing parameters.
> 
> The second issue stems from the PCI bus scan completing before pwrctrl drivers
> probe. This poses a significant problem for PCI bus extenders like switches
> because the PCI core allocates upstream bridge resources during the initial
> scan. If the upstream bridge is not hotplug capable, resources are allocated
> only for the number of downstream buses detected at scan time, which might be
> just one if the switch was not powered and enumerated at that time. Later, when
> the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
> insufficient upstream bridge resources.
> 
> Proposal
> ========
> 
> This series addresses both issues by introducing new individual APIs for pwrctrl
> device creation, destruction, power on, and power off operations. Controller
> drivers are expected to invoke these APIs during their probe(), remove(),
> suspend(), and resume() operations. This integration allows better coordination
> between controller drivers and the pwrctrl framework, enabling enhanced features
> such as D3Cold support.
> 
> The original design aimed to avoid modifying controller drivers for pwrctrl
> integration. However, this approach lacked scalability because different
> controllers have varying requirements for when devices should be powered on. For
> example, controller drivers require devices to be powered on early for
> successful PHY initialization.
> 
> By using these explicit APIs, controller drivers gain fine grained control over
> their associated pwrctrl devices.
> 
> This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
> to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
> could be used as a reference to add pwrctrl support for other controller drivers
> also.
> 
> For example, to control the 3.3v supply to the PCI slot where the NVMe device is
> connected, below modifications are required:
> 
> Devicetree
> ----------
> 
> 	// In SoC dtsi:
> 
> 	pci@1bf8000 { // controller node
> 		...
> 		pcie1_port0: pcie@0 { // PCI Root Port node
> 			compatible = "pciclass,0604"; // required for pwrctrl
> 							 driver bind
> 			...
> 		};
> 	};
> 
> 	// In board dts:
> 
> 	&pcie1_port0 {
> 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
> 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
> 	};
> 
> Controller driver
> -----------------
> 
> 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
> 
> 	probe() {
> 		...
> 		// Initialize controller resources
> 		pci_pwrctrl_create_devices(&pdev->dev);
> 		pci_pwrctrl_power_on_devices(&pdev->dev);
> 		// Deassert PERST# (optional)
> 		...
> 		pci_host_probe(); // Allocate host bridge and start bus scan
> 	}
> 
> 	suspend {
> 		// PME_Turn_Off broadcast
> 		// Assert PERST# (optional)
> 		pci_pwrctrl_power_off_devices(&pdev->dev);
> 		...
> 	}
> 
> 	resume {
> 		...
> 		pci_pwrctrl_power_on_devices(&pdev->dev);
> 		// Deassert PERST# (optional)
> 	}
> 
> I will add a documentation for the pwrctrl framework in the coming days to make
> it easier to use.
> 

This series looks great.

In practice, some PCIe devices may need to be powered down dynamically 
at runtime. For example, users might want to disable a PCIe Wi-Fi module 
when there's no internet usage — typically, commands like ifconfig wlan0 
downonly bring the interface down but leave the Wi-Fi hardware powered. 
Is there a mechanism that would allow the Endpoint driver to leverage 
pwrctrl dynamically to support such power management scenarios?


> Testing
> =======
> 
> This series is tested on the Lenovo Thinkpad T14s laptop based on Qcom X1E
> chipset and RB3Gen2 development board with TC9563 switch based on Qcom QCS6490
> chipset.
> 
> **NOTE**: With this series, the controller driver may undergo multiple probe
> deferral if the pwrctrl driver was not available during the controller driver
> probe. This is pretty much required to avoid the resource allocation issue. I
> plan to replace probe deferral with blocking wait in the coming days.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Changes in v4:
> - Used platform_device_put()
> - Changed the return value of power_off() callback to 'int'
> - Splitted patch 6 into two and reworded the commit message
> - Collected tags
> - Link to v3: https://lore.kernel.org/r/20251229-pci-pwrctrl-rework-v3-0-c7d5918cd0db@oss.qualcomm.com
> 
> Changes in v3:
> - Integrated TC9563 change
> - Reworked the power_on API to properly power off the devices in error path
> - Fixed the error path in pcie-qcom.c to not destroy pwrctrl devices during
>    probe deferral
> - Rebased on top of pci/controller/dwc-qcom branch and dropped the PERST# patch
> - Added a patch for TC9563 to fix the refcount dropping for i2c adapter
> - Added patches to drop the assert_perst callback and rename the PERST# helpers in
>    pcie-qcom.c
> - Link to v2: https://lore.kernel.org/r/20251216-pci-pwrctrl-rework-v2-0-745a563b9be6@oss.qualcomm.com
> 
> Changes in v2:
> - Exported of_pci_supply_present() API
> - Demoted the -EPROBE_DEFER log to dev_dbg()
> - Collected tags and rebased on top of v6.19-rc1
> - Link to v1: https://lore.kernel.org/r/20251124-pci-pwrctrl-rework-v1-0-78a72627683d@oss.qualcomm.com
> 
> ---
> Krishna Chaitanya Chundru (1):
>        PCI/pwrctrl: Add APIs for explicitly creating and destroying pwrctrl devices
> 
> Manivannan Sadhasivam (7):
>        PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
>        PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
>        PCI/pwrctrl: Add APIs to power on/off the pwrctrl devices
>        PCI/pwrctrl: Switch to the new pwrctrl APIs
>        PCI: qcom: Drop the assert_perst() callbacks
>        PCI: Drop the assert_perst() callback
>        PCI: qcom: Rename PERST# assert/deassert helpers for uniformity
> 
>   drivers/pci/bus.c                                 |  19 --
>   drivers/pci/controller/dwc/pcie-designware-host.c |   9 -
>   drivers/pci/controller/dwc/pcie-designware.h      |   9 -
>   drivers/pci/controller/dwc/pcie-qcom.c            |  54 +++--
>   drivers/pci/of.c                                  |   1 +
>   drivers/pci/probe.c                               |  59 -----
>   drivers/pci/pwrctrl/core.c                        | 260 ++++++++++++++++++++--
>   drivers/pci/pwrctrl/pci-pwrctrl-pwrseq.c          |  30 ++-
>   drivers/pci/pwrctrl/pci-pwrctrl-tc9563.c          |  48 ++--
>   drivers/pci/pwrctrl/slot.c                        |  48 ++--
>   drivers/pci/remove.c                              |  20 --
>   include/linux/pci-pwrctrl.h                       |  16 +-
>   include/linux/pci.h                               |   1 -
>   13 files changed, 367 insertions(+), 207 deletions(-)
> ---
> base-commit: 3e7f562e20ee87a25e104ef4fce557d39d62fa85
> change-id: 20251124-pci-pwrctrl-rework-c91a6e16c2f6
> prerequisite-message-id: 20251126081718.8239-1-mani@kernel.org
> prerequisite-patch-id: db9ff6c713e2303c397e645935280fd0d277793a
> prerequisite-patch-id: b5351b0a41f618435f973ea2c3275e51d46f01c5
> 
> Best regards,

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Manivannan Sadhasivam 3 weeks, 1 day ago

On Fri, Jan 16, 2026 at 04:02:38PM +0800, Shawn Lin wrote:
> 
> 在 2026/01/05 星期一 21:55, Manivannan Sadhasivam via B4 Relay 写道:
> > Hi,
> > 
> > This series provides a major rework for the PCI power control (pwrctrl)
> > framework to enable the pwrctrl devices to be controlled by the PCI controller
> > drivers.
> > 
> > Problem Statement
> > =================
> > 
> > Currently, the pwrctrl framework faces two major issues:
> > 
> > 1. Missing PERST# integration
> > 2. Inability to properly handle bus extenders such as PCIe switch devices
> > 
> > First issue arises from the disconnect between the PCI controller drivers and
> > pwrctrl framework. At present, the pwrctrl framework just operates on its own
> > with the help of the PCI core. The pwrctrl devices are created by the PCI core
> > during initial bus scan and the pwrctrl drivers once bind, just power on the
> > PCI devices during their probe(). This design conflicts with the PCI Express
> > Card Electromechanical Specification requirements for PERST# timing. The reason
> > is, PERST# signals are mostly handled by the controller drivers and often
> > deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
> > should be deasserted only after power and reference clock to the device are
> > stable, within predefined timing parameters.
> > 
> > The second issue stems from the PCI bus scan completing before pwrctrl drivers
> > probe. This poses a significant problem for PCI bus extenders like switches
> > because the PCI core allocates upstream bridge resources during the initial
> > scan. If the upstream bridge is not hotplug capable, resources are allocated
> > only for the number of downstream buses detected at scan time, which might be
> > just one if the switch was not powered and enumerated at that time. Later, when
> > the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
> > insufficient upstream bridge resources.
> > 
> > Proposal
> > ========
> > 
> > This series addresses both issues by introducing new individual APIs for pwrctrl
> > device creation, destruction, power on, and power off operations. Controller
> > drivers are expected to invoke these APIs during their probe(), remove(),
> > suspend(), and resume() operations. This integration allows better coordination
> > between controller drivers and the pwrctrl framework, enabling enhanced features
> > such as D3Cold support.
> > 
> > The original design aimed to avoid modifying controller drivers for pwrctrl
> > integration. However, this approach lacked scalability because different
> > controllers have varying requirements for when devices should be powered on. For
> > example, controller drivers require devices to be powered on early for
> > successful PHY initialization.
> > 
> > By using these explicit APIs, controller drivers gain fine grained control over
> > their associated pwrctrl devices.
> > 
> > This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
> > to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
> > could be used as a reference to add pwrctrl support for other controller drivers
> > also.
> > 
> > For example, to control the 3.3v supply to the PCI slot where the NVMe device is
> > connected, below modifications are required:
> > 
> > Devicetree
> > ----------
> > 
> > 	// In SoC dtsi:
> > 
> > 	pci@1bf8000 { // controller node
> > 		...
> > 		pcie1_port0: pcie@0 { // PCI Root Port node
> > 			compatible = "pciclass,0604"; // required for pwrctrl
> > 							 driver bind
> > 			...
> > 		};
> > 	};
> > 
> > 	// In board dts:
> > 
> > 	&pcie1_port0 {
> > 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
> > 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
> > 	};
> > 
> > Controller driver
> > -----------------
> > 
> > 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
> > 
> > 	probe() {
> > 		...
> > 		// Initialize controller resources
> > 		pci_pwrctrl_create_devices(&pdev->dev);
> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
> > 		// Deassert PERST# (optional)
> > 		...
> > 		pci_host_probe(); // Allocate host bridge and start bus scan
> > 	}
> > 
> > 	suspend {
> > 		// PME_Turn_Off broadcast
> > 		// Assert PERST# (optional)
> > 		pci_pwrctrl_power_off_devices(&pdev->dev);
> > 		...
> > 	}
> > 
> > 	resume {
> > 		...
> > 		pci_pwrctrl_power_on_devices(&pdev->dev);
> > 		// Deassert PERST# (optional)
> > 	}
> > 
> > I will add a documentation for the pwrctrl framework in the coming days to make
> > it easier to use.
> > 
> 
> This series looks great.
> 
> In practice, some PCIe devices may need to be powered down dynamically at
> runtime. For example, users might want to disable a PCIe Wi-Fi module when
> there's no internet usage — typically, commands like ifconfig wlan0 downonly
> bring the interface down but leave the Wi-Fi hardware powered. Is there a
> mechanism that would allow the Endpoint driver to leverage pwrctrl
> dynamically to support such power management scenarios?
> 

Glad that you've brought it up. You are talking about the usecase similar to
Airplane mode in mobiles, and we at Qcom are looking into this usecase in
upstream.

They way to handle this would be by using runtime PM ops. Once your WiFi or a
NIC driver runtime suspends, it will trigger the controller driver runtime
suspend callback. By that time, the controller driver can see if the device is
active or not (checking D states), whether wakeup is requested or not and then
initiate the D3Cold sequence using the APIs introduced in this series.

But that comes with a cost though, which is resume latency. It is generally not
advised to do D3Cold during runtime PM due to the latency and also device
lifetime issues (wearout etc...). So technically it is possible, but there are
challenges.

Krishna is going to post a series that allows the pcie-qcom driver to do D3Cold
during system suspend with these APIs. And we do have plans to extend it to
Airplane mode and similar usecases in the future.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v4 0/8] PCI/pwrctrl: Major rework to integrate pwrctrl devices with controller drivers

Posted by Shawn Lin 3 weeks, 1 day ago

在 2026/01/16 星期五 16:30, Manivannan Sadhasivam 写道:
> On Fri, Jan 16, 2026 at 04:02:38PM +0800, Shawn Lin wrote:
>>
>> 在 2026/01/05 星期一 21:55, Manivannan Sadhasivam via B4 Relay 写道:
>>> Hi,
>>>
>>> This series provides a major rework for the PCI power control (pwrctrl)
>>> framework to enable the pwrctrl devices to be controlled by the PCI controller
>>> drivers.
>>>
>>> Problem Statement
>>> =================
>>>
>>> Currently, the pwrctrl framework faces two major issues:
>>>
>>> 1. Missing PERST# integration
>>> 2. Inability to properly handle bus extenders such as PCIe switch devices
>>>
>>> First issue arises from the disconnect between the PCI controller drivers and
>>> pwrctrl framework. At present, the pwrctrl framework just operates on its own
>>> with the help of the PCI core. The pwrctrl devices are created by the PCI core
>>> during initial bus scan and the pwrctrl drivers once bind, just power on the
>>> PCI devices during their probe(). This design conflicts with the PCI Express
>>> Card Electromechanical Specification requirements for PERST# timing. The reason
>>> is, PERST# signals are mostly handled by the controller drivers and often
>>> deasserted even before the pwrctrl drivers probe. According to the spec, PERST#
>>> should be deasserted only after power and reference clock to the device are
>>> stable, within predefined timing parameters.
>>>
>>> The second issue stems from the PCI bus scan completing before pwrctrl drivers
>>> probe. This poses a significant problem for PCI bus extenders like switches
>>> because the PCI core allocates upstream bridge resources during the initial
>>> scan. If the upstream bridge is not hotplug capable, resources are allocated
>>> only for the number of downstream buses detected at scan time, which might be
>>> just one if the switch was not powered and enumerated at that time. Later, when
>>> the pwrctrl driver powers on and enumerates the switch, enumeration fails due to
>>> insufficient upstream bridge resources.
>>>
>>> Proposal
>>> ========
>>>
>>> This series addresses both issues by introducing new individual APIs for pwrctrl
>>> device creation, destruction, power on, and power off operations. Controller
>>> drivers are expected to invoke these APIs during their probe(), remove(),
>>> suspend(), and resume() operations. This integration allows better coordination
>>> between controller drivers and the pwrctrl framework, enabling enhanced features
>>> such as D3Cold support.
>>>
>>> The original design aimed to avoid modifying controller drivers for pwrctrl
>>> integration. However, this approach lacked scalability because different
>>> controllers have varying requirements for when devices should be powered on. For
>>> example, controller drivers require devices to be powered on early for
>>> successful PHY initialization.
>>>
>>> By using these explicit APIs, controller drivers gain fine grained control over
>>> their associated pwrctrl devices.
>>>
>>> This series modified the pcie-qcom driver (only consumer of pwrctrl framework)
>>> to adopt to these APIs and also removed the old pwrctrl code from PCI core. This
>>> could be used as a reference to add pwrctrl support for other controller drivers
>>> also.
>>>
>>> For example, to control the 3.3v supply to the PCI slot where the NVMe device is
>>> connected, below modifications are required:
>>>
>>> Devicetree
>>> ----------
>>>
>>> 	// In SoC dtsi:
>>>
>>> 	pci@1bf8000 { // controller node
>>> 		...
>>> 		pcie1_port0: pcie@0 { // PCI Root Port node
>>> 			compatible = "pciclass,0604"; // required for pwrctrl
>>> 							 driver bind
>>> 			...
>>> 		};
>>> 	};
>>>
>>> 	// In board dts:
>>>
>>> 	&pcie1_port0 {
>>> 		reset-gpios = <&tlmm 152 GPIO_ACTIVE_LOW>; // optional
>>> 		vpcie3v3-supply = <&vreg_nvme>; // NVMe power supply
>>> 	};
>>>
>>> Controller driver
>>> -----------------
>>>
>>> 	// Select PCI_PWRCTRL_SLOT in controller Kconfig
>>>
>>> 	probe() {
>>> 		...
>>> 		// Initialize controller resources
>>> 		pci_pwrctrl_create_devices(&pdev->dev);
>>> 		pci_pwrctrl_power_on_devices(&pdev->dev);
>>> 		// Deassert PERST# (optional)
>>> 		...
>>> 		pci_host_probe(); // Allocate host bridge and start bus scan
>>> 	}
>>>
>>> 	suspend {
>>> 		// PME_Turn_Off broadcast
>>> 		// Assert PERST# (optional)
>>> 		pci_pwrctrl_power_off_devices(&pdev->dev);
>>> 		...
>>> 	}
>>>
>>> 	resume {
>>> 		...
>>> 		pci_pwrctrl_power_on_devices(&pdev->dev);
>>> 		// Deassert PERST# (optional)
>>> 	}
>>>
>>> I will add a documentation for the pwrctrl framework in the coming days to make
>>> it easier to use.
>>>
>>
>> This series looks great.
>>
>> In practice, some PCIe devices may need to be powered down dynamically at
>> runtime. For example, users might want to disable a PCIe Wi-Fi module when
>> there's no internet usage — typically, commands like ifconfig wlan0 downonly
>> bring the interface down but leave the Wi-Fi hardware powered. Is there a
>> mechanism that would allow the Endpoint driver to leverage pwrctrl
>> dynamically to support such power management scenarios?
>>
> 
> Glad that you've brought it up. You are talking about the usecase similar to
> Airplane mode in mobiles, and we at Qcom are looking into this usecase in
> upstream.
> 
> They way to handle this would be by using runtime PM ops. Once your WiFi or a
> NIC driver runtime suspends, it will trigger the controller driver runtime
> suspend callback. By that time, the controller driver can see if the device is
> active or not (checking D states), whether wakeup is requested or not and then
> initiate the D3Cold sequence using the APIs introduced in this series.
> 
> But that comes with a cost though, which is resume latency. It is generally not
> advised to do D3Cold during runtime PM due to the latency and also device
> lifetime issues (wearout etc...). So technically it is possible, but there are
> challenges.
> 

Indeed, that's a fundamental power-performance trade-off for 
battery-powered devices.

> Krishna is going to post a series that allows the pcie-qcom driver to do D3Cold
> during system suspend with these APIs. And we do have plans to extend it to
> Airplane mode and similar usecases in the future.
> 

Thanks for sharing this details.

> - Mani
>