drivers/misc/pci_endpoint_test.c | 20 +++++ drivers/pci/controller/dwc/pcie-qcom.c | 143 ++++++++++++++++++++++++++++++- drivers/pci/controller/pci-host-common.c | 35 ++++++++ drivers/pci/controller/pci-host-common.h | 1 + drivers/pci/pci.c | 21 +++++ drivers/pci/pcie/err.c | 6 +- include/linux/pci.h | 1 + 7 files changed, 221 insertions(+), 6 deletions(-)
Hi,
Currently, in the event of AER/DPC, PCI core will try to reset the slot (Root
Port) and its subordinate devices by invoking bridge control reset and FLR. But
in some cases like AER Fatal error, it might be necessary to reset the Root
Ports using the PCI host bridge drivers in a platform specific way (as indicated
by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
Otherwise, the PCI link won't be recovered successfully.
So this series adds a new callback 'pci_host_bridge::reset_root_port' for the
host bridge drivers to reset the Root Port when a fatal error happens.
Also, this series allows the host bridge drivers to handle PCI link down event
by resetting the Root Ports and recovering the bus. This is accomplished by the
help of the new 'pci_host_handle_link_down()' API. Host bridge drivers are
expected to call this API (preferrably from a threaded IRQ handler) with
relevant Root Port 'pci_dev' when a link down event is detected for the port.
The API will reuse the pcie_do_recovery() function to recover the link if AER
support is enabled, otherwise it will directly call the reset_root_port()
callback of the host bridge driver (if exists).
For reference, I've modified the pcie-qcom driver to call
pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
LDn global_irq event and populated 'pci_host_bridge::reset_root_port()'
callback to reset the Root Ports.
Testing
-------
Tested on Qcom Lemans AU Ride platform with Host and EP SoCs connected over PCIe
link. Simulated the LDn by disabling LTSSM_EN on the EP and I could verify that
the link was getting recovered successfully.
Changes in v7:
- Dropped Rockchip Root port reset patch due to reported issues. But the series
works on other platforms as tested by others.
- Added pci_{lock/unlock}_rescan_remove() to guard pci_bus_error_reset() as the
device could be removed in-between due to Native hotplug interrupt.
- Rebased on top of v7.0-rc1
- Link to v6: https://lore.kernel.org/r/20250715-pci-port-reset-v6-0-6f9cce94e7bb@oss.qualcomm.com
Changes in v6:
- Incorporated the patch: https://lore.kernel.org/all/20250524185304.26698-2-manivannan.sadhasivam@linaro.org/
- Link to v5: https://lore.kernel.org/r/20250715-pci-port-reset-v5-0-26a5d278db40@oss.qualcomm.com
Changes in v5:
* Reworked the pci_host_handle_link_down() to accept Root Port instead of
resetting all Root Ports in the event of link down.
* Renamed 'reset_slot' to 'reset_root_port' to avoid confusion as both terms
were used interchangibly and the series is intended to reset Root Port only.
* Added the Rockchip driver change to this series.
* Dropped the applied patches and review/tested tags due to rework.
* Rebased on top of v6.16-rc1.
Changes in v4:
- Handled link down first in the irq handler
- Updated ICC & OPP bandwidth after link up in reset_slot() callback
- Link to v3: https://lore.kernel.org/r/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org
Changes in v3:
- Made the pci-host-common driver as a common library for host controller
drivers
- Moved the reset slot code to pci-host-common library
- Link to v2: https://lore.kernel.org/r/20250416-pcie-reset-slot-v2-0-efe76b278c10@linaro.org
Changes in v2:
- Moved calling reset_slot() callback from pcie_do_recovery() to pcibios_reset_secondary_bus()
- Link to v1: https://lore.kernel.org/r/20250404-pcie-reset-slot-v1-0-98952918bf90@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (4):
PCI/ERR: Add support for resetting the Root Ports in a platform specific way
PCI: host-common: Add link down handling for Root Ports
PCI: qcom: Add support for resetting the Root Port due to link down event
misc: pci_endpoint_test: Add AER error handlers
drivers/misc/pci_endpoint_test.c | 20 +++++
drivers/pci/controller/dwc/pcie-qcom.c | 143 ++++++++++++++++++++++++++++++-
drivers/pci/controller/pci-host-common.c | 35 ++++++++
drivers/pci/controller/pci-host-common.h | 1 +
drivers/pci/pci.c | 21 +++++
drivers/pci/pcie/err.c | 6 +-
include/linux/pci.h | 1 +
7 files changed, 221 insertions(+), 6 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20250715-pci-port-reset-4d9519570123
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
On Tue, Mar 10, 2026 at 07:31:58PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > Changes in v7: > - Dropped Rockchip Root port reset patch due to reported issues. But the series > works on other platforms as tested by others. Are you referring to ## On EP side: # echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \ sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start Then running pcitest only having 7 / 16 tests passed ? If so, isn't that a problem also for qcom? There is no chance that the patch: "misc: pci_endpoint_test: Add AER error handlers" improves things in this regard? Or will it simply avoid the "AER: device recovery failed" print? Kind regards, Niklas
On Wed, Mar 11, 2026 at 12:05:15PM +0100, Niklas Cassel wrote: > On Tue, Mar 10, 2026 at 07:31:58PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > Changes in v7: > > - Dropped Rockchip Root port reset patch due to reported issues. But the series > > works on other platforms as tested by others. > > Are you referring to > > ## On EP side: > # echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \ > sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start > > Then running pcitest only having 7 / 16 tests passed ? > > If so, isn't that a problem also for qcom? > No, tests are passing on my setup after link up. > > There is no chance that the patch: > "misc: pci_endpoint_test: Add AER error handlers" > improves things in this regard? > > Or will it simply avoid the "AER: device recovery failed" print? > Yes, as mentioned in the commit message, it just avoids the AER recovery failure message. - Mani -- மணிவண்ணன் சதாசிவம்
On Wed, Mar 11, 2026 at 08:09:53PM +0530, Manivannan Sadhasivam wrote: > On Wed, Mar 11, 2026 at 12:05:15PM +0100, Niklas Cassel wrote: > > On Tue, Mar 10, 2026 at 07:31:58PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > > Changes in v7: > > > - Dropped Rockchip Root port reset patch due to reported issues. But the series > > > works on other platforms as tested by others. > > > > Are you referring to > > > > ## On EP side: > > # echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \ > > sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start > > > > Then running pcitest only having 7 / 16 tests passed ? > > > > If so, isn't that a problem also for qcom? > > > > No, tests are passing on my setup after link up. > > > > > There is no chance that the patch: > > "misc: pci_endpoint_test: Add AER error handlers" > > improves things in this regard? > > > > Or will it simply avoid the "AER: device recovery failed" print? > > > > Yes, as mentioned in the commit message, it just avoids the AER recovery failure > message. > I also realized that Endpoint state is not saved in all the code paths. So the pci_endpoint_test driver has to save/restore the state also. But it is still not clear why that didn't help you. Can you share the snapshot of the entire config space before and after reset using 'lspci -xxxx -s "0000:01:00"'? - Mani -- மணிவண்ணன் சதாசிவம்
On Wed, Mar 11, 2026 at 08:44:15PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Mar 11, 2026 at 08:09:53PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Mar 11, 2026 at 12:05:15PM +0100, Niklas Cassel wrote:
> > > On Tue, Mar 10, 2026 at 07:31:58PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > > Changes in v7:
> > > > - Dropped Rockchip Root port reset patch due to reported issues. But the series
> > > > works on other platforms as tested by others.
> > >
> > > Are you referring to
> > >
> > > ## On EP side:
> > > # echo 0 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start && \
> > > sleep 0.1 && echo 1 > /sys/kernel/config/pci_ep/controllers/a40000000.pcie-ep/start
> > >
> > > Then running pcitest only having 7 / 16 tests passed ?
> > >
> > > If so, isn't that a problem also for qcom?
> > >
> >
> > No, tests are passing on my setup after link up.
> >
> > >
> > > There is no chance that the patch:
> > > "misc: pci_endpoint_test: Add AER error handlers"
> > > improves things in this regard?
> > >
> > > Or will it simply avoid the "AER: device recovery failed" print?
> > >
> >
> > Yes, as mentioned in the commit message, it just avoids the AER recovery failure
> > message.
> >
>
> I also realized that Endpoint state is not saved in all the code paths. So the
> pci_endpoint_test driver has to save/restore the state also. But it is still not
> clear why that didn't help you.
>
> Can you share the snapshot of the entire config space before and after reset
> using 'lspci -xxxx -s "0000:01:00"'?
If I don't add something like:
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 1eced7a419eb..9d7ee39164d4 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -1059,6 +1059,9 @@ static int pci_endpoint_test_set_irq(struct pci_endpoint_test *test,
return ret;
}
+ pr_info("saving PCI state (irq_type: %d)\n", req_irq_type);
+ pci_save_state(pdev);
+
return 0;
}
@@ -1453,6 +1456,7 @@ static pci_ers_result_t pci_endpoint_test_error_detected(struct pci_dev *pdev,
static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
{
+ pci_restore_state(pdev);
return PCI_ERS_RESULT_RECOVERED;
}
On top of your patch.
Then all the BAR tests + MSI and MSI-X tests fail.
There is a huge difference in lspci -vvv output (as I guess is expected),
including all BARs being marked as disabled.
With the patch above. There is zero difference before/after reset, and all
the BAR tests pass. However, MSI/MSI-X tests still fail with:
# pci_endpoint_test.c:143:MSI_TEST:Expected 0 (0) == ret (-110)
# pci_endpoint_test.c:143:MSI_TEST:Test failed for MSI1
ETIMEDOUT.
This suggests that pci_endpoint_test on the host side did not receive an
interrupt.
I don't know why, but considering that lspci output is now (with the
save+restore) identical, I assume that the problem is not related to
the host. Unless somehow the host will use a new/different MSI address
after the root port has been reset, and we restore the old MSI address,
but looking at the code, dw_pcie_msi_init() is called by
dw_pcie_setup_rc(), so I would expect the MSI address to be the same.
I will be very busy for a few weeks, so I don't have time to debug this.
If anyone wants to debug this on rk3588, I'm attaching the patches for
this new feature for rk3588 that can be applied on top of this series.
Personally, I'm fine with this series getting merged even though this
new feature will only be supported by the QCOM driver.
But, I don't understand how e.g. pci endpoint test can work on QCOM
platforms, after the root port has been reset, without something like
the save/restore diff above.
Kind regards,
Niklas
From c6416291bdbe2a3964b60183492208b41208f5a0 Mon Sep 17 00:00:00 2001
From: Niklas Cassel <cassel@kernel.org>
Date: Tue, 17 Mar 2026 09:59:09 +0100
Subject: [PATCH 1/2] Revert "PCI: dw-rockchip: Simplify regulator setup with
devm_regulator_get_enable_optional()"
This reverts commit c930b10f17c03858cfe19b9873ba5240128b4d1b.
---
drivers/pci/controller/dwc/pcie-dw-rockchip.c | 23 ++++++++++++++-----
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index 8db27199cfa6..bec42fe646d8 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -95,6 +95,7 @@ struct rockchip_pcie {
unsigned int clk_cnt;
struct reset_control *rst;
struct gpio_desc *rst_gpio;
+ struct regulator *vpcie3v3;
struct irq_domain *irq_domain;
const struct rockchip_pcie_of_data *data;
bool supports_clkreq;
@@ -673,15 +674,22 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
return ret;
/* DON'T MOVE ME: must be enable before PHY init */
- ret = devm_regulator_get_enable_optional(dev, "vpcie3v3");
- if (ret < 0 && ret != -ENODEV)
- return dev_err_probe(dev, ret,
- "failed to enable vpcie3v3 regulator\n");
+ rockchip->vpcie3v3 = devm_regulator_get_optional(dev, "vpcie3v3");
+ if (IS_ERR(rockchip->vpcie3v3)) {
+ if (PTR_ERR(rockchip->vpcie3v3) != -ENODEV)
+ return dev_err_probe(dev, PTR_ERR(rockchip->vpcie3v3),
+ "failed to get vpcie3v3 regulator\n");
+ rockchip->vpcie3v3 = NULL;
+ } else {
+ ret = regulator_enable(rockchip->vpcie3v3);
+ if (ret)
+ return dev_err_probe(dev, ret,
+ "failed to enable vpcie3v3 regulator\n");
+ }
ret = rockchip_pcie_phy_init(rockchip);
if (ret)
- return dev_err_probe(dev, ret,
- "failed to initialize the phy\n");
+ goto disable_regulator;
ret = reset_control_deassert(rockchip->rst);
if (ret)
@@ -714,6 +722,9 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
clk_bulk_disable_unprepare(rockchip->clk_cnt, rockchip->clks);
deinit_phy:
rockchip_pcie_phy_deinit(rockchip);
+disable_regulator:
+ if (rockchip->vpcie3v3)
+ regulator_disable(rockchip->vpcie3v3);
return ret;
}
--
2.53.0
From 47b29e709bb209b2877a072cd3a9c3e5f1b66399 Mon Sep 17 00:00:00 2001
From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Date: Tue, 17 Mar 2026 09:30:57 +0100
Subject: [PATCH 2/2] PCI: dw-rockchip: Add support to reset Root Port upon
link down event
The PCIe link may go down in cases like firmware crashes or unstable
connections. When this occurs, the PCIe Root Port must be reset to restore
the functionality. However, the current driver lacks link down handling,
forcing users to reboot the system to recover.
This patch implements the `reset_root_port` callback for link down handling
for Rockchip DWC PCIe host controller. In which, the RC is reset,
reconfigured and link training initiated to recover from the link down
event.
This also by extension fixes issues with sysfs initiated bus resets. In
that, currently, when a sysfs initiated bus reset is issued, the endpoint
device is non-functional after (may link up with downgraded link status).
With the link down handling support, a sysfs initiated bus reset works as
intended. Testing conducted on a ROCK5B board with an M.2 NVMe drive.
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
drivers/pci/controller/dwc/Kconfig | 1 +
drivers/pci/controller/dwc/pcie-dw-rockchip.c | 137 +++++++++++++++++-
2 files changed, 135 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
index d0aa031397fa..ecaf79da843b 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -361,6 +361,7 @@ config PCIE_ROCKCHIP_DW_HOST
depends on OF
select PCIE_DW_HOST
select PCIE_ROCKCHIP_DW
+ select PCI_HOST_COMMON
help
Enables support for the DesignWare PCIe controller in the
Rockchip SoC (except RK3399) to work in host mode.
diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index bec42fe646d8..75928057acee 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -24,6 +24,7 @@
#include <linux/reset.h>
#include "../../pci.h"
+#include "../pci-host-common.h"
#include "pcie-designware.h"
/*
@@ -106,6 +107,9 @@ struct rockchip_pcie_of_data {
const struct pci_epc_features *epc_features;
};
+static int rockchip_pcie_rc_reset_root_port(struct pci_host_bridge *bridge,
+ struct pci_dev *pdev);
+
static int rockchip_pcie_readl_apb(struct rockchip_pcie *rockchip, u32 reg)
{
return readl_relaxed(rockchip->apb_base + reg);
@@ -326,6 +330,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
rockchip_pcie_configure_l1ss(pci);
rockchip_pcie_enable_l0s(pci);
+ pp->bridge->reset_root_port = rockchip_pcie_rc_reset_root_port;
/* Disable Root Ports BAR0 and BAR1 as they report bogus size */
dw_pcie_writel_dbi2(pci, PCI_BASE_ADDRESS_0, 0x0);
@@ -524,6 +529,32 @@ static const struct dw_pcie_ops dw_pcie_ops = {
.get_ltssm = rockchip_pcie_get_ltssm,
};
+static irqreturn_t rockchip_pcie_rc_sys_irq_thread(int irq, void *arg)
+{
+ struct rockchip_pcie *rockchip = arg;
+ struct dw_pcie *pci = &rockchip->pci;
+ struct dw_pcie_rp *pp = &pci->pp;
+ struct device *dev = pci->dev;
+ struct pci_dev *port;
+ u32 reg;
+
+ reg = rockchip_pcie_readl_apb(rockchip, PCIE_CLIENT_INTR_STATUS_MISC);
+ rockchip_pcie_writel_apb(rockchip, reg, PCIE_CLIENT_INTR_STATUS_MISC);
+
+ dev_dbg(dev, "PCIE_CLIENT_INTR_STATUS_MISC: %#x\n", reg);
+ dev_dbg(dev, "LTSSM_STATUS: %#x\n", rockchip_pcie_get_ltssm_reg(rockchip));
+
+ if (reg & PCIE_LINK_REQ_RST_NOT_INT) {
+ dev_dbg(dev, "hot reset or link-down reset\n");
+ for_each_pci_bridge(port, pp->bridge->bus) {
+ if (pci_pcie_type(port) == PCI_EXP_TYPE_ROOT_PORT)
+ pci_host_handle_link_down(port);
+ }
+ }
+
+ return IRQ_HANDLED;
+}
+
static irqreturn_t rockchip_pcie_ep_sys_irq_thread(int irq, void *arg)
{
struct rockchip_pcie *rockchip = arg;
@@ -556,14 +587,29 @@ static irqreturn_t rockchip_pcie_ep_sys_irq_thread(int irq, void *arg)
return IRQ_HANDLED;
}
-static int rockchip_pcie_configure_rc(struct rockchip_pcie *rockchip)
+static int rockchip_pcie_configure_rc(struct platform_device *pdev,
+ struct rockchip_pcie *rockchip)
{
+ struct device *dev = &pdev->dev;
struct dw_pcie_rp *pp;
+ int irq, ret;
u32 val;
if (!IS_ENABLED(CONFIG_PCIE_ROCKCHIP_DW_HOST))
return -ENODEV;
+ irq = platform_get_irq_byname(pdev, "sys");
+ if (irq < 0)
+ return irq;
+
+ ret = devm_request_threaded_irq(dev, irq, NULL,
+ rockchip_pcie_rc_sys_irq_thread,
+ IRQF_ONESHOT, "pcie-sys-rc", rockchip);
+ if (ret) {
+ dev_err(dev, "failed to request PCIe sys IRQ\n");
+ return ret;
+ }
+
/* LTSSM enable control mode */
val = FIELD_PREP_WM16(PCIE_LTSSM_ENABLE_ENHANCE, 1);
rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_HOT_RESET_CTRL);
@@ -575,7 +621,17 @@ static int rockchip_pcie_configure_rc(struct rockchip_pcie *rockchip)
pp = &rockchip->pci.pp;
pp->ops = &rockchip_pcie_host_ops;
- return dw_pcie_host_init(pp);
+ ret = dw_pcie_host_init(pp);
+ if (ret) {
+ dev_err(dev, "failed to initialize host\n");
+ return ret;
+ }
+
+ /* unmask hot reset/link-down reset */
+ val = FIELD_PREP_WM16(PCIE_LINK_REQ_RST_NOT_INT, 0);
+ rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC);
+
+ return ret;
}
static int rockchip_pcie_configure_ep(struct platform_device *pdev,
@@ -701,7 +757,7 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
switch (data->mode) {
case DW_PCIE_RC_TYPE:
- ret = rockchip_pcie_configure_rc(rockchip);
+ ret = rockchip_pcie_configure_rc(pdev, rockchip);
if (ret)
goto deinit_clk;
break;
@@ -729,6 +785,81 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
return ret;
}
+static int rockchip_pcie_rc_reset_root_port(struct pci_host_bridge *bridge,
+ struct pci_dev *pdev)
+{
+ struct pci_bus *bus = bridge->bus;
+ struct dw_pcie_rp *pp = bus->sysdata;
+ struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+ struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
+ struct device *dev = rockchip->pci.dev;
+ u32 val;
+ int ret;
+
+ dw_pcie_stop_link(pci);
+ clk_bulk_disable_unprepare(rockchip->clk_cnt, rockchip->clks);
+ rockchip_pcie_phy_deinit(rockchip);
+
+ ret = reset_control_assert(rockchip->rst);
+ if (ret)
+ return ret;
+
+ ret = rockchip_pcie_phy_init(rockchip);
+ if (ret)
+ goto disable_regulator;
+
+ ret = reset_control_deassert(rockchip->rst);
+ if (ret)
+ goto deinit_phy;
+
+ ret = rockchip_pcie_clk_init(rockchip);
+ if (ret)
+ goto deinit_phy;
+
+ ret = pp->ops->init(pp);
+ if (ret) {
+ dev_err(dev, "Host init failed: %d\n", ret);
+ goto deinit_clk;
+ }
+
+ /* LTSSM enable control mode */
+ val = FIELD_PREP_WM16(PCIE_LTSSM_ENABLE_ENHANCE, 1);
+ rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_HOT_RESET_CTRL);
+
+ rockchip_pcie_writel_apb(rockchip,
+ PCIE_CLIENT_SET_MODE(PCIE_CLIENT_MODE_RC),
+ PCIE_CLIENT_GENERAL_CON);
+
+ ret = dw_pcie_setup_rc(pp);
+ if (ret) {
+ dev_err(dev, "Failed to setup RC: %d\n", ret);
+ goto deinit_clk;
+ }
+
+ /* unmask hot reset/link-down reset */
+ val = FIELD_PREP_WM16(PCIE_LINK_REQ_RST_NOT_INT, 0);
+ rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC);
+
+ ret = dw_pcie_start_link(pci);
+ if (ret)
+ goto deinit_clk;
+
+ /* Ignore errors, the link may come up later */
+ dw_pcie_wait_for_link(pci);
+ dev_dbg(dev, "Root Port reset completed\n");
+ return ret;
+
+deinit_clk:
+ clk_bulk_disable_unprepare(rockchip->clk_cnt, rockchip->clks);
+deinit_phy:
+ rockchip_pcie_phy_deinit(rockchip);
+disable_regulator:
+ if (rockchip->vpcie3v3)
+ regulator_disable(rockchip->vpcie3v3);
+
+ return ret;
+}
+
static const struct rockchip_pcie_of_data rockchip_pcie_rc_of_data_rk3568 = {
.mode = DW_PCIE_RC_TYPE,
};
--
2.53.0
On Tue, Mar 17, 2026 at 12:17:19PM +0100, Niklas Cassel wrote:
>
> I will be very busy for a few weeks, so I don't have time to debug this.
> If anyone wants to debug this on rk3588, I'm attaching the patches for
> this new feature for rk3588 that can be applied on top of this series.
For what it is worth, attaching an improved patch for rk3588 that does not
require any revert.
Kind regards,
Niklas
From a69679430750d7371e65e1b209059803cea2f5de Mon Sep 17 00:00:00 2001
From: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Date: Tue, 17 Mar 2026 09:30:57 +0100
Subject: [PATCH] PCI: dw-rockchip: Add support to reset Root Port upon link
down event
The PCIe link may go down in cases like firmware crashes or unstable
connections. When this occurs, the PCIe Root Port must be reset to restore
the functionality. However, the current driver lacks link down handling,
forcing users to reboot the system to recover.
This patch implements the `reset_root_port` callback for link down handling
for Rockchip DWC PCIe host controller. In which, the RC is reset,
reconfigured and link training initiated to recover from the link down
event.
This also by extension fixes issues with sysfs initiated bus resets. In
that, currently, when a sysfs initiated bus reset is issued, the endpoint
device is non-functional after (may link up with downgraded link status).
With the link down handling support, a sysfs initiated bus reset works as
intended. Testing conducted on a ROCK5B board with an M.2 NVMe drive.
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
---
drivers/pci/controller/dwc/Kconfig | 1 +
drivers/pci/controller/dwc/pcie-dw-rockchip.c | 134 +++++++++++++++++-
2 files changed, 132 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig
index d0aa031397fa..ecaf79da843b 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -361,6 +361,7 @@ config PCIE_ROCKCHIP_DW_HOST
depends on OF
select PCIE_DW_HOST
select PCIE_ROCKCHIP_DW
+ select PCI_HOST_COMMON
help
Enables support for the DesignWare PCIe controller in the
Rockchip SoC (except RK3399) to work in host mode.
diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index 8db27199cfa6..988d98effcd7 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -24,6 +24,7 @@
#include <linux/reset.h>
#include "../../pci.h"
+#include "../pci-host-common.h"
#include "pcie-designware.h"
/*
@@ -105,6 +106,9 @@ struct rockchip_pcie_of_data {
const struct pci_epc_features *epc_features;
};
+static int rockchip_pcie_rc_reset_root_port(struct pci_host_bridge *bridge,
+ struct pci_dev *pdev);
+
static int rockchip_pcie_readl_apb(struct rockchip_pcie *rockchip, u32 reg)
{
return readl_relaxed(rockchip->apb_base + reg);
@@ -325,6 +329,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
rockchip_pcie_configure_l1ss(pci);
rockchip_pcie_enable_l0s(pci);
+ pp->bridge->reset_root_port = rockchip_pcie_rc_reset_root_port;
/* Disable Root Ports BAR0 and BAR1 as they report bogus size */
dw_pcie_writel_dbi2(pci, PCI_BASE_ADDRESS_0, 0x0);
@@ -523,6 +528,32 @@ static const struct dw_pcie_ops dw_pcie_ops = {
.get_ltssm = rockchip_pcie_get_ltssm,
};
+static irqreturn_t rockchip_pcie_rc_sys_irq_thread(int irq, void *arg)
+{
+ struct rockchip_pcie *rockchip = arg;
+ struct dw_pcie *pci = &rockchip->pci;
+ struct dw_pcie_rp *pp = &pci->pp;
+ struct device *dev = pci->dev;
+ struct pci_dev *port;
+ u32 reg;
+
+ reg = rockchip_pcie_readl_apb(rockchip, PCIE_CLIENT_INTR_STATUS_MISC);
+ rockchip_pcie_writel_apb(rockchip, reg, PCIE_CLIENT_INTR_STATUS_MISC);
+
+ dev_dbg(dev, "PCIE_CLIENT_INTR_STATUS_MISC: %#x\n", reg);
+ dev_dbg(dev, "LTSSM_STATUS: %#x\n", rockchip_pcie_get_ltssm_reg(rockchip));
+
+ if (reg & PCIE_LINK_REQ_RST_NOT_INT) {
+ dev_dbg(dev, "hot reset or link-down reset\n");
+ for_each_pci_bridge(port, pp->bridge->bus) {
+ if (pci_pcie_type(port) == PCI_EXP_TYPE_ROOT_PORT)
+ pci_host_handle_link_down(port);
+ }
+ }
+
+ return IRQ_HANDLED;
+}
+
static irqreturn_t rockchip_pcie_ep_sys_irq_thread(int irq, void *arg)
{
struct rockchip_pcie *rockchip = arg;
@@ -555,14 +586,29 @@ static irqreturn_t rockchip_pcie_ep_sys_irq_thread(int irq, void *arg)
return IRQ_HANDLED;
}
-static int rockchip_pcie_configure_rc(struct rockchip_pcie *rockchip)
+static int rockchip_pcie_configure_rc(struct platform_device *pdev,
+ struct rockchip_pcie *rockchip)
{
+ struct device *dev = &pdev->dev;
struct dw_pcie_rp *pp;
+ int irq, ret;
u32 val;
if (!IS_ENABLED(CONFIG_PCIE_ROCKCHIP_DW_HOST))
return -ENODEV;
+ irq = platform_get_irq_byname(pdev, "sys");
+ if (irq < 0)
+ return irq;
+
+ ret = devm_request_threaded_irq(dev, irq, NULL,
+ rockchip_pcie_rc_sys_irq_thread,
+ IRQF_ONESHOT, "pcie-sys-rc", rockchip);
+ if (ret) {
+ dev_err(dev, "failed to request PCIe sys IRQ\n");
+ return ret;
+ }
+
/* LTSSM enable control mode */
val = FIELD_PREP_WM16(PCIE_LTSSM_ENABLE_ENHANCE, 1);
rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_HOT_RESET_CTRL);
@@ -574,7 +620,17 @@ static int rockchip_pcie_configure_rc(struct rockchip_pcie *rockchip)
pp = &rockchip->pci.pp;
pp->ops = &rockchip_pcie_host_ops;
- return dw_pcie_host_init(pp);
+ ret = dw_pcie_host_init(pp);
+ if (ret) {
+ dev_err(dev, "failed to initialize host\n");
+ return ret;
+ }
+
+ /* unmask hot reset/link-down reset */
+ val = FIELD_PREP_WM16(PCIE_LINK_REQ_RST_NOT_INT, 0);
+ rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC);
+
+ return ret;
}
static int rockchip_pcie_configure_ep(struct platform_device *pdev,
@@ -693,7 +749,7 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
switch (data->mode) {
case DW_PCIE_RC_TYPE:
- ret = rockchip_pcie_configure_rc(rockchip);
+ ret = rockchip_pcie_configure_rc(pdev, rockchip);
if (ret)
goto deinit_clk;
break;
@@ -718,6 +774,78 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
return ret;
}
+static int rockchip_pcie_rc_reset_root_port(struct pci_host_bridge *bridge,
+ struct pci_dev *pdev)
+{
+ struct pci_bus *bus = bridge->bus;
+ struct dw_pcie_rp *pp = bus->sysdata;
+ struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+ struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
+ struct device *dev = rockchip->pci.dev;
+ u32 val;
+ int ret;
+
+ dw_pcie_stop_link(pci);
+ clk_bulk_disable_unprepare(rockchip->clk_cnt, rockchip->clks);
+ rockchip_pcie_phy_deinit(rockchip);
+
+ ret = reset_control_assert(rockchip->rst);
+ if (ret)
+ return ret;
+
+ ret = rockchip_pcie_phy_init(rockchip);
+ if (ret)
+ return ret;
+
+ ret = reset_control_deassert(rockchip->rst);
+ if (ret)
+ goto deinit_phy;
+
+ ret = rockchip_pcie_clk_init(rockchip);
+ if (ret)
+ goto deinit_phy;
+
+ ret = pp->ops->init(pp);
+ if (ret) {
+ dev_err(dev, "Host init failed: %d\n", ret);
+ goto deinit_clk;
+ }
+
+ /* LTSSM enable control mode */
+ val = FIELD_PREP_WM16(PCIE_LTSSM_ENABLE_ENHANCE, 1);
+ rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_HOT_RESET_CTRL);
+
+ rockchip_pcie_writel_apb(rockchip,
+ PCIE_CLIENT_SET_MODE(PCIE_CLIENT_MODE_RC),
+ PCIE_CLIENT_GENERAL_CON);
+
+ ret = dw_pcie_setup_rc(pp);
+ if (ret) {
+ dev_err(dev, "Failed to setup RC: %d\n", ret);
+ goto deinit_clk;
+ }
+
+ /* unmask hot reset/link-down reset */
+ val = FIELD_PREP_WM16(PCIE_LINK_REQ_RST_NOT_INT, 0);
+ rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC);
+
+ ret = dw_pcie_start_link(pci);
+ if (ret)
+ goto deinit_clk;
+
+ /* Ignore errors, the link may come up later */
+ dw_pcie_wait_for_link(pci);
+ dev_dbg(dev, "Root Port reset completed\n");
+ return ret;
+
+deinit_clk:
+ clk_bulk_disable_unprepare(rockchip->clk_cnt, rockchip->clks);
+deinit_phy:
+ rockchip_pcie_phy_deinit(rockchip);
+
+ return ret;
+}
+
static const struct rockchip_pcie_of_data rockchip_pcie_rc_of_data_rk3568 = {
.mode = DW_PCIE_RC_TYPE,
};
--
2.53.0
On 3/10/2026 7:31 PM, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
>
> Currently, in the event of AER/DPC, PCI core will try to reset the slot (Root
> Port) and its subordinate devices by invoking bridge control reset and FLR. But
> in some cases like AER Fatal error, it might be necessary to reset the Root
> Ports using the PCI host bridge drivers in a platform specific way (as indicated
> by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
> Otherwise, the PCI link won't be recovered successfully.
>
> So this series adds a new callback 'pci_host_bridge::reset_root_port' for the
> host bridge drivers to reset the Root Port when a fatal error happens.
>
> Also, this series allows the host bridge drivers to handle PCI link down event
> by resetting the Root Ports and recovering the bus. This is accomplished by the
> help of the new 'pci_host_handle_link_down()' API. Host bridge drivers are
> expected to call this API (preferrably from a threaded IRQ handler) with
> relevant Root Port 'pci_dev' when a link down event is detected for the port.
> The API will reuse the pcie_do_recovery() function to recover the link if AER
> support is enabled, otherwise it will directly call the reset_root_port()
> callback of the host bridge driver (if exists).
>
> For reference, I've modified the pcie-qcom driver to call
> pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
> LDn global_irq event and populated 'pci_host_bridge::reset_root_port()'
> callback to reset the Root Ports.
>
> Testing
> -------
>
> Tested on Qcom Lemans AU Ride platform with Host and EP SoCs connected over PCIe
> link. Simulated the LDn by disabling LTSSM_EN on the EP and I could verify that
> the link was getting recovered successfully.
>
> Changes in v7:
> - Dropped Rockchip Root port reset patch due to reported issues. But the series
> works on other platforms as tested by others.
> - Added pci_{lock/unlock}_rescan_remove() to guard pci_bus_error_reset() as the
> device could be removed in-between due to Native hotplug interrupt.
> - Rebased on top of v7.0-rc1
> - Link to v6: https://lore.kernel.org/r/20250715-pci-port-reset-v6-0-6f9cce94e7bb@oss.qualcomm.com
>
> Changes in v6:
> - Incorporated the patch: https://lore.kernel.org/all/20250524185304.26698-2-manivannan.sadhasivam@linaro.org/
> - Link to v5: https://lore.kernel.org/r/20250715-pci-port-reset-v5-0-26a5d278db40@oss.qualcomm.com
>
> Changes in v5:
> * Reworked the pci_host_handle_link_down() to accept Root Port instead of
> resetting all Root Ports in the event of link down.
> * Renamed 'reset_slot' to 'reset_root_port' to avoid confusion as both terms
> were used interchangibly and the series is intended to reset Root Port only.
> * Added the Rockchip driver change to this series.
> * Dropped the applied patches and review/tested tags due to rework.
> * Rebased on top of v6.16-rc1.
>
> Changes in v4:
> - Handled link down first in the irq handler
> - Updated ICC & OPP bandwidth after link up in reset_slot() callback
> - Link to v3: https://lore.kernel.org/r/20250417-pcie-reset-slot-v3-0-59a10811c962@linaro.org
>
> Changes in v3:
> - Made the pci-host-common driver as a common library for host controller
> drivers
> - Moved the reset slot code to pci-host-common library
> - Link to v2: https://lore.kernel.org/r/20250416-pcie-reset-slot-v2-0-efe76b278c10@linaro.org
>
> Changes in v2:
> - Moved calling reset_slot() callback from pcie_do_recovery() to pcibios_reset_secondary_bus()
> - Link to v1: https://lore.kernel.org/r/20250404-pcie-reset-slot-v1-0-98952918bf90@linaro.org
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
For entire series,
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
- Krishna Chaitanya.
> ---
> Manivannan Sadhasivam (4):
> PCI/ERR: Add support for resetting the Root Ports in a platform specific way
> PCI: host-common: Add link down handling for Root Ports
> PCI: qcom: Add support for resetting the Root Port due to link down event
> misc: pci_endpoint_test: Add AER error handlers
>
> drivers/misc/pci_endpoint_test.c | 20 +++++
> drivers/pci/controller/dwc/pcie-qcom.c | 143 ++++++++++++++++++++++++++++++-
> drivers/pci/controller/pci-host-common.c | 35 ++++++++
> drivers/pci/controller/pci-host-common.h | 1 +
> drivers/pci/pci.c | 21 +++++
> drivers/pci/pcie/err.c | 6 +-
> include/linux/pci.h | 1 +
> 7 files changed, 221 insertions(+), 6 deletions(-)
> ---
> base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
> change-id: 20250715-pci-port-reset-4d9519570123
>
> Best regards,
© 2016 - 2026 Red Hat, Inc.