After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi
N10 through trial-and-error debugging, I finally got positive results
with enumeration on the PCI bus for both a Realtek 8111E NIC and a
Samsung PM981a SSD.
The NIC was connected to a M.2->PCIe x4 riser card and it would get
stuck on Polling.Compliance, without breaking electrical idle on the
Host RX side. The Samsung PM981a SSD is directly connected to M.2
connector and that SSD is known to be quirky (OEM... no support)
and non-functional on the RK3399 platform.
The Samsung SSD was even worse than the NIC - it would get stuck on
Detect.Active like a bricked card, even though it was fully functional
via USB adapter.
It seems both devices benefit from retrying Link Training if - big if
here - PERST# is not toggled during retry.
For retry to work, flow must be exactly as handled by present patch,
that is, we must cut power, disable the clocks, then re-enable
both clocks and power regulators and go through initialization
without touching PERST#. Then quirky devices are able to sucessfully
enumerate.
No functional change intended for already working devices.
Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com>
---
drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++---
1 file changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
index 2a1071cd3241..67b3b379d277 100644
--- a/drivers/pci/controller/pcie-rockchip-host.c
+++ b/drivers/pci/controller/pcie-rockchip-host.c
@@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip)
static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip)
{
struct device *dev = rockchip->dev;
- int err, i = MAX_LANE_NUM;
+ int err, i = MAX_LANE_NUM, is_reinit = 0;
u32 status;
- gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
+ if (!is_reinit) {
+ gpiod_set_value_cansleep(rockchip->perst_gpio, 0);
+ }
+reinit:
err = rockchip_pcie_init_port(rockchip);
if (err)
return err;
@@ -369,16 +372,46 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip)
rockchip_pcie_write(rockchip, PCIE_CLIENT_LINK_TRAIN_ENABLE,
PCIE_CLIENT_CONFIG);
- msleep(PCIE_T_PVPERL_MS);
- gpiod_set_value_cansleep(rockchip->perst_gpio, 1);
-
- msleep(PCIE_T_RRS_READY_MS);
+ if (!is_reinit) {
+ msleep(PCIE_T_PVPERL_MS);
+ gpiod_set_value_cansleep(rockchip->perst_gpio, 1);
+ msleep(PCIE_T_RRS_READY_MS);
+ }
/* 500ms timeout value should be enough for Gen1/2 training */
err = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1,
status, PCIE_LINK_UP(status), 20,
500 * USEC_PER_MSEC);
- if (err) {
+
+ if (err && !is_reinit) {
+ while (i--)
+ phy_power_off(rockchip->phys[i]);
+ i = MAX_LANE_NUM;
+ while (i--)
+ phy_exit(rockchip->phys[i]);
+ i = MAX_LANE_NUM;
+ is_reinit = 1;
+ dev_dbg(dev, "Will reinit PCIe without toggling PERST#");
+ if (!IS_ERR(rockchip->vpcie12v))
+ regulator_disable(rockchip->vpcie12v);
+ if (!IS_ERR(rockchip->vpcie3v3))
+ regulator_disable(rockchip->vpcie3v3);
+ regulator_disable(rockchip->vpcie1v8);
+ regulator_disable(rockchip->vpcie0v9);
+ rockchip_pcie_disable_clocks(rockchip);
+ err = rockchip_pcie_enable_clocks(rockchip);
+ if (err)
+ return err;
+ err = rockchip_pcie_set_vpcie(rockchip);
+ if (err) {
+ dev_err(dev, "failed to set vpcie regulator\n");
+ rockchip_pcie_disable_clocks(rockchip);
+ return err;
+ }
+ goto reinit;
+ }
+
+ else if (err) {
dev_err(dev, "PCIe link training gen1 timeout!\n");
goto err_power_off_phy;
}
--
2.49.0
Hi Geraldo, 在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道: > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > N10 through trial-and-error debugging, I finally got positive results > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > Samsung PM981a SSD. > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > stuck on Polling.Compliance, without breaking electrical idle on the > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > connector and that SSD is known to be quirky (OEM... no support) > and non-functional on the RK3399 platform. > > The Samsung SSD was even worse than the NIC - it would get stuck on > Detect.Active like a bricked card, even though it was fully functional > via USB adapter. > > It seems both devices benefit from retrying Link Training if - big if > here - PERST# is not toggled during retry. > I didn't see this error before especially given RTL8111 NIC is widelly used by customers. Could you help tried this? [1] apply your patch 3 first [2] apply below changes --- a/drivers/pci/controller/pcie-rockchip-host.c +++ b/drivers/pci/controller/pcie-rockchip-host.c @@ -314,7 +314,7 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) rockchip_pcie_write(rockchip, PCIE_CLIENT_LINK_TRAIN_ENABLE, PCIE_CLIENT_CONFIG); - msleep(PCIE_T_PVPERL_MS); + msleep(500); gpiod_set_value_cansleep(rockchip->perst_gpio, 1); msleep(PCIE_RESET_CONFIG_WAIT_MS); @@ -322,7 +322,7 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) /* 500ms timeout value should be enough for Gen1/2 training */ err = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1, status, PCIE_LINK_UP(status), 20, - 500 * USEC_PER_MSEC); + 5000 * USEC_PER_MSEC); if (err) { dev_err(dev, "PCIe link training gen1 timeout!\n"); goto err_power_off_phy; @@ -951,6 +951,8 @@ static int rockchip_pcie_probe(struct platform_device *pdev) if (err) return err; + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); + err = rockchip_pcie_set_vpcie(rockchip); if (err) { dev_err(dev, "failed to set vpcie regulator\n"); > For retry to work, flow must be exactly as handled by present patch, > that is, we must cut power, disable the clocks, then re-enable > both clocks and power regulators and go through initialization > without touching PERST#. Then quirky devices are able to sucessfully > enumerate. > > No functional change intended for already working devices. > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > --- > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > 1 file changed, 40 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > index 2a1071cd3241..67b3b379d277 100644 > --- a/drivers/pci/controller/pcie-rockchip-host.c > +++ b/drivers/pci/controller/pcie-rockchip-host.c > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > { > struct device *dev = rockchip->dev; > - int err, i = MAX_LANE_NUM; > + int err, i = MAX_LANE_NUM, is_reinit = 0; > u32 status; > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + if (!is_reinit) { > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + } > > +reinit: > err = rockchip_pcie_init_port(rockchip); > if (err) > return err; > @@ -369,16 +372,46 @@ static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > rockchip_pcie_write(rockchip, PCIE_CLIENT_LINK_TRAIN_ENABLE, > PCIE_CLIENT_CONFIG); > > - msleep(PCIE_T_PVPERL_MS); > - gpiod_set_value_cansleep(rockchip->perst_gpio, 1); > - > - msleep(PCIE_T_RRS_READY_MS); > + if (!is_reinit) { > + msleep(PCIE_T_PVPERL_MS); > + gpiod_set_value_cansleep(rockchip->perst_gpio, 1); > + msleep(PCIE_T_RRS_READY_MS); > + } > > /* 500ms timeout value should be enough for Gen1/2 training */ > err = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1, > status, PCIE_LINK_UP(status), 20, > 500 * USEC_PER_MSEC); > - if (err) { > + > + if (err && !is_reinit) { > + while (i--) > + phy_power_off(rockchip->phys[i]); > + i = MAX_LANE_NUM; > + while (i--) > + phy_exit(rockchip->phys[i]); > + i = MAX_LANE_NUM; > + is_reinit = 1; > + dev_dbg(dev, "Will reinit PCIe without toggling PERST#"); > + if (!IS_ERR(rockchip->vpcie12v)) > + regulator_disable(rockchip->vpcie12v); > + if (!IS_ERR(rockchip->vpcie3v3)) > + regulator_disable(rockchip->vpcie3v3); > + regulator_disable(rockchip->vpcie1v8); > + regulator_disable(rockchip->vpcie0v9); > + rockchip_pcie_disable_clocks(rockchip); > + err = rockchip_pcie_enable_clocks(rockchip); > + if (err) > + return err; > + err = rockchip_pcie_set_vpcie(rockchip); > + if (err) { > + dev_err(dev, "failed to set vpcie regulator\n"); > + rockchip_pcie_disable_clocks(rockchip); > + return err; > + } > + goto reinit; > + } > + > + else if (err) { > dev_err(dev, "PCIe link training gen1 timeout!\n"); > goto err_power_off_phy; > }
On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote: > Hi Geraldo, > > 在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道: > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > > N10 through trial-and-error debugging, I finally got positive results > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > > Samsung PM981a SSD. > > > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > > stuck on Polling.Compliance, without breaking electrical idle on the > > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > > connector and that SSD is known to be quirky (OEM... no support) > > and non-functional on the RK3399 platform. > > > > The Samsung SSD was even worse than the NIC - it would get stuck on > > Detect.Active like a bricked card, even though it was fully functional > > via USB adapter. > > > > It seems both devices benefit from retrying Link Training if - big if > > here - PERST# is not toggled during retry. > > > > I didn't see this error before especially given RTL8111 NIC is widelly > used by customers. Hi Shawn, great to hear from you! Notice that my board exposes PCIe only via NVMe connector, and not directly via a proper PCIe connector, so it is necessary for me to adapt with inexpensive riser card that exposes proper PCIe connector. I say this because while I don't doubt that the RTL8111 NIC works out-of-the-box for boards that directly expose PCIe connector, the combination of riser card plus NIC has a similar effect - though not entirely equal, as described above - of connecting known good SSDs that simply refuse to work with Rockchip-IP PCIe. I admit that patch 1 looks a little crazy, but is has the effect of enabling use of presently non-working devices or combination of devices on this IP, at least on the board I have access to. > > Could you help tried this? > [1] apply your patch 3 first Sure, I'm always open for testing, but could you clarify the patch 3 part? AFAIK this series of mine only has 2 patches, so I'm a little confused about exactly which patch to apply as a preliminary step. Also, since you're asking me to test some code, I think it is only fair if I ask you to test my code, too. It shouldn't be too hard for you to find a otherwise working NVMe SSD that refuses to complete link training with current code. Connect this SSD please to a RK3399 board and let us know if my proposed code change does anything to ameliorate the long-standing issue of SSD that refuses to cooperate. Thank you, Geraldo Nascimento
在 2025/07/18 星期五 11:33, Geraldo Nascimento 写道: > On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote: >> Hi Geraldo, >> >> 在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道: >>> After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi >>> N10 through trial-and-error debugging, I finally got positive results >>> with enumeration on the PCI bus for both a Realtek 8111E NIC and a >>> Samsung PM981a SSD. >>> >>> The NIC was connected to a M.2->PCIe x4 riser card and it would get >>> stuck on Polling.Compliance, without breaking electrical idle on the >>> Host RX side. The Samsung PM981a SSD is directly connected to M.2 >>> connector and that SSD is known to be quirky (OEM... no support) >>> and non-functional on the RK3399 platform. >>> >>> The Samsung SSD was even worse than the NIC - it would get stuck on >>> Detect.Active like a bricked card, even though it was fully functional >>> via USB adapter. >>> >>> It seems both devices benefit from retrying Link Training if - big if >>> here - PERST# is not toggled during retry. >>> >> >> I didn't see this error before especially given RTL8111 NIC is widelly >> used by customers. > > Hi Shawn, great to hear from you! > > Notice that my board exposes PCIe only via NVMe connector, and not > directly via a proper PCIe connector, so it is necessary for me to > adapt with inexpensive riser card that exposes proper PCIe connector. > > I say this because while I don't doubt that the RTL8111 NIC works > out-of-the-box for boards that directly expose PCIe connector, the > combination of riser card plus NIC has a similar effect - though not > entirely equal, as described above - of connecting known good SSDs > that simply refuse to work with Rockchip-IP PCIe. > > I admit that patch 1 looks a little crazy, but is has the effect of > enabling use of presently non-working devices or combination of devices > on this IP, at least on the board I have access to. > >> >> Could you help tried this? >> [1] apply your patch 3 first > > Sure, I'm always open for testing, but could you clarify the patch 3 > part? AFAIK this series of mine only has 2 patches, so I'm a little > confused about exactly which patch to apply as a preliminary step. Patch 3 refers to "arm64: dts: rockchip: drop PCIe 3v3 always-on and boot-on" which let kernel fully controller the power in case firmware did it in advanced. > > Also, since you're asking me to test some code, I think it is only fair > if I ask you to test my code, too. It shouldn't be too hard for you to > find a otherwise working NVMe SSD that refuses to complete link training > with current code. Connect this SSD please to a RK3399 board and let us > know if my proposed code change does anything to ameliorate the > long-standing issue of SSD that refuses to cooperate. Sure, I don't have Samsung PM981a SSD now, but I could try to test all my SSDs to find if I could pick up one that won't work. > > Thank you, > Geraldo Nascimento >
On Fri, Jul 18, 2025 at 11:46:33AM +0800, Shawn Lin wrote: > 在 2025/07/18 星期五 11:33, Geraldo Nascimento 写道: > > > > Also, since you're asking me to test some code, I think it is only fair > > if I ask you to test my code, too. It shouldn't be too hard for you to > > find a otherwise working NVMe SSD that refuses to complete link training > > with current code. Connect this SSD please to a RK3399 board and let us > > know if my proposed code change does anything to ameliorate the > > long-standing issue of SSD that refuses to cooperate. > > Sure, I don't have Samsung PM981a SSD now, but I could try to test all > my SSDs to find if I could pick up one that won't work. > Hi Shawn, Haven't heard back from you so I assume you tested with SSD that should work but does not and that the test failed? Thanks, Geraldo Nascimento
On Fri, Jul 18, 2025 at 11:46:33AM +0800, Shawn Lin wrote: > 在 2025/07/18 星期五 11:33, Geraldo Nascimento 写道: > > On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote: > >> Could you help tried this? > >> [1] apply your patch 3 first > > > > Sure, I'm always open for testing, but could you clarify the patch 3 > > part? AFAIK this series of mine only has 2 patches, so I'm a little > > confused about exactly which patch to apply as a preliminary step. > > Patch 3 refers to "arm64: dts: rockchip: drop PCIe 3v3 always-on and > boot-on" which let kernel fully controller the power in case firmware > did it in advanced. Hi Shawn, I tested your patch but unfortunately it does not work, PM981a SSD "plays dead" and 2.5 GT/s training never completes, even with the bigger timeout. I hope you get the chance to test my patch soon, because once you share your results there could be two possible scenarios: 1) Patch does not alleviate problem for you: If this is the case, then there's little I can do further and this becomes a wild goose chase, so no chance of upstreaming anything and I'll just move on to more useful work and leave everybody else to do their useful work too. 2) Patch works and previously non-working SSD is now working: In this case there's something serious going on and it is our mission to find a way to correctly upstream a fix. Thanks, Geraldo Nascimento
On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > N10 through trial-and-error debugging, I finally got positive results > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > Samsung PM981a SSD. > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > stuck on Polling.Compliance, without breaking electrical idle on the > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > connector and that SSD is known to be quirky (OEM... no support) > and non-functional on the RK3399 platform. > > The Samsung SSD was even worse than the NIC - it would get stuck on > Detect.Active like a bricked card, even though it was fully functional > via USB adapter. > > It seems both devices benefit from retrying Link Training if - big if > here - PERST# is not toggled during retry. > > For retry to work, flow must be exactly as handled by present patch, > that is, we must cut power, disable the clocks, then re-enable > both clocks and power regulators and go through initialization > without touching PERST#. Then quirky devices are able to sucessfully > enumerate. > This sounds weird. PERST# is just an indication to the device that the power and refclk are applied or going to be removed. The devices uses PERST# to prepare for the power removal during assert and start functioning after deassert. It looks like the PERST# polarity is inverted in your case. Could you please change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue without this patch? If that didn't work, could you please drop the 'ep-gpios' property and check? > No functional change intended for already working devices. > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > --- > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > 1 file changed, 40 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > index 2a1071cd3241..67b3b379d277 100644 > --- a/drivers/pci/controller/pcie-rockchip-host.c > +++ b/drivers/pci/controller/pcie-rockchip-host.c > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > { > struct device *dev = rockchip->dev; > - int err, i = MAX_LANE_NUM; > + int err, i = MAX_LANE_NUM, is_reinit = 0; > u32 status; > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + if (!is_reinit) { > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > + } > > +reinit: So this reinit part only skips the PERST# assert, but calls rockchip_pcie_init_port() which resets the Root Port including PHY. I don't think it is safe to do it if PERST# is wired. - Mani -- மணிவண்ணன் சதாசிவம்
On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote: > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > > N10 through trial-and-error debugging, I finally got positive results > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > > Samsung PM981a SSD. > > > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > > stuck on Polling.Compliance, without breaking electrical idle on the > > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > > connector and that SSD is known to be quirky (OEM... no support) > > and non-functional on the RK3399 platform. > > > > The Samsung SSD was even worse than the NIC - it would get stuck on > > Detect.Active like a bricked card, even though it was fully functional > > via USB adapter. > > > > It seems both devices benefit from retrying Link Training if - big if > > here - PERST# is not toggled during retry. > > > > For retry to work, flow must be exactly as handled by present patch, > > that is, we must cut power, disable the clocks, then re-enable > > both clocks and power regulators and go through initialization > > without touching PERST#. Then quirky devices are able to sucessfully > > enumerate. > > > > This sounds weird. PERST# is just an indication to the device that the power and > refclk are applied or going to be removed. The devices uses PERST# to prepare > for the power removal during assert and start functioning after deassert. Hi Mani! Thank you for looking into this. Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the hopes someone with access to PCIe Analyzer could have deeper look at what the heck is going on here - because it does work, but I don't claim to understand how. > > It looks like the PERST# polarity is inverted in your case. Could you please > change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue > without this patch? > > If that didn't work, could you please drop the 'ep-gpios' property and check? Sorry to decline your request, but I assure you I have tried many other combinations before reaching present patch, including your suggestion. It will do nothing. It won't work, won't make SSD that refuse to work with RK3399, working. Note that this isn't specific to my board - RK3399 is infamous for being picky about devices. > > > No functional change intended for already working devices. > > > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > > --- > > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > > 1 file changed, 40 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > > index 2a1071cd3241..67b3b379d277 100644 > > --- a/drivers/pci/controller/pcie-rockchip-host.c > > +++ b/drivers/pci/controller/pcie-rockchip-host.c > > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > > { > > struct device *dev = rockchip->dev; > > - int err, i = MAX_LANE_NUM; > > + int err, i = MAX_LANE_NUM, is_reinit = 0; > > u32 status; > > > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > + if (!is_reinit) { > > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > + } > > > > +reinit: > > So this reinit part only skips the PERST# assert, but calls > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't > think it is safe to do it if PERST# is wired. I don't understand, could you be a bit more verbose on why do you think this is dangerous? Thanks, Geraldo Nascimento > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
On Mon, Jun 23, 2025 at 08:44:49AM GMT, Geraldo Nascimento wrote: > On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote: > > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > > > After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi > > > N10 through trial-and-error debugging, I finally got positive results > > > with enumeration on the PCI bus for both a Realtek 8111E NIC and a > > > Samsung PM981a SSD. > > > > > > The NIC was connected to a M.2->PCIe x4 riser card and it would get > > > stuck on Polling.Compliance, without breaking electrical idle on the > > > Host RX side. The Samsung PM981a SSD is directly connected to M.2 > > > connector and that SSD is known to be quirky (OEM... no support) > > > and non-functional on the RK3399 platform. > > > > > > The Samsung SSD was even worse than the NIC - it would get stuck on > > > Detect.Active like a bricked card, even though it was fully functional > > > via USB adapter. > > > > > > It seems both devices benefit from retrying Link Training if - big if > > > here - PERST# is not toggled during retry. > > > > > > For retry to work, flow must be exactly as handled by present patch, > > > that is, we must cut power, disable the clocks, then re-enable > > > both clocks and power regulators and go through initialization > > > without touching PERST#. Then quirky devices are able to sucessfully > > > enumerate. > > > > > > > This sounds weird. PERST# is just an indication to the device that the power and > > refclk are applied or going to be removed. The devices uses PERST# to prepare > > for the power removal during assert and start functioning after deassert. > > Hi Mani! Thank you for looking into this. > > Yeah, tell me about it, it is beyond weird. I posted RFC Patch in the > hopes someone with access to PCIe Analyzer could have deeper look > at what the heck is going on here - because it does work, but I don't > claim to understand how. > I was hoping that the Rockchip folks would chime in, but no reply from them so far. @Shawn: Could you please shed some light here? > > > > It looks like the PERST# polarity is inverted in your case. Could you please > > change the 'ep-gpios' polarity to GPIO_ACTIVE_LOW and see if it fixes the issue > > without this patch? > > > > If that didn't work, could you please drop the 'ep-gpios' property and check? > > Sorry to decline your request, but I assure you I have tried many > other combinations before reaching present patch, including your > suggestion. It will do nothing. It won't work, won't make SSD that > refuse to work with RK3399, working. Note that this isn't specific > to my board - RK3399 is infamous for being picky about devices. > > > > > > No functional change intended for already working devices. > > > > > > Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com> > > > --- > > > drivers/pci/controller/pcie-rockchip-host.c | 47 ++++++++++++++++++--- > > > 1 file changed, 40 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c > > > index 2a1071cd3241..67b3b379d277 100644 > > > --- a/drivers/pci/controller/pcie-rockchip-host.c > > > +++ b/drivers/pci/controller/pcie-rockchip-host.c > > > @@ -338,11 +338,14 @@ static int rockchip_pcie_set_vpcie(struct rockchip_pcie *rockchip) > > > static int rockchip_pcie_host_init_port(struct rockchip_pcie *rockchip) > > > { > > > struct device *dev = rockchip->dev; > > > - int err, i = MAX_LANE_NUM; > > > + int err, i = MAX_LANE_NUM, is_reinit = 0; > > > u32 status; > > > > > > - gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > > + if (!is_reinit) { > > > + gpiod_set_value_cansleep(rockchip->perst_gpio, 0); > > > + } > > > > > > +reinit: > > > > So this reinit part only skips the PERST# assert, but calls > > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't > > think it is safe to do it if PERST# is wired. > > I don't understand, could you be a bit more verbose on why do you > think this is dangerous? > When the Root Port and PHY gets reset, there is a good chance that the refclk would also be cutoff. So if that happens without PERST# assert, then the device has no chance to clean its state machine. If the device gets its own refclk, then it is a different story, but we should not make assumptions. - Mani -- மணிவண்ணன் சதாசிவம்
On Thu, Jul 17, 2025 at 05:59:32PM +0530, Manivannan Sadhasivam wrote: > On Mon, Jun 23, 2025 at 08:44:49AM GMT, Geraldo Nascimento wrote: > > On Mon, Jun 23, 2025 at 05:29:46AM -0600, Manivannan Sadhasivam wrote: > > > On Tue, Jun 10, 2025 at 04:05:40PM -0300, Geraldo Nascimento wrote: > > > > +reinit: > > > > > > So this reinit part only skips the PERST# assert, but calls > > > rockchip_pcie_init_port() which resets the Root Port including PHY. I don't > > > think it is safe to do it if PERST# is wired. > > > > I don't understand, could you be a bit more verbose on why do you > > think this is dangerous? > > > > When the Root Port and PHY gets reset, there is a good chance that the refclk > would also be cutoff. So if that happens without PERST# assert, then the device > has no chance to clean its state machine. If the device gets its own refclk, > then it is a different story, but we should not make assumptions. Hi Mani, thank you for your time spent looking into this! I'm not sure if the following information helps, but patch 2 of this series disables the PCIe 3.3V always-on/boot-on through DT. That was not incidental, and in fact it is required for patch 1 to work. Then, if you follow the proposed code change, you will see that power is effectively cut via disabling the power regulators, even before disabling the clocks. So there's effectively zero chance of corrupting the endpoint device state machine, since the device is power-cycled. While I understand we should not make assumptions on kernel work, and that the patch is unmergeable on its current form (it's a goddamn hack), it does empirically alleviate a very real report, that of known-good working devices refusing to cooperate with Rockchip-IP PCIe. I agree we should wait on Shawn Lin's feedback. Thank you, Geraldo Nascimento > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
© 2016 - 2025 Red Hat, Inc.