The mvebu-comphy driver does not currently know how to pass correct
lane-count to ATF while configuring the serdes lanes.
This causes the system to hard reset during reconfiguration, if a pci
card is present and has established a link during bootloader.
Remove the comphy handles from the respective pci nodes to avoid runtime
reconfiguration, relying solely on bootloader configuration - while
avoiding the hard reset.
When bootloader has configured the lanes correctly, the pci ports are
functional under Linux.
This issue may be addressed in the comphy driver at a future point.
Fixes: e9ff907f4076 ("arm64: dts: add description for solidrun cn9132 cex7 module and clearfog board")
Cc: <stable@vger.kernel.org>
Signed-off-by: Josua Mayer <josua@solid-run.com>
---
arch/arm64/boot/dts/marvell/cn9132-clearfog.dts | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts b/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts
index 115c55d73786e2b9265e1caa4c62ee26f498fb41..6f237d3542b9102695f8a48457f43340da994a2c 100644
--- a/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts
+++ b/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts
@@ -413,7 +413,13 @@ fixed-link {
/* SRDS #0,#1,#2,#3 - PCIe */
&cp0_pcie0 {
num-lanes = <4>;
- phys = <&cp0_comphy0 0>, <&cp0_comphy1 0>, <&cp0_comphy2 0>, <&cp0_comphy3 0>;
+ /*
+ * The mvebu-comphy driver does not currently know how to pass correct
+ * lane-count to ATF while configuring the serdes lanes.
+ * Rely on bootloader configuration only.
+ *
+ * phys = <&cp0_comphy0 0>, <&cp0_comphy1 0>, <&cp0_comphy2 0>, <&cp0_comphy3 0>;
+ */
status = "okay";
};
@@ -475,7 +481,13 @@ &cp1_eth0 {
/* SRDS #0,#1 - PCIe */
&cp1_pcie0 {
num-lanes = <2>;
- phys = <&cp1_comphy0 0>, <&cp1_comphy1 0>;
+ /*
+ * The mvebu-comphy driver does not currently know how to pass correct
+ * lane-count to ATF while configuring the serdes lanes.
+ * Rely on bootloader configuration only.
+ *
+ * phys = <&cp1_comphy0 0>, <&cp1_comphy1 0>;
+ */
status = "okay";
};
--
2.51.0
Josua Mayer <josua@solid-run.com> writes:
> The mvebu-comphy driver does not currently know how to pass correct
> lane-count to ATF while configuring the serdes lanes.
>
> This causes the system to hard reset during reconfiguration, if a pci
> card is present and has established a link during bootloader.
>
> Remove the comphy handles from the respective pci nodes to avoid runtime
> reconfiguration, relying solely on bootloader configuration - while
> avoiding the hard reset.
>
> When bootloader has configured the lanes correctly, the pci ports are
> functional under Linux.
>
> This issue may be addressed in the comphy driver at a future point.
>
> Fixes: e9ff907f4076 ("arm64: dts: add description for solidrun cn9132 cex7 module and clearfog board")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Josua Mayer <josua@solid-run.com>
Applied on mvebu/fixes
Thanks,
Gregory
> ---
> arch/arm64/boot/dts/marvell/cn9132-clearfog.dts | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts b/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts
> index 115c55d73786e2b9265e1caa4c62ee26f498fb41..6f237d3542b9102695f8a48457f43340da994a2c 100644
> --- a/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts
> +++ b/arch/arm64/boot/dts/marvell/cn9132-clearfog.dts
> @@ -413,7 +413,13 @@ fixed-link {
> /* SRDS #0,#1,#2,#3 - PCIe */
> &cp0_pcie0 {
> num-lanes = <4>;
> - phys = <&cp0_comphy0 0>, <&cp0_comphy1 0>, <&cp0_comphy2 0>, <&cp0_comphy3 0>;
> + /*
> + * The mvebu-comphy driver does not currently know how to pass correct
> + * lane-count to ATF while configuring the serdes lanes.
> + * Rely on bootloader configuration only.
> + *
> + * phys = <&cp0_comphy0 0>, <&cp0_comphy1 0>, <&cp0_comphy2 0>, <&cp0_comphy3 0>;
> + */
> status = "okay";
> };
>
> @@ -475,7 +481,13 @@ &cp1_eth0 {
> /* SRDS #0,#1 - PCIe */
> &cp1_pcie0 {
> num-lanes = <2>;
> - phys = <&cp1_comphy0 0>, <&cp1_comphy1 0>;
> + /*
> + * The mvebu-comphy driver does not currently know how to pass correct
> + * lane-count to ATF while configuring the serdes lanes.
> + * Rely on bootloader configuration only.
> + *
> + * phys = <&cp1_comphy0 0>, <&cp1_comphy1 0>;
> + */
> status = "okay";
> };
>
>
> --
> 2.51.0
>
>
--
Grégory CLEMENT, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
On Thu, Sep 11, 2025 at 08:28:06PM +0200, Josua Mayer wrote: > The mvebu-comphy driver does not currently know how to pass correct > lane-count to ATF while configuring the serdes lanes. > > This causes the system to hard reset during reconfiguration, if a pci > card is present and has established a link during bootloader. > > Remove the comphy handles from the respective pci nodes to avoid runtime > reconfiguration, relying solely on bootloader configuration - while > avoiding the hard reset. > > When bootloader has configured the lanes correctly, the pci ports are > functional under Linux. Does this require a specific bootloader? Can i use mainline grub or bareboot? Andrew
Am 12.09.25 um 00:12 schrieb Andrew Lunn: > On Thu, Sep 11, 2025 at 08:28:06PM +0200, Josua Mayer wrote: >> The mvebu-comphy driver does not currently know how to pass correct >> lane-count to ATF while configuring the serdes lanes. >> >> This causes the system to hard reset during reconfiguration, if a pci >> card is present and has established a link during bootloader. >> >> Remove the comphy handles from the respective pci nodes to avoid runtime >> reconfiguration, relying solely on bootloader configuration - while >> avoiding the hard reset. >> >> When bootloader has configured the lanes correctly, the pci ports are >> functional under Linux. > Does this require a specific bootloader? Can i use mainline grub or > bareboot? In this case it means U-Boot, i.e. before one would start grub. I am never quite sure if in this situation I should say "firmware" instead ...
On Thu, Sep 18, 2025 at 10:46:03AM +0000, Josua Mayer wrote: > Am 12.09.25 um 00:12 schrieb Andrew Lunn: > > On Thu, Sep 11, 2025 at 08:28:06PM +0200, Josua Mayer wrote: > >> The mvebu-comphy driver does not currently know how to pass correct > >> lane-count to ATF while configuring the serdes lanes. > >> > >> This causes the system to hard reset during reconfiguration, if a pci > >> card is present and has established a link during bootloader. > >> > >> Remove the comphy handles from the respective pci nodes to avoid runtime > >> reconfiguration, relying solely on bootloader configuration - while > >> avoiding the hard reset. > >> > >> When bootloader has configured the lanes correctly, the pci ports are > >> functional under Linux. > > Does this require a specific bootloader? Can i use mainline grub or > > bareboot? > > In this case it means U-Boot, i.e. before one would start grub. > > I am never quite sure if in this situation I should say "firmware" instead ... What you failed to answer is my question about 'mainline'? Do i need a specific vendor u-boot, or can i just use mainline u-boot, or mainline bareboot. I personally like to replace the bootloader, because the one shipped with the board often has useful features disabled, or is old. If i do that, will the board work? I would much prefer the kernel makes no assumptions about the bootloader. You said: > The mvebu-comphy driver does not currently know how to pass correct > lane-count to ATF while configuring the serdes lanes. Why not just teach mvebu-comphy to pass the correct line-count? That sounds like the proper fix, and that makes the kernel independent of the bootloader. Andrew
Am 18.09.25 um 16:09 schrieb Andrew Lunn: > On Thu, Sep 18, 2025 at 10:46:03AM +0000, Josua Mayer wrote: >> Am 12.09.25 um 00:12 schrieb Andrew Lunn: >>> On Thu, Sep 11, 2025 at 08:28:06PM +0200, Josua Mayer wrote: >>>> The mvebu-comphy driver does not currently know how to pass correct >>>> lane-count to ATF while configuring the serdes lanes. >>>> >>>> This causes the system to hard reset during reconfiguration, if a pci >>>> card is present and has established a link during bootloader. >>>> >>>> Remove the comphy handles from the respective pci nodes to avoid runtime >>>> reconfiguration, relying solely on bootloader configuration - while >>>> avoiding the hard reset. >>>> >>>> When bootloader has configured the lanes correctly, the pci ports are >>>> functional under Linux. >>> Does this require a specific bootloader? Can i use mainline grub or >>> bareboot? >> In this case it means U-Boot, i.e. before one would start grub. >> >> I am never quite sure if in this situation I should say "firmware" instead ... > What you failed to answer is my question about 'mainline'? Do i need a > specific vendor u-boot, or can i just use mainline u-boot, or mainline > bareboot. Ah. There is no mainline u-boot for these boards (yet). I submitted v1 on u-boot ml a while back but didn't have time to rework it. U-Boot has a different comphy driver that appears to configure the lanes correctly. > > I personally like to replace the bootloader, because the one shipped > with the board often has useful features disabled, or is old. If i do > that, will the board work? Conversationally if the bootloader configured the board correctly, then it also correctly configured all pci lanes considering u-boot handles link-up itself on mvebu. > I would much prefer the kernel makes no > assumptions about the bootloader. Same. > You said: > >> The mvebu-comphy driver does not currently know how to pass correct >> lane-count to ATF while configuring the serdes lanes. > Why not just teach mvebu-comphy to pass the correct line-count? That > sounds like the proper fix, and that makes the kernel independent of > the bootloader. That would be a feature on the comphy driver, not a bug-fix backported to stable. The core goal was to fix bugs found in Debian 13. The Linux comphy driver should be updated going forward. I did not due it (yet) because tracing the bitmask of atf smc call is very tedious due to layers of #define referencing each other.
> >> The mvebu-comphy driver does not currently know how to pass correct
> >> lane-count to ATF while configuring the serdes lanes.
> > Why not just teach mvebu-comphy to pass the correct line-count? That
> > sounds like the proper fix, and that makes the kernel independent of
> > the bootloader.
> That would be a feature on the comphy driver, not a bug-fix backported
> to stable. The core goal was to fix bugs found in Debian 13.
It is not so simple.
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
It must either fix a real bug that bothers people or just add a device ID
Crashing at boot would be a real bug that bothers people, not just a
new feature.
Lets see how big the patch is. If its 1000 lines of hard to understand
code, it will probably be rejected for stable. If its 100 lines or
less, it will likely be accepted.
It is also hard to argue the DT is wrong. It just describes the
hardware. I assume the description is actually correct? The issue is
the driver, not the description. Also, i assume this affects all
boards using this SoC? Removing the nodes in one board 'fixes' one
board. Fixing the driver fixes all boards...
Andrew
Am 18.09.25 um 17:41 schrieb Andrew Lunn: >>>> The mvebu-comphy driver does not currently know how to pass correct >>>> lane-count to ATF while configuring the serdes lanes. >>> Why not just teach mvebu-comphy to pass the correct line-count? That >>> sounds like the proper fix, and that makes the kernel independent of >>> the bootloader. >> That would be a feature on the comphy driver, not a bug-fix backported >> to stable. The core goal was to fix bugs found in Debian 13. > It is not so simple. > > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > > It must either fix a real bug that bothers people or just add a device ID > > Crashing at boot would be a real bug that bothers people, not just a > new feature. > > Lets see how big the patch is. If its 1000 lines of hard to understand > code, it will probably be rejected for stable. If its 100 lines or > less, it will likely be accepted. I see. > It is also hard to argue the DT is wrong. It just describes the > hardware. I assume the description is actually correct? The x4 port linked comphy as below: phys = <&cp0_comphy0 0>, <&cp0_comphy1 0>, <&cp0_comphy2 0>, <&cp0_comphy3 0>; At the time of submitting my patch I was not convinced the above was right, or wrong. I labeled it wrong for causing a fault which I should have noticed much earlier. The numeric argument after the comphy-lane handle is the port number, for those functions that can have multiple ports (e.g. ethernet #2). This means above dts linked pci port 0 on lanes 0-4, which appears correct. Further lanes 1-3 have no other pci ports, there is no other configuration to confuse it with. > The issue is > the driver, not the description. Also, i assume this affects all > boards using this SoC? Removing the nodes in one board 'fixes' one > board. Fixing the driver fixes all boards... I missed to check whether other boards share similar description. Today I found two other dts that reference multiple lanes: arch/arm64/boot/dts/marvell/armada-8040-puzzle-m801.dts arch/arm64/boot/dts/marvell/armada-8040-mcbin.dtsi Both cases the function is PCI - first one x2, secondx4. I will try to look into a more correct solution soon.
Hi Andrew, Am 18.09.25 um 19:40 schrieb Josua Mayer: > Am 18.09.25 um 17:41 schrieb Andrew Lunn: > >>>>> The mvebu-comphy driver does not currently know how to pass correct >>>>> lane-count to ATF while configuring the serdes lanes. >>>> Why not just teach mvebu-comphy to pass the correct line-count? That >>>> sounds like the proper fix, and that makes the kernel independent of >>>> the bootloader. >>> That would be a feature on the comphy driver, not a bug-fix backported >>> to stable. The core goal was to fix bugs found in Debian 13. >> It is not so simple. >> >> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html >> >> It must either fix a real bug that bothers people or just add a device ID >> >> Crashing at boot would be a real bug that bothers people, not just a >> new feature. >> >> Lets see how big the patch is. If its 1000 lines of hard to understand >> code, it will probably be rejected for stable. If its 100 lines or >> less, it will likely be accepted. > I see. >> It is also hard to argue the DT is wrong. It just describes the >> hardware. I assume the description is actually correct? > The x4 port linked comphy as below: > > phys = <&cp0_comphy0 0>, <&cp0_comphy1 0>, <&cp0_comphy2 0>, <&cp0_comphy3 0>; > > At the time of submitting my patch I was not convinced the above was right, or wrong. > I labeled it wrong for causing a fault which I should have noticed much earlier. > > The numeric argument after the comphy-lane handle is the port number, > for those functions that can have multiple ports (e.g. ethernet #2). > > This means above dts linked pci port 0 on lanes 0-4, which appears correct. > Further lanes 1-3 have no other pci ports, there is no other configuration to confuse it with. > >> The issue is >> the driver, not the description. Also, i assume this affects all >> boards using this SoC? Removing the nodes in one board 'fixes' one >> board. Fixing the driver fixes all boards... > I missed to check whether other boards share similar description. > > Today I found two other dts that reference multiple lanes: > > arch/arm64/boot/dts/marvell/armada-8040-puzzle-m801.dts > arch/arm64/boot/dts/marvell/armada-8040-mcbin.dtsi > > Both cases the function is PCI - first one x2, secondx4. > > I will try to look into a more correct solution soon. I did some extensive testing outside of Debian world, with v6.12.0, v6.12.48, Marvell v6.1 ... and made a range of interesting / confusing observations: 1. I was able to produce the problem in a self-compiled kernel outside Debian. This is with arm64 defconfig and some small adaptations. The system only got stuck during boot when the comphy driver was a module. In this case there are two suspicious messages in boot log: [ 2.742966] armada8k-pcie f4600000.pcie: No available PHY [ 3.732084] armada8k-pcie f4600000.pcie: Phy link never came up The link timeout comes to mind first, which is unexpected as in my testing there was always a card connected. This card was detected fine with comphy driver builtin. The "No available PHY" likely leads to some bad error handling in pci probe, and should be investigated further. 2. When the system got stuck during boot, it was never in the middle of an smc to atf. I confirmed this by adding locking to the smc function handler in atf, and logging activity to serial console. 3. I was wrong in that the linux driver does not know how to configure the lane count, the comphy driver does indeed pass the port width (as indicated by num-lanes dt prop) in the format that ATF expects. However ATF does nothing with kernel driver pci lane configuration. Instead any power-on or power-off call from kernel driver via smcc is tested for originating within linux vs. uboot. Only when source is uboot, it performs any configuration ... : https://github.com/ARM-software/arm-trusted-firmware/blob/master/drivers/marvell/comphy/phy-comphy-cp110.c#L1257 4. The "mode" argument (x2) to the smc function for comphy lane power-on/setup differs between Marvell U-Boot and Linux. I found this by dumping them from ATF itself. In particular the bits indicating the port number were invalid due to an overflow error in solidrun u-boot (based on Marvell u-boot). The mode specified by kernel driver however seemed correct in that regard. Further the bits indicating the serdes lane speed are 0 in linux driver, and all one in vendor u-boot. As atf ignores pci lane configuration when originating from kernel, this had no impact so far. 5. The port number passed from u-boot to atf appears to have no effect. Fixing it in vendor u-boot had so far no apparent impact. To conclude, I think my device-tree patch is not correct and should be replaced once a better workaround or solution is discovered.
© 2016 - 2026 Red Hat, Inc.