Add the netlink YAML spec and auto-generated UAPI header for a unified
loopback interface covering MAC, PCS, PHY, and pluggable module
components.
Each loopback point is described by a nested entry attribute
containing:
- component where in the path (MAC, PCS, PHY, MODULE)
- name subsystem label, e.g. "cmis-host" or "cmis-media"
- id optional instance selector (e.g. PHY id, port id)
- supported bitmask of supported directions
- direction NEAR_END, FAR_END, or 0 (disabled)
Signed-off-by: Björn Töpel <bjorn@kernel.org>
---
Documentation/netlink/specs/ethtool.yaml | 123 ++++++++++++++++++
.../uapi/linux/ethtool_netlink_generated.h | 59 +++++++++
2 files changed, 182 insertions(+)
diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 4707063af3b4..8bd14a3c946a 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -211,6 +211,49 @@ definitions:
name: discard
value: 31
+ -
+ name: loopback-component
+ type: enum
+ doc: |
+ Loopback component. Identifies where in the network path the
+ loopback is applied.
+ entries:
+ -
+ name: mac
+ doc: MAC loopback. Loops traffic at the MAC block.
+ -
+ name: pcs
+ doc: |
+ PCS loopback. Loops traffic at the PCS sublayer between the
+ MAC and the PHY.
+ -
+ name: phy
+ doc: |
+ Ethernet PHY loopback. This refers to the Ethernet PHY managed
+ by phylib, not generic PHY drivers. A Base-T SFP module
+ containing an Ethernet PHY driven by Linux should report
+ loopback under this component, not module.
+ -
+ name: module
+ doc: |
+ Pluggable module (e.g. CMIS (Q)SFP) loopback. Covers loopback
+ modes controlled via module firmware or EEPROM registers. When
+ Linux drives an Ethernet PHY inside the module via phylib, use
+ the phy component instead.
+ -
+ name: loopback-direction
+ type: flags
+ doc: |
+ Loopback direction flags. Used as a bitmask in supported, and as
+ a single value in direction.
+ entries:
+ -
+ name: near-end
+ doc: Near-end loopback; host-loop-host
+ -
+ name: far-end
+ doc: Far-end loopback; line-loop-line
+
attribute-sets:
-
name: header
@@ -1903,6 +1946,58 @@ attribute-sets:
name: link
type: nest
nested-attributes: mse-snapshot
+ -
+ name: loopback-entry
+ doc: Per-component loopback configuration entry.
+ attr-cnt-name: __ethtool-a-loopback-entry-cnt
+ attributes:
+ -
+ name: unspec
+ type: unused
+ value: 0
+ -
+ name: component
+ type: u32
+ enum: loopback-component
+ doc: Loopback component
+ -
+ name: id
+ type: u32
+ doc: Optional component instance identifier.
+ -
+ name: name
+ type: string
+ doc: |
+ Subsystem-specific name for the loopback point within the
+ component.
+ -
+ name: supported
+ type: u8
+ enum: loopback-direction
+ enum-as-flags: true
+ doc: Bitmask of supported loopback directions
+ -
+ name: direction
+ type: u8
+ enum: loopback-direction
+ doc: Current loopback direction, 0 means disabled
+ -
+ name: loopback
+ attr-cnt-name: __ethtool-a-loopback-cnt
+ attributes:
+ -
+ name: unspec
+ type: unused
+ value: 0
+ -
+ name: header
+ type: nest
+ nested-attributes: header
+ -
+ name: entry
+ type: nest
+ multi-attr: true
+ nested-attributes: loopback-entry
operations:
enum-model: directional
@@ -2855,6 +2950,34 @@ operations:
- worst-channel
- link
dump: *mse-get-op
+ -
+ name: loopback-get
+ doc: Get loopback configuration and capabilities.
+
+ attribute-set: loopback
+
+ do: &loopback-get-op
+ request:
+ attributes:
+ - header
+ reply:
+ attributes: &loopback
+ - header
+ - entry
+ dump: *loopback-get-op
+ -
+ name: loopback-set
+ doc: Set loopback configuration.
+
+ attribute-set: loopback
+
+ do:
+ request:
+ attributes: *loopback
+ -
+ name: loopback-ntf
+ doc: Notification for change in loopback configuration.
+ notify: loopback-get
mcast-groups:
list:
diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h
index 114b83017297..d8bff056a4b1 100644
--- a/include/uapi/linux/ethtool_netlink_generated.h
+++ b/include/uapi/linux/ethtool_netlink_generated.h
@@ -78,6 +78,40 @@ enum ethtool_pse_event {
ETHTOOL_PSE_EVENT_SW_PW_CONTROL_ERROR = 64,
};
+/**
+ * enum ethtool_loopback_component - Loopback component. Identifies where in
+ * the network path the loopback is applied.
+ * @ETHTOOL_LOOPBACK_COMPONENT_MAC: MAC loopback. Loops traffic at the MAC
+ * block.
+ * @ETHTOOL_LOOPBACK_COMPONENT_PCS: PCS loopback. Loops traffic at the PCS
+ * sublayer between the MAC and the PHY.
+ * @ETHTOOL_LOOPBACK_COMPONENT_PHY: Ethernet PHY loopback. This refers to the
+ * Ethernet PHY managed by phylib, not generic PHY drivers. A Base-T SFP
+ * module containing an Ethernet PHY driven by Linux should report loopback
+ * under this component, not module.
+ * @ETHTOOL_LOOPBACK_COMPONENT_MODULE: Pluggable module (e.g. CMIS (Q)SFP)
+ * loopback. Covers loopback modes controlled via module firmware or EEPROM
+ * registers. When Linux drives an Ethernet PHY inside the module via phylib,
+ * use the phy component instead.
+ */
+enum ethtool_loopback_component {
+ ETHTOOL_LOOPBACK_COMPONENT_MAC,
+ ETHTOOL_LOOPBACK_COMPONENT_PCS,
+ ETHTOOL_LOOPBACK_COMPONENT_PHY,
+ ETHTOOL_LOOPBACK_COMPONENT_MODULE,
+};
+
+/**
+ * enum ethtool_loopback_direction - Loopback direction flags. Used as a
+ * bitmask in supported, and as a single value in direction.
+ * @ETHTOOL_LOOPBACK_DIRECTION_NEAR_END: Near-end loopback; host-loop-host
+ * @ETHTOOL_LOOPBACK_DIRECTION_FAR_END: Far-end loopback; line-loop-line
+ */
+enum ethtool_loopback_direction {
+ ETHTOOL_LOOPBACK_DIRECTION_NEAR_END = 1,
+ ETHTOOL_LOOPBACK_DIRECTION_FAR_END = 2,
+};
+
enum {
ETHTOOL_A_HEADER_UNSPEC,
ETHTOOL_A_HEADER_DEV_INDEX,
@@ -838,6 +872,27 @@ enum {
ETHTOOL_A_MSE_MAX = (__ETHTOOL_A_MSE_CNT - 1)
};
+enum {
+ ETHTOOL_A_LOOPBACK_ENTRY_UNSPEC,
+ ETHTOOL_A_LOOPBACK_ENTRY_COMPONENT,
+ ETHTOOL_A_LOOPBACK_ENTRY_ID,
+ ETHTOOL_A_LOOPBACK_ENTRY_NAME,
+ ETHTOOL_A_LOOPBACK_ENTRY_SUPPORTED,
+ ETHTOOL_A_LOOPBACK_ENTRY_DIRECTION,
+
+ __ETHTOOL_A_LOOPBACK_ENTRY_CNT,
+ ETHTOOL_A_LOOPBACK_ENTRY_MAX = (__ETHTOOL_A_LOOPBACK_ENTRY_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_LOOPBACK_UNSPEC,
+ ETHTOOL_A_LOOPBACK_HEADER,
+ ETHTOOL_A_LOOPBACK_ENTRY,
+
+ __ETHTOOL_A_LOOPBACK_CNT,
+ ETHTOOL_A_LOOPBACK_MAX = (__ETHTOOL_A_LOOPBACK_CNT - 1)
+};
+
enum {
ETHTOOL_MSG_USER_NONE = 0,
ETHTOOL_MSG_STRSET_GET = 1,
@@ -891,6 +946,8 @@ enum {
ETHTOOL_MSG_RSS_CREATE_ACT,
ETHTOOL_MSG_RSS_DELETE_ACT,
ETHTOOL_MSG_MSE_GET,
+ ETHTOOL_MSG_LOOPBACK_GET,
+ ETHTOOL_MSG_LOOPBACK_SET,
__ETHTOOL_MSG_USER_CNT,
ETHTOOL_MSG_USER_MAX = (__ETHTOOL_MSG_USER_CNT - 1)
@@ -952,6 +1009,8 @@ enum {
ETHTOOL_MSG_RSS_CREATE_NTF,
ETHTOOL_MSG_RSS_DELETE_NTF,
ETHTOOL_MSG_MSE_GET_REPLY,
+ ETHTOOL_MSG_LOOPBACK_GET_REPLY,
+ ETHTOOL_MSG_LOOPBACK_NTF,
__ETHTOOL_MSG_KERNEL_CNT,
ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
--
2.53.0
Hi again Björn,
First, thanks for iterating so quick !
On 10/03/2026 11:47, Björn Töpel wrote:
> Add the netlink YAML spec and auto-generated UAPI header for a unified
> loopback interface covering MAC, PCS, PHY, and pluggable module
> components.
>
> Each loopback point is described by a nested entry attribute
> containing:
>
> - component where in the path (MAC, PCS, PHY, MODULE)
> - name subsystem label, e.g. "cmis-host" or "cmis-media"
> - id optional instance selector (e.g. PHY id, port id)
> - supported bitmask of supported directions
> - direction NEAR_END, FAR_END, or 0 (disabled)
>
> Signed-off-by: Björn Töpel <bjorn@kernel.org>
> ---
> Documentation/netlink/specs/ethtool.yaml | 123 ++++++++++++++++++
> .../uapi/linux/ethtool_netlink_generated.h | 59 +++++++++
> 2 files changed, 182 insertions(+)
>
> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 4707063af3b4..8bd14a3c946a 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -211,6 +211,49 @@ definitions:
> name: discard
> value: 31
>
> + -
> + name: loopback-component
> + type: enum
> + doc: |
> + Loopback component. Identifies where in the network path the
> + loopback is applied.
> + entries:
> + -
> + name: mac
> + doc: MAC loopback. Loops traffic at the MAC block.
> + -
> + name: pcs
> + doc: |
> + PCS loopback. Loops traffic at the PCS sublayer between the
> + MAC and the PHY.
> + -
> + name: phy
> + doc: |
> + Ethernet PHY loopback. This refers to the Ethernet PHY managed
> + by phylib, not generic PHY drivers. A Base-T SFP module
> + containing an Ethernet PHY driven by Linux should report
> + loopback under this component, not module.
> + -
> + name: module
> + doc: |
> + Pluggable module (e.g. CMIS (Q)SFP) loopback. Covers loopback
> + modes controlled via module firmware or EEPROM registers. When
> + Linux drives an Ethernet PHY inside the module via phylib, use
> + the phy component instead.
So to get back on Andrew's remarks, let's see if we can get something
closer to 802.3.
Here, we have loopback at various locations, which all depends on the
Ethernet standard you use.
It's usually in the PCS, PMA or PMD components. Thing is, we may have
these in multiple places in our link.
If we take an example with a 10G PHY, we may have :
+----SoC-----+
| |
| MAC |- drivers/net/ethernet
| | |
| Base-R PCS |- could be in drivers/net/pcs, or directly
| | | in the MAC driver
| | |
| SerDes |- May be in drivers/phy, maybe handled by firmware,
| | | maybe by the MAC driver, maybe by the PCS driver ?
+---|--------+
|
| 10GBase-R
|
+---|-PHY+
| | |
| SerDes | \
| | | |
| PCS | |
| | | > All of that handled by the drivers/net/phy PHY driver
| PMA | |
| | | |
| PMD | /
+---|----+
|
v 10GBaseT
So even the "PCS" loopback component is a bit ambiguous, are we talking
about the PHY PCS or the MAC PCS ?
Another thing to consider is that there may be multiple PCSs in the SoC
(e.g. a BaseX and a BaseR PCS like we have in mvpp2), the one in use
depends on the current interface between the MAC and the PHY.
Another open question is, do we deal with loopbacks that may affect
multi-netdev links ? Like the multi-lane modes we discussed with fbnic,
or even for embedded, interfaces such as QSGMII ?
As for the SerDes on the MAC side (say, the comphy on Marvell devices),
can we say it's a PMA for 10GBase-KR ? Or is it something that's simply
out of spec ?
So I'd say, maybe we should not have a PCS loopback component at all,
but instead loopback at the well-defined components on our link, that is:
- MAC => MAC loopack, PCS on the MAC side, SerDes on the SoC, etc.
- PHY => Loopbacks on the PCS/PHY/PMA withing the PHY device
- Module => For non-PHY (Q)SFPs
The important part would therefore to get the "name" part right, making
sure we don't fall into driver specific names.
We can name that 'pcs', 'pma', 'pmd', or maybe even 'mii' ? Let's see :
+----SoC-----+
| |
| MAC |- component = MAC, name = 'mac'
| | |
| Base-R PCS |- component = MAC, name = 'pcs'
| | |
| | |
| SerDes |- component = MAC, name = 'mii' ?
| | |
+---|--------+
|
| 10GBase-R
|
+---|-PHY+
| | |
| SerDes | - component = PHY, name = 'mii' ?
| | |
| PCS | - component = PHY, name = 'pcs'
| | |
| PMA | - component = PHY, name = 'pma'
| | |
| PMD |- component = PHY, name = 'pmd' or 'mdi' ?
+---|----+
|
v 10GBaseT
Sorry that's a lot of questions and I don't expect you to have the
answer, but as what you've come-up with is taking a good shape, it's
important to decide on the overall design and draw some lines about
what do we support, and how :(
> + -
> + name: loopback-direction
> + type: flags
> + doc: |
> + Loopback direction flags. Used as a bitmask in supported, and as
> + a single value in direction.
> + entries:
> + -
> + name: near-end
> + doc: Near-end loopback; host-loop-host
> + -
> + name: far-end
> + doc: Far-end loopback; line-loop-line
I was browsing 802.3, it uses the terminlogy of "local loopback" vs
"remote loopback", I suggest we use those.
Maxime
> If we take an example with a 10G PHY, we may have :
>
> +----SoC-----+
> | |
> | MAC |- drivers/net/ethernet
> | | |
> | Base-R PCS |- could be in drivers/net/pcs, or directly
> | | | in the MAC driver
> | | |
> | SerDes |- May be in drivers/phy, maybe handled by firmware,
> | | | maybe by the MAC driver, maybe by the PCS driver ?
> +---|--------+
> |
> | 10GBase-R
> |
> +---|-PHY+
> | | |
> | SerDes | \
> | | | |
> | PCS | |
> | | | > All of that handled by the drivers/net/phy PHY driver
> | PMA | |
> | | | |
> | PMD | /
> +---|----+
> |
> v 10GBaseT
We should also keep in mind this is a "simple" PHY. If you have a PHY
which does rate adaptation it looks more like:
+---|-PHY+
| | |
| SerDes |
| | |
| PCS |
| | |
| MAC |
| | |
| packet |
| buffer |
| | |
| MAC |
| | |
| PCS |
| | |
| PMA |
| | |
| PMD |
+---|----+
|
v 10GBaseT
So there is potentially 5 more loopback points?
Jakub proposal had the concept of 'depth'. Maybe we need that to
handle having the same block repeated a few times as you go towards
the media?
We should also think about when we have a PHY acting as a MII
converter. You see the Marvell PHY placed between the MAC and the SFP
cage. That has a collection of blocks which can do loopback. And then
we could have either a Base-T module/PHY in the cage, with more of the
same blocks, or a fibre modules with loopback.
Andrew
On 11/03/2026 16:22, Andrew Lunn wrote:
>> If we take an example with a 10G PHY, we may have :
>>
>> +----SoC-----+
>> | |
>> | MAC |- drivers/net/ethernet
>> | | |
>> | Base-R PCS |- could be in drivers/net/pcs, or directly
>> | | | in the MAC driver
>> | | |
>> | SerDes |- May be in drivers/phy, maybe handled by firmware,
>> | | | maybe by the MAC driver, maybe by the PCS driver ?
>> +---|--------+
>> |
>> | 10GBase-R
>> |
>> +---|-PHY+
>> | | |
>> | SerDes | \
>> | | | |
>> | PCS | |
>> | | | > All of that handled by the drivers/net/phy PHY driver
>> | PMA | |
>> | | | |
>> | PMD | /
>> +---|----+
>> |
>> v 10GBaseT
>
> We should also keep in mind this is a "simple" PHY. If you have a PHY
> which does rate adaptation it looks more like:
>
> +---|-PHY+
> | | |
> | SerDes |
> | | |
> | PCS |
> | | |
> | MAC |
> | | |
> | packet |
> | buffer |
> | | |
> | MAC |
> | | |
> | PCS |
> | | |
> | PMA |
> | | |
> | PMD |
> +---|----+
> |
> v 10GBaseT
>
> So there is potentially 5 more loopback points?
Good point indeed
>
> Jakub proposal had the concept of 'depth'. Maybe we need that to
> handle having the same block repeated a few times as you go towards
> the media?
So, the same name + depth ?
+---|-PHY+
| | |
| SerDes |
| | |
| PCS | component = PHY, name = "pcs", depth = 0
| | |
| MAC |
| | |
| packet |
| buffer |
| | |
| MAC |
| | |
| PCS | component = PHY, name = "pcs", depth = 1
| | |
| PMA |
| | |
| PMD |
+---|----+
|
v 10GBaseT
I think I like this idea of depth + name, as we can consider omitting
the depth information when it's not needed (e.g. simple PHY with 1 PCS),
to keep the API simple.
To continue with your example, with combo-port PHYs we may get multiple
PMA/PMD instances, one per port, that's even more loopback points.
We could potentially associate these with phy_port though ?
> We should also think about when we have a PHY acting as a MII
> converter. You see the Marvell PHY placed between the MAC and the SFP
> cage. That has a collection of blocks which can do loopback. And then
> we could have either a Base-T module/PHY in the cage, with more of the
> same blocks, or a fibre modules with loopback.
For that we have what we need with phy_link_topology, as each PHY has
its index, we should be good to go in that regard hopefully :)
Maxime
> So, the same name + depth ?
>
> +---|-PHY+
> | | |
> | SerDes |
> | | |
> | PCS | component = PHY, name = "pcs", depth = 0
> | | |
> | MAC |
> | | |
> | packet |
> | buffer |
> | | |
> | MAC |
> | | |
> | PCS | component = PHY, name = "pcs", depth = 1
> | | |
> | PMA |
> | | |
> | PMD |
> +---|----+
> |
> v 10GBaseT
>
> For that we have what we need with phy_link_topology, as each PHY has
> its index, we should be good to go in that regard hopefully :)
So depth would be local to a component? We could have two PHY
components, each with a different index, and depth = 0?
I _think_ Jakub's depth was more at a global level? But then it would
need to be passed down as we do the enumeration.
Andrew
On Wed, 11 Mar 2026 20:26:39 +0100 Andrew Lunn wrote: > > For that we have what we need with phy_link_topology, as each PHY has > > its index, we should be good to go in that regard hopefully :) > > So depth would be local to a component? We could have two PHY > components, each with a different index, and depth = 0? > > I _think_ Jakub's depth was more at a global level? But then it would > need to be passed down as we do the enumeration. Oh, sorry, I responded without reading the whole discussion :) No, I imagined the depth would be within a single component, so under control of a single driver (instance). The ordering between components should be defined by PHY topology etc so it's outside of the loopback config.
On Wed, Mar 11, 2026 at 07:50:52PM -0700, Jakub Kicinski wrote:
> On Wed, 11 Mar 2026 20:26:39 +0100 Andrew Lunn wrote:
> > > For that we have what we need with phy_link_topology, as each PHY has
> > > its index, we should be good to go in that regard hopefully :)
> >
> > So depth would be local to a component? We could have two PHY
> > components, each with a different index, and depth = 0?
> >
> > I _think_ Jakub's depth was more at a global level? But then it would
> > need to be passed down as we do the enumeration.
>
> Oh, sorry, I responded without reading the whole discussion :)
> No, I imagined the depth would be within a single component,
> so under control of a single driver (instance). The ordering
> between components should be defined by PHY topology etc so
> it's outside of the loopback config.
As for me, it is problematic to help the user to understand the datapath
depth on a switch. For example:
CPU -- xMII --- MAC1 [loop] --- fabric --- MAC2 [loop] --- xMII -- PHY
\----- MACx [loop] ---
... each port has two xMII loop configurations: towards the xMII or towards
the fabric. From a driver perspective, a loop towards the xMII is
"remote." However, from a system perspective, a "remote" loop on MAC1 is
a local loop at depth=0, whereas a "local" loop on MAC2 is a local loop
at depth=1.
Other example would be where we have a chain of components which are
attached on the system in a unexpected direction, where the MDI
interface is pointing towards the main CPU, so the remote loopbacks
became to local loop.
One more issue is the test data generator location. The data generator
is not always the CPU. We have HW generators located in components like
PHYs or we may use external source (remote loopback).
This is why i would prefer to have description of topology by not
hard coding the perspective of view.
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> As for me, it is problematic to help the user to understand the datapath > depth on a switch. For example: Do you mean Ethernet switch? Or MII switch. > > CPU -- xMII --- MAC1 [loop] --- fabric --- MAC2 [loop] --- xMII -- PHY > \----- MACx [loop] --- In DSA, MAC1 is the CPU port of the switch. It is not represented by a netif. Since there is no netif, you cannot use ethtool on it. So it is impossible to apply loopback here. This is one of the oddities of DSA. The CPU port and the conduit interface on the host are just plumbing to make the setup work. In terms of networking, they are not important. But sometimes you need to get into the plumbing to find out why it is blocked up, statistics are useful, and maybe loopback as well. We have discussed it a few times that MAC1 should have a netif, but the conclusion is that developers have a hard enough time with the conduit interface, adding yet another oddball interface with no real purpose other than diagnostics is gone to make the confusion even worse. So i don't think depth is relevant here. > ... each port has two xMII loop configurations: towards the xMII or towards > the fabric. From a driver perspective, a loop towards the xMII is > "remote." However, from a system perspective, a "remote" loop on MAC1 is > a local loop at depth=0, whereas a "local" loop on MAC2 is a local loop > at depth=1. If you think about DSA and the Linux representation, the switch fabric is not seen at all. All you have are user ports, those going to the outside world. They act the same as interfaces directly connected to the SoC. So "remote" and "local" must have the same meaning as an interface directly connected to the host. And this is true for switchdev in general, DSA is not special in any way. > One more issue is the test data generator location. The data generator > is not always the CPU. We have HW generators located in components like > PHYs or we may use external source (remote loopback). At the moment, we don't have a Linux model for such generators. There is interest in them, but nobody has actually stepped up and proposed anything. I do see there is an intersect, we need to be able to represent them in the topology, and know which way they are pointing, but i don't think they have a direct influence on loopback. Andrew
Hi Andrew, >> One more issue is the test data generator location. The data generator >> is not always the CPU. We have HW generators located in components like >> PHYs or we may use external source (remote loopback). > > At the moment, we don't have a Linux model for such generators. There > is interest in them, but nobody has actually stepped up and proposed > anything. I do see there is an intersect, we need to be able to > represent them in the topology, and know which way they are pointing, > but i don't think they have a direct influence on loopback. If I'm following Oleksij, the idea would be to have on one side the ability to "dump" the link topology with a finer granularity so that we can see all the different blocks (pcs, pma, pmd, etc.), how they are chained together and who's driving them (MAC, PHY (+ phy_index), module, etc.), and on another side commands to configure loopback on them, with the ability to also configure traffic generators in the future, gather stats, etc. Another can of worms for sure, and probably too much for what Björn is trying to achieve. It's hard to say if this is overkill or not, there's interest in that for sure, but also quite a lot of work to do... Maxime
Folks, thanks for the elaborate discussion (accidental complexity vs
essential complexity comes to mind...)!
Maxime Chevallier <maxime.chevallier@bootlin.com> writes:
> Hi Andrew,
>
>>> One more issue is the test data generator location. The data generator
>>> is not always the CPU. We have HW generators located in components like
>>> PHYs or we may use external source (remote loopback).
>>
>> At the moment, we don't have a Linux model for such generators. There
>> is interest in them, but nobody has actually stepped up and proposed
>> anything. I do see there is an intersect, we need to be able to
>> represent them in the topology, and know which way they are pointing,
>> but i don't think they have a direct influence on loopback.
>
> If I'm following Oleksij, the idea would be to have on one side the
> ability to "dump" the link topology with a finer granularity so that we
> can see all the different blocks (pcs, pma, pmd, etc.), how they are
> chained together and who's driving them (MAC, PHY (+ phy_index), module,
> etc.), and on another side commands to configure loopback on them, with
> the ability to also configure traffic generators in the future, gather
> stats, etc.
>
> Another can of worms for sure, and probably too much for what Björn is
> trying to achieve. It's hard to say if this is overkill or not, there's
> interest in that for sure, but also quite a lot of work to do...
It's great to have these discussion as input to the first (minimal!)
series, so we can extend/build on it later.
If I try to make sense of the above discussions...
Rough agreement on:
- Depth/ordering should be local to a component, not global across the
whole path.
- Cross-component ordering comes from existing infrastructure (PHY link
topology, phy_index).
- The current component set (MAC/PHY/MODULE) is reasonable for a first
pass.
- HW traffic generators and full topology dumps are interesting but out
of scope for now (Please? ;-)).
So, maybe the next steps are:
1. Keep the current component model (MAC/PHY/MODULE) and the
NEAR_END/FAR_END direction (naming need to change as Maxime said).
2. Add a depth (or order?) field to ETHTOOL_A_LOOPBACK_ENTRY as Jakub
suggested, local to each component instance. This addresses the
"multiple loopback points within one MAC" case without requiring a
global ordering. I hope it addresses what Oleksij's switch example
needs (multiple local loops at different depths within one
component) *insert that screaming emoji*.
3. Document the viewpoint convention clearly.
4. Punt on the grand topology dump. Too much to chew.
5. Don't worry about DSA CPU ports - they don't have a netif, so
loopback doesn't apply there today. If someone adds netifs for CPU
ports later, depth handles it.
TL;DR: Add depth, document the viewpoint convention, and ship
it^W^Winterate.
Did I get that right?
Enjoy the w/e!
Björn
> So, maybe the next steps are:
>
> 1. Keep the current component model (MAC/PHY/MODULE) and the
> NEAR_END/FAR_END direction (naming need to change as Maxime said).
>
> 2. Add a depth (or order?) field to ETHTOOL_A_LOOPBACK_ENTRY as Jakub
> suggested, local to each component instance. This addresses the
> "multiple loopback points within one MAC" case without requiring a
> global ordering. I hope it addresses what Oleksij's switch example
> needs (multiple local loops at different depths within one
> component) *insert that screaming emoji*.
>
> 3. Document the viewpoint convention clearly.
>
> 4. Punt on the grand topology dump. Too much to chew.
>
> 5. Don't worry about DSA CPU ports - they don't have a netif, so
> loopback doesn't apply there today. If someone adds netifs for CPU
> ports later, depth handles it.
>
> TL;DR: Add depth, document the viewpoint convention, and ship
> it^W^Winterate.
>
> Did I get that right?
Sounds reasonable. The first version can be KISS, we just need to keep
in mind reality is more complex and try to avoid adding any roadblocks
for making it more complex to reflect that reality.
Andrew
Hi Björn, On Fri, Mar 13, 2026 at 08:11:11PM +0100, Björn Töpel wrote: > Folks, thanks for the elaborate discussion (accidental complexity vs > essential complexity comes to mind...)! Sorry for overthinking :) > Maxime Chevallier <maxime.chevallier@bootlin.com> writes: > > > Hi Andrew, > > > >>> One more issue is the test data generator location. The data generator > >>> is not always the CPU. We have HW generators located in components like > >>> PHYs or we may use external source (remote loopback). > >> > >> At the moment, we don't have a Linux model for such generators. There > >> is interest in them, but nobody has actually stepped up and proposed > >> anything. I do see there is an intersect, we need to be able to > >> represent them in the topology, and know which way they are pointing, > >> but i don't think they have a direct influence on loopback. > > > > If I'm following Oleksij, the idea would be to have on one side the > > ability to "dump" the link topology with a finer granularity so that we > > can see all the different blocks (pcs, pma, pmd, etc.), how they are > > chained together and who's driving them (MAC, PHY (+ phy_index), module, > > etc.), and on another side commands to configure loopback on them, with > > the ability to also configure traffic generators in the future, gather > > stats, etc. > > > > Another can of worms for sure, and probably too much for what Björn is > > trying to achieve. It's hard to say if this is overkill or not, there's > > interest in that for sure, but also quite a lot of work to do... > > It's great to have these discussion as input to the first (minimal!) > series, so we can extend/build on it later. > > If I try to make sense of the above discussions... > > Rough agreement on: > > - Depth/ordering should be local to a component, not global across the > whole path. ack > - Cross-component ordering comes from existing infrastructure (PHY link > topology, phy_index). ack > - The current component set (MAC/PHY/MODULE) is reasonable for a first > pass. I do not have strong opinion here. > - HW traffic generators and full topology dumps are interesting but out > of scope for now (Please? ;-)). It didn't tried to push it here. My point is - image me or may be you, will need to implement it in the next step. This components will need to cooperate and user will need to understand the relation and/or topology. The diagnostic is all about topology. > So, maybe the next steps are: > > 1. Keep the current component model (MAC/PHY/MODULE) and the > NEAR_END/FAR_END direction (naming need to change as Maxime said). Probably good to document that NEAR_END/FAR_END or local/remote is related to the viewpoint convention. Otherwise it will get confusing with components which mount in a unusual direction (embedded worlds is full of it :)) > 2. Add a depth (or order?) field to ETHTOOL_A_LOOPBACK_ENTRY as Jakub > suggested, local to each component instance. This addresses the > "multiple loopback points within one MAC" case without requiring a > global ordering. I hope it addresses what Oleksij's switch example > needs (multiple local loops at different depths within one > component) *insert that screaming emoji*. ack. I guess "depth" fits to the "viewpoint" terminology. > 3. Document the viewpoint convention clearly. ack > 4. Punt on the grand topology dump. Too much to chew. ack > 5. Don't worry about DSA CPU ports - they don't have a netif, so > loopback doesn't apply there today. If someone adds netifs for CPU > ports later, depth handles it. ack > TL;DR: Add depth, document the viewpoint convention, and ship > it^W^Winterate. > > Did I get that right? I'm ok with it, but maintainers will have the last word. Best Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On Sun, 15 Mar 2026 at 16:10, Oleksij Rempel <o.rempel@pengutronix.de> wrote: > > Hi Björn, > > On Fri, Mar 13, 2026 at 08:11:11PM +0100, Björn Töpel wrote: > > Folks, thanks for the elaborate discussion (accidental complexity vs > > essential complexity comes to mind...)! > > Sorry for overthinking :) Haha, don't be! I think it's great that we have these discussions upfront! If this is overthinking, please continue to do that! :-) > > Maxime Chevallier <maxime.chevallier@bootlin.com> writes: > > > > > Hi Andrew, > > > > > >>> One more issue is the test data generator location. The data generator > > >>> is not always the CPU. We have HW generators located in components like > > >>> PHYs or we may use external source (remote loopback). > > >> > > >> At the moment, we don't have a Linux model for such generators. There > > >> is interest in them, but nobody has actually stepped up and proposed > > >> anything. I do see there is an intersect, we need to be able to > > >> represent them in the topology, and know which way they are pointing, > > >> but i don't think they have a direct influence on loopback. > > > > > > If I'm following Oleksij, the idea would be to have on one side the > > > ability to "dump" the link topology with a finer granularity so that we > > > can see all the different blocks (pcs, pma, pmd, etc.), how they are > > > chained together and who's driving them (MAC, PHY (+ phy_index), module, > > > etc.), and on another side commands to configure loopback on them, with > > > the ability to also configure traffic generators in the future, gather > > > stats, etc. > > > > > > Another can of worms for sure, and probably too much for what Björn is > > > trying to achieve. It's hard to say if this is overkill or not, there's > > > interest in that for sure, but also quite a lot of work to do... > > > > It's great to have these discussion as input to the first (minimal!) > > series, so we can extend/build on it later. > > > > If I try to make sense of the above discussions... > > > > Rough agreement on: > > > > - Depth/ordering should be local to a component, not global across the > > whole path. > > ack > > > - Cross-component ordering comes from existing infrastructure (PHY link > > topology, phy_index). > > ack > > > - The current component set (MAC/PHY/MODULE) is reasonable for a first > > pass. > > I do not have strong opinion here. > > > - HW traffic generators and full topology dumps are interesting but out > > of scope for now (Please? ;-)). > > It didn't tried to push it here. My point is - image me or may be you, > will need to implement it in the next step. This components will need to > cooperate and user will need to understand the relation and/or topology. > > The diagnostic is all about topology. I hear you, and I hope this didn't come across as negative. I definitely think we need something that we all can continue to build on. ...and if my summary/view isn't, please holler! > > So, maybe the next steps are: > > > > 1. Keep the current component model (MAC/PHY/MODULE) and the > > NEAR_END/FAR_END direction (naming need to change as Maxime said). > > Probably good to document that NEAR_END/FAR_END or local/remote is > related to the viewpoint convention. Otherwise it will get confusing > with components which mount in a unusual direction (embedded worlds is > full of it :)) ACK! > > 2. Add a depth (or order?) field to ETHTOOL_A_LOOPBACK_ENTRY as Jakub > > suggested, local to each component instance. This addresses the > > "multiple loopback points within one MAC" case without requiring a > > global ordering. I hope it addresses what Oleksij's switch example > > needs (multiple local loops at different depths within one > > component) *insert that screaming emoji*. > > ack. I guess "depth" fits to the "viewpoint" terminology. > > > 3. Document the viewpoint convention clearly. > > ack > > > 4. Punt on the grand topology dump. Too much to chew. > > ack > > > 5. Don't worry about DSA CPU ports - they don't have a netif, so > > loopback doesn't apply there today. If someone adds netifs for CPU > > ports later, depth handles it. > > ack > > > TL;DR: Add depth, document the viewpoint convention, and ship > > it^W^Winterate. > > > > Did I get that right? > > I'm ok with it, but maintainers will have the last word. Agreed! Björn
On Thu, Mar 12, 2026 at 02:34:40PM +0100, Andrew Lunn wrote: > > As for me, it is problematic to help the user to understand the datapath > > depth on a switch. For example: > > Do you mean Ethernet switch? Or MII switch. > > > > > CPU -- xMII --- MAC1 [loop] --- fabric --- MAC2 [loop] --- xMII -- PHY > > \----- MACx [loop] --- > > In DSA, MAC1 is the CPU port of the switch. It is not represented by a > netif. Since there is no netif, you cannot use ethtool on it. So it is > impossible to apply loopback here. > > This is one of the oddities of DSA. The CPU port and the conduit > interface on the host are just plumbing to make the setup work. In > terms of networking, they are not important. But sometimes you need to > get into the plumbing to find out why it is blocked up, statistics are > useful, and maybe loopback as well. We have discussed it a few times > that MAC1 should have a netif, but the conclusion is that developers > have a hard enough time with the conduit interface, adding yet another > oddball interface with no real purpose other than diagnostics is gone > to make the confusion even worse. > > So i don't think depth is relevant here. I have some projects where we need to configure the egress queue from the switch to the CPU MAC. Currently my idea is to represent them as optional HW offloading helpers for the host MAC. For example, this pipe is already represented on the linux side as one interface: eth0 - [ SoC MAC0 --- xMII --- Switch MAC1 ] MAC0 and MAC1 counters are merged to one output. So, the egress queue configuration of the Switch MAC1 would be an ingress queue configuration of the MAC0 interface. The same about remote loopback on MAC1, are additional local loopbacks of MAC0. If we do it with counters, why not with everything else? -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On 12/03/2026 06:04, Oleksij Rempel wrote: > On Wed, Mar 11, 2026 at 07:50:52PM -0700, Jakub Kicinski wrote: >> On Wed, 11 Mar 2026 20:26:39 +0100 Andrew Lunn wrote: >>>> For that we have what we need with phy_link_topology, as each PHY has >>>> its index, we should be good to go in that regard hopefully :) >>> >>> So depth would be local to a component? We could have two PHY >>> components, each with a different index, and depth = 0? >>> >>> I _think_ Jakub's depth was more at a global level? But then it would >>> need to be passed down as we do the enumeration. >> >> Oh, sorry, I responded without reading the whole discussion :) >> No, I imagined the depth would be within a single component, >> so under control of a single driver (instance). The ordering >> between components should be defined by PHY topology etc so >> it's outside of the loopback config. > > As for me, it is problematic to help the user to understand the datapath > depth on a switch. For example: > > CPU -- xMII --- MAC1 [loop] --- fabric --- MAC2 [loop] --- xMII -- PHY > \----- MACx [loop] --- > > ... each port has two xMII loop configurations: towards the xMII or towards > the fabric. From a driver perspective, a loop towards the xMII is > "remote." However, from a system perspective, a "remote" loop on MAC1 is > a local loop at depth=0, whereas a "local" loop on MAC2 is a local loop > at depth=1. What's important is to specify clearly in the documentation from which end do we start, where representing the topology. From your scenario here, each block is already well represented and exposed, and if we use local depth definitions we should be fine ? > > Other example would be where we have a chain of components which are > attached on the system in a unexpected direction, where the MDI > interface is pointing towards the main CPU, so the remote loopbacks > became to local loop. I have a few of these types of setup on my desk, where 3 PHY devices are daisy-chained, we don't support that for now. If we one day add support for standalone PHYs acting as media converters, I expect we'll be able to tell which end is pointing where, and let it up to the user to figure out what "remote" and "local" means in that case. > > One more issue is the test data generator location. The data generator > is not always the CPU. We have HW generators located in components like > PHYs or we may use external source (remote loopback). There were discussions about PRBS, I think the same idea of "pinpointing which block we want to use" can be applied for both loopback and generation ? Maxime
On Thu, Mar 12, 2026 at 08:49:39AM +0100, Maxime Chevallier wrote: > > > On 12/03/2026 06:04, Oleksij Rempel wrote: > > On Wed, Mar 11, 2026 at 07:50:52PM -0700, Jakub Kicinski wrote: > >> On Wed, 11 Mar 2026 20:26:39 +0100 Andrew Lunn wrote: > >>>> For that we have what we need with phy_link_topology, as each PHY has > >>>> its index, we should be good to go in that regard hopefully :) > >>> > >>> So depth would be local to a component? We could have two PHY > >>> components, each with a different index, and depth = 0? > >>> > >>> I _think_ Jakub's depth was more at a global level? But then it would > >>> need to be passed down as we do the enumeration. > >> > >> Oh, sorry, I responded without reading the whole discussion :) > >> No, I imagined the depth would be within a single component, > >> so under control of a single driver (instance). The ordering > >> between components should be defined by PHY topology etc so > >> it's outside of the loopback config. > > > > As for me, it is problematic to help the user to understand the datapath > > depth on a switch. For example: > > > > CPU -- xMII --- MAC1 [loop] --- fabric --- MAC2 [loop] --- xMII -- PHY > > \----- MACx [loop] --- > > > > ... each port has two xMII loop configurations: towards the xMII or towards > > the fabric. From a driver perspective, a loop towards the xMII is > > "remote." However, from a system perspective, a "remote" loop on MAC1 is > > a local loop at depth=0, whereas a "local" loop on MAC2 is a local loop > > at depth=1. > > What's important is to specify clearly in the documentation from which > end do we start, where representing the topology. From your scenario > here, each block is already well represented and exposed, and if we use > local depth definitions we should be fine ? I guess my main problem is to imagine depth representation in two separate directions for the user. So, the kernel documentation should describe what is the starting point of view depending on the device type. For example: - PHY has typically xMII and MDI end points, so the loop towards the xMII is the local loop and towards the MDI is the remote loop. - a switch/bridge has mutiple, application specific end points. So, we have a starting point of view from the fabric. Every loop pointing from the fabric towards the outside world of the switch is the remote loop, independent on connection type (xMII or MDI). Correct? > > Other example would be where we have a chain of components which are > > attached on the system in a unexpected direction, where the MDI > > interface is pointing towards the main CPU, so the remote loopbacks > > became to local loop. > > I have a few of these types of setup on my desk, where 3 PHY devices are > daisy-chained, we don't support that for now. If we one day add support > for standalone PHYs acting as media converters, I expect we'll be able > to tell which end is pointing where, and let it up to the user to figure > out what "remote" and "local" means in that case. > > > > > One more issue is the test data generator location. The data generator > > is not always the CPU. We have HW generators located in components like > > PHYs or we may use external source (remote loopback). > > There were discussions about PRBS, I think the same idea of "pinpointing > which block we want to use" can be applied for both loopback and > generation ? Yes, the same apply for the counters. If we represent the data path as pipe with different components like loopbacks, PRBS, etc on different stages of the pipe, the same we have with counters. For example industrial or automotive PHYs have separate counters for xMII and MDI. A low depth loopback would not triggers some of counters. Since I do not wont push all of this right now, i suggest to use more abstract topology representation to make it easily extendable. -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Hi all, On Wed, Mar 11, 2026 at 08:33:26AM +0100, Maxime Chevallier wrote: > Hi again Björn, > > First, thanks for iterating so quick ! > > On 10/03/2026 11:47, Björn Töpel wrote: > > Add the netlink YAML spec and auto-generated UAPI header for a unified > > loopback interface covering MAC, PCS, PHY, and pluggable module > > components. > > > > Each loopback point is described by a nested entry attribute > > containing: > > > > - component where in the path (MAC, PCS, PHY, MODULE) > > - name subsystem label, e.g. "cmis-host" or "cmis-media" > > - id optional instance selector (e.g. PHY id, port id) > > - supported bitmask of supported directions > > - direction NEAR_END, FAR_END, or 0 (disabled) > > > > Signed-off-by: Björn Töpel <bjorn@kernel.org> > > --- > > Documentation/netlink/specs/ethtool.yaml | 123 ++++++++++++++++++ > > .../uapi/linux/ethtool_netlink_generated.h | 59 +++++++++ > > 2 files changed, 182 insertions(+) > > > > diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml > > index 4707063af3b4..8bd14a3c946a 100644 > > --- a/Documentation/netlink/specs/ethtool.yaml > > +++ b/Documentation/netlink/specs/ethtool.yaml > > @@ -211,6 +211,49 @@ definitions: > > name: discard > > value: 31 > > > > + - > > + name: loopback-component > > + type: enum > > + doc: | > > + Loopback component. Identifies where in the network path the > > + loopback is applied. > > + entries: > > + - > > + name: mac > > + doc: MAC loopback. Loops traffic at the MAC block. > > + - > > + name: pcs > > + doc: | > > + PCS loopback. Loops traffic at the PCS sublayer between the > > + MAC and the PHY. > > + - > > + name: phy > > + doc: | > > + Ethernet PHY loopback. This refers to the Ethernet PHY managed > > + by phylib, not generic PHY drivers. A Base-T SFP module > > + containing an Ethernet PHY driven by Linux should report > > + loopback under this component, not module. > > + - > > + name: module > > + doc: | > > + Pluggable module (e.g. CMIS (Q)SFP) loopback. Covers loopback > > + modes controlled via module firmware or EEPROM registers. When > > + Linux drives an Ethernet PHY inside the module via phylib, use > > + the phy component instead. > > So to get back on Andrew's remarks, let's see if we can get something > closer to 802.3. > > Here, we have loopback at various locations, which all depends on the > Ethernet standard you use. > > It's usually in the PCS, PMA or PMD components. Thing is, we may have > these in multiple places in our link. > > If we take an example with a 10G PHY, we may have : > > +----SoC-----+ > | | > | MAC |- drivers/net/ethernet > | | | > | Base-R PCS |- could be in drivers/net/pcs, or directly > | | | in the MAC driver > | | | > | SerDes |- May be in drivers/phy, maybe handled by firmware, > | | | maybe by the MAC driver, maybe by the PCS driver ? > +---|--------+ > | > | 10GBase-R > | > +---|-PHY+ > | | | > | SerDes | \ > | | | | > | PCS | | > | | | > All of that handled by the drivers/net/phy PHY driver > | PMA | | > | | | | > | PMD | / > +---|----+ > | > v 10GBaseT > > So even the "PCS" loopback component is a bit ambiguous, are we talking > about the PHY PCS or the MAC PCS ? > > Another thing to consider is that there may be multiple PCSs in the SoC > (e.g. a BaseX and a BaseR PCS like we have in mvpp2), the one in use > depends on the current interface between the MAC and the PHY. > > Another open question is, do we deal with loopbacks that may affect > multi-netdev links ? Like the multi-lane modes we discussed with fbnic, > or even for embedded, interfaces such as QSGMII ? > > As for the SerDes on the MAC side (say, the comphy on Marvell devices), > can we say it's a PMA for 10GBase-KR ? Or is it something that's simply > out of spec ? > > So I'd say, maybe we should not have a PCS loopback component at all, > but instead loopback at the well-defined components on our link, that is: > > - MAC => MAC loopack, PCS on the MAC side, SerDes on the SoC, etc. > - PHY => Loopbacks on the PCS/PHY/PMA withing the PHY device > - Module => For non-PHY (Q)SFPs > > The important part would therefore to get the "name" part right, making > sure we don't fall into driver specific names. > > We can name that 'pcs', 'pma', 'pmd', or maybe even 'mii' ? Let's see : > > +----SoC-----+ > | | > | MAC |- component = MAC, name = 'mac' > | | | > | Base-R PCS |- component = MAC, name = 'pcs' > | | | > | | | > | SerDes |- component = MAC, name = 'mii' ? > | | | > +---|--------+ > | > | 10GBase-R > | > +---|-PHY+ > | | | > | SerDes | - component = PHY, name = 'mii' ? > | | | > | PCS | - component = PHY, name = 'pcs' > | | | > | PMA | - component = PHY, name = 'pma' > | | | > | PMD |- component = PHY, name = 'pmd' or 'mdi' ? > +---|----+ > | > v 10GBaseT > > Sorry that's a lot of questions and I don't expect you to have the > answer, but as what you've come-up with is taking a good shape, it's > important to decide on the overall design and draw some lines about > what do we support, and how :( > > > + - > > + name: loopback-direction > > + type: flags > > + doc: | > > + Loopback direction flags. Used as a bitmask in supported, and as > > + a single value in direction. > > + entries: > > + - > > + name: near-end > > + doc: Near-end loopback; host-loop-host > > + - > > + name: far-end > > + doc: Far-end loopback; line-loop-line > > I was browsing 802.3, it uses the terminlogy of "local loopback" vs > "remote loopback", I suggest we use those. I do not want to overload this initial series with complex topology problems, but we must ensure the proposed UAPI does not block future extensions. I am currently investigating automated datapath diagnostic, and a flat component + name model will eventually fail us. Looking at the current patch: - component (MAC, PCS, PHY, MODULE) - name (subsystem label) - id (local instance selector) - direction (near-end / far-end): These terms become highly ambiguous in branching topologies (like CPU port on DSA switches). mixed loopbacks across complex interconnects, userspace will eventually need a Directed Acyclic Graph (DAG) model. By adopting a DAG topology now, we can reduce the load on the initial implementation and bypass much of the ongoing naming discussions, as components are identified by their topological relations rather than arbitrary string labels. Can we design the netlink attributes now to ensure we are not blocked from adding the following fields later: - node_id: Global system ID. This also allows us to attach more diagnostic points (e.g., hardware counters) to exact subcomponents. - parent_node_id: Upstream pointer for tree reconstruction. - action: Bitmask of hardware modes (e.g., LOOPBACK, GENERATE) to allow simultaneous operations on a single node. See 6.3.1.3.1 Loopback Modes in: https://www.ti.com/lit/ds/symlink/dp83tg720s-q1.pdf?ts=1773225830126 - supported_actions: Bitmask of capabilities (e.g., can this node do LOOPBACK and GENERATE simultaneously?). - direction: Towards parent / from parent. - operational_constraints: MTU limits (e.g., FEC corrupts loopbacks >1522 bytes), clock injection requirements (e.g., stmmac requires external Rx clocks), and required interface modes (e.g. loopback on FEC works only in MII mode) If we hardcode a flat list assumption into the framework now, it will break when we try to automate tests across datapath forks (e.g., SoC -> DSA Switch -> PHYs) or handle complex industrial PHYs... :) Best Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On Wed, 11 Mar 2026 11:50:47 +0100 Oleksij Rempel wrote: > Looking at the current patch: > - component (MAC, PCS, PHY, MODULE) > - name (subsystem label) > - id (local instance selector) > - direction (near-end / far-end): These terms become highly ambiguous in > branching topologies (like CPU port on DSA switches). > > mixed loopbacks across complex interconnects, userspace will eventually need a > Directed Acyclic Graph (DAG) model. > > By adopting a DAG topology now, we can reduce the load on the initial > implementation and bypass much of the ongoing naming discussions, as components > are identified by their topological relations rather than arbitrary string > labels. Not sure we need parentage chain or just "stage id" within each component but FWIW if I interpret what you wrote right - I think I agree :) What matters is the topology not the naming of things.
Maxime Chevallier <maxime.chevallier@bootlin.com> writes: > Hi again Björn, > > First, thanks for iterating so quick ! Thank *you* for helping me navigating the lower levels of the stack! I'm trying to be like the Iron Maiden tune: Be quick or be... ? :-P >> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml >> index 4707063af3b4..8bd14a3c946a 100644 >> --- a/Documentation/netlink/specs/ethtool.yaml >> +++ b/Documentation/netlink/specs/ethtool.yaml >> @@ -211,6 +211,49 @@ definitions: >> name: discard >> value: 31 >> >> + - >> + name: loopback-component >> + type: enum >> + doc: | >> + Loopback component. Identifies where in the network path the >> + loopback is applied. >> + entries: >> + - >> + name: mac >> + doc: MAC loopback. Loops traffic at the MAC block. >> + - >> + name: pcs >> + doc: | >> + PCS loopback. Loops traffic at the PCS sublayer between the >> + MAC and the PHY. >> + - >> + name: phy >> + doc: | >> + Ethernet PHY loopback. This refers to the Ethernet PHY managed >> + by phylib, not generic PHY drivers. A Base-T SFP module >> + containing an Ethernet PHY driven by Linux should report >> + loopback under this component, not module. >> + - >> + name: module >> + doc: | >> + Pluggable module (e.g. CMIS (Q)SFP) loopback. Covers loopback >> + modes controlled via module firmware or EEPROM registers. When >> + Linux drives an Ethernet PHY inside the module via phylib, use >> + the phy component instead. > > So to get back on Andrew's remarks, let's see if we can get something > closer to 802.3. > > Here, we have loopback at various locations, which all depends on the > Ethernet standard you use. > > It's usually in the PCS, PMA or PMD components. Thing is, we may have > these in multiple places in our link. > > If we take an example with a 10G PHY, we may have : > > +----SoC-----+ > | | > | MAC |- drivers/net/ethernet > | | | > | Base-R PCS |- could be in drivers/net/pcs, or directly > | | | in the MAC driver > | | | > | SerDes |- May be in drivers/phy, maybe handled by firmware, > | | | maybe by the MAC driver, maybe by the PCS driver ? > +---|--------+ > | > | 10GBase-R > | > +---|-PHY+ > | | | > | SerDes | \ > | | | | > | PCS | | > | | | > All of that handled by the drivers/net/phy PHY driver > | PMA | | > | | | | > | PMD | / > +---|----+ > | > v 10GBaseT > > So even the "PCS" loopback component is a bit ambiguous, are we talking > about the PHY PCS or the MAC PCS ? > > Another thing to consider is that there may be multiple PCSs in the SoC > (e.g. a BaseX and a BaseR PCS like we have in mvpp2), the one in use > depends on the current interface between the MAC and the PHY. > > Another open question is, do we deal with loopbacks that may affect > multi-netdev links ? Like the multi-lane modes we discussed with fbnic, > or even for embedded, interfaces such as QSGMII ? Hmm, TBH punt on it for now. The current design is per-netdev, and drivers should only expose loopbacks they can scope to a single netdev. Multi-netdev loopbacks can be addressed later if a real use case arises. That keeps the series focused and avoids designing for hypotheticals. > As for the SerDes on the MAC side (say, the comphy on Marvell devices), > can we say it's a PMA for 10GBase-KR ? Or is it something that's simply > out of spec ? > > So I'd say, maybe we should not have a PCS loopback component at all, > but instead loopback at the well-defined components on our link, that is: > > - MAC => MAC loopack, PCS on the MAC side, SerDes on the SoC, etc. > - PHY => Loopbacks on the PCS/PHY/PMA withing the PHY device > - Module => For non-PHY (Q)SFPs Less is more! I like that! So, the component maps to the Linux driver boundary (who owns the loopback), and the name is the 802.3 sublayer within that device? > The important part would therefore to get the "name" part right, making > sure we don't fall into driver specific names. > > We can name that 'pcs', 'pma', 'pmd', or maybe even 'mii' ? Let's see : > > +----SoC-----+ > | | > | MAC |- component = MAC, name = 'mac' > | | | > | Base-R PCS |- component = MAC, name = 'pcs' > | | | > | | | > | SerDes |- component = MAC, name = 'mii' ? > | | | > +---|--------+ > | > | 10GBase-R > | > +---|-PHY+ > | | | > | SerDes | - component = PHY, name = 'mii' ? > | | | > | PCS | - component = PHY, name = 'pcs' > | | | > | PMA | - component = PHY, name = 'pma' > | | | > | PMD |- component = PHY, name = 'pmd' or 'mdi' ? > +---|----+ > | > v 10GBaseT > > Sorry that's a lot of questions and I don't expect you to have the > answer, but as what you've come-up with is taking a good shape, it's > important to decide on the overall design and draw some lines about > what do we support, and how :( (Again, this is why input from folks like you/Andrew/Naveen is excellent! (Hey, I just wanted the CMIS loopback to start with! ;-))) I like this. The nice thing is that since "name" is a string, we're not locked into an enum -- drivers report what they have using 802.3 vocabulary, and we document the recommended names (pcs, pma, pmd, mii) with references? That way it's unambiguous, but not too constrained. For the next spin, I'll drop the pcs component entirely and keep only mac, phy, and module. I'll also expand the component docs to explain that the sublayer granularity lives in the name attribute using 802.3 terminology. How does that sound? >> + - >> + name: loopback-direction >> + type: flags >> + doc: | >> + Loopback direction flags. Used as a bitmask in supported, and as >> + a single value in direction. >> + entries: >> + - >> + name: near-end >> + doc: Near-end loopback; host-loop-host >> + - >> + name: far-end >> + doc: Far-end loopback; line-loop-line > > I was browsing 802.3, it uses the terminlogy of "local loopback" vs > "remote loopback", I suggest we use those. Sounds good! Thanks for taking the time to think through the layering -- this is much cleaner. Björn
> I like this. The nice thing is that since "name" is a string, we're not > locked into an enum -- drivers report what they have using 802.3 > vocabulary, and we document the recommended names (pcs, pma, pmd, mii) > with references? That way it's unambiguous, but not too constrained. It is both good and bad. I expect some vendors will just ignore the text and use what their data sheet says, because they don't know better. An enum forces more consistency. https://gist.github.com/mjball/9cd028ac793ae8b351df1379f1e721f9 enum gets you around level 9. string around level 3. Andrew
On Wed, 11 Mar 2026 16:30:03 +0100 Andrew Lunn wrote: > > I like this. The nice thing is that since "name" is a string, we're not > > locked into an enum -- drivers report what they have using 802.3 > > vocabulary, and we document the recommended names (pcs, pma, pmd, mii) > > with references? That way it's unambiguous, but not too constrained. > > It is both good and bad. I expect some vendors will just ignore the > text and use what their data sheet says, because they don't know > better. An enum forces more consistency. enum only enforces using a value from a fixed set. The uAPI itself cannot enforce functional equivalence. Worst case this may result in different vendors interpreting the enum differently. With a string there's a better chance that even if not matching between the vendors at least a user with a databook in hand will know exactly what the Linux API will do. Could someone please explain to me what the use case for standardization of the enum values would be? I push hard for standardization of stats, because with standard stats its easy to build high level standard tools. I don't understand what value standardization across vendors what the last step of the MAC is called brings us. Please explain, and if we have a real use case in mind it should be possible to write a selftest that verifies that use case is met by any given device. Which reminds me -- I was suggesting that we add an order / id to the stages, not just name. Because AFAIU being able to request the loopback "very last loopback point of MAC" or "first loopback point of PHY" is something that should be doable without user having to parse the names / enums and understanding them.
> Which reminds me -- I was suggesting that we add an order / id to the > stages, not just name. Because AFAIU being able to request the loopback > "very last loopback point of MAC" or "first loopback point of PHY" How do you define where the MAC ends? I suspect some vendors will include the PCS and the PMA, because the MAC ends at the MII pins on their SoC. Other vendors are going to say the MAC ends at the interface to the PCS, especially those who have licensed the PCS, and are using the shared Linux driver for the PCS. And the PMA might again be a shared implementation, since it is also used for USB, PCIe and SATA. If Linux is driving the hardware, using phylink, phylib, PCS drivers and generic PHY, we are very likely to have a uniform definition of all these parts. Are we happy firmware devices will have a much fuzzier, different interpretation, conglomerating it all together? Andrew
On Thu, 12 Mar 2026 14:46:40 +0100 Andrew Lunn wrote: > > Which reminds me -- I was suggesting that we add an order / id to the > > stages, not just name. Because AFAIU being able to request the loopback > > "very last loopback point of MAC" or "first loopback point of PHY" > > How do you define where the MAC ends? MAC may not be the greatest of names because I'd include in it everything past the PHY, up to the DMA blocks. > I suspect some vendors will include the PCS and the PMA, because the > MAC ends at the MII pins on their SoC. Other vendors are going to say > the MAC ends at the interface to the PCS, especially those who have > licensed the PCS, and are using the shared Linux driver for the > PCS. And the PMA might again be a shared implementation, since it is > also used for USB, PCIe and SATA. > > If Linux is driving the hardware, using phylink, phylib, PCS drivers > and generic PHY, we are very likely to have a uniform definition of > all these parts. Are we happy firmware devices will have a much > fuzzier, different interpretation, conglomerating it all together? As long as the kernel API lets "integrated" devices expose both a MAC and a PHY node I don't think we should let anyone conglomerate. I see your point that the enum would work nicely for PHY stages. But it will be limiting for MAC stages. These conflicting preferences make having all of loopback config in one API tricky. I guess we could have a half-measure to add in the kernel the "well known" PHY stage name, and WARN_ON_ONCE() if some driver exposes PHY stage in something that's not a PHY. Or uses an unknown name for a PHY stage?
On 11/03/2026 16:30, Andrew Lunn wrote: >> I like this. The nice thing is that since "name" is a string, we're not >> locked into an enum -- drivers report what they have using 802.3 >> vocabulary, and we document the recommended names (pcs, pma, pmd, mii) >> with references? That way it's unambiguous, but not too constrained. > > It is both good and bad. I expect some vendors will just ignore the > text and use what their data sheet says, because they don't know > better. An enum forces more consistency. > > https://gist.github.com/mjball/9cd028ac793ae8b351df1379f1e721f9 > > enum gets you around level 9. string around level 3. > > Andrew Oh didn't know that manifesto. Givent the current discussions I'm indeed starting to think an enum value will be enough to cover a good portion of the usecases, so I'm OK with it :) Maxime
> > https://gist.github.com/mjball/9cd028ac793ae8b351df1379f1e721f9
> >
> > enum gets you around level 9. string around level 3.
> >
> > Andrew
>
> Oh didn't know that manifesto.
Rusty has not been involved in kernel work for around 10 years. But it
is much older than that, a keynote from Ottawa Linux Symposium 2003.
https://ozlabs.org/~rusty/ols-2003-keynote/img0.html
Andrew
© 2016 - 2026 Red Hat, Inc.