[PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch

Tristram.Ha@microchip.com posted 1 patch 7 months, 1 week ago
There is a newer version of this series
drivers/net/dsa/microchip/Kconfig      |   1 +
drivers/net/dsa/microchip/ksz9477.c    | 191 ++++++++++++++++++++++++-
drivers/net/dsa/microchip/ksz9477.h    |   4 +-
drivers/net/dsa/microchip/ksz_common.c |  36 ++++-
drivers/net/dsa/microchip/ksz_common.h |  23 ++-
5 files changed, 248 insertions(+), 7 deletions(-)
[PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch
Posted by Tristram.Ha@microchip.com 7 months, 1 week ago
From: Tristram Ha <tristram.ha@microchip.com>

The KSZ9477 switch driver uses the XPCS driver to operate its SGMII
port.  However there are some hardware bugs in the KSZ9477 SGMII
module so workarounds are needed.  There was a proposal to update the
XPCS driver to accommodate KSZ9477, but the new code is not generic
enough to be used by other vendors.  It is better to do all these
workarounds inside the KSZ9477 driver instead of modifying the XPCS
driver.

There are 3 hardware issues.  The first is the MII_ADVERTISE register
needs to be write once after reset for the correct code word to be
sent.  The XPCS driver disables auto-negotiation first before
configuring the SGMII/1000BASE-X mode and then enables it back.  The
KSZ9477 driver then writes the MII_ADVERTISE register before enabling
auto-negotiation.  In 1000BASE-X mode the MII_ADVERTISE register will
be set, so KSZ9477 driver does not need to write it.

The second issue is the MII_BMCR register needs to set the exact speed
and duplex mode when running in SGMII mode.  During link polling the
KSZ9477 will check the speed and duplex mode are different from
previous ones and update the MII_BMCR register accordingly.

The last issue is 1000BASE-X mode does not work with auto-negotiation
on.  The cause is the local port hardware does not know the link is up
and so network traffic is not forwarded.  The workaround is to write 2
additional bits when 1000BASE-X mode is configured.

Note the SGMII interrupt in the port cannot be masked.  As that
interrupt is not handled in the KSZ9477 driver the SGMII interrupt bit
will not be set even when the XPCS driver sets it.

Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
---
v2
 - add Kconfig for required XPCS driver build

 drivers/net/dsa/microchip/Kconfig      |   1 +
 drivers/net/dsa/microchip/ksz9477.c    | 191 ++++++++++++++++++++++++-
 drivers/net/dsa/microchip/ksz9477.h    |   4 +-
 drivers/net/dsa/microchip/ksz_common.c |  36 ++++-
 drivers/net/dsa/microchip/ksz_common.h |  23 ++-
 5 files changed, 248 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/microchip/Kconfig b/drivers/net/dsa/microchip/Kconfig
index 12a86585a77f..c71d3fd5dfeb 100644
--- a/drivers/net/dsa/microchip/Kconfig
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -6,6 +6,7 @@ menuconfig NET_DSA_MICROCHIP_KSZ_COMMON
 	select NET_DSA_TAG_NONE
 	select NET_IEEE8021Q_HELPERS
 	select DCB
+	select PCS_XPCS
 	help
 	  This driver adds support for Microchip KSZ8, KSZ9 and
 	  LAN937X series switch chips, being KSZ8863/8873,
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index 29fe79ea74cd..825aa570eed9 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -2,7 +2,7 @@
 /*
  * Microchip KSZ9477 switch driver main logic
  *
- * Copyright (C) 2017-2024 Microchip Technology Inc.
+ * Copyright (C) 2017-2025 Microchip Technology Inc.
  */
 
 #include <linux/kernel.h>
@@ -161,6 +161,187 @@ static int ksz9477_wait_alu_sta_ready(struct ksz_device *dev)
 					10, 1000);
 }
 
+static void port_sgmii_s(struct ksz_device *dev, uint port, u16 devid, u16 reg)
+{
+	u32 data;
+
+	data = (devid & MII_MMD_CTRL_DEVAD_MASK) << 16;
+	data |= reg;
+	ksz_pwrite32(dev, port, REG_PORT_SGMII_ADDR__4, data);
+}
+
+static void port_sgmii_r(struct ksz_device *dev, uint port, u16 devid, u16 reg,
+			 u16 *buf)
+{
+	port_sgmii_s(dev, port, devid, reg);
+	ksz_pread16(dev, port, REG_PORT_SGMII_DATA__4 + 2, buf);
+}
+
+static void port_sgmii_w(struct ksz_device *dev, uint port, u16 devid, u16 reg,
+			 u16 buf)
+{
+	port_sgmii_s(dev, port, devid, reg);
+	ksz_pwrite32(dev, port, REG_PORT_SGMII_DATA__4, buf);
+}
+
+static int ksz9477_pcs_read(struct mii_bus *bus, int phy, int mmd, int reg)
+{
+	struct ksz_device *dev = bus->priv;
+	int port = ksz_get_sgmii_port(dev);
+	u16 val;
+
+	port_sgmii_r(dev, port, mmd, reg, &val);
+
+	/* Simulate a value to activate special code in the XPCS driver if
+	 * supported.
+	 */
+	if (mmd == MDIO_MMD_PMAPMD) {
+		if (reg == MDIO_DEVID1)
+			val = 0x9477;
+		else if (reg == MDIO_DEVID2)
+			val = 0x22 << 10;
+	} else if (mmd == MDIO_MMD_VEND2) {
+		struct ksz_port *p = &dev->ports[port];
+
+		/* Need to update MII_BMCR register with the exact speed and
+		 * duplex mode when running in SGMII mode and this register is
+		 * used to detect connected speed in that mode.
+		 */
+		if (reg == MMD_SR_MII_AUTO_NEG_STATUS) {
+			int duplex, speed;
+
+			if (val & SR_MII_STAT_LINK_UP) {
+				speed = (val >> SR_MII_STAT_S) & SR_MII_STAT_M;
+				if (speed == SR_MII_STAT_1000_MBPS)
+					speed = SPEED_1000;
+				else if (speed == SR_MII_STAT_100_MBPS)
+					speed = SPEED_100;
+				else
+					speed = SPEED_10;
+
+				if (val & SR_MII_STAT_FULL_DUPLEX)
+					duplex = DUPLEX_FULL;
+				else
+					duplex = DUPLEX_HALF;
+
+				if (!p->phydev.link ||
+				    p->phydev.speed != speed ||
+				    p->phydev.duplex != duplex) {
+					u16 ctrl;
+
+					p->phydev.link = 1;
+					p->phydev.speed = speed;
+					p->phydev.duplex = duplex;
+					port_sgmii_r(dev, port, mmd, MII_BMCR,
+						     &ctrl);
+					ctrl &= BMCR_ANENABLE;
+					ctrl |= mii_bmcr_encode_fixed(speed,
+								      duplex);
+					port_sgmii_w(dev, port, mmd, MII_BMCR,
+						     ctrl);
+				}
+			} else {
+				p->phydev.link = 0;
+			}
+		} else if (reg == MII_BMSR) {
+			p->phydev.link = (val & BMSR_LSTATUS);
+		}
+	}
+	return val;
+}
+
+static int ksz9477_pcs_write(struct mii_bus *bus, int phy, int mmd, int reg,
+			     u16 val)
+{
+	struct ksz_device *dev = bus->priv;
+	int port = ksz_get_sgmii_port(dev);
+
+	if (mmd == MDIO_MMD_VEND2) {
+		struct ksz_port *p = &dev->ports[port];
+
+		if (reg == MMD_SR_MII_AUTO_NEG_CTRL) {
+			u16 sgmii_mode = SR_MII_PCS_SGMII << SR_MII_PCS_MODE_S;
+
+			/* Need these bits for 1000BASE-X mode to work with
+			 * AN on.
+			 */
+			if (!(val & sgmii_mode))
+				val |= SR_MII_SGMII_LINK_UP |
+				       SR_MII_TX_CFG_PHY_MASTER;
+
+			/* SGMII interrupt in the port cannot be masked, so
+			 * make sure interrupt is not enabled as it is not
+			 * handled.
+			 */
+			val &= ~SR_MII_AUTO_NEG_COMPLETE_INTR;
+		} else if (reg == MII_BMCR) {
+			/* The MII_ADVERTISE register needs to write once
+			 * before doing auto-negotiation for the correct
+			 * config_word to be sent out after reset.
+			 */
+			if ((val & BMCR_ANENABLE) && !p->sgmii_adv_write) {
+				u16 adv;
+
+				/* The SGMII port cannot disable flow contrl
+				 * so it is better to just advertise symmetric
+				 * pause.
+				 */
+				port_sgmii_r(dev, port, mmd, MII_ADVERTISE,
+					     &adv);
+				adv |= ADVERTISE_1000XPAUSE;
+				adv &= ~ADVERTISE_1000XPSE_ASYM;
+				port_sgmii_w(dev, port, mmd, MII_ADVERTISE,
+					     adv);
+				p->sgmii_adv_write = 1;
+			} else if (val & BMCR_RESET) {
+				p->sgmii_adv_write = 0;
+			}
+		} else if (reg == MII_ADVERTISE) {
+			/* XPCS driver writes to this register so there is no
+			 * need to update it for the errata.
+			 */
+			p->sgmii_adv_write = 1;
+		}
+	}
+	port_sgmii_w(dev, port, mmd, reg, val);
+	return 0;
+}
+
+int ksz9477_pcs_create(struct ksz_device *dev)
+{
+	/* This chip has a SGMII port. */
+	if (ksz_has_sgmii_port(dev)) {
+		int port = ksz_get_sgmii_port(dev);
+		struct ksz_port *p = &dev->ports[port];
+		struct phylink_pcs *pcs;
+		struct mii_bus *bus;
+		int ret;
+
+		bus = devm_mdiobus_alloc(dev->dev);
+		if (!bus)
+			return -ENOMEM;
+
+		bus->name = "ksz_pcs_mdio_bus";
+		snprintf(bus->id, MII_BUS_ID_SIZE, "%s-pcs",
+			 dev_name(dev->dev));
+		bus->read_c45 = &ksz9477_pcs_read;
+		bus->write_c45 = &ksz9477_pcs_write;
+		bus->parent = dev->dev;
+		bus->phy_mask = ~0;
+		bus->priv = dev;
+
+		ret = devm_mdiobus_register(dev->dev, bus);
+		if (ret)
+			return ret;
+
+		pcs = xpcs_create_pcs_mdiodev(bus, 0);
+		if (IS_ERR(pcs))
+			return PTR_ERR(pcs);
+		p->pcs = pcs;
+	}
+	return 0;
+}
+
 int ksz9477_reset_switch(struct ksz_device *dev)
 {
 	u8 data8;
@@ -978,6 +1159,14 @@ void ksz9477_get_caps(struct ksz_device *dev, int port,
 
 	if (dev->info->gbit_capable[port])
 		config->mac_capabilities |= MAC_1000FD;
+
+	if (ksz_is_sgmii_port(dev, port)) {
+		struct ksz_port *p = &dev->ports[port];
+
+		phy_interface_or(config->supported_interfaces,
+				 config->supported_interfaces,
+				 p->pcs->supported_interfaces);
+	}
 }
 
 int ksz9477_set_ageing_time(struct ksz_device *dev, unsigned int msecs)
diff --git a/drivers/net/dsa/microchip/ksz9477.h b/drivers/net/dsa/microchip/ksz9477.h
index d2166b0d881e..0d1a6dfda23e 100644
--- a/drivers/net/dsa/microchip/ksz9477.h
+++ b/drivers/net/dsa/microchip/ksz9477.h
@@ -2,7 +2,7 @@
 /*
  * Microchip KSZ9477 series Header file
  *
- * Copyright (C) 2017-2022 Microchip Technology Inc.
+ * Copyright (C) 2017-2025 Microchip Technology Inc.
  */
 
 #ifndef __KSZ9477_H
@@ -97,4 +97,6 @@ void ksz9477_acl_match_process_l2(struct ksz_device *dev, int port,
 				  u16 ethtype, u8 *src_mac, u8 *dst_mac,
 				  unsigned long cookie, u32 prio);
 
+int ksz9477_pcs_create(struct ksz_device *dev);
+
 #endif
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index b45052497f8a..c93a567a4c3b 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -2,7 +2,7 @@
 /*
  * Microchip switch driver main logic
  *
- * Copyright (C) 2017-2024 Microchip Technology Inc.
+ * Copyright (C) 2017-2025 Microchip Technology Inc.
  */
 
 #include <linux/delay.h>
@@ -354,10 +354,26 @@ static void ksz9477_phylink_mac_link_up(struct phylink_config *config,
 					int speed, int duplex, bool tx_pause,
 					bool rx_pause);
 
+static struct phylink_pcs *
+ksz_phylink_mac_select_pcs(struct phylink_config *config,
+			   phy_interface_t interface)
+{
+	struct dsa_port *dp = dsa_phylink_to_port(config);
+	struct ksz_device *dev = dp->ds->priv;
+	struct ksz_port *p = &dev->ports[dp->index];
+
+	if (ksz_is_sgmii_port(dev, dp->index) &&
+	    (interface == PHY_INTERFACE_MODE_SGMII ||
+	    interface == PHY_INTERFACE_MODE_1000BASEX))
+		return p->pcs;
+	return NULL;
+}
+
 static const struct phylink_mac_ops ksz9477_phylink_mac_ops = {
 	.mac_config	= ksz_phylink_mac_config,
 	.mac_link_down	= ksz_phylink_mac_link_down,
 	.mac_link_up	= ksz9477_phylink_mac_link_up,
+	.mac_select_pcs	= ksz_phylink_mac_select_pcs,
 };
 
 static const struct ksz_dev_ops ksz9477_dev_ops = {
@@ -395,6 +411,7 @@ static const struct ksz_dev_ops ksz9477_dev_ops = {
 	.reset = ksz9477_reset_switch,
 	.init = ksz9477_switch_init,
 	.exit = ksz9477_switch_exit,
+	.pcs_create = ksz9477_pcs_create,
 };
 
 static const struct phylink_mac_ops lan937x_phylink_mac_ops = {
@@ -1035,8 +1052,7 @@ static const struct regmap_range ksz9477_valid_regs[] = {
 	regmap_reg_range(0x701b, 0x701b),
 	regmap_reg_range(0x701f, 0x7020),
 	regmap_reg_range(0x7030, 0x7030),
-	regmap_reg_range(0x7200, 0x7203),
-	regmap_reg_range(0x7206, 0x7207),
+	regmap_reg_range(0x7200, 0x7207),
 	regmap_reg_range(0x7300, 0x7301),
 	regmap_reg_range(0x7400, 0x7401),
 	regmap_reg_range(0x7403, 0x7403),
@@ -1552,6 +1568,7 @@ const struct ksz_chip_data ksz_switch_chips[] = {
 				   true, false, false},
 		.gbit_capable	= {true, true, true, true, true, true, true},
 		.ptp_capable = true,
+		.sgmii_port = 7,
 		.wr_table = &ksz9477_register_set,
 		.rd_table = &ksz9477_register_set,
 	},
@@ -1944,6 +1961,7 @@ const struct ksz_chip_data ksz_switch_chips[] = {
 		.internal_phy	= {true, true, true, true,
 				   true, false, false},
 		.gbit_capable	= {true, true, true, true, true, true, true},
+		.sgmii_port = 7,
 		.wr_table = &ksz9477_register_set,
 		.rd_table = &ksz9477_register_set,
 	},
@@ -2067,7 +2085,7 @@ void ksz_r_mib_stats64(struct ksz_device *dev, int port)
 
 	spin_unlock(&mib->stats64_lock);
 
-	if (dev->info->phy_errata_9477) {
+	if (dev->info->phy_errata_9477 && !ksz_is_sgmii_port(dev, port)) {
 		ret = ksz9477_errata_monitor(dev, port, raw->tx_late_col);
 		if (ret)
 			dev_err(dev->dev, "Failed to monitor transmission halt\n");
@@ -2775,6 +2793,12 @@ static int ksz_setup(struct dsa_switch *ds)
 	if (ret)
 		return ret;
 
+	if (ksz_has_sgmii_port(dev) && dev->dev_ops->pcs_create) {
+		ret = dev->dev_ops->pcs_create(dev);
+		if (ret)
+			return ret;
+	}
+
 	/* set broadcast storm protection 10% rate */
 	regmap_update_bits(ksz_regmap_16(dev), regs[S_BROADCAST_CTRL],
 			   BROADCAST_STORM_RATE,
@@ -3613,6 +3637,10 @@ static void ksz_phylink_mac_config(struct phylink_config *config,
 	if (dev->info->internal_phy[port])
 		return;
 
+	/* No need to configure XMII control register when using SGMII. */
+	if (ksz_is_sgmii_port(dev, port))
+		return;
+
 	if (phylink_autoneg_inband(mode)) {
 		dev_err(dev->dev, "In-band AN not supported!\n");
 		return;
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index dd5429ff16ee..84e9e423980d 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Microchip switch driver common header
  *
- * Copyright (C) 2017-2024 Microchip Technology Inc.
+ * Copyright (C) 2017-2025 Microchip Technology Inc.
  */
 
 #ifndef __KSZ_COMMON_H
@@ -10,6 +10,7 @@
 #include <linux/etherdevice.h>
 #include <linux/kernel.h>
 #include <linux/mutex.h>
+#include <linux/pcs/pcs-xpcs.h>
 #include <linux/phy.h>
 #include <linux/regmap.h>
 #include <net/dsa.h>
@@ -93,6 +94,7 @@ struct ksz_chip_data {
 	bool internal_phy[KSZ_MAX_NUM_PORTS];
 	bool gbit_capable[KSZ_MAX_NUM_PORTS];
 	bool ptp_capable;
+	u8 sgmii_port;
 	const struct regmap_access_table *wr_table;
 	const struct regmap_access_table *rd_table;
 };
@@ -132,6 +134,7 @@ struct ksz_port {
 	u32 force:1;
 	u32 read:1;			/* read MIB counters in background */
 	u32 freeze:1;			/* MIB counter freeze is enabled */
+	u32 sgmii_adv_write:1;
 
 	struct ksz_port_mib mib;
 	phy_interface_t interface;
@@ -141,6 +144,7 @@ struct ksz_port {
 	void *acl_priv;
 	struct ksz_irq pirq;
 	u8 num;
+	struct phylink_pcs *pcs;
 #if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ_PTP)
 	struct hwtstamp_config tstamp_config;
 	bool hwts_tx_en;
@@ -440,6 +444,8 @@ struct ksz_dev_ops {
 	int (*reset)(struct ksz_device *dev);
 	int (*init)(struct ksz_device *dev);
 	void (*exit)(struct ksz_device *dev);
+
+	int (*pcs_create)(struct ksz_device *dev);
 };
 
 struct ksz_device *ksz_switch_alloc(struct device *base, void *priv);
@@ -731,6 +737,21 @@ static inline bool is_lan937x_tx_phy(struct ksz_device *dev, int port)
 		dev->chip_id == LAN9372_CHIP_ID) && port == KSZ_PORT_4;
 }
 
+static inline int ksz_get_sgmii_port(struct ksz_device *dev)
+{
+	return dev->info->sgmii_port - 1;
+}
+
+static inline bool ksz_has_sgmii_port(struct ksz_device *dev)
+{
+	return dev->info->sgmii_port > 0;
+}
+
+static inline bool ksz_is_sgmii_port(struct ksz_device *dev, int port)
+{
+	return dev->info->sgmii_port == port + 1;
+}
+
 /* STP State Defines */
 #define PORT_TX_ENABLE			BIT(2)
 #define PORT_RX_ENABLE			BIT(1)
-- 
2.34.1
Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch
Posted by Jakub Kicinski 7 months, 1 week ago
On Tue, 6 May 2025 17:09:11 -0700 Tristram.Ha@microchip.com wrote:
>  drivers/net/dsa/microchip/Kconfig      |   1 +
>  drivers/net/dsa/microchip/ksz9477.c    | 191 ++++++++++++++++++++++++-
>  drivers/net/dsa/microchip/ksz9477.h    |   4 +-
>  drivers/net/dsa/microchip/ksz_common.c |  36 ++++-
>  drivers/net/dsa/microchip/ksz_common.h |  23 ++-

No longer applies cleanly, please rebase:

Applying: net: dsa: microchip: Add SGMII port support to KSZ9477 switch
Using index info to reconstruct a base tree...
M	drivers/net/dsa/microchip/ksz_common.h
Falling back to patching base and 3-way merge...
-- 
pw-bot: cr
Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch
Posted by Maxime Chevallier 7 months, 1 week ago
Hi Tristram,

On Tue, 6 May 2025 17:09:11 -0700
<Tristram.Ha@microchip.com> wrote:

> From: Tristram Ha <tristram.ha@microchip.com>
> 
> The KSZ9477 switch driver uses the XPCS driver to operate its SGMII
> port.  However there are some hardware bugs in the KSZ9477 SGMII
> module so workarounds are needed.  There was a proposal to update the
> XPCS driver to accommodate KSZ9477, but the new code is not generic
> enough to be used by other vendors.  It is better to do all these
> workarounds inside the KSZ9477 driver instead of modifying the XPCS
> driver.
> 
> There are 3 hardware issues.  The first is the MII_ADVERTISE register
> needs to be write once after reset for the correct code word to be
> sent.  The XPCS driver disables auto-negotiation first before
> configuring the SGMII/1000BASE-X mode and then enables it back.  The
> KSZ9477 driver then writes the MII_ADVERTISE register before enabling
> auto-negotiation.  In 1000BASE-X mode the MII_ADVERTISE register will
> be set, so KSZ9477 driver does not need to write it.
> 
> The second issue is the MII_BMCR register needs to set the exact speed
> and duplex mode when running in SGMII mode.  During link polling the
> KSZ9477 will check the speed and duplex mode are different from
> previous ones and update the MII_BMCR register accordingly.
> 
> The last issue is 1000BASE-X mode does not work with auto-negotiation
> on.  The cause is the local port hardware does not know the link is up
> and so network traffic is not forwarded.  The workaround is to write 2
> additional bits when 1000BASE-X mode is configured.
> 
> Note the SGMII interrupt in the port cannot be masked.  As that
> interrupt is not handled in the KSZ9477 driver the SGMII interrupt bit
> will not be set even when the XPCS driver sets it.
>
> Signed-off-by: Tristram Ha <tristram.ha@microchip.com>

[...]

> +
> +static int ksz9477_pcs_read(struct mii_bus *bus, int phy, int mmd, int reg)
> +{
> +	struct ksz_device *dev = bus->priv;
> +	int port = ksz_get_sgmii_port(dev);
> +	u16 val;
> +
> +	port_sgmii_r(dev, port, mmd, reg, &val);
> +
> +	/* Simulate a value to activate special code in the XPCS driver if
> +	 * supported.
> +	 */
> +	if (mmd == MDIO_MMD_PMAPMD) {
> +		if (reg == MDIO_DEVID1)
> +			val = 0x9477;
> +		else if (reg == MDIO_DEVID2)
> +			val = 0x22 << 10;
> +	} else if (mmd == MDIO_MMD_VEND2) {
> +		struct ksz_port *p = &dev->ports[port];
> +
> +		/* Need to update MII_BMCR register with the exact speed and
> +		 * duplex mode when running in SGMII mode and this register is
> +		 * used to detect connected speed in that mode.
> +		 */
> +		if (reg == MMD_SR_MII_AUTO_NEG_STATUS) {
> +			int duplex, speed;
> +
> +			if (val & SR_MII_STAT_LINK_UP) {
> +				speed = (val >> SR_MII_STAT_S) & SR_MII_STAT_M;
> +				if (speed == SR_MII_STAT_1000_MBPS)
> +					speed = SPEED_1000;
> +				else if (speed == SR_MII_STAT_100_MBPS)
> +					speed = SPEED_100;
> +				else
> +					speed = SPEED_10;
> +
> +				if (val & SR_MII_STAT_FULL_DUPLEX)
> +					duplex = DUPLEX_FULL;
> +				else
> +					duplex = DUPLEX_HALF;
> +
> +				if (!p->phydev.link ||
> +				    p->phydev.speed != speed ||
> +				    p->phydev.duplex != duplex) {
> +					u16 ctrl;
> +
> +					p->phydev.link = 1;
> +					p->phydev.speed = speed;
> +					p->phydev.duplex = duplex;
> +					port_sgmii_r(dev, port, mmd, MII_BMCR,
> +						     &ctrl);
> +					ctrl &= BMCR_ANENABLE;
> +					ctrl |= mii_bmcr_encode_fixed(speed,
> +								      duplex);
> +					port_sgmii_w(dev, port, mmd, MII_BMCR,
> +						     ctrl);
> +				}
> +			} else {
> +				p->phydev.link = 0;
> +			}
> +		} else if (reg == MII_BMSR) {
> +			p->phydev.link = (val & BMSR_LSTATUS);
> +		}
> +	}
> +	return val;
> +}
> +
> +static int ksz9477_pcs_write(struct mii_bus *bus, int phy, int mmd, int reg,
> +			     u16 val)
> +{
> +	struct ksz_device *dev = bus->priv;
> +	int port = ksz_get_sgmii_port(dev);
> +
> +	if (mmd == MDIO_MMD_VEND2) {
> +		struct ksz_port *p = &dev->ports[port];
> +
> +		if (reg == MMD_SR_MII_AUTO_NEG_CTRL) {
> +			u16 sgmii_mode = SR_MII_PCS_SGMII << SR_MII_PCS_MODE_S;
> +
> +			/* Need these bits for 1000BASE-X mode to work with
> +			 * AN on.
> +			 */
> +			if (!(val & sgmii_mode))
> +				val |= SR_MII_SGMII_LINK_UP |
> +				       SR_MII_TX_CFG_PHY_MASTER;
> +
> +			/* SGMII interrupt in the port cannot be masked, so
> +			 * make sure interrupt is not enabled as it is not
> +			 * handled.
> +			 */
> +			val &= ~SR_MII_AUTO_NEG_COMPLETE_INTR;
> +		} else if (reg == MII_BMCR) {
> +			/* The MII_ADVERTISE register needs to write once
> +			 * before doing auto-negotiation for the correct
> +			 * config_word to be sent out after reset.
> +			 */
> +			if ((val & BMCR_ANENABLE) && !p->sgmii_adv_write) {
> +				u16 adv;
> +
> +				/* The SGMII port cannot disable flow contrl
> +				 * so it is better to just advertise symmetric
> +				 * pause.
> +				 */
> +				port_sgmii_r(dev, port, mmd, MII_ADVERTISE,
> +					     &adv);
> +				adv |= ADVERTISE_1000XPAUSE;
> +				adv &= ~ADVERTISE_1000XPSE_ASYM;
> +				port_sgmii_w(dev, port, mmd, MII_ADVERTISE,
> +					     adv);
> +				p->sgmii_adv_write = 1;
> +			} else if (val & BMCR_RESET) {
> +				p->sgmii_adv_write = 0;
> +			}
> +		} else if (reg == MII_ADVERTISE) {
> +			/* XPCS driver writes to this register so there is no
> +			 * need to update it for the errata.
> +			 */
> +			p->sgmii_adv_write = 1;
> +		}
> +	}
> +	port_sgmii_w(dev, port, mmd, reg, val);
> +	return 0;
> +}

I'm a bit confused here, are you intercepting r/w ops that are supposed
to be handled by xpcs ?

Russell has sent a series [1] (not merged yet, I think we were waiting
on some feedback from Synopsys folks ?) to properly support the XPCS
version that's in KSZ9477, and you also had a patchset that didn't
require all this sgmii_r/w snooping [2].

I've been running your previous patchset on top of Russell's for a few
months, if works fine with SGMII as well as 1000BaseX :)

Can we maybe focus on getting pcs-xpcs to properly support this version
of the IP instead of these 2 R/W functions ? Or did I miss something in
the previous discussions ?

Maxime

[1] : https://lore.kernel.org/netdev/Z6NnPm13D1n5-Qlw@shell.armlinux.org.uk/ 
[2] : https://lore.kernel.org/netdev/20250208002417.58634-1-Tristram.Ha@microchip.com/
Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch
Posted by Russell King (Oracle) 7 months, 1 week ago
On Wed, May 07, 2025 at 09:44:49AM +0200, Maxime Chevallier wrote:
> Hi Tristram,
> 
> On Tue, 6 May 2025 17:09:11 -0700
> <Tristram.Ha@microchip.com> wrote:
> 
> > From: Tristram Ha <tristram.ha@microchip.com>
> > 
> > The KSZ9477 switch driver uses the XPCS driver to operate its SGMII
> > port.  However there are some hardware bugs in the KSZ9477 SGMII
> > module so workarounds are needed.  There was a proposal to update the
> > XPCS driver to accommodate KSZ9477, but the new code is not generic
> > enough to be used by other vendors.  It is better to do all these
> > workarounds inside the KSZ9477 driver instead of modifying the XPCS
> > driver.
> > 
> > There are 3 hardware issues.  The first is the MII_ADVERTISE register
> > needs to be write once after reset for the correct code word to be
> > sent.  The XPCS driver disables auto-negotiation first before
> > configuring the SGMII/1000BASE-X mode and then enables it back.  The
> > KSZ9477 driver then writes the MII_ADVERTISE register before enabling
> > auto-negotiation.  In 1000BASE-X mode the MII_ADVERTISE register will
> > be set, so KSZ9477 driver does not need to write it.
> > 
> > The second issue is the MII_BMCR register needs to set the exact speed
> > and duplex mode when running in SGMII mode.  During link polling the
> > KSZ9477 will check the speed and duplex mode are different from
> > previous ones and update the MII_BMCR register accordingly.
> > 
> > The last issue is 1000BASE-X mode does not work with auto-negotiation
> > on.  The cause is the local port hardware does not know the link is up
> > and so network traffic is not forwarded.  The workaround is to write 2
> > additional bits when 1000BASE-X mode is configured.
> > 
> > Note the SGMII interrupt in the port cannot be masked.  As that
> > interrupt is not handled in the KSZ9477 driver the SGMII interrupt bit
> > will not be set even when the XPCS driver sets it.
> >
> > Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
> 
> [...]
> 
> > +
> > +static int ksz9477_pcs_read(struct mii_bus *bus, int phy, int mmd, int reg)
> > +{
> > +	struct ksz_device *dev = bus->priv;
> > +	int port = ksz_get_sgmii_port(dev);
> > +	u16 val;
> > +
> > +	port_sgmii_r(dev, port, mmd, reg, &val);
> > +
> > +	/* Simulate a value to activate special code in the XPCS driver if
> > +	 * supported.
> > +	 */
> > +	if (mmd == MDIO_MMD_PMAPMD) {
> > +		if (reg == MDIO_DEVID1)
> > +			val = 0x9477;
> > +		else if (reg == MDIO_DEVID2)
> > +			val = 0x22 << 10;
> > +	} else if (mmd == MDIO_MMD_VEND2) {
> > +		struct ksz_port *p = &dev->ports[port];
> > +
> > +		/* Need to update MII_BMCR register with the exact speed and
> > +		 * duplex mode when running in SGMII mode and this register is
> > +		 * used to detect connected speed in that mode.
> > +		 */
> > +		if (reg == MMD_SR_MII_AUTO_NEG_STATUS) {
> > +			int duplex, speed;
> > +
> > +			if (val & SR_MII_STAT_LINK_UP) {
> > +				speed = (val >> SR_MII_STAT_S) & SR_MII_STAT_M;
> > +				if (speed == SR_MII_STAT_1000_MBPS)
> > +					speed = SPEED_1000;
> > +				else if (speed == SR_MII_STAT_100_MBPS)
> > +					speed = SPEED_100;
> > +				else
> > +					speed = SPEED_10;
> > +
> > +				if (val & SR_MII_STAT_FULL_DUPLEX)
> > +					duplex = DUPLEX_FULL;
> > +				else
> > +					duplex = DUPLEX_HALF;
> > +
> > +				if (!p->phydev.link ||
> > +				    p->phydev.speed != speed ||
> > +				    p->phydev.duplex != duplex) {
> > +					u16 ctrl;
> > +
> > +					p->phydev.link = 1;
> > +					p->phydev.speed = speed;
> > +					p->phydev.duplex = duplex;
> > +					port_sgmii_r(dev, port, mmd, MII_BMCR,
> > +						     &ctrl);
> > +					ctrl &= BMCR_ANENABLE;
> > +					ctrl |= mii_bmcr_encode_fixed(speed,
> > +								      duplex);
> > +					port_sgmii_w(dev, port, mmd, MII_BMCR,
> > +						     ctrl);
> > +				}
> > +			} else {
> > +				p->phydev.link = 0;
> > +			}
> > +		} else if (reg == MII_BMSR) {
> > +			p->phydev.link = (val & BMSR_LSTATUS);
> > +		}
> > +	}
> > +	return val;
> > +}
> > +
> > +static int ksz9477_pcs_write(struct mii_bus *bus, int phy, int mmd, int reg,
> > +			     u16 val)
> > +{
> > +	struct ksz_device *dev = bus->priv;
> > +	int port = ksz_get_sgmii_port(dev);
> > +
> > +	if (mmd == MDIO_MMD_VEND2) {
> > +		struct ksz_port *p = &dev->ports[port];
> > +
> > +		if (reg == MMD_SR_MII_AUTO_NEG_CTRL) {
> > +			u16 sgmii_mode = SR_MII_PCS_SGMII << SR_MII_PCS_MODE_S;
> > +
> > +			/* Need these bits for 1000BASE-X mode to work with
> > +			 * AN on.
> > +			 */
> > +			if (!(val & sgmii_mode))
> > +				val |= SR_MII_SGMII_LINK_UP |
> > +				       SR_MII_TX_CFG_PHY_MASTER;
> > +
> > +			/* SGMII interrupt in the port cannot be masked, so
> > +			 * make sure interrupt is not enabled as it is not
> > +			 * handled.
> > +			 */
> > +			val &= ~SR_MII_AUTO_NEG_COMPLETE_INTR;
> > +		} else if (reg == MII_BMCR) {
> > +			/* The MII_ADVERTISE register needs to write once
> > +			 * before doing auto-negotiation for the correct
> > +			 * config_word to be sent out after reset.
> > +			 */
> > +			if ((val & BMCR_ANENABLE) && !p->sgmii_adv_write) {
> > +				u16 adv;
> > +
> > +				/* The SGMII port cannot disable flow contrl
> > +				 * so it is better to just advertise symmetric
> > +				 * pause.
> > +				 */
> > +				port_sgmii_r(dev, port, mmd, MII_ADVERTISE,
> > +					     &adv);
> > +				adv |= ADVERTISE_1000XPAUSE;
> > +				adv &= ~ADVERTISE_1000XPSE_ASYM;
> > +				port_sgmii_w(dev, port, mmd, MII_ADVERTISE,
> > +					     adv);
> > +				p->sgmii_adv_write = 1;
> > +			} else if (val & BMCR_RESET) {
> > +				p->sgmii_adv_write = 0;
> > +			}
> > +		} else if (reg == MII_ADVERTISE) {
> > +			/* XPCS driver writes to this register so there is no
> > +			 * need to update it for the errata.
> > +			 */
> > +			p->sgmii_adv_write = 1;
> > +		}
> > +	}
> > +	port_sgmii_w(dev, port, mmd, reg, val);
> > +	return 0;
> > +}
> 
> I'm a bit confused here, are you intercepting r/w ops that are supposed
> to be handled by xpcs ?
> 
> Russell has sent a series [1] (not merged yet, I think we were waiting
> on some feedback from Synopsys folks ?) to properly support the XPCS
> version that's in KSZ9477, and you also had a patchset that didn't
> require all this sgmii_r/w snooping [2].
> 
> I've been running your previous patchset on top of Russell's for a few
> months, if works fine with SGMII as well as 1000BaseX :)
> 
> Can we maybe focus on getting pcs-xpcs to properly support this version
> of the IP instead of these 2 R/W functions ? Or did I miss something in
> the previous discussions ?

Honestly, I don't think Tristram is doing anything unreasonable here,
given what Vladimir has been saying. Essentially, we've been blocking
a way forward on the pcs-xpcs driver. We've had statements from the
hardware designers from Microchip. We've had statements from Synopsys.
The two don't quite agree, but that's not atypical. Yet, we're still
demanding why the Microchip version of XPCS is different.

So what's left for Tristram to do other than to hack around the blockage
we're causing by intercepting the read/write ops and bodging them.

As I understand the situation, this is Jose's response having asked
internally at my request:

https://lore.kernel.org/netdev/DM4PR12MB5088BA650B164D5CEC33CA08D3E82@DM4PR12MB5088.namprd12.prod.outlook.com/

To put it another way, as far as Synopsys can tell us, they are unaware
of the Microchip behaviour, but customers can modify the Synopsys IP.

Maybe Microchip's version is based on an old Synopsys version, but
which was modified by Microchip a long time ago and those engineers
have moved on, and no one really knows anymore. I doubt that we are
ever going to get to the bottom of the different behaviour.

So, what do we do now? Do we continue playing hardball and basically
saying "no" to changing the XPCS driver, demanding information that
doesn't seem to exist anymore? Or do we try to come up with an
approach that works.

I draw attention to the last sentence in Jose's quote in his reply.
As far as the Synopsys folk are concerned, setting these bits to 1
should have no effect provided there aren't customer modifications to
the IP that depend on these being set to zero.

That last bit is where I think the sticking point between Vladimir and
myself is - I'm in favour of keeping things simple and just setting
the bits. Vladimir feels it would be safer to make it conditional,
which leads to more complicated code.

I didn't progress my series because I decided it was a waste of time
to try and progress this any further - I'd dug up the SJA1105 docs to
see what they said, I'd reached out to Synopsys and got a statement
back, and still Vladimir wasn't happy.

With Vladimir continuing to demand information from Tristram that just
didn't exist, I saw that the

[rest of the email got deleted because Linux / X11 / KDE got confused
about the state the backspace key and decided it was going to be
continuously pressed and doing nothing except shutting the laptop
down would stop it.]

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch
Posted by Maxime Chevallier 7 months, 1 week ago
On Wed, 7 May 2025 09:31:48 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, May 07, 2025 at 09:44:49AM +0200, Maxime Chevallier wrote:
> > Hi Tristram,
> > 
> > On Tue, 6 May 2025 17:09:11 -0700
> > <Tristram.Ha@microchip.com> wrote:
> >   
> > > From: Tristram Ha <tristram.ha@microchip.com>
> > > 
> > > The KSZ9477 switch driver uses the XPCS driver to operate its SGMII
> > > port.  However there are some hardware bugs in the KSZ9477 SGMII
> > > module so workarounds are needed.  There was a proposal to update the
> > > XPCS driver to accommodate KSZ9477, but the new code is not generic
> > > enough to be used by other vendors.  It is better to do all these
> > > workarounds inside the KSZ9477 driver instead of modifying the XPCS
> > > driver.
> > > 
> > > There are 3 hardware issues.  The first is the MII_ADVERTISE register
> > > needs to be write once after reset for the correct code word to be
> > > sent.  The XPCS driver disables auto-negotiation first before
> > > configuring the SGMII/1000BASE-X mode and then enables it back.  The
> > > KSZ9477 driver then writes the MII_ADVERTISE register before enabling
> > > auto-negotiation.  In 1000BASE-X mode the MII_ADVERTISE register will
> > > be set, so KSZ9477 driver does not need to write it.
> > > 
> > > The second issue is the MII_BMCR register needs to set the exact speed
> > > and duplex mode when running in SGMII mode.  During link polling the
> > > KSZ9477 will check the speed and duplex mode are different from
> > > previous ones and update the MII_BMCR register accordingly.
> > > 
> > > The last issue is 1000BASE-X mode does not work with auto-negotiation
> > > on.  The cause is the local port hardware does not know the link is up
> > > and so network traffic is not forwarded.  The workaround is to write 2
> > > additional bits when 1000BASE-X mode is configured.
> > > 
> > > Note the SGMII interrupt in the port cannot be masked.  As that
> > > interrupt is not handled in the KSZ9477 driver the SGMII interrupt bit
> > > will not be set even when the XPCS driver sets it.
> > >
> > > Signed-off-by: Tristram Ha <tristram.ha@microchip.com>  
> > 
> > [...]
> >   
> > > +
> > > +static int ksz9477_pcs_read(struct mii_bus *bus, int phy, int mmd, int reg)
> > > +{
> > > +	struct ksz_device *dev = bus->priv;
> > > +	int port = ksz_get_sgmii_port(dev);
> > > +	u16 val;
> > > +
> > > +	port_sgmii_r(dev, port, mmd, reg, &val);
> > > +
> > > +	/* Simulate a value to activate special code in the XPCS driver if
> > > +	 * supported.
> > > +	 */
> > > +	if (mmd == MDIO_MMD_PMAPMD) {
> > > +		if (reg == MDIO_DEVID1)
> > > +			val = 0x9477;
> > > +		else if (reg == MDIO_DEVID2)
> > > +			val = 0x22 << 10;
> > > +	} else if (mmd == MDIO_MMD_VEND2) {
> > > +		struct ksz_port *p = &dev->ports[port];
> > > +
> > > +		/* Need to update MII_BMCR register with the exact speed and
> > > +		 * duplex mode when running in SGMII mode and this register is
> > > +		 * used to detect connected speed in that mode.
> > > +		 */
> > > +		if (reg == MMD_SR_MII_AUTO_NEG_STATUS) {
> > > +			int duplex, speed;
> > > +
> > > +			if (val & SR_MII_STAT_LINK_UP) {
> > > +				speed = (val >> SR_MII_STAT_S) & SR_MII_STAT_M;
> > > +				if (speed == SR_MII_STAT_1000_MBPS)
> > > +					speed = SPEED_1000;
> > > +				else if (speed == SR_MII_STAT_100_MBPS)
> > > +					speed = SPEED_100;
> > > +				else
> > > +					speed = SPEED_10;
> > > +
> > > +				if (val & SR_MII_STAT_FULL_DUPLEX)
> > > +					duplex = DUPLEX_FULL;
> > > +				else
> > > +					duplex = DUPLEX_HALF;
> > > +
> > > +				if (!p->phydev.link ||
> > > +				    p->phydev.speed != speed ||
> > > +				    p->phydev.duplex != duplex) {
> > > +					u16 ctrl;
> > > +
> > > +					p->phydev.link = 1;
> > > +					p->phydev.speed = speed;
> > > +					p->phydev.duplex = duplex;
> > > +					port_sgmii_r(dev, port, mmd, MII_BMCR,
> > > +						     &ctrl);
> > > +					ctrl &= BMCR_ANENABLE;
> > > +					ctrl |= mii_bmcr_encode_fixed(speed,
> > > +								      duplex);
> > > +					port_sgmii_w(dev, port, mmd, MII_BMCR,
> > > +						     ctrl);
> > > +				}
> > > +			} else {
> > > +				p->phydev.link = 0;
> > > +			}
> > > +		} else if (reg == MII_BMSR) {
> > > +			p->phydev.link = (val & BMSR_LSTATUS);
> > > +		}
> > > +	}
> > > +	return val;
> > > +}
> > > +
> > > +static int ksz9477_pcs_write(struct mii_bus *bus, int phy, int mmd, int reg,
> > > +			     u16 val)
> > > +{
> > > +	struct ksz_device *dev = bus->priv;
> > > +	int port = ksz_get_sgmii_port(dev);
> > > +
> > > +	if (mmd == MDIO_MMD_VEND2) {
> > > +		struct ksz_port *p = &dev->ports[port];
> > > +
> > > +		if (reg == MMD_SR_MII_AUTO_NEG_CTRL) {
> > > +			u16 sgmii_mode = SR_MII_PCS_SGMII << SR_MII_PCS_MODE_S;
> > > +
> > > +			/* Need these bits for 1000BASE-X mode to work with
> > > +			 * AN on.
> > > +			 */
> > > +			if (!(val & sgmii_mode))
> > > +				val |= SR_MII_SGMII_LINK_UP |
> > > +				       SR_MII_TX_CFG_PHY_MASTER;
> > > +
> > > +			/* SGMII interrupt in the port cannot be masked, so
> > > +			 * make sure interrupt is not enabled as it is not
> > > +			 * handled.
> > > +			 */
> > > +			val &= ~SR_MII_AUTO_NEG_COMPLETE_INTR;
> > > +		} else if (reg == MII_BMCR) {
> > > +			/* The MII_ADVERTISE register needs to write once
> > > +			 * before doing auto-negotiation for the correct
> > > +			 * config_word to be sent out after reset.
> > > +			 */
> > > +			if ((val & BMCR_ANENABLE) && !p->sgmii_adv_write) {
> > > +				u16 adv;
> > > +
> > > +				/* The SGMII port cannot disable flow contrl
> > > +				 * so it is better to just advertise symmetric
> > > +				 * pause.
> > > +				 */
> > > +				port_sgmii_r(dev, port, mmd, MII_ADVERTISE,
> > > +					     &adv);
> > > +				adv |= ADVERTISE_1000XPAUSE;
> > > +				adv &= ~ADVERTISE_1000XPSE_ASYM;
> > > +				port_sgmii_w(dev, port, mmd, MII_ADVERTISE,
> > > +					     adv);
> > > +				p->sgmii_adv_write = 1;
> > > +			} else if (val & BMCR_RESET) {
> > > +				p->sgmii_adv_write = 0;
> > > +			}
> > > +		} else if (reg == MII_ADVERTISE) {
> > > +			/* XPCS driver writes to this register so there is no
> > > +			 * need to update it for the errata.
> > > +			 */
> > > +			p->sgmii_adv_write = 1;
> > > +		}
> > > +	}
> > > +	port_sgmii_w(dev, port, mmd, reg, val);
> > > +	return 0;
> > > +}  
> > 
> > I'm a bit confused here, are you intercepting r/w ops that are supposed
> > to be handled by xpcs ?
> > 
> > Russell has sent a series [1] (not merged yet, I think we were waiting
> > on some feedback from Synopsys folks ?) to properly support the XPCS
> > version that's in KSZ9477, and you also had a patchset that didn't
> > require all this sgmii_r/w snooping [2].
> > 
> > I've been running your previous patchset on top of Russell's for a few
> > months, if works fine with SGMII as well as 1000BaseX :)
> > 
> > Can we maybe focus on getting pcs-xpcs to properly support this version
> > of the IP instead of these 2 R/W functions ? Or did I miss something in
> > the previous discussions ?  
> 
> Honestly, I don't think Tristram is doing anything unreasonable here,
> given what Vladimir has been saying. Essentially, we've been blocking
> a way forward on the pcs-xpcs driver. We've had statements from the
> hardware designers from Microchip. We've had statements from Synopsys.
> The two don't quite agree, but that's not atypical. Yet, we're still
> demanding why the Microchip version of XPCS is different.
> 
> So what's left for Tristram to do other than to hack around the blockage
> we're causing by intercepting the read/write ops and bodging them.
> 
> As I understand the situation, this is Jose's response having asked
> internally at my request:
> 
> https://lore.kernel.org/netdev/DM4PR12MB5088BA650B164D5CEC33CA08D3E82@DM4PR12MB5088.namprd12.prod.outlook.com/
> 
> To put it another way, as far as Synopsys can tell us, they are unaware
> of the Microchip behaviour, but customers can modify the Synopsys IP.
> 
> Maybe Microchip's version is based on an old Synopsys version, but
> which was modified by Microchip a long time ago and those engineers
> have moved on, and no one really knows anymore. I doubt that we are
> ever going to get to the bottom of the different behaviour.
> 
> So, what do we do now? Do we continue playing hardball and basically
> saying "no" to changing the XPCS driver, demanding information that
> doesn't seem to exist anymore? Or do we try to come up with an
> approach that works.

Fair enough, it wasn't clear to me that this was the path forward, but
that does make sense to avoid cluttering xpcs with things that, in that
case, are really KSZ9477 specific.

I'll try to give this patch a try on my side soon-ish, but I'm working
with limited access to HW for the next few days.

> I draw attention to the last sentence in Jose's quote in his reply.
> As far as the Synopsys folk are concerned, setting these bits to 1
> should have no effect provided there aren't customer modifications to
> the IP that depend on these being set to zero.
> 
> That last bit is where I think the sticking point between Vladimir and
> myself is - I'm in favour of keeping things simple and just setting
> the bits. Vladimir feels it would be safer to make it conditional,
> which leads to more complicated code.
> 
> I didn't progress my series because I decided it was a waste of time
> to try and progress this any further - I'd dug up the SJA1105 docs to
> see what they said, I'd reached out to Synopsys and got a statement
> back, and still Vladimir wasn't happy.
> 
> With Vladimir continuing to demand information from Tristram that just
> didn't exist, I saw that the
> 
> [rest of the email got deleted because Linux / X11 / KDE got confused
> about the state the backspace key and decided it was going to be
> continuously pressed and doing nothing except shutting the laptop
> down would stop it.]

Funny how I have the same exact issue on my laptop as well... 

Thanks for the quick reply, and Tristram sorry for the noise then :)

Maxime
[BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Russell King (Oracle) 7 months, 1 week ago
[Sorry for going off topic here - changed the Cc list, added Linus,
changed the subject.]

On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:
> On Wed, 7 May 2025 09:31:48 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > [rest of the email got deleted because Linux / X11 / KDE got confused
> > about the state the backspace key and decided it was going to be
> > continuously pressed and doing nothing except shutting the laptop
> > down would stop it.]
> 
> Funny how I have the same exact issue on my laptop as well... 

I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
laptop I had previously (normally with ctrl-F keys). However, hitting
ctrl/shift/alt would stop it.

This is the first time I've seen the behaviour with the Carbon X1
laptop, but this was way more severe. No key would stop it. Trying to
move the focus using the trackpad/nipple had any effect. Meanwhile
the email was being deleted one character at a time. So I shut the
laptop lid causing it to suspend, and wondered what to do... on
re-opening the laptop, it didn't restart and is back to normal.

This suggests that the entire input subsystem in the software stack
collapsed just after the backspace key was pressed, and Xorg never
saw the key-release event. So Xorg duitifully did its key-repeat
processing, causing the email to be deleted one character at a time.

The problem is, not only did this destroy the email reply, but it
also destroyed my train of thought for the reply as well through
the panic of trying to stop the entire email being deleted.

I don't think this is a hardware issue - I think there's a problem
in the input handling somewhere in the stack of kernel, Xorg,
whatever multiple input libraries make up modern systems, and KDE.

I did check the logs. Nothing in the kernel messages that suggests
a problem. Nothing in Xorg's logs (which are difficult to tie up
because it doesn't use real timestamps that one can relate to real
time.) There's no longer any ~/.xsession-errors logfile for logging
the stuff below Xorg.

I'm running Debian Stable here - kernel 6.1.0-34-amd64, X.Org X Server
1.21.1.7, KDE Plasma (5.27.5, frameworks 5.103.0, QT 5.15.8).

Anyone else seeing this kind of behaviour - if so, what are you
using?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Russell King (Oracle) 7 months, 1 week ago
On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:
> [Sorry for going off topic here - changed the Cc list, added Linus,
> changed the subject.]
> 
> On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:
> > On Wed, 7 May 2025 09:31:48 +0100
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > about the state the backspace key and decided it was going to be
> > > continuously pressed and doing nothing except shutting the laptop
> > > down would stop it.]
> > 
> > Funny how I have the same exact issue on my laptop as well... 
> 
> I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> laptop I had previously (normally with ctrl-F keys). However, hitting
> ctrl/shift/alt would stop it.
> 
> This is the first time I've seen the behaviour with the Carbon X1
> laptop, but this was way more severe. No key would stop it. Trying to
> move the focus using the trackpad/nipple had any effect. Meanwhile
> the email was being deleted one character at a time. So I shut the
> laptop lid causing it to suspend, and wondered what to do... on
> re-opening the laptop, it didn't restart and is back to normal.
> 
> This suggests that the entire input subsystem in the software stack
> collapsed just after the backspace key was pressed, and Xorg never
> saw the key-release event. So Xorg duitifully did its key-repeat
> processing, causing the email to be deleted one character at a time.
> 
> The problem is, not only did this destroy the email reply, but it
> also destroyed my train of thought for the reply as well through
> the panic of trying to stop the entire email being deleted.
> 
> I don't think this is a hardware issue - I think there's a problem
> in the input handling somewhere in the stack of kernel, Xorg,
> whatever multiple input libraries make up modern systems, and KDE.
> 
> I did check the logs. Nothing in the kernel messages that suggests
> a problem. Nothing in Xorg's logs (which are difficult to tie up
> because it doesn't use real timestamps that one can relate to real
> time.) There's no longer any ~/.xsession-errors logfile for logging
> the stuff below Xorg.
> 
> I'm running Debian Stable here - kernel 6.1.0-34-amd64, X.Org X Server
> 1.21.1.7, KDE Plasma (5.27.5, frameworks 5.103.0, QT 5.15.8).

I'll also add that The Carbon X1, being a laptop, its built-in keyboard
uses the i8042:

[    1.698156] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    1.698543] i8042: Warning: Keylock active
[    1.700170] serio: i8042 KBD port at 0x60,0x64 irq 1
[    1.700174] serio: i8042 AUX port at 0x60,0x64 irq 12
[    1.700271] mousedev: PS/2 mouse device common for all mice
[    1.702951] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0

I don't have the HP laptop with me to check what that was using.

The mysterious thing is "Keylock active" - clearly it isn't because I
can write this email typing on that very keyboard. However, I wonder
if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.

Unfortunately, it's probably going to take a year on the Carbon X1
to work out if this makes any difference.

> Anyone else seeing this kind of behaviour - if so, what are you
> using?
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Maxime Chevallier 7 months, 1 week ago
Hi Russell,

On Wed, 7 May 2025 10:59:21 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:
> > [Sorry for going off topic here - changed the Cc list, added Linus,
> > changed the subject.]
> > 
> > On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:  
> > > On Wed, 7 May 2025 09:31:48 +0100
> > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:  
> > > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > > about the state the backspace key and decided it was going to be
> > > > continuously pressed and doing nothing except shutting the laptop
> > > > down would stop it.]  
> > > 
> > > Funny how I have the same exact issue on my laptop as well...   
> > 
> > I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> > laptop I had previously (normally with ctrl-F keys). However, hitting
> > ctrl/shift/alt would stop it.
> > 
> > This is the first time I've seen the behaviour with the Carbon X1
> > laptop, but this was way more severe. No key would stop it. Trying to
> > move the focus using the trackpad/nipple had any effect. Meanwhile
> > the email was being deleted one character at a time. So I shut the
> > laptop lid causing it to suspend, and wondered what to do... on
> > re-opening the laptop, it didn't restart and is back to normal.
> > 
> > This suggests that the entire input subsystem in the software stack
> > collapsed just after the backspace key was pressed, and Xorg never
> > saw the key-release event. So Xorg duitifully did its key-repeat
> > processing, causing the email to be deleted one character at a time.
> > 
> > The problem is, not only did this destroy the email reply, but it
> > also destroyed my train of thought for the reply as well through
> > the panic of trying to stop the entire email being deleted.
> > 
> > I don't think this is a hardware issue - I think there's a problem
> > in the input handling somewhere in the stack of kernel, Xorg,
> > whatever multiple input libraries make up modern systems, and KDE.
> > 
> > I did check the logs. Nothing in the kernel messages that suggests
> > a problem. Nothing in Xorg's logs (which are difficult to tie up
> > because it doesn't use real timestamps that one can relate to real
> > time.) There's no longer any ~/.xsession-errors logfile for logging
> > the stuff below Xorg.
> > 
> > I'm running Debian Stable here - kernel 6.1.0-34-amd64, X.Org X Server
> > 1.21.1.7, KDE Plasma (5.27.5, frameworks 5.103.0, QT 5.15.8).  
> 
> I'll also add that The Carbon X1, being a laptop, its built-in keyboard
> uses the i8042:
> 
> [    1.698156] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
> [    1.698543] i8042: Warning: Keylock active
> [    1.700170] serio: i8042 KBD port at 0x60,0x64 irq 1
> [    1.700174] serio: i8042 AUX port at 0x60,0x64 irq 12
> [    1.700271] mousedev: PS/2 mouse device common for all mice
> [    1.702951] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> 
> I don't have the HP laptop with me to check what that was using.
> 
> The mysterious thing is "Keylock active" - clearly it isn't because I
> can write this email typing on that very keyboard. However, I wonder
> if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.
> 
> Unfortunately, it's probably going to take a year on the Carbon X1
> to work out if this makes any difference.
>
> > Anyone else seeing this kind of behaviour - if so, what are you
> > using?

It just happened to me as I was typing this very email (key 'd' got
stuck, nothing could un-stick it, couldn't move the mouse cursor but
mouse-click events did work, had to suspend/resume the laptop to fix
that)

Got the same "Keylock active" warning at boot :

[    0.916750] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[    0.917210] i8042: Warning: Keylock active
[    0.920087] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.920090] serio: i8042 AUX port at 0x60,0x64 irq 12

Nothing in the kernel logs when the key got stuck.

Laptop is a Dell XPS 15 9510, Running Fedora 41 but I saw this issue
before, kernel 6.14.4-200.fc41.x86_64, Wayland-based, Gnome 47.

Hopefully this helps a bit narrowing this down, I have a fairly
different userspace stack and kernel version, but we do have the same
driver involved and same keylock warning...

Maxime
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Russell King (Oracle) 7 months, 1 week ago
Hi Maxime,

On Wed, May 07, 2025 at 03:32:36PM +0200, Maxime Chevallier wrote:
> Hi Russell,
> 
> On Wed, 7 May 2025 10:59:21 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:
> > > [Sorry for going off topic here - changed the Cc list, added Linus,
> > > changed the subject.]
> > > 
> > > On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:  
> > > > On Wed, 7 May 2025 09:31:48 +0100
> > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:  
> > > > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > > > about the state the backspace key and decided it was going to be
> > > > > continuously pressed and doing nothing except shutting the laptop
> > > > > down would stop it.]  
> > > > 
> > > > Funny how I have the same exact issue on my laptop as well...   
> > > 
> > > I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> > > laptop I had previously (normally with ctrl-F keys). However, hitting
> > > ctrl/shift/alt would stop it.
> > > 
> > > This is the first time I've seen the behaviour with the Carbon X1
> > > laptop, but this was way more severe. No key would stop it. Trying to
> > > move the focus using the trackpad/nipple had any effect. Meanwhile
> > > the email was being deleted one character at a time. So I shut the
> > > laptop lid causing it to suspend, and wondered what to do... on
> > > re-opening the laptop, it didn't restart and is back to normal.
> > > 
> > > This suggests that the entire input subsystem in the software stack
> > > collapsed just after the backspace key was pressed, and Xorg never
> > > saw the key-release event. So Xorg duitifully did its key-repeat
> > > processing, causing the email to be deleted one character at a time.
> > > 
> > > The problem is, not only did this destroy the email reply, but it
> > > also destroyed my train of thought for the reply as well through
> > > the panic of trying to stop the entire email being deleted.
> > > 
> > > I don't think this is a hardware issue - I think there's a problem
> > > in the input handling somewhere in the stack of kernel, Xorg,
> > > whatever multiple input libraries make up modern systems, and KDE.
> > > 
> > > I did check the logs. Nothing in the kernel messages that suggests
> > > a problem. Nothing in Xorg's logs (which are difficult to tie up
> > > because it doesn't use real timestamps that one can relate to real
> > > time.) There's no longer any ~/.xsession-errors logfile for logging
> > > the stuff below Xorg.
> > > 
> > > I'm running Debian Stable here - kernel 6.1.0-34-amd64, X.Org X Server
> > > 1.21.1.7, KDE Plasma (5.27.5, frameworks 5.103.0, QT 5.15.8).  
> > 
> > I'll also add that The Carbon X1, being a laptop, its built-in keyboard
> > uses the i8042:
> > 
> > [    1.698156] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
> > [    1.698543] i8042: Warning: Keylock active
> > [    1.700170] serio: i8042 KBD port at 0x60,0x64 irq 1
> > [    1.700174] serio: i8042 AUX port at 0x60,0x64 irq 12
> > [    1.700271] mousedev: PS/2 mouse device common for all mice
> > [    1.702951] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> > 
> > I don't have the HP laptop with me to check what that was using.
> > 
> > The mysterious thing is "Keylock active" - clearly it isn't because I
> > can write this email typing on that very keyboard. However, I wonder
> > if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.
> > 
> > Unfortunately, it's probably going to take a year on the Carbon X1
> > to work out if this makes any difference.
> >
> > > Anyone else seeing this kind of behaviour - if so, what are you
> > > using?
> 
> It just happened to me as I was typing this very email (key 'd' got
> stuck, nothing could un-stick it, couldn't move the mouse cursor but
> mouse-click events did work, had to suspend/resume the laptop to fix
> that)

'd' can be quite disastrous if you're using vi and you're in command
mode!

> Got the same "Keylock active" warning at boot :
> 
> [    0.916750] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
> [    0.917210] i8042: Warning: Keylock active
> [    0.920087] serio: i8042 KBD port at 0x60,0x64 irq 1
> [    0.920090] serio: i8042 AUX port at 0x60,0x64 irq 12
> 
> Nothing in the kernel logs when the key got stuck.
> 
> Laptop is a Dell XPS 15 9510, Running Fedora 41 but I saw this issue
> before, kernel 6.14.4-200.fc41.x86_64, Wayland-based, Gnome 47.
> 
> Hopefully this helps a bit narrowing this down, I have a fairly
> different userspace stack and kernel version, but we do have the same
> driver involved and same keylock warning...

Could you try booting with i8042_unlock=1 and see whether that makes any
difference please?

I've added that to my grub config in preparation for rebooting, but even
if I booted now, I suspect it'll be some time before I have any useful
result.

How often do you see the problem?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [BUG] Stuck key syndrome
Posted by Holger Hoffstätte 7 months, 1 week ago
On 2025-05-07 13:44, Russell King (Oracle) wrote:
> Could you try booting with i8042_unlock=1 and see whether that makes any
> difference please?

It did not help - just had another runaway event with that setting,
on my ca. 2021 Thinkpad L14. Had the symptom for as long as I have
had this machine.

We've been tracking this problem in Gentoo since late 2022, see
https://bugs.gentoo.org/873163 and none of the suggested options
for i8042 really make a difference. In my case I almost always get
the stuck key events when using the cursor keys for scrolling in a
web browser. Sometimes once a month, sometimes twice a day.

Fwiw it's not necessary to reboot; suspend/resume fixes it,
as in close/reopen the lid if you have that configured.

thanks
Holger
Re: [BUG] Stuck key syndrome
Posted by Russell King (Oracle) 7 months, 1 week ago
On Wed, May 07, 2025 at 10:46:35PM +0200, Holger Hoffstätte wrote:
> On 2025-05-07 13:44, Russell King (Oracle) wrote:
> > Could you try booting with i8042_unlock=1 and see whether that makes any
> > difference please?
> 
> It did not help - just had another runaway event with that setting,
> on my ca. 2021 Thinkpad L14. Had the symptom for as long as I have
> had this machine.
> 
> We've been tracking this problem in Gentoo since late 2022, see
> https://bugs.gentoo.org/873163 and none of the suggested options
> for i8042 really make a difference. In my case I almost always get
> the stuck key events when using the cursor keys for scrolling in a
> web browser. Sometimes once a month, sometimes twice a day.
> 
> Fwiw it's not necessary to reboot; suspend/resume fixes it,
> as in close/reopen the lid if you have that configured.

Thanks - it's good to know that I'm not alone with this problem!
I wonder how common it is, I think we're now up to four people.

So it's interesting that Finn's system is AMD vs mine which have
both been Intel based systems, and we seem to have exactly the same
problem. Is it possible that both are using the same firmware for
emulating an i8042?

Also what seems to be interesting is that it afflicts specific keys.
On my old HP Pavilion, it was always Ctrl-F3 which would get stuck
down (which I use to switch to virtual desktop 3 which has my Konsoles
on.) In this case, pressing all of ctrl-shift-alt would clear it.

I've only had it once on the Lenovo Carbon X1 so far, so can't comment
if it's going to be the backspace key every time - there is an Intel ME
firmware update pending (it didn't get installed at the last reboot -
I'd accidentally left the laptop with sleep disabled but no external
power, so it drained the battery and shut itself down which probably
prevented the firmware update being installed.) I believe the Intel ME
deals with the keyboard.

Thanks for adding comment 24... I assume from what you've said above
that comment 23 has proven not to solve it completely, but has it
reduced the frequency at all for you?

What also crosses my mind is that if the i8042 is now emulated by
firmware, is there a replacement interface that the kernel should
instead be using? Surely if the i8042 emulation is as bad as this,
then e.g. under Windows these laptops would be getting very poor
write-ups due to keyboard problems afflicting Windows as well.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [BUG] Stuck key syndrome
Posted by Linus Torvalds 7 months, 1 week ago
On Wed, 7 May 2025 at 15:07, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> So it's interesting that Finn's system is AMD vs mine which have
> both been Intel based systems, and we seem to have exactly the same
> problem. Is it possible that both are using the same firmware for
> emulating an i8042?

Might be a BIOS vendor issue.

> Also what seems to be interesting is that it afflicts specific keys.
> On my old HP Pavilion, it was always Ctrl-F3 which would get stuck
> down (which I use to switch to virtual desktop 3 which has my Konsoles
> on.) In this case, pressing all of ctrl-shift-alt would clear it.

So multiple keys being pressed at once can result in confusion
depending on how the keyboard matrix is set up, and pressing multiple
keys then causes ghost reports.

Usually it requires three keys to be pressed simultaneously - and some
really cheap underlying hardware without some basic N-key rollover
protection.

But who knows what can confuse the firmware.

And honestly, it might also be a timing issue. So when you switch
virtual desktops, you end up doing more CPU work, changing timings,
and messing something up in the firmware in the process.

For example, I wouldn't expect firmware to be great about SMP. So
while the i8042 driver serializes everything with i8042_lock, who
knows *where* the firmware runs.

Maybe we could do something like tie irq1 (keyboard) and irq12 ("aux",
aka mouse) to the boot CPU in the hopes that there's less chance of
confusing some firmware that way.

I have no idea what Windows does, and - as usual - that's the case
that gets almost all the testing from vendors.

> What also crosses my mind is that if the i8042 is now emulated by
> firmware, is there a replacement interface that the kernel should
> instead be using?

I don't think there is any documentation - or necessarily commonality
- for the low-level hardware. I would guess that it's probably some
small i2c controller that ends up doing some keypad matrix thing. That
i2c thing *may* do native HID, but it might easily also be just some
custom GPIO expander thing.

I think the touchpad is usually some i2c device, and it is sometimes
accessible both ways. And there's a long history of keyboard problems
when the touchpad is looked at just the wrong way, so those things
most definitely can interact (because the firmware emulation emulates
both).

It's very hard to find hardware information at that level these days.
It's been decades since things like the keyboard matrix was documented
as such....

                  Linus
Re: [BUG] Stuck key syndrome
Posted by Dmitry Torokhov 7 months, 1 week ago
On Wed, May 07, 2025 at 10:46:35PM +0200, Holger Hoffstätte wrote:
> On 2025-05-07 13:44, Russell King (Oracle) wrote:
> > Could you try booting with i8042_unlock=1 and see whether that makes any
> > difference please?
> 
> It did not help - just had another runaway event with that setting,
> on my ca. 2021 Thinkpad L14. Had the symptom for as long as I have
> had this machine.
> 
> We've been tracking this problem in Gentoo since late 2022, see
> https://bugs.gentoo.org/873163 and none of the suggested options
> for i8042 really make a difference. In my case I almost always get
> the stuck key events when using the cursor keys for scrolling in a
> web browser. Sometimes once a month, sometimes twice a day.
> 
> Fwiw it's not necessary to reboot; suspend/resume fixes it,
> as in close/reopen the lid if you have that configured.

So looking at your logs in gentoo bugzilla we see:

>>> It is around 1 second later that I realise the J key has died.
>>> Now I sit and watch for a few seconds before closing the lid.
Event: time 1664975487.559043, -------------- SYN_REPORT ------------
Event: time 1664975487.591980, type 4 (EV_MSC), code 4 (MSC_SCAN), value 24
Event: time 1664975487.591980, type 1 (EV_KEY), code 36 (KEY_J), value 2
Event: time 1664975487.591980, -------------- SYN_REPORT ------------
Event: time 1664975487.624955, type 4 (EV_MSC), code 4 (MSC_SCAN), value 24
Event: time 1664975487.624955, type 1 (EV_KEY), code 36 (KEY_J), value 2
Event: time 1664975487.624955, -------------- SYN_REPORT ------------
Event: time 1664975487.657800, type 4 (EV_MSC), code 4 (MSC_SCAN), value 24
Event: time 1664975487.657800, type 1 (EV_KEY), code 36 (KEY_J), value 2
Event: time 1664975487.657800, -------------- SYN_REPORT ------------

Because I see the MSC_SCAN events this means you are not using
atkbd.softrepeat for software-emulated autorepeat and is using the
hardware autorepeat function (which is the default). In this mode
keyboard controller repeatedly sends the scancode for the pressed key
and kernel reports it. We know that interrupts are working because we do
get scancodes from i8042 and from the kernel POV the key is still
pressed because [piece of software that emulates] i8042 tells it so. :(

...
Event: time 1664975493.812157, -------------- SYN_REPORT ------------
Event: time 1664975495.120717, type 1 (EV_KEY), code 36 (KEY_J), value 0
>>> It is around here that the computer resumes from suspend.

This is software-emulated key release that we do as part of suspend
process. And afterwards firmware gets jolted into its senses by suspend
and starts working...

Thanks.

-- 
Dmitry
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Maxime Chevallier 7 months, 1 week ago
On Wed, 7 May 2025 12:44:24 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> Hi Maxime,
> 
> On Wed, May 07, 2025 at 03:32:36PM +0200, Maxime Chevallier wrote:
> > Hi Russell,
> > 
> > On Wed, 7 May 2025 10:59:21 +0100
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> >   
> > > On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:  
> > > > [Sorry for going off topic here - changed the Cc list, added Linus,
> > > > changed the subject.]
> > > > 
> > > > On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:    
> > > > > On Wed, 7 May 2025 09:31:48 +0100
> > > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:    
> > > > > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > > > > about the state the backspace key and decided it was going to be
> > > > > > continuously pressed and doing nothing except shutting the laptop
> > > > > > down would stop it.]    
> > > > > 
> > > > > Funny how I have the same exact issue on my laptop as well...     
> > > > 
> > > > I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> > > > laptop I had previously (normally with ctrl-F keys). However, hitting
> > > > ctrl/shift/alt would stop it.
> > > > 
> > > > This is the first time I've seen the behaviour with the Carbon X1
> > > > laptop, but this was way more severe. No key would stop it. Trying to
> > > > move the focus using the trackpad/nipple had any effect. Meanwhile
> > > > the email was being deleted one character at a time. So I shut the
> > > > laptop lid causing it to suspend, and wondered what to do... on
> > > > re-opening the laptop, it didn't restart and is back to normal.
> > > > 
> > > > This suggests that the entire input subsystem in the software stack
> > > > collapsed just after the backspace key was pressed, and Xorg never
> > > > saw the key-release event. So Xorg duitifully did its key-repeat
> > > > processing, causing the email to be deleted one character at a time.
> > > > 
> > > > The problem is, not only did this destroy the email reply, but it
> > > > also destroyed my train of thought for the reply as well through
> > > > the panic of trying to stop the entire email being deleted.
> > > > 
> > > > I don't think this is a hardware issue - I think there's a problem
> > > > in the input handling somewhere in the stack of kernel, Xorg,
> > > > whatever multiple input libraries make up modern systems, and KDE.
> > > > 
> > > > I did check the logs. Nothing in the kernel messages that suggests
> > > > a problem. Nothing in Xorg's logs (which are difficult to tie up
> > > > because it doesn't use real timestamps that one can relate to real
> > > > time.) There's no longer any ~/.xsession-errors logfile for logging
> > > > the stuff below Xorg.
> > > > 
> > > > I'm running Debian Stable here - kernel 6.1.0-34-amd64, X.Org X Server
> > > > 1.21.1.7, KDE Plasma (5.27.5, frameworks 5.103.0, QT 5.15.8).    
> > > 
> > > I'll also add that The Carbon X1, being a laptop, its built-in keyboard
> > > uses the i8042:
> > > 
> > > [    1.698156] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
> > > [    1.698543] i8042: Warning: Keylock active
> > > [    1.700170] serio: i8042 KBD port at 0x60,0x64 irq 1
> > > [    1.700174] serio: i8042 AUX port at 0x60,0x64 irq 12
> > > [    1.700271] mousedev: PS/2 mouse device common for all mice
> > > [    1.702951] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> > > 
> > > I don't have the HP laptop with me to check what that was using.
> > > 
> > > The mysterious thing is "Keylock active" - clearly it isn't because I
> > > can write this email typing on that very keyboard. However, I wonder
> > > if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.
> > > 
> > > Unfortunately, it's probably going to take a year on the Carbon X1
> > > to work out if this makes any difference.
> > >  
> > > > Anyone else seeing this kind of behaviour - if so, what are you
> > > > using?  
> > 
> > It just happened to me as I was typing this very email (key 'd' got
> > stuck, nothing could un-stick it, couldn't move the mouse cursor but
> > mouse-click events did work, had to suspend/resume the laptop to fix
> > that)  
> 
> 'd' can be quite disastrous if you're using vi and you're in command
> mode!
> 
> > Got the same "Keylock active" warning at boot :
> > 
> > [    0.916750] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
> > [    0.917210] i8042: Warning: Keylock active
> > [    0.920087] serio: i8042 KBD port at 0x60,0x64 irq 1
> > [    0.920090] serio: i8042 AUX port at 0x60,0x64 irq 12
> > 
> > Nothing in the kernel logs when the key got stuck.
> > 
> > Laptop is a Dell XPS 15 9510, Running Fedora 41 but I saw this issue
> > before, kernel 6.14.4-200.fc41.x86_64, Wayland-based, Gnome 47.
> > 
> > Hopefully this helps a bit narrowing this down, I have a fairly
> > different userspace stack and kernel version, but we do have the same
> > driver involved and same keylock warning...  
> 
> Could you try booting with i8042_unlock=1 and see whether that makes any
> difference please?

I'll try this out indeed :)

> I've added that to my grub config in preparation for rebooting, but even
> if I booted now, I suspect it'll be some time before I have any useful
> result.
> 
> How often do you see the problem?

It's very sporadic, sometimes I'll see this behaviour multiple times a
day, and then nothing for weeks... This laptop has been very quirky
ever since I got it, so I categorized that as a "yet another XPS 9510
broken behaviour", but this has been a recurrent thing for multiple
years...

So, same as you, it'll take a long time for me to say with some amount
of certainty that 'i8042_unlock=1' has a beneficial effect, of
course unless I see the problem happen again in the meantime.

Maxime
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Dmitry Torokhov 7 months, 1 week ago
On Wed, May 07, 2025 at 01:51:26PM +0200, Maxime Chevallier wrote:
> On Wed, 7 May 2025 12:44:24 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > Hi Maxime,
> > 
> > On Wed, May 07, 2025 at 03:32:36PM +0200, Maxime Chevallier wrote:
> > > Hi Russell,
> > > 
> > > On Wed, 7 May 2025 10:59:21 +0100
> > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > >   
> > > > On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:  
> > > > > [Sorry for going off topic here - changed the Cc list, added Linus,
> > > > > changed the subject.]
> > > > > 
> > > > > On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:    
> > > > > > On Wed, 7 May 2025 09:31:48 +0100
> > > > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:    
> > > > > > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > > > > > about the state the backspace key and decided it was going to be
> > > > > > > continuously pressed and doing nothing except shutting the laptop
> > > > > > > down would stop it.]    
> > > > > > 
> > > > > > Funny how I have the same exact issue on my laptop as well...     
> > > > > 
> > > > > I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> > > > > laptop I had previously (normally with ctrl-F keys). However, hitting
> > > > > ctrl/shift/alt would stop it.
> > > > > 
> > > > > This is the first time I've seen the behaviour with the Carbon X1
> > > > > laptop, but this was way more severe. No key would stop it. Trying to
> > > > > move the focus using the trackpad/nipple had any effect. Meanwhile
> > > > > the email was being deleted one character at a time. So I shut the

If we indeed lost a key release event somewhere the way to "restore" it
is to hit the stuck key again. Then we should get press/release with
press most likely being ignored and release achieving the desired
result. Of course that will not help if embedded controller is confused.

> > > > > laptop lid causing it to suspend, and wondered what to do... on
> > > > > re-opening the laptop, it didn't restart and is back to normal.
> > > > > 
> > > > > This suggests that the entire input subsystem in the software stack
> > > > > collapsed just after the backspace key was pressed, and Xorg never
> > > > > saw the key-release event. So Xorg duitifully did its key-repeat
> > > > > processing, causing the email to be deleted one character at a time.
> > > > > 
> > > > > The problem is, not only did this destroy the email reply, but it
> > > > > also destroyed my train of thought for the reply as well through
> > > > > the panic of trying to stop the entire email being deleted.
> > > > > 
> > > > > I don't think this is a hardware issue - I think there's a problem
> > > > > in the input handling somewhere in the stack of kernel, Xorg,
> > > > > whatever multiple input libraries make up modern systems, and KDE.
> > > > > 
> > > > > I did check the logs. Nothing in the kernel messages that suggests
> > > > > a problem. Nothing in Xorg's logs (which are difficult to tie up
> > > > > because it doesn't use real timestamps that one can relate to real
> > > > > time.) There's no longer any ~/.xsession-errors logfile for logging
> > > > > the stuff below Xorg.
> > > > > 
> > > > > I'm running Debian Stable here - kernel 6.1.0-34-amd64, X.Org X Server
> > > > > 1.21.1.7, KDE Plasma (5.27.5, frameworks 5.103.0, QT 5.15.8).    
> > > > 
> > > > I'll also add that The Carbon X1, being a laptop, its built-in keyboard
> > > > uses the i8042:
> > > > 
> > > > [    1.698156] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
> > > > [    1.698543] i8042: Warning: Keylock active
> > > > [    1.700170] serio: i8042 KBD port at 0x60,0x64 irq 1
> > > > [    1.700174] serio: i8042 AUX port at 0x60,0x64 irq 12
> > > > [    1.700271] mousedev: PS/2 mouse device common for all mice
> > > > [    1.702951] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
> > > > 
> > > > I don't have the HP laptop with me to check what that was using.
> > > > 
> > > > The mysterious thing is "Keylock active" - clearly it isn't because I
> > > > can write this email typing on that very keyboard. However, I wonder
> > > > if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.

Just ignore this message, it is harmless and trying to flip the bit
might confuse the emulation even more. Maybe we should lower the
severity of it to debug.

That said I do not see it on my Carbon (neither v5 nor v12, can't check
v9 because it is at home)... What version of Carbon do you have? Do you
have up-to-date BIOS/EC?

> > > > 
> > > > Unfortunately, it's probably going to take a year on the Carbon X1
> > > > to work out if this makes any difference.
> > > >  
> > > > > Anyone else seeing this kind of behaviour - if so, what are you
> > > > > using?  
> > > 
> > > It just happened to me as I was typing this very email (key 'd' got
> > > stuck, nothing could un-stick it, couldn't move the mouse cursor but
> > > mouse-click events did work, had to suspend/resume the laptop to fix
> > > that)

This is weird and suggests that the breakage happens up the stack from
the kernel (or down in the firmware). Mouse clicks and mouse movement is
delivered as part of a mouse packet, so if there are button clicks there
will also be movement, they are not separate. If the cursor is not
reacting that means desktop environment is not handling input properly.

> > 
> > 'd' can be quite disastrous if you're using vi and you're in command
> > mode!
> > 
> > > Got the same "Keylock active" warning at boot :
> > > 
> > > [    0.916750] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
> > > [    0.917210] i8042: Warning: Keylock active
> > > [    0.920087] serio: i8042 KBD port at 0x60,0x64 irq 1
> > > [    0.920090] serio: i8042 AUX port at 0x60,0x64 irq 12
> > > 
> > > Nothing in the kernel logs when the key got stuck.
> > > 
> > > Laptop is a Dell XPS 15 9510, Running Fedora 41 but I saw this issue
> > > before, kernel 6.14.4-200.fc41.x86_64, Wayland-based, Gnome 47.
> > > 
> > > Hopefully this helps a bit narrowing this down, I have a fairly
> > > different userspace stack and kernel version, but we do have the same
> > > driver involved and same keylock warning...  
> > 
> > Could you try booting with i8042_unlock=1 and see whether that makes any
> > difference please?
> 
> I'll try this out indeed :)
> 
> > I've added that to my grub config in preparation for rebooting, but even
> > if I booted now, I suspect it'll be some time before I have any useful
> > result.
> > 
> > How often do you see the problem?
> 
> It's very sporadic, sometimes I'll see this behaviour multiple times a
> day, and then nothing for weeks... This laptop has been very quirky
> ever since I got it, so I categorized that as a "yet another XPS 9510
> broken behaviour", but this has been a recurrent thing for multiple
> years...
> 
> So, same as you, it'll take a long time for me to say with some amount
> of certainty that 'i8042_unlock=1' has a beneficial effect, of
> course unless I see the problem happen again in the meantime.

The kernel does drop input events if userspace is unable to read buffers
quickly enough. It notifies userspace by queuing special
EV_SYN/SYN_DROPPED event and userspace is supposed to query the full
device state upon receiving it to figure out what to do. I doubt we are
running into this with keyboards, but maybe we should add some logging
there to rule this out.

I'll add Peter and Benjamin to this thread in case they have ideas.

Thanks.

-- 
Dmitry
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Russell King (Oracle) 7 months, 1 week ago
On Wed, May 07, 2025 at 10:45:48AM -0700, Dmitry Torokhov wrote:
> On Wed, May 07, 2025 at 01:51:26PM +0200, Maxime Chevallier wrote:
> > On Wed, 7 May 2025 12:44:24 +0100
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > 
> > > Hi Maxime,
> > > 
> > > On Wed, May 07, 2025 at 03:32:36PM +0200, Maxime Chevallier wrote:
> > > > Hi Russell,
> > > > 
> > > > On Wed, 7 May 2025 10:59:21 +0100
> > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > >   
> > > > > On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:  
> > > > > > [Sorry for going off topic here - changed the Cc list, added Linus,
> > > > > > changed the subject.]
> > > > > > 
> > > > > > On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:    
> > > > > > > On Wed, 7 May 2025 09:31:48 +0100
> > > > > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:    
> > > > > > > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > > > > > > about the state the backspace key and decided it was going to be
> > > > > > > > continuously pressed and doing nothing except shutting the laptop
> > > > > > > > down would stop it.]    
> > > > > > > 
> > > > > > > Funny how I have the same exact issue on my laptop as well...     
> > > > > > 
> > > > > > I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> > > > > > laptop I had previously (normally with ctrl-F keys). However, hitting
> > > > > > ctrl/shift/alt would stop it.
> > > > > > 
> > > > > > This is the first time I've seen the behaviour with the Carbon X1
> > > > > > laptop, but this was way more severe. No key would stop it. Trying to
> > > > > > move the focus using the trackpad/nipple had any effect. Meanwhile
> > > > > > the email was being deleted one character at a time. So I shut the
> 
> If we indeed lost a key release event somewhere the way to "restore" it
> is to hit the stuck key again. Then we should get press/release with
> press most likely being ignored and release achieving the desired
> result. Of course that will not help if embedded controller is confused.

I tried pressing every key...

> > > > > The mysterious thing is "Keylock active" - clearly it isn't because I
> > > > > can write this email typing on that very keyboard. However, I wonder
> > > > > if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.
> 
> Just ignore this message, it is harmless and trying to flip the bit
> might confuse the emulation even more. Maybe we should lower the
> severity of it to debug.
> 
> That said I do not see it on my Carbon (neither v5 nor v12, can't check
> v9 because it is at home)... What version of Carbon do you have? Do you
> have up-to-date BIOS/EC?

Neither did I see a problem until today, and I've been using the laptop
since October 2024, and this is the first time it's had an issue.

Looking at fwupd, it has an Intel ME update pending (1.32.2418 to
1.35.2557). I can't find a way to get any update history beyond
that out of fwupd and fwupd doesn't seem to log to journald what
it's doing.

> > > > It just happened to me as I was typing this very email (key 'd' got
> > > > stuck, nothing could un-stick it, couldn't move the mouse cursor but
> > > > mouse-click events did work, had to suspend/resume the laptop to fix
> > > > that)
> 
> This is weird and suggests that the breakage happens up the stack from
> the kernel (or down in the firmware). Mouse clicks and mouse movement is
> delivered as part of a mouse packet, so if there are button clicks there
> will also be movement, they are not separate. If the cursor is not
> reacting that means desktop environment is not handling input properly.

So could we be looking at an Xorg bug?

> The kernel does drop input events if userspace is unable to read buffers
> quickly enough. It notifies userspace by queuing special
> EV_SYN/SYN_DROPPED event and userspace is supposed to query the full
> device state upon receiving it to figure out what to do. I doubt we are
> running into this with keyboards, but maybe we should add some logging
> there to rule this out.
> 
> I'll add Peter and Benjamin to this thread in case they have ideas.

I'm thinking of leaving evtest running in a terminal, so its output
can be inspected if it happens again. One issue though is the
timestamps aren't readable, but I'm sure with a bit of perl
post-processing that could be fixed.

That would allow an answer to "is it kernel or firmware" vs
"userspace".

The problem is - if it's taken 7-ish months to show for me, it's
likely that evtest won't be running when it next happens (as there
will be needs to reboot for kernel upgrades/firmware upgrades in
that time.) Really, it needs to be something like an automatically
started at boot e.g. evtest inside a detached screen session.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Peter Hutterer 7 months, 1 week ago
On Wed, May 07, 2025 at 07:18:37PM +0100, Russell King (Oracle) wrote:
> On Wed, May 07, 2025 at 10:45:48AM -0700, Dmitry Torokhov wrote:
> > On Wed, May 07, 2025 at 01:51:26PM +0200, Maxime Chevallier wrote:
> > > On Wed, 7 May 2025 12:44:24 +0100
> > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > 
> > > > Hi Maxime,
> > > > 
> > > > On Wed, May 07, 2025 at 03:32:36PM +0200, Maxime Chevallier wrote:
> > > > > Hi Russell,
> > > > > 
> > > > > On Wed, 7 May 2025 10:59:21 +0100
> > > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > > >   
> > > > > > On Wed, May 07, 2025 at 10:23:17AM +0100, Russell King (Oracle) wrote:  
> > > > > > > [Sorry for going off topic here - changed the Cc list, added Linus,
> > > > > > > changed the subject.]
> > > > > > > 
> > > > > > > On Wed, May 07, 2025 at 10:54:57AM +0200, Maxime Chevallier wrote:    
> > > > > > > > On Wed, 7 May 2025 09:31:48 +0100
> > > > > > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:    
> > > > > > > > > [rest of the email got deleted because Linux / X11 / KDE got confused
> > > > > > > > > about the state the backspace key and decided it was going to be
> > > > > > > > > continuously pressed and doing nothing except shutting the laptop
> > > > > > > > > down would stop it.]    
> > > > > > > > 
> > > > > > > > Funny how I have the same exact issue on my laptop as well...     
> > > > > > > 
> > > > > > > I've had the "stuck key" behaviour with the HP Pavilion 15-au185sa
> > > > > > > laptop I had previously (normally with ctrl-F keys). However, hitting
> > > > > > > ctrl/shift/alt would stop it.
> > > > > > > 
> > > > > > > This is the first time I've seen the behaviour with the Carbon X1
> > > > > > > laptop, but this was way more severe. No key would stop it. Trying to
> > > > > > > move the focus using the trackpad/nipple had any effect. Meanwhile
> > > > > > > the email was being deleted one character at a time. So I shut the
> > 
> > If we indeed lost a key release event somewhere the way to "restore" it
> > is to hit the stuck key again. Then we should get press/release with
> > press most likely being ignored and release achieving the desired
> > result. Of course that will not help if embedded controller is confused.
> 
> I tried pressing every key...

ftr, any key autorepeat in XKB (or libxkbcommon) is cancelled when
another key arrives so this does indeed hint at a firmware/controller
issue.

> > > > > > The mysterious thing is "Keylock active" - clearly it isn't because I
> > > > > > can write this email typing on that very keyboard. However, I wonder
> > > > > > if it needs i8042_unlock=1 to set I8042_CTR_IGNKEYLOCK.
> > 
> > Just ignore this message, it is harmless and trying to flip the bit
> > might confuse the emulation even more. Maybe we should lower the
> > severity of it to debug.
> > 
> > That said I do not see it on my Carbon (neither v5 nor v12, can't check
> > v9 because it is at home)... What version of Carbon do you have? Do you
> > have up-to-date BIOS/EC?
> 
> Neither did I see a problem until today, and I've been using the laptop
> since October 2024, and this is the first time it's had an issue.
> 
> Looking at fwupd, it has an Intel ME update pending (1.32.2418 to
> 1.35.2557). I can't find a way to get any update history beyond
> that out of fwupd and fwupd doesn't seem to log to journald what
> it's doing.
> 
> > > > > It just happened to me as I was typing this very email (key 'd' got
> > > > > stuck, nothing could un-stick it, couldn't move the mouse cursor but
> > > > > mouse-click events did work, had to suspend/resume the laptop to fix
> > > > > that)
> > 
> > This is weird and suggests that the breakage happens up the stack from
> > the kernel (or down in the firmware). Mouse clicks and mouse movement is
> > delivered as part of a mouse packet, so if there are button clicks there
> > will also be movement, they are not separate. If the cursor is not
> > reacting that means desktop environment is not handling input properly.
> 
> So could we be looking at an Xorg bug?
> 
> > The kernel does drop input events if userspace is unable to read buffers
> > quickly enough. It notifies userspace by queuing special
> > EV_SYN/SYN_DROPPED event and userspace is supposed to query the full
> > device state upon receiving it to figure out what to do. I doubt we are
> > running into this with keyboards, but maybe we should add some logging
> > there to rule this out.
> > 
> > I'll add Peter and Benjamin to this thread in case they have ideas.
> 
> I'm thinking of leaving evtest running in a terminal, so its output
> can be inspected if it happens again. One issue though is the
> timestamps aren't readable, but I'm sure with a bit of perl
> post-processing that could be fixed.

$ screen sudo libinput record --autorestart=5 -o keyboard.yml

Select the keyboard devce and off you go.
This will create a keyboard.yml.$DATETIME file and after 5 seconds of no
events it will rotate to the next file. This means you can leave this
running for months and when it happens you stop typing for 5 seconds and
then look at the most recent (usually) two files for epiphanies.
Bonus points for ssh-ing in and killing the process remotely because
then you don't have interference of whatever

If the last event you get is a key down event the issue is in the
firmware and userspace does the right thing. In that case your 
logs will be full of EV_KEY KEY_FOO with a value of 2 (kernel repeat).

If the last event you get is a key up but the desktop still repeats
then the issue is in that part of the stack. In that case the log should
be empty after you stopped typing.

If you get a SYN_DROPPED and the desktop does the wrong thing the issue
is in libinput because it should handle that transparently (same with
the Xorg evdev driver if you're using that).

Optionally: --show-keycodes but adding this means you will leak
all your key presses into the logs (without this all alphanumeric keys
are recorded as KEY_A). So maybe only use that for entertainment value.

Optionally: provide the device node as arg (see below)

> That would allow an answer to "is it kernel or firmware" vs
> "userspace".
> 
> The problem is - if it's taken 7-ish months to show for me, it's
> likely that evtest won't be running when it next happens (as there
> will be needs to reboot for kernel upgrades/firmware upgrades in
> that time.) Really, it needs to be something like an automatically
> started at boot e.g. evtest inside a detached screen session.
 
Well, that bit I leave to you to figure out because it'll also require
scripting to find the keyboard's device node automatically so you can
pass it to libinput record via the .unit file, or cron job script, or
whatever.

Either way, this should at least make it easier to catch the issue when
it occurs.

Cheers,
  Peter
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Linus Torvalds 7 months, 1 week ago
On Wed, 7 May 2025 at 04:51, Maxime Chevallier
<maxime.chevallier@bootlin.com> wrote:
>
> So, same as you, it'll take a long time for me to say with some amount
> of certainty that 'i8042_unlock=1' has a beneficial effect, of
> course unless I see the problem happen again in the meantime.

Christ. You'd expect that any i8042 issues had been fixed long ago,
but the problem is that the chip doesn't necessarily even exist in
modern platforms, and everybody just fakes it.

So the platform presumably still has hardware support for it, but it's
mostly in the form of "take a trap when accessing the legacy keyboard
ports, and fake it in firmware".

Although it doesn't help that there are literally decades of clone
chips and hacky real hardware that extended on the i8042 in various
more-or-less compatible ways.

Which makes all of these things almost entirely undebuggable.

I'm surprised the XPS9510 would be particularly troublesome - I've had
an XPS for years (older version, obviously) with no issues outside of
WiFi sometimes acting up. But random firmware...

I doubt it's "keylock active", but who knows. I get that on my xps
too, it's a random bit that doesn't really mean much. But - because of
all the reasons above - who knows...

One typical problem has been "the interrupt line is wired oddly", but
the fact that it apparently works *most* of the time means that that
is unlikely to be the issue here.

              Linus
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Dmitry Torokhov 7 months, 1 week ago
On Wed, May 07, 2025 at 07:46:03AM -0700, Linus Torvalds wrote:
> On Wed, 7 May 2025 at 04:51, Maxime Chevallier
> <maxime.chevallier@bootlin.com> wrote:
> >
> > So, same as you, it'll take a long time for me to say with some amount
> > of certainty that 'i8042_unlock=1' has a beneficial effect, of
> > course unless I see the problem happen again in the meantime.
> 
> Christ. You'd expect that any i8042 issues had been fixed long ago,
> but the problem is that the chip doesn't necessarily even exist in
> modern platforms, and everybody just fakes it.

It has not existed as a real chip for more than 20 years I believe. It's
all faked in firmware and embedded controllers that fake it in their
firmwares.

And newer firmware tend to implement less and less of it, just what OS
that devices that ship with it needs.

> 
> So the platform presumably still has hardware support for it, but it's
> mostly in the form of "take a trap when accessing the legacy keyboard
> ports, and fake it in firmware".
> 
> Although it doesn't help that there are literally decades of clone
> chips and hacky real hardware that extended on the i8042 in various
> more-or-less compatible ways.
> 
> Which makes all of these things almost entirely undebuggable.
> 
> I'm surprised the XPS9510 would be particularly troublesome - I've had
> an XPS for years (older version, obviously) with no issues outside of
> WiFi sometimes acting up. But random firmware...
> 
> I doubt it's "keylock active", but who knows. I get that on my xps
> too, it's a random bit that doesn't really mean much. But - because of
> all the reasons above - who knows...

It is typically harmless and whats more trying to "unlock" 8042 when it
reports being locked might confuse 8042 emulation.

Thanks.

-- 
Dmitry
Re: [BUG] Stuck key syndrome (was: Re: [PATCH net-next v2] net: dsa: microchip: Add SGMII port support to KSZ9477 switch)
Posted by Russell King (Oracle) 7 months, 1 week ago
On Wed, May 07, 2025 at 10:23:37AM -0700, Dmitry Torokhov wrote:
> On Wed, May 07, 2025 at 07:46:03AM -0700, Linus Torvalds wrote:
> > On Wed, 7 May 2025 at 04:51, Maxime Chevallier
> > <maxime.chevallier@bootlin.com> wrote:
> > >
> > > So, same as you, it'll take a long time for me to say with some amount
> > > of certainty that 'i8042_unlock=1' has a beneficial effect, of
> > > course unless I see the problem happen again in the meantime.
> > 
> > Christ. You'd expect that any i8042 issues had been fixed long ago,
> > but the problem is that the chip doesn't necessarily even exist in
> > modern platforms, and everybody just fakes it.
> 
> It has not existed as a real chip for more than 20 years I believe. It's
> all faked in firmware and embedded controllers that fake it in their
> firmwares.
> 
> And newer firmware tend to implement less and less of it, just what OS
> that devices that ship with it needs.
> 
> > 
> > So the platform presumably still has hardware support for it, but it's
> > mostly in the form of "take a trap when accessing the legacy keyboard
> > ports, and fake it in firmware".
> > 
> > Although it doesn't help that there are literally decades of clone
> > chips and hacky real hardware that extended on the i8042 in various
> > more-or-less compatible ways.
> > 
> > Which makes all of these things almost entirely undebuggable.
> > 
> > I'm surprised the XPS9510 would be particularly troublesome - I've had
> > an XPS for years (older version, obviously) with no issues outside of
> > WiFi sometimes acting up. But random firmware...
> > 
> > I doubt it's "keylock active", but who knows. I get that on my xps
> > too, it's a random bit that doesn't really mean much. But - because of
> > all the reasons above - who knows...
> 
> It is typically harmless and whats more trying to "unlock" 8042 when it
> reports being locked might confuse 8042 emulation.

So I think the summary of this thread is... laptop keyboards are
unreliable, and it's a lottery whether any particular laptop works
or remains working over firmware updates.

Surely that essentially means, laptops are basically unreliable
devices?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!