This series addresses a silent data corruption issue triggered when ynl
retrieves string sets from NICs with a large number of statistics entries
(e.g. mlx5_core with thousands of ETH_SS_STATS strings).
The root cause is that struct nlattr.nla_len is a __u16 (max 65535
bytes). When a NIC exports enough statistics strings, the
ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
this limit. nla_nest_end() silently truncates the length on assignment,
producing a corrupted netlink message. In the doit path the corrupted
message is delivered to userspace without any error; in the dumpit path
an -EMSGSIZE may be returned if the data does not fit in the dump skb.
Patch 1 improves the userspace tool: rename the doit/dumpit helpers
to do_set/do_get and convert do_get to use ynl.do() with an
explicit device header instead of a full dump with client-side filtering.
Patch 2 adds a --dbg-small-recv option to the YNL ethtool tool,
matching the same option already present in cli.py, to aid debugging
of netlink message size issues.
Patch 3 is the kernel fix: check whether the strings_attr nest would
exceed U16_MAX before calling nla_nest_end() in strset_fill_set(),
and return -EMSGSIZE early if so.
Patch 4 adds a generic WARN_ON_ONCE() in nla_nest_end() itself, so
that any future caller hitting the same overflow is immediately visible
in the kernel log rather than silently corrupting data.
---
Hangbin Liu (4):
tools: ynl: ethtool: use doit instead of dumpit for per-device GET
tools: ynl: ethtool: add --dbg-small-recv option
ethtool: strset: check nla_len overflow before nla_nest_end
netlink: warn on nla_len overflow in nla_nest_end()
include/net/netlink.h | 1 +
net/ethtool/strset.c | 4 +++
tools/net/ynl/pyynl/ethtool.py | 77 ++++++++++++++++++++++--------------------
3 files changed, 46 insertions(+), 36 deletions(-)
---
base-commit: dc9e9d61e301c087bcd990dbf2fa18ad3e2e1429
change-id: 20260324-b4-ynl_ethtool-f87cd42f572c
Best regards,
--
Hangbin Liu <liuhangbin@gmail.com>