The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to
"u32 reg".
However, "u32 reg" is not quite a register address, but an enum
ocelot_reg, which in itself encodes an enum ocelot_target target in the
upper bits, and an index into the ocelot->map[target][] array in the
lower bits.
So, whereas the previous code comparison between stats_layout[i].offset
and last + 1 was correct (because those "offsets" at the time were
32-bit relative addresses), the new code, comparing layout[i].reg to
last + 4 is not correct, because the "reg" here is an enum/index, not an
actual register address.
What we want to compare are indeed register addresses, but to do that,
we need to actually go through the same motions as
__ocelot_bulk_read_ix() itself.
With this bug, all statistics counters are deemed by
ocelot_prepare_stats_regions() as constituting their own region.
(Truncated) log on VSC9959 (Felix) below (prints added by me):
Before:
region of 1 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x001]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x002]
...
region of 1 contiguous counters starting with SYS:STAT:CNT[0x041]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x042]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x081]
...
region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x100]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x101]
...
region of 1 contiguous counters starting with SYS:STAT:CNT[0x111]
After:
region of 67 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 45 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 18 contiguous counters starting with SYS:STAT:CNT[0x100]
Since commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for
stats") intended bulking as a performance improvement, and since now,
with trivial-sized regions, performance is even worse than without
bulking at all, this could easily qualify as a performance regression.
Fixes: d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
drivers/net/ethernet/mscc/ocelot_stats.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mscc/ocelot_stats.c b/drivers/net/ethernet/mscc/ocelot_stats.c
index bdb893476832..096c81ec9dd6 100644
--- a/drivers/net/ethernet/mscc/ocelot_stats.c
+++ b/drivers/net/ethernet/mscc/ocelot_stats.c
@@ -899,7 +899,8 @@ static int ocelot_prepare_stats_regions(struct ocelot *ocelot)
if (!layout[i].reg)
continue;
- if (region && layout[i].reg == last + 4) {
+ if (region && ocelot->map[SYS][layout[i].reg & REG_MASK] ==
+ ocelot->map[SYS][last & REG_MASK] + 4) {
region->count++;
} else {
region = devm_kzalloc(ocelot->dev, sizeof(*region),
--
2.34.1
On Tue, Mar 21, 2023 at 03:03:23AM +0200, Vladimir Oltean wrote: > The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to > "u32 reg". > > However, "u32 reg" is not quite a register address, but an enum > ocelot_reg, which in itself encodes an enum ocelot_target target in the > upper bits, and an index into the ocelot->map[target][] array in the > lower bits. > > So, whereas the previous code comparison between stats_layout[i].offset > and last + 1 was correct (because those "offsets" at the time were > 32-bit relative addresses), the new code, comparing layout[i].reg to > last + 4 is not correct, because the "reg" here is an enum/index, not an > actual register address. > > What we want to compare are indeed register addresses, but to do that, > we need to actually go through the same motions as > __ocelot_bulk_read_ix() itself. > > With this bug, all statistics counters are deemed by > ocelot_prepare_stats_regions() as constituting their own region. > (Truncated) log on VSC9959 (Felix) below (prints added by me): > > Before: > > region of 1 contiguous counters starting with SYS:STAT:CNT[0x000] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x001] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x002] > ... > region of 1 contiguous counters starting with SYS:STAT:CNT[0x041] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x042] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x080] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x081] > ... > region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x100] > region of 1 contiguous counters starting with SYS:STAT:CNT[0x101] > ... > region of 1 contiguous counters starting with SYS:STAT:CNT[0x111] > > After: > > region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] > region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] > region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] Yes, I verified this with: `trace-cmd record -p function_graph -l ocelot_* sleep 3` Before the patch series, on the VSC7512 a call to ocelot_port_update_stats() takes about 14ms, with many calls to ocelot_spi_regmap_bus_read(). After the patch series, the calls take about 2ms, with four calls to ocelot_spi_regmap_bus_read(). Acked-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com>
© 2016 - 2026 Red Hat, Inc.