[v3] hexdump: Allow skipping identical lines

[PATCH v3 2/3] hexdump: Allow skipping identical lines

Posted by Miquel Raynal 10 months, 3 weeks ago

When dumping long buffers (especially for debug purposes) it may be very
convenient to sometimes avoid spitting all the lines of the buffer if
the lines are identical. Typically on embedded devices, the console
would be wired to a UART running at 115200 bauds, which makes the dumps
very (very) slow. In this case, having a flag to avoid printing
duplicated lines is handy.

Example of a made up repetitive output:
0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb

Same but with the flag enabled:
0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 Documentation/core-api/printk-formats.rst |  4 +++-
 include/linux/printk.h                    |  1 +
 lib/hexdump.c                             | 20 +++++++++++++++++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index f80b5e262e9933992d1291f1d78fba97589d5631..820f92c65dc64e7d24af5c0031ee8c8d6bb0f931 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -310,7 +310,9 @@ Raw buffer as a hex string
 
 For printing small buffers (up to 64 bytes long) as a hex string with a
 certain separator. For larger buffers consider using
-:c:func:`print_hex`.
+:c:func:`print_hex`, especially since duplicated lines can be
+skipped automatically to reduce the overhead with the
+``DUMP_SKIP_IDENTICAL_LINES`` flag.
 
 MAC/FDDI addresses
 ------------------
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 7dca2270c82c0ed788cd706274f1c1b14ed9a7fe..d9e3e4b0bab8d3ff6a49600abbdbc9b1e6320a60 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -757,6 +757,7 @@ enum {
 	DUMP_PREFIX_NONE = 0, /* Legacy definition for print_hex_dump() */
 	DUMP_PREFIX_ADDRESS = BIT(1),
 	DUMP_PREFIX_OFFSET = BIT(2),
+	DUMP_SKIP_IDENTICAL_LINES = BIT(3),
 };
 
 extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize,
diff --git a/lib/hexdump.c b/lib/hexdump.c
index 74fdcb4566d27f257a0e1288c261d81d231b06bf..f0d1a7f1ce817fd53a7ffd259fbe9b9c8348db16 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -8,6 +8,7 @@
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/minmax.h>
+#include <linux/string.h>
 #include <linux/export.h>
 #include <linux/unaligned.h>
 
@@ -240,6 +241,8 @@ EXPORT_SYMBOL(hex_dump_to_buffer);
  *   - %DUMP_PREFIX_OFFSET shows the offset in front of each line
  *   - %DUMP_PREFIX_ADDRESS shows the address in front of each line
  *   - %DUMP_ASCII prints the ascii equivalent after the hex output
+ *   - %DUMP_SKIP_IDENTICAL_LINES will display a single '*' instead of
+ *     duplicated lines.
  *
  * Given a buffer of u8 data, print_hex() prints a hex + ASCII dump
  * to the kernel log at the specified kernel log level, with an optional
@@ -263,8 +266,9 @@ void print_hex(const char *level, const char *prefix_str, int rowsize, int group
 	       const void *buf, size_t len, unsigned int dump_flags)
 {
 	const u8 *ptr = buf;
-	int i, linelen, remaining = len;
+	int i, prev_i, linelen, remaining = len;
 	unsigned char linebuf[32 * 3 + 2 + 32 + 1];
+	bool same_line = false;
 
 	if (rowsize != 16 && rowsize != 32)
 		rowsize = 16;
@@ -273,6 +277,20 @@ void print_hex(const char *level, const char *prefix_str, int rowsize, int group
 		linelen = min(remaining, rowsize);
 		remaining -= rowsize;
 
+		if (dump_flags & DUMP_SKIP_IDENTICAL_LINES) {
+			if (i && !memcmp(ptr + i, ptr + prev_i, linelen)) {
+				prev_i = i;
+				if (same_line)
+					continue;
+				same_line = true;
+				printk("%s*\n", level);
+				continue;
+			} else {
+				prev_i = i;
+				same_line = false;
+			}
+		}
+
 		hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize,
 				   linebuf, sizeof(linebuf),
 				   dump_flags & DUMP_ASCII);

-- 
2.48.1

Re: [PATCH v3 2/3] hexdump: Allow skipping identical lines

Posted by Andy Shevchenko 10 months, 3 weeks ago

On Wed, Mar 19, 2025 at 05:08:11PM +0100, Miquel Raynal wrote:
> When dumping long buffers (especially for debug purposes) it may be very
> convenient to sometimes avoid spitting all the lines of the buffer if
> the lines are identical. Typically on embedded devices, the console
> would be wired to a UART running at 115200 bauds, which makes the dumps
> very (very) slow. In this case, having a flag to avoid printing
> duplicated lines is handy.
> 
> Example of a made up repetitive output:
> 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
> 
> Same but with the flag enabled:
> 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> *
> ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb

...

>  For printing small buffers (up to 64 bytes long) as a hex string with a
>  certain separator. For larger buffers consider using
> -:c:func:`print_hex`.

Instead of fixing this (see also comment in previous patch), just add the text
like

:c:func:`print_hex` is  especially useful since duplicated lines can be skipped
automatically to reduce the overhead with the ``DUMP_SKIP_IDENTICAL_LINES`` flag.

> +:c:func:`print_hex`, especially since duplicated lines can be
> +skipped automatically to reduce the overhead with the
> +``DUMP_SKIP_IDENTICAL_LINES`` flag.

Also, can  we also put a sub name spaces to the flags, like for HEX/ASCII

DUMP_DATA_HEX
DUMP_DATA_ASCII

This SKIP  will start a new sub name space.

...

>  #include <linux/errno.h>
>  #include <linux/kernel.h>
>  #include <linux/minmax.h>

> +#include <linux/string.h>
>  #include <linux/export.h>

It's more natural to put it here, with given context it makes more order
(speaking of alphabetical one).

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v3 2/3] hexdump: Allow skipping identical lines

Posted by Miquel Raynal 10 months, 3 weeks ago

>>  For printing small buffers (up to 64 bytes long) as a hex string with a
>>  certain separator. For larger buffers consider using
>> -:c:func:`print_hex`.
>
> Instead of fixing this (see also comment in previous patch), just add the text
> like
>
> :c:func:`print_hex` is  especially useful since duplicated lines can be skipped
> automatically to reduce the overhead with the
> ``DUMP_SKIP_IDENTICAL_LINES`` flag.

Why are you bothered here? I am sorry but this part of the review is
absolutely pointless. I have no words to emphasize how high the
nitpicking level is. Please refrain yourself.

>> +:c:func:`print_hex`, especially since duplicated lines can be
>> +skipped automatically to reduce the overhead with the
>> +``DUMP_SKIP_IDENTICAL_LINES`` flag.
>
> Also, can  we also put a sub name spaces to the flags, like for HEX/ASCII
>
> DUMP_DATA_HEX
> DUMP_DATA_ASCII

They just refer to the way the data is dumped, so "data" is in the name,
that's true. The fact that they share the same namespace is fortunate
but not super relevant either. I am following the hints discussed in v2
regarding the naming, I am not too attached to these, but they felt
correct, so I use them.

> This SKIP  will start a new sub name space.

Yes. And?

I would have probably chosen a common name space from day 1 if I was
creating these now, but they are already two enums named
"DUMP_PREFIX". I guess we all agree it would be stupid to prefix all
enums "DUMP_PREFIX" anyway? And because you disgusted me from attempting
brave tree-wide changes, I will not rename them.

>
> ...
>
>>  #include <linux/errno.h>
>>  #include <linux/kernel.h>
>>  #include <linux/minmax.h>
>
>> +#include <linux/string.h>
>>  #include <linux/export.h>
>
> It's more natural to put it here, with given context it makes more order
> (speaking of alphabetical one).

If I learned something on these mailing lists, it is that what is
natural for me might not be for others. The alphabetical order is not
respected. You already pointed that in your first review, so I chose
another place which felt more relevant... to me.
e, k, m, s, u. This is the alphabetical order. export.h is not at the
correct place, I am very sorry.

[PATCH v3 1/3] hexdump: Simplify print_hex_dump()
[PATCH v3 2/3] hexdump: Allow skipping identical lines
[PATCH v3 3/3] hexdump: Print the prefix after the last line to show the dump is over