[PATCH 2/2] hexdump: Allow skipping identical lines

Miquel Raynal posted 2 patches 1 year, 3 months ago
There is a newer version of this series
[PATCH 2/2] hexdump: Allow skipping identical lines
Posted by Miquel Raynal 1 year, 3 months ago
When dumping long buffers (especially for debug purposes) it may be very
convenient to sometimes avoid spitting all the lines of the buffer if
the lines are identical. Typically on embedded devices, the console
would be wired to a UART running at 115200 bauds, which makes the dumps
very (very) slow. In this case, having a flag to avoid printing
duplicated lines is handy.

Example of a made up repetitive output:
0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb

Same but with the flag enabled:
0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 Documentation/core-api/printk-formats.rst |  4 +++-
 include/linux/printk.h                    |  1 +
 lib/hexdump.c                             | 21 +++++++++++++++++++--
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 4451ef501936..917b7b401858 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -292,7 +292,9 @@ Raw buffer as a hex string
 
 For printing small buffers (up to 64 bytes long) as a hex string with a
 certain separator. For larger buffers consider using
-:c:func:`print_hex_dump`.
+:c:func:`print_hex_dump`, especially since dupplicated lines can be
+skipped automatically to reduce the overhead with the
+``DUMP_FLAG_SKIP_IDENTICAL_LINES`` flag.
 
 MAC/FDDI addresses
 ------------------
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 5981f29c79c4..33a74a536780 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -719,6 +719,7 @@ enum {
 
 enum {
 	DUMP_FLAG_ASCII,
+	DUMP_FLAG_SKIP_IDENTICAL_LINES,
 };
 
 extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize,
diff --git a/lib/hexdump.c b/lib/hexdump.c
index f485b66aebbb..8749349a4fae 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/types.h>
+#include <linux/string.h>
 #include <linux/ctype.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -239,7 +240,8 @@ EXPORT_SYMBOL(hex_dump_to_buffer);
  * @buf: data blob to dump
  * @len: number of bytes in the @buf
  * @flags: controls the output, typically %DUMP_FLAG_ASCII will print the ascii
- * equivalent after the hex output.
+ * equivalent after the hex output, %DUMP_FLAG_SKIP_IDENTICAL_LINES will display
+ * a single '*' instead of duplicated lines.
  *
  * Given a buffer of u8 data, print_hex_dump() prints a hex + ASCII dump
  * to the kernel log at the specified kernel log level, with an optional
@@ -264,8 +266,9 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
 		    const void *buf, size_t len, unsigned int flags)
 {
 	const u8 *ptr = buf;
-	int i, linelen, remaining = len;
+	int i, prev_i, linelen, remaining = len;
 	unsigned char linebuf[32 * 3 + 2 + 32 + 1];
+	bool same_line = false;
 
 	if (rowsize != 16 && rowsize != 32)
 		rowsize = 16;
@@ -274,6 +277,20 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
 		linelen = min(remaining, rowsize);
 		remaining -= rowsize;
 
+		if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
+			if (i && !memcmp(ptr + i, ptr + prev_i, linelen)) {
+				prev_i = i;
+				if (same_line)
+					continue;
+				same_line = true;
+				printk("%s*\n", level);
+				continue;
+			} else {
+				prev_i = i;
+				same_line = false;
+			}
+		}
+
 		hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize,
 				   linebuf, sizeof(linebuf),
 				   flags & DUMP_FLAG_ASCII);
-- 
2.43.0
Re: [PATCH 2/2] hexdump: Allow skipping identical lines
Posted by Andy Shevchenko 1 year, 3 months ago
On Mon, Aug 26, 2024 at 06:24:16PM +0200, Miquel Raynal wrote:
> When dumping long buffers (especially for debug purposes) it may be very
> convenient to sometimes avoid spitting all the lines of the buffer if
> the lines are identical. Typically on embedded devices, the console
> would be wired to a UART running at 115200 bauds, which makes the dumps
> very (very) slow. In this case, having a flag to avoid printing
> duplicated lines is handy.
> 
> Example of a made up repetitive output:
> 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
> 
> Same but with the flag enabled:
> 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> *
> ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb

The problem here is that without offset we can't see how many lines were
skipped.

Two ways to solve (that come to my mind immediately, maybe more and better):
1) make sure that new flag implies or expects (otherwise BUILD_BUG_ON() or so)
  the OFFSET to be set;
2) [OR] add number of lines skipped in that * line.

Personally I prefer the 1) as I think that you tried to follow the existing
format of user space tools and there is a chance that there are other tools or
scripts that parse the dump to restore the binary contents.

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH 2/2] hexdump: Allow skipping identical lines
Posted by Miquel Raynal 1 year, 3 months ago
Hi Andy,

andriy.shevchenko@linux.intel.com wrote on Mon, 26 Aug 2024 20:35:36
+0300:

> On Mon, Aug 26, 2024 at 06:24:16PM +0200, Miquel Raynal wrote:
> > When dumping long buffers (especially for debug purposes) it may be very
> > convenient to sometimes avoid spitting all the lines of the buffer if
> > the lines are identical. Typically on embedded devices, the console
> > would be wired to a UART running at 115200 bauds, which makes the dumps
> > very (very) slow. In this case, having a flag to avoid printing
> > duplicated lines is handy.
> > 
> > Example of a made up repetitive output:
> > 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
> > 
> > Same but with the flag enabled:
> > 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > *
> > ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb  
> 
> The problem here is that without offset we can't see how many lines were
> skipped.

Yes, this is intended, I prefer to mimic userspace tools behavior.

> Two ways to solve (that come to my mind immediately, maybe more and better):
> 1) make sure that new flag implies or expects (otherwise BUILD_BUG_ON() or so)
>   the OFFSET to be set;

It depends what you are looking for. When I print a 2kiB page and want
to compare the output with some other dump, I will immediately see if
there are more or less skipped lines in the diff. When I want to just
grab the UBI header and skip all the ff's following while asking a full
buffer to be dumped (for kernel development reasons), the amount of
skipped lines is not of interest to me either. Of course this is my own
use case, but I guess there are others.

However this is true it is sometimes also useful to know where we are in
the dump, but the hexdump helpers already include all the interesting
bits for that through the 'prefix_type' parameter :

enum {
	DUMP_PREFIX_NONE,
	DUMP_PREFIX_ADDRESS,
	DUMP_PREFIX_OFFSET
};

See https://elixir.bootlin.com/linux/v4.20.17/source/include/linux/printk.h

I anyway understand the request and will change the example with
something more common, probably, by using one of the two other
prefixes.

> 2) [OR] add number of lines skipped in that * line.

As mentioned above, this is not the intended output.

> Personally I prefer the 1) as I think that you tried to follow the existing
> format of user space tools and there is a chance that there are other tools or
> scripts that parse the dump to restore the binary contents.

Exactly. Also, just simply using the diff command over two dumps
without being polluted by any additions on one side or the other is very
convenient.

Thanks,
Miquèl
Re: [PATCH 2/2] hexdump: Allow skipping identical lines
Posted by Andy Shevchenko 1 year, 3 months ago
On Tue, Aug 27, 2024 at 11:13:53AM +0200, Miquel Raynal wrote:
> andriy.shevchenko@linux.intel.com wrote on Mon, 26 Aug 2024 20:35:36
> +0300:
> > On Mon, Aug 26, 2024 at 06:24:16PM +0200, Miquel Raynal wrote:
> > > When dumping long buffers (especially for debug purposes) it may be very
> > > convenient to sometimes avoid spitting all the lines of the buffer if
> > > the lines are identical. Typically on embedded devices, the console
> > > would be wired to a UART running at 115200 bauds, which makes the dumps
> > > very (very) slow. In this case, having a flag to avoid printing
> > > duplicated lines is handy.
> > > 
> > > Example of a made up repetitive output:
> > > 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
> > > 
> > > Same but with the flag enabled:
> > > 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> > > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > *
> > > ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb  
> > 
> > The problem here is that without offset we can't see how many lines were
> > skipped.
> 
> Yes, this is intended, I prefer to mimic userspace tools behavior.
> 
> > Two ways to solve (that come to my mind immediately, maybe more and better):
> > 1) make sure that new flag implies or expects (otherwise BUILD_BUG_ON() or so)
> >   the OFFSET to be set;
> 
> It depends what you are looking for. When I print a 2kiB page and want
> to compare the output with some other dump, I will immediately see if
> there are more or less skipped lines in the diff. When I want to just
> grab the UBI header and skip all the ff's following while asking a full
> buffer to be dumped (for kernel development reasons), the amount of
> skipped lines is not of interest to me either. Of course this is my own
> use case, but I guess there are others.
> 
> However this is true it is sometimes also useful to know where we are in
> the dump, but the hexdump helpers already include all the interesting
> bits for that through the 'prefix_type' parameter :
> 
> enum {
> 	DUMP_PREFIX_NONE,
> 	DUMP_PREFIX_ADDRESS,
> 	DUMP_PREFIX_OFFSET
> };
> 
> See https://elixir.bootlin.com/linux/v4.20.17/source/include/linux/printk.h
> 
> I anyway understand the request and will change the example with
> something more common, probably, by using one of the two other
> prefixes.
> 
> > 2) [OR] add number of lines skipped in that * line.
> 
> As mentioned above, this is not the intended output.
> 
> > Personally I prefer the 1) as I think that you tried to follow the existing
> > format of user space tools and there is a chance that there are other tools or
> > scripts that parse the dump to restore the binary contents.
> 
> Exactly. Also, just simply using the diff command over two dumps
> without being polluted by any additions on one side or the other is very
> convenient.

I got it, then provide a good examples in the cover letter / commit message,
documentation, and test cases.

After thinking more about this, if the caller asked for DUMP_PREFIX_NONE,
that's what they get if they add also SKIP flag. So, maybe here is no
problem after all :-)

-- 
With Best Regards,
Andy Shevchenko