When dumping long buffers (especially for debug purposes) it may be very
convenient to sometimes avoid spitting all the lines of the buffer if
the lines are identical. Typically on embedded devices, the console
would be wired to a UART running at 115200 bauds, which makes the dumps
very (very) slow. In this case, having a flag to avoid printing
duplicated lines is handy.
Example of a made up repetitive output:
0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
Same but with the flag enabled:
0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
Documentation/core-api/printk-formats.rst | 4 +++-
include/linux/printk.h | 1 +
lib/hexdump.c | 21 +++++++++++++++++++--
3 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index ecccc0473da9c10f45f2464566f690472c61401e..90e6616284d1faf5882019eba8de6bebffe4883a 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -310,7 +310,9 @@ Raw buffer as a hex string
For printing small buffers (up to 64 bytes long) as a hex string with a
certain separator. For larger buffers consider using
-:c:func:`print_hex_dump`.
+:c:func:`print_hex_dump`, especially since dupplicated lines can be
+skipped automatically to reduce the overhead with the
+``DUMP_FLAG_SKIP_IDENTICAL_LINES`` flag.
MAC/FDDI addresses
------------------
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 59e9e4c445108d66a3df422cfeaf79920e2ff08f..f89b4117483dce34d2da2f699848f16304deb942 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -755,6 +755,7 @@ enum {
enum {
DUMP_FLAG_ASCII,
+ DUMP_FLAG_SKIP_IDENTICAL_LINES,
};
extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize,
diff --git a/lib/hexdump.c b/lib/hexdump.c
index 4ac9c32c28a046d2ca037eaef95c785c1a866627..eaacd3f95b0442c0cebe884b26d9fc12e237eb68 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -4,6 +4,7 @@
*/
#include <linux/types.h>
+#include <linux/string.h>
#include <linux/ctype.h>
#include <linux/errno.h>
#include <linux/kernel.h>
@@ -239,7 +240,8 @@ EXPORT_SYMBOL(hex_dump_to_buffer);
* @buf: data blob to dump
* @len: number of bytes in the @buf
* @flags: controls the output, typically %DUMP_FLAG_ASCII will print the ascii
- * equivalent after the hex output.
+ * equivalent after the hex output, %DUMP_FLAG_SKIP_IDENTICAL_LINES will display
+ * a single '*' instead of duplicated lines.
*
* Given a buffer of u8 data, print_hex_dump() prints a hex + ASCII dump
* to the kernel log at the specified kernel log level, with an optional
@@ -264,8 +266,9 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
const void *buf, size_t len, unsigned int flags)
{
const u8 *ptr = buf;
- int i, linelen, remaining = len;
+ int i, prev_i, linelen, remaining = len;
unsigned char linebuf[32 * 3 + 2 + 32 + 1];
+ bool same_line = false;
if (rowsize != 16 && rowsize != 32)
rowsize = 16;
@@ -274,6 +277,20 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
linelen = min(remaining, rowsize);
remaining -= rowsize;
+ if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
+ if (i && !memcmp(ptr + i, ptr + prev_i, linelen)) {
+ prev_i = i;
+ if (same_line)
+ continue;
+ same_line = true;
+ printk("%s*\n", level);
+ continue;
+ } else {
+ prev_i = i;
+ same_line = false;
+ }
+ }
+
hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize,
linebuf, sizeof(linebuf),
flags & DUMP_FLAG_ASCII);
--
2.47.0
On Fri, Jan 10, 2025 at 07:42:05PM +0100, Miquel Raynal wrote:
> When dumping long buffers (especially for debug purposes) it may be very
> convenient to sometimes avoid spitting all the lines of the buffer if
> the lines are identical. Typically on embedded devices, the console
> would be wired to a UART running at 115200 bauds, which makes the dumps
> very (very) slow. In this case, having a flag to avoid printing
> duplicated lines is handy.
>
> Example of a made up repetitive output:
> 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
>
> Same but with the flag enabled:
> 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> *
> ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
Still thinking that it's not okay to leave the cases where hex_dump_to_buffer()
is being used for the similar. I would expect that to be modified as well.
As told in v1 thread this can be achieved using a context data, instead of
providing zillion fields, one of which may be a kind of CRC32 checksum that
makes this work without any additional allocation.
But I won't prevent you to go with this if you get a blessing from other
PRINTK/PRINTF maintainers/reviewers.
...
> #include <linux/types.h>
> +#include <linux/string.h>
Can we keep it ordered (to some extent)? I know that types.h is misplaced here.
> #include <linux/ctype.h>
> #include <linux/errno.h>
> #include <linux/kernel.h>
...
> + if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
> + if (i && !memcmp(ptr + i, ptr + prev_i, linelen)) {
> + prev_i = i;
Can we rather use a hash function or so instead of memcmp()?
> + if (same_line)
> + continue;
> + same_line = true;
> + printk("%s*\n", level);
> + continue;
> + } else {
Redundant 'else'.
> + prev_i = i;
> + same_line = false;
> + }
> + }
Something like
unsigned long hcur, hprev = ~0; // any unrealistic init value
...
if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
hcur = $HASH($LINE);
if (hcur == hprev) {
...
continue;
}
hprev = hcur;
}
--
With Best Regards,
Andy Shevchenko
On Mon 2025-01-13 14:35:41, Andy Shevchenko wrote:
> On Fri, Jan 10, 2025 at 07:42:05PM +0100, Miquel Raynal wrote:
> > When dumping long buffers (especially for debug purposes) it may be very
> > convenient to sometimes avoid spitting all the lines of the buffer if
> > the lines are identical. Typically on embedded devices, the console
> > would be wired to a UART running at 115200 bauds, which makes the dumps
> > very (very) slow. In this case, having a flag to avoid printing
> > duplicated lines is handy.
> >
> > Example of a made up repetitive output:
> > 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
> >
> > Same but with the flag enabled:
> > 0f 53 63 47 56 55 78 7a aa b7 8c ff ff ff ff ff
> > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > *
> > ff ff ff ff ff ff ff ff ff ff ff ff 01 2a 39 eb
>
> Still thinking that it's not okay to leave the cases where hex_dump_to_buffer()
> is being used for the similar. I would expect that to be modified as well.
> As told in v1 thread this can be achieved using a context data, instead of
> providing zillion fields, one of which may be a kind of CRC32 checksum that
> makes this work without any additional allocation.
>
> But I won't prevent you to go with this if you get a blessing from other
> PRINTK/PRINTF maintainers/reviewers.
Honestly, I never felt as a maintainer of the hexdump API.
I reviewed patches when time permitted but the changes always went
in by Andrew ;-)
Also I do not know the history of the two APIs. It seems that
hex_dump_to_buffer() is capable of writing more lines but
it seems to be primary used to fill one line.
This might explain why it does not handle the prefix...
=> hex_dump_to_buffer() is not much useful for dumping more
lines because they would be hard to analyze without the prefix,
...
=> print_hex_dump() is the API for dumping more lines
IMHO, it is perfectly fine to add support for skipping identical lines
only to print_hex_dump(). And I would go even further and replace
void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
int rowsize, int groupsize,
const void *buf, size_t len, bool ascii)
with
void print_hex_dump(const char *level, const char *prefix_str,
enum hex_dump_type,
int rowsize, int groupsize,
const void *buf, size_t len)
and combine all the flags into the one enum:
enum hex_dump_type {
DUMP_HEX_ONLY = 0,
DUMP_HEX_AND_ASCII = BIT(1),
DUMP_PREFIX_ADDRESS = BIT(2),
DUMP_PREFIX_OFFSET = BIT(3),
DUMP_SKIP_IDENTICAL_LINES = BIT(4),
};
How does that sound, please?
Best Regards,
Petr
On Fri, 17 Jan 2025 17:27:26 +0100
Petr Mladek <pmladek@suse.com> wrote:
...
> IMHO, it is perfectly fine to add support for skipping identical lines
> only to print_hex_dump(). And I would go even further and replace
>
> void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
> int rowsize, int groupsize,
> const void *buf, size_t len, bool ascii)
>
> with
>
> void print_hex_dump(const char *level, const char *prefix_str,
> enum hex_dump_type,
> int rowsize, int groupsize,
> const void *buf, size_t len)
>
> and combine all the flags into the one enum:
>
> enum hex_dump_type {
> DUMP_HEX_ONLY = 0,
> DUMP_HEX_AND_ASCII = BIT(1),
> DUMP_PREFIX_ADDRESS = BIT(2),
> DUMP_PREFIX_OFFSET = BIT(3),
> DUMP_SKIP_IDENTICAL_LINES = BIT(4),
> };
>
> How does that sound, please?
Rename it as (say) print_hex() and add wrappers for the existing callers?
David
>
> Best Regards,
> Petr
>
Hello David & Petr,
On 17/01/2025 at 19:25:22 GMT, David Laight <david.laight.linux@gmail.com> wrote:
> On Fri, 17 Jan 2025 17:27:26 +0100
> Petr Mladek <pmladek@suse.com> wrote:
>
> ...
>> IMHO, it is perfectly fine to add support for skipping identical lines
>> only to print_hex_dump(). And I would go even further and replace
>>
>> void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
>> int rowsize, int groupsize,
>> const void *buf, size_t len, bool ascii)
>>
>> with
>>
>> void print_hex_dump(const char *level, const char *prefix_str,
>> enum hex_dump_type,
>> int rowsize, int groupsize,
>> const void *buf, size_t len)
>>
>> and combine all the flags into the one enum:
>>
>> enum hex_dump_type {
>> DUMP_HEX_ONLY = 0,
>> DUMP_HEX_AND_ASCII = BIT(1),
>> DUMP_PREFIX_ADDRESS = BIT(2),
>> DUMP_PREFIX_OFFSET = BIT(3),
>> DUMP_SKIP_IDENTICAL_LINES = BIT(4),
>> };
Would a single enum (in the prototype of the function) work? I like the
idea but we need some kind of OR combination to be supported, typically:
DUMP_HEX_AND_ASCII | DUMP_PREFIX_OFFSET | DUMP_SKIP_IDENTICAL_LINES
Maybe something like:
void print_hex(const char *level, const char *prefix_str,
int rowsize, int groupsize,
const void *buf, size_t len,
unsigned int dump_flags) // flags instead of enum?
enum hex_dump_flags {
// I'm not sure what to do with the default value?
DUMP_ASCII = BIT(0), // renamed?
DUMP_PREFIX_ADDRESS = BIT(1),
DUMP_PREFIX_OFFSET = BIT(2),
DUMP_SKIP_IDENTICAL_LINES = BIT(3),
};
>> How does that sound, please?
>
> Rename it as (say) print_hex() and add wrappers for the existing callers?
That would avoid the treewide changes, so yes I can try that, definitely.
Thanks,
Miquèl
On Mon 2025-01-20 10:29:27, Miquel Raynal wrote:
> Hello David & Petr,
>
> On 17/01/2025 at 19:25:22 GMT, David Laight <david.laight.linux@gmail.com> wrote:
>
> > On Fri, 17 Jan 2025 17:27:26 +0100
> > Petr Mladek <pmladek@suse.com> wrote:
> >
> > ...
> >> IMHO, it is perfectly fine to add support for skipping identical lines
> >> only to print_hex_dump(). And I would go even further and replace
> >>
> >> void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
> >> int rowsize, int groupsize,
> >> const void *buf, size_t len, bool ascii)
> >>
> >> with
> >>
> >> void print_hex_dump(const char *level, const char *prefix_str,
> >> enum hex_dump_type,
> >> int rowsize, int groupsize,
> >> const void *buf, size_t len)
> >>
> >> and combine all the flags into the one enum:
> >>
> >> enum hex_dump_type {
> >> DUMP_HEX_ONLY = 0,
> >> DUMP_HEX_AND_ASCII = BIT(1),
> >> DUMP_PREFIX_ADDRESS = BIT(2),
> >> DUMP_PREFIX_OFFSET = BIT(3),
> >> DUMP_SKIP_IDENTICAL_LINES = BIT(4),
> >> };
>
> Would a single enum (in the prototype of the function) work? I like the
> idea but we need some kind of OR combination to be supported, typically:
>
> DUMP_HEX_AND_ASCII | DUMP_PREFIX_OFFSET | DUMP_SKIP_IDENTICAL_LINES
>
> Maybe something like:
>
> void print_hex(const char *level, const char *prefix_str,
> int rowsize, int groupsize,
> const void *buf, size_t len,
> unsigned int dump_flags) // flags instead of enum?
Yes, the parameter would need to be an unsigned int or unsigned long
type so that we could use the logical OR operation.
We could also define the bits without the enum type because the enum
type won't be used anywhere.
I just thought that you wanted to have it "enum". AFAIK, workqueues
code uses enum because it allows to use the names of the bits in
crash/gdb. The enum will cause that the names of the values will
appear in the debug info...
> enum hex_dump_flags {
> // I'm not sure what to do with the default value?
I would define
DUMP_HEX_ONLY = 0,
or
DUMP_HEX_DATA = 0,
It would be used only when the caller wants only the plain hex data.
> DUMP_ASCII = BIT(0), // renamed?
You are right that DUMP_ASCII might be enough. The names of the flags
would mean what the caller wants on top of the plain hex data.
> DUMP_PREFIX_ADDRESS = BIT(1),
> DUMP_PREFIX_OFFSET = BIT(2),
> DUMP_SKIP_IDENTICAL_LINES = BIT(3),
> };
> >> How does that sound, please?
> >
> > Rename it as (say) print_hex() and add wrappers for the existing callers?
>
> That would avoid the treewide changes, so yes I can try that, definitely.
I am fine with it. But a treewide clean up (2 parameters -> one flags)
would be better from my POV.
Best Regards,
Petr
Hi,
On 1/10/25 10:42 AM, Miquel Raynal wrote:
> diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
> index ecccc0473da9c10f45f2464566f690472c61401e..90e6616284d1faf5882019eba8de6bebffe4883a 100644
> --- a/Documentation/core-api/printk-formats.rst
> +++ b/Documentation/core-api/printk-formats.rst
> @@ -310,7 +310,9 @@ Raw buffer as a hex string
>
> For printing small buffers (up to 64 bytes long) as a hex string with a
> certain separator. For larger buffers consider using
> -:c:func:`print_hex_dump`.
> +:c:func:`print_hex_dump`, especially since dupplicated lines can be
duplicated
> +skipped automatically to reduce the overhead with the
> +``DUMP_FLAG_SKIP_IDENTICAL_LINES`` flag.
>
> MAC/FDDI addresses
> ------------------
--
~Randy
On Fri, 10 Jan 2025 19:42:05 +0100
Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> When dumping long buffers (especially for debug purposes) it may be very
> convenient to sometimes avoid spitting all the lines of the buffer if
> the lines are identical. Typically on embedded devices, the console
> would be wired to a UART running at 115200 bauds, which makes the dumps
> very (very) slow. In this case, having a flag to avoid printing
> duplicated lines is handy.
...
> enum {
> DUMP_FLAG_ASCII,
> + DUMP_FLAG_SKIP_IDENTICAL_LINES,
> };
...
> + if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
That doesn't look right to me.
You want:
enum {
DUMP_FLAG_HEX_ONLY = false,
DUMP_FLAG_ASCII = true,
DUMP_FLAG_SKIP_IDENTICAL_LINES = BIT(1),
};
and maybe you can get away with not changing all the other files.
David
Hi David,
On 10/01/2025 at 19:39:30 GMT, David Laight <david.laight.linux@gmail.com> wrote:
> On Fri, 10 Jan 2025 19:42:05 +0100
> Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
>> When dumping long buffers (especially for debug purposes) it may be very
>> convenient to sometimes avoid spitting all the lines of the buffer if
>> the lines are identical. Typically on embedded devices, the console
>> would be wired to a UART running at 115200 bauds, which makes the dumps
>> very (very) slow. In this case, having a flag to avoid printing
>> duplicated lines is handy.
> ...
>> enum {
>> DUMP_FLAG_ASCII,
>> + DUMP_FLAG_SKIP_IDENTICAL_LINES,
>> };
> ...
>> + if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
>
>
> That doesn't look right to me.
> You want:
> enum {
> DUMP_FLAG_HEX_ONLY = false,
> DUMP_FLAG_ASCII = true,
> DUMP_FLAG_SKIP_IDENTICAL_LINES = BIT(1),
> };
>
> and maybe you can get away with not changing all the other files.
I'm a bit sad all the time spent on these changes will go to trash :),
they kind of looked "nicer", but for sure this approach would be
transparent. I can definitely try that.
Thanks,
Miquèl
On Sat, Jan 11, 2025 at 10:54:58AM +0100, Miquel Raynal wrote:
> On 10/01/2025 at 19:39:30 GMT, David Laight <david.laight.linux@gmail.com> wrote:
> > On Fri, 10 Jan 2025 19:42:05 +0100
> > Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >> When dumping long buffers (especially for debug purposes) it may be very
> >> convenient to sometimes avoid spitting all the lines of the buffer if
> >> the lines are identical. Typically on embedded devices, the console
> >> would be wired to a UART running at 115200 bauds, which makes the dumps
> >> very (very) slow. In this case, having a flag to avoid printing
> >> duplicated lines is handy.
> > ...
> >> enum {
> >> DUMP_FLAG_ASCII,
> >> + DUMP_FLAG_SKIP_IDENTICAL_LINES,
> >> };
> > ...
> >> + if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
> >
> >
> > That doesn't look right to me.
> > You want:
> > enum {
> > DUMP_FLAG_HEX_ONLY = false,
> > DUMP_FLAG_ASCII = true,
> > DUMP_FLAG_SKIP_IDENTICAL_LINES = BIT(1),
> > };
> >
> > and maybe you can get away with not changing all the other files.
>
> I'm a bit sad all the time spent on these changes will go to trash :),
Oh, you can imagine my frustration when I contribute and it goes to trash.
I have an experience with that kind of events :-)
> they kind of looked "nicer", but for sure this approach would be
> transparent. I can definitely try that.
--
With Best Regards,
Andy Shevchenko
On Sat, 11 Jan 2025 10:54:58 +0100
Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> Hi David,
>
> On 10/01/2025 at 19:39:30 GMT, David Laight <david.laight.linux@gmail.com> wrote:
>
> > On Fri, 10 Jan 2025 19:42:05 +0100
> > Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> >> When dumping long buffers (especially for debug purposes) it may be very
> >> convenient to sometimes avoid spitting all the lines of the buffer if
> >> the lines are identical. Typically on embedded devices, the console
> >> would be wired to a UART running at 115200 bauds, which makes the dumps
> >> very (very) slow. In this case, having a flag to avoid printing
> >> duplicated lines is handy.
> > ...
> >> enum {
> >> DUMP_FLAG_ASCII,
> >> + DUMP_FLAG_SKIP_IDENTICAL_LINES,
> >> };
> > ...
> >> + if (flags & DUMP_FLAG_SKIP_IDENTICAL_LINES) {
> >
> >
> > That doesn't look right to me.
> > You want:
> > enum {
> > DUMP_FLAG_HEX_ONLY = false,
> > DUMP_FLAG_ASCII = true,
> > DUMP_FLAG_SKIP_IDENTICAL_LINES = BIT(1),
> > };
> >
> > and maybe you can get away with not changing all the other files.
>
> I'm a bit sad all the time spent on these changes will go to trash :),
> they kind of looked "nicer", but for sure this approach would be
> transparent. I can definitely try that.
The only way the big patch would ever get applied is if Linus himself
decided it was a good idea and 'just applied it'.
Otherwise it needs an ack from all the maintainers.
The other way to avoid having to change all the files is to rename the
function and add a compile-time wrapper for the old users.
So you'd end up with (something like):
#define hexdump(args, ascii) hexdump_new(args, ascii ? DUMP_FLAG_ASCII : DUMP_FLAG_HEX_ONLY)
But in this case you might be away with binary compatibility.
David
© 2016 - 2026 Red Hat, Inc.