kernel/printk/printk_ringbuffer.c | 38 +++++++++++++++++-------------- 1 file changed, 21 insertions(+), 17 deletions(-)
This series fixes the issue where data blocks would wrap in cases where the last data block perfectly fits the ring. This caused whatever was at the beginning of the ring to get discarded in this case, and the data block would get put there even though it could be put at the end of the data ring just fine without discarding anything. Fixing this issue also allows to simplify the check in data_check_size, previously it would ensure there's space for a trailing id, which we don't need anymore. v0->v1: - Fix severely broken code alignment v1->v2: - Rename & invert get_next_lpos -> is_blk_wrapped - Add a new commit for changing the logic in data_check_size Daniil Tatianin (2): printk_ringbuffer: don't needlessly wrap data blocks around printk_ringbuffer: allow one data block to occupy the entire data ring kernel/printk/printk_ringbuffer.c | 38 +++++++++++++++++-------------- 1 file changed, 21 insertions(+), 17 deletions(-) -- 2.43.0
On Fri 2025-09-05 17:41:50, Daniil Tatianin wrote:
> This series fixes the issue where data blocks would wrap in cases where the last
> data block perfectly fits the ring. This caused whatever was at the beginning of
> the ring to get discarded in this case, and the data block would get put there
> even though it could be put at the end of the data ring just fine without
> discarding anything.
>
> Fixing this issue also allows to simplify the check in data_check_size,
> previously it would ensure there's space for a trailing id, which we
> don't need anymore.
>
> v0->v1:
> - Fix severely broken code alignment
>
> v1->v2:
> - Rename & invert get_next_lpos -> is_blk_wrapped
> - Add a new commit for changing the logic in data_check_size
The patchset looks good to me. But I wanted to do some tests
and it failed. I did the following:
1. Applied this patchset on top of printk/linux.git, branch for-next,
https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git/log/?h=for-next
I this branch because it contains a new KUnit test for the printk
ring buffer.
2. I applied the following patch:
+ It reduces the size of the data ring. If I count it correctly
it should be 256 (2 << 8).
+ It increases the maximal size of the text so that the maximal
record size is 256.
3. I built it with Kasan enabled:
$> grep KASAN .config
CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_KASAN_SW_TAGS=y
CONFIG_KASAN=y
CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
CONFIG_KASAN_GENERIC=y
# CONFIG_KASAN_OUTLINE is not set
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=y
CONFIG_KASAN_VMALLOC=y
# CONFIG_KASAN_KUNIT_TEST is not set
# CONFIG_KASAN_EXTRA_INFO is not set
4. I loaded the test module:
# depmod
# modprobe printk_ringbuffer_kunit_test
And I got the following:
[ 128.983916] printk_ringbuffer_kunit_test: module verification failed: signature and/or required key missing - tainting kernel
[ 128.993363] KTAP version 1
[ 128.993643] 1..1
[ 128.994282] KTAP version 1
[ 128.994579] # Subtest: printk-ringbuffer
[ 128.994908] # module: printk_ringbuffer_kunit_test
[ 128.994938] 1..1
[ 128.996142] # test_readerwriter: running for 10000 ms
[ 128.997243] # test_readerwriter: start thread 001 (writer)
[ 128.997545] # test_readerwriter: start thread 002 (writer)
[ 128.997767] # test_readerwriter: start thread 003 (writer)
[ 129.000142] # test_readerwriter: start thread 004 (writer)
[ 129.001086] # test_readerwriter: start thread 005 (writer)
[ 129.003865] # test_readerwriter: start thread 006 (writer)
[ 129.004125] # test_readerwriter: start thread 007 (writer)
[ 129.004329] # test_readerwriter: start thread 008 (writer)
[ 129.004584] # test_readerwriter: start thread 009 (writer)
[ 129.004896] # test_readerwriter: start thread 010 (writer)
[ 129.005089] # test_readerwriter: starting test
[ 129.005092] # test_readerwriter: start thread 011 (writer)
[ 129.011876] # test_readerwriter: start reader
[ 129.012639] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=5 size=1145324612 text=
[ 129.014417] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=6 size=165 text=DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
[ 129.017373] ------------[ cut here ]------------
[ 129.017927] WARNING: CPU: 0 PID: 1497 at kernel/printk/printk_ringbuffer.c:1266 get_data+0x240/0x350
[ 129.018920] Modules linked in: printk_ringbuffer_kunit_test(E)
[ 129.019628] CPU: 0 UID: 0 PID: 1497 Comm: kunit_try_catch Tainted: G E N 6.16.0-default+ #159 PREEMPT(full) 20fb7e6f689e04f2e29367f5d7a03c64ac29a9f9
[ 129.021181] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
[ 129.021746] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
[ 129.022898] RIP: 0010:get_data+0x240/0x350
[ 129.023397] Code: 89 4c 24 08 4c 89 1c 24 e8 0d 21 81 00 4c 8b 44 24 20 48 8b 74 24 18 4c 8b 54 24 10 4c 8b 4c 24 08 4c 8b 1c 24 e9 9a fe ff ff <0f> 0b 31 c0 e9 6a ff ff ff 48 89 7c 24 08 48 89 f7 4c 89 44 24 10
[ 129.025323] RSP: 0018:ffffc90004d2f508 EFLAGS: 00010297
[ 129.025935] RAX: ffffffffc02194c0 RBX: ffffffffc0217780 RCX: 0000000000000b00
[ 129.026729] RDX: 00000000000000ff RSI: 0000000000000b00 RDI: 00000000000009b0
[ 129.027522] RBP: 00000000000000f5 R08: ffffc90004d2f5d0 R09: 00000000000009b0
[ 129.028328] R10: 0000000000000000 R11: ffffffffc02177b0 R12: ffffc90004d2f770
[ 129.029122] R13: dffffc0000000000 R14: ffffc90004d2f6b8 R15: 000000000000000c
[ 129.029915] FS: 0000000000000000(0000) GS:ffff888193898000(0000) knlGS:0000000000000000
[ 129.030800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 129.031457] CR2: 000056180c2595a8 CR3: 000000011caee000 CR4: 0000000000750ef0
[ 129.032271] PKRU: 55555554
[ 129.032637] Call Trace:
[ 129.032976] <TASK>
[ 129.033280] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.033849] ? __asan_memcpy+0x3c/0x60
[ 129.034320] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.034889] _prb_read_valid+0x27e/0x620
[ 129.035397] ? __pfx__prb_read_valid+0x10/0x10
[ 129.035956] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.036531] ? __pfx___kunit_do_failed_assertion+0x10/0x10
[ 129.037176] ? kunit_log_append+0xb3/0xf0
[ 129.037672] ? __pfx_kunit_log_append+0x10/0x10
[ 129.038220] prb_read_valid+0x69/0x90
[ 129.038688] ? __pfx_prb_read_valid+0x10/0x10
[ 129.039214] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.039808] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.040381] prbtest_reader.isra.0+0x2ae/0x75c [printk_ringbuffer_kunit_test c283de0751f9c111bb1c6a71f2f57c1c47bab8d5]
[ 129.041538] ? __pfx_prbtest_reader.isra.0+0x10/0x10 [printk_ringbuffer_kunit_test c283de0751f9c111bb1c6a71f2f57c1c47bab8d5]
[ 129.042779] ? lock_acquire.part.0+0xc7/0x250
[ 129.043308] ? local_clock_noinstr+0xf/0x100
[ 129.043862] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.044428] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.044990] ? __lock_release.isra.0+0x1cb/0x300
[ 129.045547] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.046109] ? rcu_is_watching+0x15/0xb0
[ 129.046600] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.047159] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.047721] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.048324] ? __pfx_wait_for_completion+0x10/0x10
[ 129.048902] ? __pfx_prbtest_wakeup_callback+0x10/0x10 [printk_ringbuffer_kunit_test c283de0751f9c111bb1c6a71f2f57c1c47bab8d5]
[ 129.050107] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.050728] ? wait_for_completion+0x99/0x360
[ 129.051253] ? __pfx_migration_cpu_stop+0x10/0x10
[ 129.051833] ? affine_move_task+0xb41/0x1020
[ 129.052356] ? cpumask_any_and_distribute+0xee/0x130
[ 129.052943] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.053504] ? __set_cpus_allowed_ptr_locked+0x362/0x610
[ 129.054127] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.054689] ? __set_cpus_allowed_ptr+0x1b1/0x2c0
[ 129.055240] ? __pfx_prbtest_kthread_cleanup+0x10/0x10 [printk_ringbuffer_kunit_test c283de0751f9c111bb1c6a71f2f57c1c47bab8d5]
[ 129.056461] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.057019] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.057580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.058141] ? __pfx___set_cpus_allowed_ptr+0x10/0x10
[ 129.058738] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.059297] ? kunit_log_append+0xb3/0xf0
[ 129.059820] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.060379] ? set_cpus_allowed_ptr+0x80/0xb0
[ 129.060901] ? __pfx_set_cpus_allowed_ptr+0x10/0x10
[ 129.061471] ? kthread_bind+0x14c/0x1a0
[ 129.061962] test_readerwriter.cold+0xd3/0x244 [printk_ringbuffer_kunit_test c283de0751f9c111bb1c6a71f2f57c1c47bab8d5]
[ 129.063106] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.063668] ? lock_acquire.part.0+0xc7/0x250
[ 129.064213] ? __pfx_test_readerwriter+0x10/0x10 [printk_ringbuffer_kunit_test c283de0751f9c111bb1c6a71f2f57c1c47bab8d5]
[ 129.065377] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.065939] ? __lock_release.isra.0+0x1cb/0x300
[ 129.066494] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.067061] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.067627] ? lockdep_hardirqs_on+0x8c/0x130
[ 129.068168] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.068732] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.069293] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.069858] ? ktime_get_ts64+0x80/0x250
[ 129.070356] kunit_try_run_case+0x179/0x290
[ 129.070865] ? __pfx_kunit_try_run_case+0x10/0x10
[ 129.071420] ? __lock_release.isra.0+0x1cb/0x300
[ 129.071995] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.072607] ? rcu_is_watching+0x15/0xb0
[ 129.073088] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 129.073779] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.074341] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.074906] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.075471] ? lockdep_hardirqs_on+0x8c/0x130
[ 129.076016] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.076588] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 129.077270] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.077840] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 129.078529] kunit_generic_run_threadfn_adapter+0x82/0xf0
[ 129.079159] kthread+0x3b0/0x750
[ 129.079576] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.080165] ? local_clock_noinstr+0xf/0x100
[ 129.080689] ? __pfx_kthread+0x10/0x10
[ 129.081157] ? __lock_release.isra.0+0x1cb/0x300
[ 129.081716] ? srso_alias_return_thunk+0x5/0xfbef5
[ 129.082282] ? rcu_is_watching+0x15/0xb0
[ 129.082774] ? __pfx_kthread+0x10/0x10
[ 129.083246] ret_from_fork+0x38e/0x4b0
[ 129.083719] ? __pfx_kthread+0x10/0x10
[ 129.084207] ? __pfx_kthread+0x10/0x10
[ 129.084680] ret_from_fork_asm+0x1a/0x30
[ 129.085195] </TASK>
[ 129.085512] irq event stamp: 1867
[ 129.085930] hardirqs last enabled at (1877): [<ffffffffb56f82de>] __up_console_sem+0x5e/0x70
[ 129.086858] hardirqs last disabled at (1886): [<ffffffffb56f82c3>] __up_console_sem+0x43/0x70
[ 129.087783] softirqs last enabled at (1134): [<ffffffffb55416b2>] handle_softirqs+0x502/0x6b0
[ 129.088744] softirqs last disabled at (1129): [<ffffffffb5541982>] __irq_exit_rcu+0x112/0x230
[ 129.089668] ---[ end trace 0000000000000000 ]---
[ 139.039960] # test_readerwriter: end reader: read=13 seq=15
[ 139.040720] # test_readerwriter: completed test
[ 139.042049] # test_readerwriter: end thread 011: wrote=13124589
[ 139.042860] # test_readerwriter: end thread 010: wrote=13101241
[ 139.043667] # test_readerwriter: end thread 009: wrote=16785932
[ 139.044803] # test_readerwriter: end thread 008: wrote=13123788
[ 139.045626] # test_readerwriter: end thread 007: wrote=16778106
[ 139.046259] # test_readerwriter: end thread 006: wrote=13119340
[ 139.046857] # test_readerwriter: end thread 005: wrote=16800807
[ 139.047457] # test_readerwriter: end thread 004: wrote=13136578
[ 139.048084] # test_readerwriter: end thread 003: wrote=16805445
[ 139.048665] # test_readerwriter: end thread 002: wrote=13156382
[ 139.049232] # test_readerwriter: end thread 001: wrote=16847574
[ 139.050000] # test_readerwriter.speed: slow
[ 139.050034] not ok 1 test_readerwriter
[ 139.050497] not ok 1 printk-ringbuffer
Observation:
1. The test failed very early. The sequence numbers of the broken
records are 5 and 6.
2. The record with the sequence number 5 has very high size:
BAD RECORD: seq=5 size=1145324612 text=
3. It triggered an internal ring buffer warning:
WARNING: CPU: 0 PID: 1497 at
kernel/printk/printk_ringbuffer.c:1266 get_data+0x240/0x350
It is the following one:
static const char *get_data(struct prb_data_ring *data_ring,
struct prb_data_blk_lpos *blk_lpos,
unsigned int *data_size)
{
[...]
/* A valid data block will always have at least an ID. */
if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
return NULL;
It looks to me that the ring buffer got corrupted.
I am not sure if it is caused by this patchset or
if the problem was there even before or
if it is a bug in the KUnit test.
It has to be investigated.
Best Regards,
Petr
On Thu 2025-09-11 17:30:36, Petr Mladek wrote:
> On Fri 2025-09-05 17:41:50, Daniil Tatianin wrote:
> > This series fixes the issue where data blocks would wrap in cases where the last
> > data block perfectly fits the ring. This caused whatever was at the beginning of
> > the ring to get discarded in this case, and the data block would get put there
> > even though it could be put at the end of the data ring just fine without
> > discarding anything.
> >
> > Fixing this issue also allows to simplify the check in data_check_size,
> > previously it would ensure there's space for a trailing id, which we
> > don't need anymore.
> >
> > v0->v1:
> > - Fix severely broken code alignment
> >
> > v1->v2:
> > - Rename & invert get_next_lpos -> is_blk_wrapped
> > - Add a new commit for changing the logic in data_check_size
>
> The patchset looks good to me. But I wanted to do some tests
> and it failed. I did the following:
>
> 1. Applied this patchset on top of printk/linux.git, branch for-next,
> https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git/log/?h=for-next
>
> I this branch because it contains a new KUnit test for the printk
> ring buffer.
I forgot the add the patch:
diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
index 2282348e869a..241f7ef49ac6 100644
--- a/kernel/printk/printk_ringbuffer_kunit_test.c
+++ b/kernel/printk/printk_ringbuffer_kunit_test.c
@@ -56,7 +56,7 @@ struct prbtest_rbdata {
char text[] __counted_by(size);
};
-#define MAX_RBDATA_TEXT_SIZE 0x80
+#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
#define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
struct prbtest_data {
@@ -251,7 +251,7 @@ static inline void prbtest_prb_reinit(struct printk_ringbuffer *rb)
static void test_readerwriter(struct kunit *test)
{
/* Equivalent to CONFIG_LOG_BUF_SHIFT=13 */
- DEFINE_PRINTKRB(test_rb, 8, 5);
+ DEFINE_PRINTKRB(test_rb, 4, 4);
struct prbtest_thread_data *thread_data;
struct prbtest_data *test_data;
--
2.51.0
Best Regards,
Petr
On 2025-09-11, Petr Mladek <pmladek@suse.com> wrote:
> diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> index 2282348e869a..241f7ef49ac6 100644
> --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> char text[] __counted_by(size);
> };
>
> -#define MAX_RBDATA_TEXT_SIZE 0x80
> +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
I guess this should be:
#define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
John
On Thu 2025-09-11 18:18:32, John Ogness wrote:
> On 2025-09-11, Petr Mladek <pmladek@suse.com> wrote:
> > diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> > index 2282348e869a..241f7ef49ac6 100644
> > --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> > +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> > @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> > char text[] __counted_by(size);
> > };
> >
> > -#define MAX_RBDATA_TEXT_SIZE 0x80
> > +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
>
> I guess this should be:
>
> #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
Great catch!
But the KUnit test fails even with this change, see below. And I am
not surprised. The test should work even with larger-than-allowed
messages. prbtest_writer() should skip then because prb_reserve()
should fail.
Here is test result with:
#define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
#define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
DEFINE_PRINTKRB(test_rb, 4, 4);
and with this patchset reverted, aka, sources from
printk/linux.git, branch for-next:
[ 229.020133] printk_ringbuffer_kunit_test: module verification failed: signature and/or required key missing - tainting kernel
[ 229.029453] KTAP version 1
[ 229.029793] 1..1
[ 229.030461] KTAP version 1
[ 229.030771] # Subtest: printk-ringbuffer
[ 229.031111] # module: printk_ringbuffer_kunit_test
[ 229.031142] 1..1
[ 229.032371] # test_readerwriter: running for 10000 ms
[ 229.034348] # test_readerwriter: start thread 001 (writer)
[ 229.034721] # test_readerwriter: start thread 002 (writer)
[ 229.035398] # test_readerwriter: start thread 003 (writer)
[ 229.035697] # test_readerwriter: start thread 004 (writer)
[ 229.038006] # test_readerwriter: start thread 005 (writer)
[ 229.041965] # test_readerwriter: start thread 006 (writer)
[ 229.042244] # test_readerwriter: start thread 007 (writer)
[ 229.042559] # test_readerwriter: start thread 008 (writer)
[ 229.042748] # test_readerwriter: start thread 009 (writer)
[ 229.042996] # test_readerwriter: start thread 010 (writer)
[ 229.043288] # test_readerwriter: starting test
[ 229.043299] # test_readerwriter: start thread 011 (writer)
[ 229.043370] # test_readerwriter: start reader
[ 229.043574] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=15 size=1111638594 text=
[ 239.133686] # test_readerwriter: end reader: read=14 seq=24
[ 239.134417] # test_readerwriter: completed test
[ 239.136019] # test_readerwriter: end thread 011: wrote=10466402
[ 239.136856] # test_readerwriter: end thread 010: wrote=10461343
[ 239.137825] # test_readerwriter: end thread 009: wrote=13686189
[ 239.138478] # test_readerwriter: end thread 008: wrote=10475551
[ 239.139106] # test_readerwriter: end thread 007: wrote=13652622
[ 239.139765] # test_readerwriter: end thread 006: wrote=10473282
[ 239.140376] # test_readerwriter: end thread 005: wrote=10444451
[ 239.140989] # test_readerwriter: end thread 004: wrote=10466857
[ 239.141652] # test_readerwriter: end thread 003: wrote=13733553
[ 239.142243] # test_readerwriter: end thread 002: wrote=13710538
[ 239.142859] # test_readerwriter: end thread 001: wrote=13738411
[ 239.143738] # test_readerwriter.speed: slow
[ 239.143771] not ok 1 test_readerwriter
[ 239.144161] not ok 1 printk-ringbuffer
It is well reproducible. It always fails after reading few records.
Here are results from few other runs:
[ 1617.607865] KTAP version 1
[ 1617.608377] # Subtest: printk-ringbuffer
[ 1617.608891] # module: printk_ringbuffer_kunit_test
[ 1617.609041] 1..1
[ 1617.613633] # test_readerwriter: running for 10000 ms
[ 1617.614796] # test_readerwriter: start thread 001 (writer)
[ 1617.615562] # test_readerwriter: start thread 002 (writer)
[ 1617.616399] # test_readerwriter: start thread 003 (writer)
[ 1617.617911] # test_readerwriter: start thread 004 (writer)
[ 1617.618763] # test_readerwriter: start thread 005 (writer)
[ 1617.619067] # test_readerwriter: start thread 006 (writer)
[ 1617.619368] # test_readerwriter: start thread 007 (writer)
[ 1617.619650] # test_readerwriter: start thread 008 (writer)
[ 1617.619921] # test_readerwriter: start thread 009 (writer)
[ 1617.620211] # test_readerwriter: start thread 010 (writer)
[ 1617.620462] # test_readerwriter: starting test
[ 1617.623169] # test_readerwriter: start thread 011 (writer)
[ 1617.623175] # test_readerwriter: start reader
[ 1617.624508] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=9 size=1111638594 text=
[ 1617.626400] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=13 size=1111638594 text=
[ 1617.628225] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=17 size=1111638594 text=
[ 1627.681624] # test_readerwriter: end reader: read=14 seq=17
[ 1627.682370] # test_readerwriter: completed test
[ 1627.683960] # test_readerwriter: end thread 011: wrote=10452656
[ 1627.684779] # test_readerwriter: end thread 010: wrote=13707777
[ 1627.685412] # test_readerwriter: end thread 009: wrote=10480695
[ 1627.686274] # test_readerwriter: end thread 008: wrote=10470822
[ 1627.687081] # test_readerwriter: end thread 007: wrote=13697351
[ 1627.687716] # test_readerwriter: end thread 006: wrote=10464543
[ 1627.688317] # test_readerwriter: end thread 005: wrote=10488791
[ 1627.688919] # test_readerwriter: end thread 004: wrote=10488529
[ 1627.689492] # test_readerwriter: end thread 003: wrote=13738659
[ 1627.690321] # test_readerwriter: end thread 002: wrote=13718856
[ 1627.690915] # test_readerwriter: end thread 001: wrote=13675859
[ 1627.691694] # test_readerwriter.speed: slow
[ 1627.691753] not ok 1 test_readerwriter
[ 1627.692512] not ok 1 printk-ringbuffer
[ 1627.693708] KTAP version 1
[ 1627.694046] # Subtest: printk-ringbuffer
[ 1627.694385] # module: printk_ringbuffer_kunit_test
[ 1627.694410] 1..1
[ 1627.695698] # test_readerwriter: running for 10000 ms
[ 1627.698351] # test_readerwriter: start thread 001 (writer)
[ 1627.698935] # test_readerwriter: start thread 002 (writer)
[ 1627.699341] # test_readerwriter: start thread 003 (writer)
[ 1627.699657] # test_readerwriter: start thread 004 (writer)
[ 1627.699891] # test_readerwriter: start thread 005 (writer)
[ 1627.700111] # test_readerwriter: start thread 006 (writer)
[ 1627.700337] # test_readerwriter: start thread 007 (writer)
[ 1627.700620] # test_readerwriter: start thread 008 (writer)
[ 1627.700885] # test_readerwriter: start thread 009 (writer)
[ 1627.701134] # test_readerwriter: start thread 010 (writer)
[ 1627.701371] # test_readerwriter: starting test
[ 1627.701374] # test_readerwriter: start thread 011 (writer)
[ 1627.705369] # test_readerwriter: start reader
[ 1627.706115] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=16 size=1111638594 text=
[ 1627.707966] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=20 size=1111638594 text=
[ 1627.710886] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=24 size=16962 text=
[ 1637.917622] # test_readerwriter: end reader: read=14 seq=27
[ 1637.918372] # test_readerwriter: completed test
[ 1637.919848] # test_readerwriter: end thread 011: wrote=10559459
[ 1637.920670] # test_readerwriter: end thread 010: wrote=13811125
[ 1637.921309] # test_readerwriter: end thread 009: wrote=10572663
[ 1637.922176] # test_readerwriter: end thread 008: wrote=10570916
[ 1637.922818] # test_readerwriter: end thread 007: wrote=13832304
[ 1637.923456] # test_readerwriter: end thread 006: wrote=10571421
[ 1637.924082] # test_readerwriter: end thread 005: wrote=10579437
[ 1637.924703] # test_readerwriter: end thread 004: wrote=13861481
[ 1637.925318] # test_readerwriter: end thread 003: wrote=10590865
[ 1637.925977] # test_readerwriter: end thread 002: wrote=13852926
[ 1637.926570] # test_readerwriter: end thread 001: wrote=13845110
[ 1637.927172] # test_readerwriter.speed: slow
[ 1637.927202] not ok 1 test_readerwriter
[ 1637.927611] not ok 1 printk-ringbuffer
[ 1658.412099] KTAP version 1
[ 1658.412409] # Subtest: printk-ringbuffer
[ 1658.412784] # module: printk_ringbuffer_kunit_test
[ 1658.412813] 1..1
[ 1658.416372] # test_readerwriter: running for 10000 ms
[ 1658.417927] # test_readerwriter: start thread 001 (writer)
[ 1658.418309] # test_readerwriter: start thread 002 (writer)
[ 1658.418785] # test_readerwriter: start thread 003 (writer)
[ 1658.418996] # test_readerwriter: start thread 004 (writer)
[ 1658.419262] # test_readerwriter: start thread 005 (writer)
[ 1658.419531] # test_readerwriter: start thread 006 (writer)
[ 1658.419758] # test_readerwriter: start thread 007 (writer)
[ 1658.420028] # test_readerwriter: start thread 008 (writer)
[ 1658.420239] # test_readerwriter: start thread 009 (writer)
[ 1658.420516] # test_readerwriter: start thread 010 (writer)
[ 1658.420737] # test_readerwriter: starting test
[ 1658.420739] # test_readerwriter: start thread 011 (writer)
[ 1658.420791] # test_readerwriter: start reader
[ 1658.420879] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=8 size=217 text=DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD\x10\xa2\xff\xff\xff\xff#\x8b^M\xa4\xff\xff\xff\xff\xb3\x8a\xb5A
[ 1658.421015] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=9 size=217 text=DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD\x10\xa2\xff\xff\xff\xff#\x8b^M\xa4\xff\xff\xff\xff\xb3\x8a\xb5A
[ 1658.421152] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=10 size=1145324612 text=
[ 1658.431961] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=18 size=1145324612 text=
[ 1668.637609] # test_readerwriter: end reader: read=13 seq=20
[ 1668.638384] # test_readerwriter: completed test
[ 1668.639464] # test_readerwriter: end thread 011: wrote=13482649
[ 1668.640116] # test_readerwriter: end thread 010: wrote=10354440
[ 1668.640934] # test_readerwriter: end thread 009: wrote=10388436
[ 1668.642245] # test_readerwriter: end thread 008: wrote=13621613
[ 1668.642886] # test_readerwriter: end thread 007: wrote=10072648
[ 1668.643536] # test_readerwriter: end thread 006: wrote=13582778
[ 1668.644171] # test_readerwriter: end thread 005: wrote=10397092
[ 1668.644792] # test_readerwriter: end thread 004: wrote=13499818
[ 1668.645415] # test_readerwriter: end thread 003: wrote=10362779
[ 1668.646319] # test_readerwriter: end thread 002: wrote=10348508
[ 1668.646924] # test_readerwriter: end thread 001: wrote=13616849
[ 1668.647584] # test_readerwriter.speed: slow
[ 1668.647618] not ok 1 test_readerwriter
[ 1668.648311] not ok 1 printk-ringbuffer
Best Regards,
Petr
On Fri 2025-09-12 11:25:09, Petr Mladek wrote:
> On Thu 2025-09-11 18:18:32, John Ogness wrote:
> > On 2025-09-11, Petr Mladek <pmladek@suse.com> wrote:
> > > diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > index 2282348e869a..241f7ef49ac6 100644
> > > --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> > > +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> > > char text[] __counted_by(size);
> > > };
> > >
> > > -#define MAX_RBDATA_TEXT_SIZE 0x80
> > > +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
> >
> > I guess this should be:
> >
> > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
>
> Great catch!
>
> But the KUnit test fails even with this change, see below. And I am
> not surprised. The test should work even with larger-than-allowed
> messages. prbtest_writer() should skip then because prb_reserve()
> should fail.
>
> Here is test result with:
>
> #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> #define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
>
> DEFINE_PRINTKRB(test_rb, 4, 4);
>
> and with this patchset reverted, aka, sources from
> printk/linux.git, branch for-next:
>
> It is well reproducible. It always fails after reading few records.
> Here are results from few other runs:
And I am not longer able to reproduce it after limiting the size
of the record to 1/4 of the data buffer size. I did it with
the following change:
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index bc811de18316..2f02254705aa 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
*/
static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
{
- struct prb_data_block *db = NULL;
-
if (size == 0)
return true;
@@ -409,7 +407,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
* at least the ID of the next block.
*/
size = to_blk_size(size);
- if (size > DATA_SIZE(data_ring) - sizeof(db->id))
+ if (size > DATA_SIZE(data_ring) / 4)
return false;
return true;
I guess that there is a race when we need to make all existing records
reusable when making space for the next one.
Another aspect might be the very small amount of descriptors (16).
They are quickly recycled. But it is not a problem after
limiting the size of the record to 1/4.
Note that my test system is using 12 CPUs in KVM.
And it is x86_64.
Best Regards,
Petr
On Fri 2025-09-12 11:55:02, Petr Mladek wrote:
> On Fri 2025-09-12 11:25:09, Petr Mladek wrote:
> > On Thu 2025-09-11 18:18:32, John Ogness wrote:
> > > On 2025-09-11, Petr Mladek <pmladek@suse.com> wrote:
> > > > diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > index 2282348e869a..241f7ef49ac6 100644
> > > > --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> > > > char text[] __counted_by(size);
> > > > };
> > > >
> > > > -#define MAX_RBDATA_TEXT_SIZE 0x80
> > > > +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
> > >
> > > I guess this should be:
> > >
> > > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> >
> > Great catch!
> >
> > But the KUnit test fails even with this change, see below. And I am
> > not surprised. The test should work even with larger-than-allowed
> > messages. prbtest_writer() should skip then because prb_reserve()
> > should fail.
> >
> > Here is test result with:
> >
> > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> > #define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
> >
> > DEFINE_PRINTKRB(test_rb, 4, 4);
> >
> > and with this patchset reverted, aka, sources from
> > printk/linux.git, branch for-next:
> >
> > It is well reproducible. It always fails after reading few records.
> > Here are results from few other runs:
>
> And I am not longer able to reproduce it after limiting the size
> of the record to 1/4 of the data buffer size. I did it with
> the following change:
>
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index bc811de18316..2f02254705aa 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
> */
> static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> {
> - struct prb_data_block *db = NULL;
> -
> if (size == 0)
> return true;
>
> @@ -409,7 +407,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> * at least the ID of the next block.
> */
> size = to_blk_size(size);
> - if (size > DATA_SIZE(data_ring) - sizeof(db->id))
> + if (size > DATA_SIZE(data_ring) / 4)
> return false;
>
> return true;
>
>
> I guess that there is a race when we need to make all existing records
> reusable when making space for the next one.
It looks to me that the reader API is not ready to handle the
situation when all records are "reusable".
At least, it looks like prb_next_seq() might end up in an "infinite"
loop because it blindly increments "seq" until finding a valid record...
Honestly, I would really like to limit the maximal record size to
1/4 of the buffer size. I do not want to make the design more
complicated just to be able to fill just one record, definitely.
That said, I still a bit nervous because I do not understand why
the KUnit test fails. It does not depend on prb_next_seq().
It seems that prb_read_valid() retuns success even when
returning a garbage. I have added some debug output using
trace_printk():
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1856,6 +1856,10 @@ static bool copy_data(struct prb_data_ring *data_ring,
data_size = min_t(unsigned int, buf_size, len);
memcpy(&buf[0], data, data_size); /* LMM(copy_data:A) */
+ if (data_ring != &prb->text_data_ring) {
+ trace_printk(" %s: reading from 0x%lx (%d bytes)\n",
+ __func__, (unsigned long)data, data_size);
+ }
return true;
}
@@ -1939,6 +1943,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 seq,
if (r->info)
memcpy(r->info, info, sizeof(*(r->info)));
+ if (rb != prb)
+ trace_printk("%s: Calling copy_data() for seq=%llu\n", __func__, seq);
+
/* Copy text data. If it fails, this is a data-less record. */
if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, info->text_len,
r->text_buf, r->text_buf_size, line_count)) {
@@ -2160,6 +2167,11 @@ static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
}
}
+ if (rb != prb && r && r->text_buf) {
+ trace_printk("%s: Successfully read record with seq=%llu\n",
+ __func__, *seq);
+ }
+
return true;
}
The test failed with the output:
[ 57.654444] KTAP version 1
[ 57.654739] 1..1
[ 57.655363] KTAP version 1
[ 57.655766] # Subtest: printk-ringbuffer
[ 57.656105] # module: printk_ringbuffer_kunit_test
[ 57.656137] 1..1
[ 57.657997] # test_readerwriter: running for 10000 ms
[ 57.659135] # test_readerwriter: start thread 001 (writer)
[ 57.659328] # test_readerwriter: start thread 002 (writer)
[ 57.659626] # test_readerwriter: start thread 003 (writer)
[ 57.659827] # test_readerwriter: start thread 004 (writer)
[ 57.660008] # test_readerwriter: start thread 005 (writer)
[ 57.660209] # test_readerwriter: start thread 006 (writer)
[ 57.661648] # test_readerwriter: start thread 007 (writer)
[ 57.662040] # test_readerwriter: start thread 008 (writer)
[ 57.662283] # test_readerwriter: start thread 009 (writer)
[ 57.662539] # test_readerwriter: start thread 010 (writer)
[ 57.662833] # test_readerwriter: start thread 011 (writer)
[ 57.662841] # test_readerwriter: starting test
[ 57.662945] # test_readerwriter: start reader
[ 57.663072] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1519 size=1145324612 text=
[ 57.663253] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1520 size=4473924 text=
[ 57.663432] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1524 size=1145324612 text=
[ 57.672125] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1526 size=4473924 text=
[ 57.672322] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1528 size=1145324612 text=
[ 57.675564] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1529 size=1145324612 text=
[ 57.675761] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=1531 size=1145324612 text=
[ 67.904538] # test_readerwriter: end reader: read=14 seq=1531
[ 67.905316] # test_readerwriter: completed test
[ 67.907023] # test_readerwriter: end thread 011: wrote=0 by attempts=10260578
[ 67.907974] # test_readerwriter: end thread 010: wrote=0 by attempts=10271385
[ 67.908929] # test_readerwriter: end thread 009: wrote=0 by attempts=10281631
[ 67.909675] # test_readerwriter: end thread 008: wrote=0 by attempts=10266089
[ 67.910571] # test_readerwriter: end thread 007: wrote=0 by attempts=10278468
[ 67.911294] # test_readerwriter: end thread 006: wrote=0 by attempts=13344945
[ 67.912009] # test_readerwriter: end thread 005: wrote=160 by attempts=10290490
[ 67.912800] # test_readerwriter: end thread 004: wrote=247 by attempts=13351217
[ 67.913499] # test_readerwriter: end thread 003: wrote=294 by attempts=13337712
[ 67.914167] # test_readerwriter: end thread 002: wrote=281 by attempts=13373189
[ 67.914858] # test_readerwriter: end thread 001: wrote=439 by attempts=13283005
[ 67.915980] # test_readerwriter.speed: slow
[ 67.916068] not ok 1 test_readerwriter
[ 67.916930] not ok 1 printk-ringbuffer
with the mathing trace output:
kunit_try_catch-1487 [000] ..... 57.792815: _prb_read_valid: prb_read: Calling copy_data() for seq=1518
kunit_try_catch-1487 [000] ..... 57.792818: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (78 bytes)
kunit_try_catch-1487 [000] ..... 57.792819: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1518
kunit_try_catch-1487 [000] ..... 57.792821: _prb_read_valid: prb_read: Calling copy_data() for seq=1519
kunit_try_catch-1487 [000] ..... 57.792822: _prb_read_valid: copy_data: reading from 0xffffffffc0255520 (17 bytes)
kunit_try_catch-1487 [000] ..... 57.792823: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1519
kunit_try_catch-1487 [000] ..... 57.793011: _prb_read_valid: prb_read: Calling copy_data() for seq=1520
kunit_try_catch-1487 [000] ..... 57.793012: _prb_read_valid: copy_data: reading from 0xffffffffc0255558 (15 bytes)
kunit_try_catch-1487 [000] ..... 57.793013: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1520
kunit_try_catch-1487 [000] ..... 57.793182: _prb_read_valid: prb_read: Calling copy_data() for seq=1521
kunit_try_catch-1487 [000] ..... 57.793183: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (131 bytes)
kunit_try_catch-1487 [000] ..... 57.793184: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1521
kunit_try_catch-1487 [000] ..... 57.793185: _prb_read_valid: prb_read: Calling copy_data() for seq=1522
kunit_try_catch-1487 [000] ..... 57.793186: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (95 bytes)
kunit_try_catch-1487 [000] ..... 57.793186: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1522
kunit_try_catch-1487 [000] ..... 57.793187: _prb_read_valid: prb_read: Calling copy_data() for seq=1523
kunit_try_catch-1487 [000] ..... 57.793188: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (131 bytes)
kunit_try_catch-1487 [000] ..... 57.793189: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1523
kunit_try_catch-1487 [000] ..... 57.793190: _prb_read_valid: prb_read: Calling copy_data() for seq=1524
kunit_try_catch-1487 [000] ..... 57.793191: _prb_read_valid: copy_data: reading from 0xffffffffc0255530 (87 bytes)
kunit_try_catch-1487 [000] ..... 57.793192: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1524
kunit_try_catch-1487 [000] ..... 57.801868: _prb_read_valid: prb_read: Calling copy_data() for seq=1525
kunit_try_catch-1487 [000] ..... 57.801869: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (107 bytes)
kunit_try_catch-1487 [000] ..... 57.801870: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1525
kunit_try_catch-1487 [000] ..... 57.801872: _prb_read_valid: prb_read: Calling copy_data() for seq=1526
kunit_try_catch-1487 [000] ..... 57.801873: _prb_read_valid: copy_data: reading from 0xffffffffc0255558 (60 bytes)
kunit_try_catch-1487 [000] ..... 57.801873: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1526
kunit_try_catch-1487 [000] ..... 57.802067: _prb_read_valid: prb_read: Calling copy_data() for seq=1527
kunit_try_catch-1487 [000] ..... 57.802068: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (51 bytes)
kunit_try_catch-1487 [000] ..... 57.802068: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1527
kunit_try_catch-1487 [000] ..... 57.802070: _prb_read_valid: prb_read: Calling copy_data() for seq=1528
kunit_try_catch-1487 [000] ..... 57.802070: _prb_read_valid: copy_data: reading from 0xffffffffc0255540 (67 bytes)
kunit_try_catch-1487 [000] ..... 57.802071: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1528
kunit_try_catch-1487 [000] ..... 57.805307: _prb_read_valid: prb_read: Calling copy_data() for seq=1529
kunit_try_catch-1487 [000] ..... 57.805308: _prb_read_valid: copy_data: reading from 0xffffffffc0255508 (57 bytes)
kunit_try_catch-1487 [000] ..... 57.805309: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1529
kunit_try_catch-1487 [000] ..... 57.805505: _prb_read_valid: prb_read: Calling copy_data() for seq=1530
kunit_try_catch-1487 [000] ..... 57.805506: _prb_read_valid: copy_data: reading from 0xffffffffc02554c8 (45 bytes)
kunit_try_catch-1487 [000] ..... 57.805507: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1530
kunit_try_catch-1487 [000] ..... 57.805509: _prb_read_valid: prb_read: Calling copy_data() for seq=1531
kunit_try_catch-1487 [000] ..... 57.805509: _prb_read_valid: copy_data: reading from 0xffffffffc0255550 (109 bytes)
kunit_try_catch-1487 [000] ..... 57.805510: _prb_read_valid: _prb_read_valid: Successfully read record with seq=1531
It seems that the first successfully read record was OK.
BTW: I have added the trace_printk() "Calling copy_data()..."
because before I saw that the test failed more times than
_prb_read_valid() returned success. So, I was curious
whether it at least copied the data for the seq number.
BTW2: It seems the even writers had problems to reserve space this
time. Only few of them succeeded few times from many attempts.
I the numbers by this change:
--- a/kernel/printk/printk_ringbuffer_kunit_test.c
+++ b/kernel/printk/printk_ringbuffer_kunit_test.c
@@ -112,6 +112,7 @@ static int prbtest_writer(void *data)
struct prbtest_rbdata *dat;
u32 record_size, text_size;
unsigned long count = 0;
+ unsigned long success_count = 0;
struct printk_record r;
kunit_info(tr->test_data->test, "start thread %03lu (writer)\n", tr->num);
@@ -154,6 +155,7 @@ static int prbtest_writer(void *data)
prb_commit(&e);
wake_up_interruptible(&tr->test_data->new_record_wait);
+ success_count++;
}
if ((count++ & 0x3fff) == 0)
@@ -163,7 +165,9 @@ static int prbtest_writer(void *data)
break;
}
- kunit_info(tr->test_data->test, "end thread %03lu: wrote=%lu\n", tr->num, count);
+ kunit_info(tr->test_data->test,
+ "end thread %03lu: wrote=%lu by attempts=%lu\n",
+ tr->num, success_count, count);
return 0;
}
Best Regards,
Petr
Hi Petr, Summary: printk() is not in danger but we should correct a loose bounds check. On 2025-09-12, Petr Mladek <pmladek@suse.com> wrote: > Honestly, I would really like to limit the maximal record size to > 1/4 of the buffer size. I do not want to make the design more > complicated just to be able to fill just one record, definitely. So I was able to track this down. Your usage of DEFINE_PRINTKRB(test_rb, 4, 4); actually made it relatively easy because there are only 16 descriptors. All I needed to do was dump the descriptors before each reserve, between reserve and commit, after commit, and when reserve fails. This allowed me to easily see exactly how the ringbuffer is behaving. The problem can be reproduced with a single writer, no reader needed. Using #define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata)) provides a wild range of attempts that trigger the problem within about 20 write cycles. The problem comes from the function data_make_reusable(). The job of this function is to push the data_ring tail forward, one data block at a time, while setting the related descriptors to reusable. After pushing the tail forward, if it still has not pushed it far enough for new requested reservation, it must push it further. For this it _assumes the current position of the tail is a descriptor ID for the next data block_. But what if the tail was pushed all the way to the head? Then there is no next data block and it will read in garbage, thinking it is the next descriptor ID to set reusable. And from there it just goes crazy because it is reading garbage to determine how big the data block is so that it can continue pushing the tail (beyond the head!). Example: Assume the 96 byte ringbuffer has a single message of 64 bytes. Then we try to reserve space for a 72-byte message. data_make_reusable() will first set the descriptor of the 64-byte message to reusable and push the tail forward to index 64. But the new message needs 72 bytes, so data_make_reusable() will keep going and read the descriptor ID at index 64, but there is only random garbage at that position. 64 is the head and there is nothing valid after it. This situation can never happen for printk because of your 1/4 limit (MAX_LOG_TAKE_PART), although it is over-conservative. It is enough to limit messages to 1/2 of the data ring (with Daniil's series). Otherwise the limit must be "1/2 - sizeof(long)" to also leave room for the trailing ID of a wrapping data block. I am still positive about Daniil's series. And we should fix data_check_size() to be provide a proper limit as well as describe the critical relationship between data_check_size() and data_make_reusable(). I prefer not modify data_make_reusable() to handle this case. Currently data_make_reusable() does nothing with the head, so it would introduce new memory barriers. Also, the "push tail beyond head" scenario is a bit odd to handle. It is better just to document the assumption and put in the correct bounds checks. I will put together a patch without Daniil's series that improves the comments and puts a proper bound on data_check_size(). John Ogness
On Fri 2025-09-12 20:49:37, John Ogness wrote:
> Hi Petr,
>
> Summary: printk() is not in danger but we should correct a loose bounds
> check.
>
> On 2025-09-12, Petr Mladek <pmladek@suse.com> wrote:
> > Honestly, I would really like to limit the maximal record size to
> > 1/4 of the buffer size. I do not want to make the design more
> > complicated just to be able to fill just one record, definitely.
>
> So I was able to track this down. Your usage of
>
> DEFINE_PRINTKRB(test_rb, 4, 4);
>
> actually made it relatively easy because there are only 16
> descriptors. All I needed to do was dump the descriptors before each
> reserve, between reserve and commit, after commit, and when reserve
> fails. This allowed me to easily see exactly how the ringbuffer is
> behaving.
>
> The problem can be reproduced with a single writer, no reader
> needed. Using
>
> #define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
>
> provides a wild range of attempts that trigger the problem within about
> 20 write cycles.
>
> The problem comes from the function data_make_reusable(). The job of
> this function is to push the data_ring tail forward, one data block at a
> time, while setting the related descriptors to reusable.
>
> After pushing the tail forward, if it still has not pushed it far enough
> for new requested reservation, it must push it further. For this it
> _assumes the current position of the tail is a descriptor ID for the
> next data block_. But what if the tail was pushed all the way to the
> head? Then there is no next data block and it will read in garbage,
> thinking it is the next descriptor ID to set reusable. And from there it
> just goes crazy because it is reading garbage to determine how big the
> data block is so that it can continue pushing the tail (beyond the head!).
>
> Example: Assume the 96 byte ringbuffer has a single message of 64
> bytes. Then we try to reserve space for a 72-byte
> message. data_make_reusable() will first set the descriptor of the
> 64-byte message to reusable and push the tail forward to index 64. But
> the new message needs 72 bytes, so data_make_reusable() will keep going
> and read the descriptor ID at index 64, but there is only random garbage
> at that position. 64 is the head and there is nothing valid after it.
Great catch and example!
I wondered why data_make_reusable() needed to push the tail that far.
The buffer was empty after making the 64 bytes long message free.
My understanding is that it is combination of the following effects:
1. The message is wrapped.
2. The ring buffer does not support proper wrapping. Instead,
the non-sufficient space at the end of the buffer stays
unused (last wrap). And the messages will be written
from the beginning of the buffer (next wrap).
=> the message will occupy more space than expected
unused space from last wrap + full message size in new wrap
In our case:
+ size of the buffer: 96
+ unused space in old wrap: 96 - 64 = 32
+ occupied space in new wrap: 72
=> total occupied space: = 32 + 72 = 104 > 96
=> lpos passed to data_push_tail() is from a never used space
=> This is why data_push_tail() tries to read
descriptor from a never used space and reads a garbage
> This situation can never happen for printk because of your 1/4 limit
> (MAX_LOG_TAKE_PART), although it is over-conservative.
I would say that it is conservative. It would survive mistakes from the
off-by-one family, ... ;-)
And it is still far from practical limits. Because having this
powerful ring buffer for 1, 2, or 4 messages looks line an overkill.
> It is enough to limit messages to 1/2 of the data ring
> (with Daniil's series). Otherwise the limit must be
> "1/2 - sizeof(long)" to also leave room for the
> trailing ID of a wrapping data block.
I am not sure why it is important to push it to the limits.
That said, I could live with it. Especially how, when we
understood what happened.
> I am still positive about Daniil's series.
Yes, the patch which prevents wrapping for perfectly fitting messages
looks good to me.
> And we should fix
> data_check_size() to be provide a proper limit as well as describe the
> critical relationship between data_check_size() and
> data_make_reusable().
Yup.
> I prefer not modify data_make_reusable() to handle this case. Currently
> data_make_reusable() does nothing with the head, so it would introduce
> new memory barriers. Also, the "push tail beyond head" scenario is a bit
> odd to handle. It is better just to document the assumption and put in
> the correct bounds checks.
It might be possible to catch this in either in data_alloc().
or in get_next_lpos(). They could ignore/yell about when
the really occupied space would be bigger than DATA_SIZE(data_ring).
Something like:
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 17b741b2eccd..d7ba4c0d8c3b 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1056,8 +1056,16 @@ static char *data_alloc(struct printk_ringbuffer *rb, unsigned int size,
do {
next_lpos = get_next_lpos(data_ring, begin_lpos, size);
- if (!data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
- /* Failed to allocate, specify a data-less block. */
+ /*
+ * Double check that the really used space won't be bigger than
+ * the ring buffer. Wrapped messages need to reserve more space,
+ * see get_next_lpos.
+ *
+ * Specify a data-less block when the check or the allocation
+ * fails.
+ */
+ if (WARN_ON_ONCE(next_lpos - begin_lpos > DATA_SIZE(data_ring)) ||
+ !data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
blk_lpos->begin = FAILED_LPOS;
blk_lpos->next = FAILED_LPOS;
return NULL;
Similar check would need to be done also in data_realloc().
I am not sure if it is worth it. Maybe, we could rule this out
when we limit the allocated size to 1/2 or 1/4 of the ring buffer size.
Best Regards,
Petr
On 2025-09-15, Petr Mladek <pmladek@suse.com> wrote:
> It might be possible to catch this in either in data_alloc().
> or in get_next_lpos(). They could ignore/yell about when
> the really occupied space would be bigger than DATA_SIZE(data_ring).
>
> Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 17b741b2eccd..d7ba4c0d8c3b 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -1056,8 +1056,16 @@ static char *data_alloc(struct printk_ringbuffer *rb, unsigned int size,
> do {
> next_lpos = get_next_lpos(data_ring, begin_lpos, size);
>
> - if (!data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
> - /* Failed to allocate, specify a data-less block. */
> + /*
> + * Double check that the really used space won't be bigger than
> + * the ring buffer. Wrapped messages need to reserve more space,
> + * see get_next_lpos.
> + *
> + * Specify a data-less block when the check or the allocation
> + * fails.
> + */
> + if (WARN_ON_ONCE(next_lpos - begin_lpos > DATA_SIZE(data_ring)) ||
> + !data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
> blk_lpos->begin = FAILED_LPOS;
> blk_lpos->next = FAILED_LPOS;
> return NULL;
>
>
> Similar check would need to be done also in data_realloc().
I like this. It is an important sanity check and safe error handling in
case (for whatever reason) the data ring gets corrupted.
John
On 9/12/25 9:43 PM, John Ogness wrote: > Hi Petr, > > Summary: printk() is not in danger but we should correct a loose bounds > check. > > On 2025-09-12, Petr Mladek <pmladek@suse.com> wrote: >> Honestly, I would really like to limit the maximal record size to >> 1/4 of the buffer size. I do not want to make the design more >> complicated just to be able to fill just one record, definitely. > So I was able to track this down. Your usage of > > DEFINE_PRINTKRB(test_rb, 4, 4); > > actually made it relatively easy because there are only 16 > descriptors. All I needed to do was dump the descriptors before each > reserve, between reserve and commit, after commit, and when reserve > fails. This allowed me to easily see exactly how the ringbuffer is > behaving. > > The problem can be reproduced with a single writer, no reader > needed. Using > > #define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata)) > > provides a wild range of attempts that trigger the problem within about > 20 write cycles. > > The problem comes from the function data_make_reusable(). The job of > this function is to push the data_ring tail forward, one data block at a > time, while setting the related descriptors to reusable. > > After pushing the tail forward, if it still has not pushed it far enough > for new requested reservation, it must push it further. For this it > _assumes the current position of the tail is a descriptor ID for the > next data block_. But what if the tail was pushed all the way to the > head? Then there is no next data block and it will read in garbage, > thinking it is the next descriptor ID to set reusable. And from there it > just goes crazy because it is reading garbage to determine how big the > data block is so that it can continue pushing the tail (beyond the > head!). > > Example: Assume the 96 byte ringbuffer has a single message of 64 > bytes. Then we try to reserve space for a 72-byte > message. data_make_reusable() will first set the descriptor of the > 64-byte message to reusable and push the tail forward to index 64. But > the new message needs 72 bytes, so data_make_reusable() will keep going > and read the descriptor ID at index 64, but there is only random garbage > at that position. 64 is the head and there is nothing valid after it. Good catch, although I'm not sure i understand why this produces the error Petr is seeing. Why would it see garbage in a zeroed .bss buffer? Is this because of the previous test runs dirtying the same data ring or something? At least in my tests it seems like it would try to free a descriptor with an id of 0, which would result in a "desc_miss", so it would basically fail to allocate anything. Either way, I guess after your patch is accepted i'm going to resend mine to only strip the trailing id, but the records must still be less than half of the data ring size so that data_make_reusable never invalidates past the current head. > This situation can never happen for printk because of your 1/4 limit > (MAX_LOG_TAKE_PART), although it is over-conservative. It is enough to > limit messages to 1/2 of the data ring (with Daniil's series). Otherwise > the limit must be "1/2 - sizeof(long)" to also leave room for the > trailing ID of a wrapping data block. > > I am still positive about Daniil's series. And we should fix > data_check_size() to be provide a proper limit as well as describe the > critical relationship between data_check_size() and > data_make_reusable(). > > I prefer not modify data_make_reusable() to handle this case. Currently > data_make_reusable() does nothing with the head, so it would introduce > new memory barriers. Also, the "push tail beyond head" scenario is a bit > odd to handle. It is better just to document the assumption and put in > the correct bounds checks. > > I will put together a patch without Daniil's series that improves the > comments and puts a proper bound on data_check_size(). > > John Ogness
On 2025-09-13, Daniil Tatianin <d-tatianin@yandex-team.ru> wrote: >> The problem comes from the function data_make_reusable(). The job of >> this function is to push the data_ring tail forward, one data block at a >> time, while setting the related descriptors to reusable. >> >> After pushing the tail forward, if it still has not pushed it far enough >> for new requested reservation, it must push it further. For this it >> _assumes the current position of the tail is a descriptor ID for the >> next data block_. But what if the tail was pushed all the way to the >> head? Then there is no next data block and it will read in garbage, >> thinking it is the next descriptor ID to set reusable. And from there it >> just goes crazy because it is reading garbage to determine how big the >> data block is so that it can continue pushing the tail (beyond the >> head!). >> >> Example: Assume the 96 byte ringbuffer has a single message of 64 >> bytes. Then we try to reserve space for a 72-byte >> message. data_make_reusable() will first set the descriptor of the >> 64-byte message to reusable and push the tail forward to index 64. But >> the new message needs 72 bytes, so data_make_reusable() will keep going >> and read the descriptor ID at index 64, but there is only random garbage >> at that position. 64 is the head and there is nothing valid after it. > > Good catch, although I'm not sure i understand why this produces the > error Petr is seeing. > > Why would it see garbage in a zeroed .bss buffer? Is this because of > the previous test runs dirtying the same data ring or something? Correct. The explosions don't start happening until after about 5-6 wraps. So the data ring is full of somewhat random data that will then be randomly interpretted once it attempts to push the tail beyond the head. > Either way, I guess after your patch is accepted i'm going to resend > mine to only strip the trailing id, but the records must still be less > than half of the data ring size so that data_make_reusable never > invalidates past the current head. After applying your patch, can you provide an example where a maximum size of exactly half causes the tail to be pushed beyond the head? Keep in mind that data_check_size() accounts for the meta-data. It only doesn't account for the extra ID on wrapping data blocks. John
On 9/14/25 12:23 PM, John Ogness wrote: > On 2025-09-13, Daniil Tatianin <d-tatianin@yandex-team.ru> wrote: >>> The problem comes from the function data_make_reusable(). The job of >>> this function is to push the data_ring tail forward, one data block at a >>> time, while setting the related descriptors to reusable. >>> >>> After pushing the tail forward, if it still has not pushed it far enough >>> for new requested reservation, it must push it further. For this it >>> _assumes the current position of the tail is a descriptor ID for the >>> next data block_. But what if the tail was pushed all the way to the >>> head? Then there is no next data block and it will read in garbage, >>> thinking it is the next descriptor ID to set reusable. And from there it >>> just goes crazy because it is reading garbage to determine how big the >>> data block is so that it can continue pushing the tail (beyond the >>> head!). >>> >>> Example: Assume the 96 byte ringbuffer has a single message of 64 >>> bytes. Then we try to reserve space for a 72-byte >>> message. data_make_reusable() will first set the descriptor of the >>> 64-byte message to reusable and push the tail forward to index 64. But >>> the new message needs 72 bytes, so data_make_reusable() will keep going >>> and read the descriptor ID at index 64, but there is only random garbage >>> at that position. 64 is the head and there is nothing valid after it. >> Good catch, although I'm not sure i understand why this produces the >> error Petr is seeing. >> >> Why would it see garbage in a zeroed .bss buffer? Is this because of >> the previous test runs dirtying the same data ring or something? > Correct. The explosions don't start happening until after about 5-6 > wraps. So the data ring is full of somewhat random data that will then > be randomly interpretted once it attempts to push the tail beyond the > head. Ah ok, yeah that makes sense. > >> Either way, I guess after your patch is accepted i'm going to resend >> mine to only strip the trailing id, but the records must still be less >> than half of the data ring size so that data_make_reusable never >> invalidates past the current head. > After applying your patch, can you provide an example where a maximum > size of exactly half causes the tail to be pushed beyond the head? Keep > in mind that data_check_size() accounts for the meta-data. It only > doesn't account for the extra ID on wrapping data blocks. Sorry, I think exactly half is fine, basically we can keep it half, but only remove the tailing id check with my patch. > > John
On 2025-09-14, Daniil Tatianin <d-tatianin@yandex-team.ru> wrote:
>> After applying your patch, can you provide an example where a maximum
>> size of exactly half causes the tail to be pushed beyond the head? Keep
>> in mind that data_check_size() accounts for the meta-data. It only
>> doesn't account for the extra ID on wrapping data blocks.
>
> Sorry, I think exactly half is fine, basically we can keep it half, but
> only remove the tailing id check with my patch.
I have been investigating this further. Even _without_ your patches, I
cannot find (either by using my brain or through testing) a problem with
limiting it to exactly half:
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index bc811de18316b..9d47c1b94b71f 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
*/
static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
{
- struct prb_data_block *db = NULL;
-
if (size == 0)
return true;
@@ -408,11 +406,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
* array. The largest possible data block must still leave room for
* at least the ID of the next block.
*/
- size = to_blk_size(size);
- if (size > DATA_SIZE(data_ring) - sizeof(db->id))
- return false;
-
- return true;
+ return (to_blk_size(size) <= (DATA_SIZE(data_ring) / 2));
}
/* Query the state of a descriptor. */
When invalidating a data block (pushing the tail) it only must be
certain that the newly created space is large enough to fit the new data
block.
With a maximum of half, a new non-wrapping data block will always
fit. If it is a wrapping data block the worst case is if it is maximally
sized and ends exactly at the end of the array. In the case, it is
placed at index 0. But there it will only free up until the head
value. (If the head value was less, the data block would not have
wrapped.)
Your series handles the "ends exactly at the end of the array" case by
avoiding the need to wrap and thus invalidate up to half the
ringbuffer. But your series does not affect the maximum record size.
I will submit an official patch that also improves the comments to
clarify exactly why the limit exists.
@Petr: I am fine with you keeping our 1/4 limit in printk.c. But I would
like the ringbuffer code to be exactly proper here.
John
On 9/15/25 6:07 PM, John Ogness wrote:
> On 2025-09-14, Daniil Tatianin <d-tatianin@yandex-team.ru> wrote:
>>> After applying your patch, can you provide an example where a maximum
>>> size of exactly half causes the tail to be pushed beyond the head? Keep
>>> in mind that data_check_size() accounts for the meta-data. It only
>>> doesn't account for the extra ID on wrapping data blocks.
>> Sorry, I think exactly half is fine, basically we can keep it half, but
>> only remove the tailing id check with my patch.
> I have been investigating this further. Even _without_ your patches, I
> cannot find (either by using my brain or through testing) a problem with
> limiting it to exactly half:
>
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index bc811de18316b..9d47c1b94b71f 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
> */
> static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> {
> - struct prb_data_block *db = NULL;
> -
> if (size == 0)
> return true;
>
> @@ -408,11 +406,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> * array. The largest possible data block must still leave room for
> * at least the ID of the next block.
> */
> - size = to_blk_size(size);
> - if (size > DATA_SIZE(data_ring) - sizeof(db->id))
> - return false;
> -
> - return true;
> + return (to_blk_size(size) <= (DATA_SIZE(data_ring) / 2));
> }
>
> /* Query the state of a descriptor. */
>
> When invalidating a data block (pushing the tail) it only must be
> certain that the newly created space is large enough to fit the new data
> block.
>
> With a maximum of half, a new non-wrapping data block will always
> fit. If it is a wrapping data block the worst case is if it is maximally
> sized and ends exactly at the end of the array. In the case, it is
> placed at index 0. But there it will only free up until the head
> value. (If the head value was less, the data block would not have
> wrapped.)
>
> Your series handles the "ends exactly at the end of the array" case by
> avoiding the need to wrap and thus invalidate up to half the
> ringbuffer. But your series does not affect the maximum record size.
Yeah, I think you're right. I guess the second patch can be dropped now,
which is even better.
>
> I will submit an official patch that also improves the comments to
> clarify exactly why the limit exists.
>
> @Petr: I am fine with you keeping our 1/4 limit in printk.c. But I would
> like the ringbuffer code to be exactly proper here.
>
> John
On Mon 2025-09-15 17:13:03, John Ogness wrote:
> On 2025-09-14, Daniil Tatianin <d-tatianin@yandex-team.ru> wrote:
> >> After applying your patch, can you provide an example where a maximum
> >> size of exactly half causes the tail to be pushed beyond the head? Keep
> >> in mind that data_check_size() accounts for the meta-data. It only
> >> doesn't account for the extra ID on wrapping data blocks.
> >
> > Sorry, I think exactly half is fine, basically we can keep it half, but
> > only remove the tailing id check with my patch.
>
> I have been investigating this further. Even _without_ your patches, I
> cannot find (either by using my brain or through testing) a problem with
> limiting it to exactly half:
>
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index bc811de18316b..9d47c1b94b71f 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
> */
> static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> {
> - struct prb_data_block *db = NULL;
> -
> if (size == 0)
> return true;
>
> @@ -408,11 +406,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> * array. The largest possible data block must still leave room for
> * at least the ID of the next block.
> */
> - size = to_blk_size(size);
> - if (size > DATA_SIZE(data_ring) - sizeof(db->id))
> - return false;
> -
> - return true;
> + return (to_blk_size(size) <= (DATA_SIZE(data_ring) / 2));
> }
>
> /* Query the state of a descriptor. */
>
> When invalidating a data block (pushing the tail) it only must be
> certain that the newly created space is large enough to fit the new data
> block.
>
> With a maximum of half, a new non-wrapping data block will always
> fit. If it is a wrapping data block the worst case is if it is maximally
> sized and ends exactly at the end of the array. In the case, it is
> placed at index 0. But there it will only free up until the head
> value. (If the head value was less, the data block would not have
> wrapped.)
I could confirm this by my findings. By other words, a record which
needs half of the ring buffer will always fit into the ring buffer
even when it is wrapped and occupies also some unused space from
the previous wrap.
In the worst case, the unused space might be half of the ring buffer.
In this case, it would be the perfectly fitting record which would
be wrapped now. But it would be stored in the 2nd half of the ring
buffer with fix by the 1st patch from this patchset.
By other words, it would be enough to invalidate all existing
entries. data_alloc()/data_realloc() would never ask
data_push_tail() to push tail_lpos in front of head_lpos.
> Your series handles the "ends exactly at the end of the array" case by
> avoiding the need to wrap and thus invalidate up to half the
> ringbuffer. But your series does not affect the maximum record size.
>
> I will submit an official patch that also improves the comments to
> clarify exactly why the limit exists.
>
> @Petr: I am fine with you keeping our 1/4 limit in printk.c. But I would
> like the ringbuffer code to be exactly proper here.
Fair enough.
To make it complete. I wondered about the reader API. But it seems
to be safe as well.
The reader API seems to be safe as long as there is at least
one descriptor in reserved state (empty ring buffer is handled
special way).
And prb_reserve() _does reserve_ a descriptor before allocating
the data. It means that a descriptor is in a reserved state
before other descriptors are made reusable to free the space.
It means that at least one descriptor is in a reserved state
even when others are reusable. And it means that _prb_read_valid()
exits in this descriptor because prb_read() returns -EINVAL...
Best Regards,
Petr
On Fri 2025-09-12 16:49:53, Petr Mladek wrote:
> On Fri 2025-09-12 11:55:02, Petr Mladek wrote:
> > On Fri 2025-09-12 11:25:09, Petr Mladek wrote:
> > > On Thu 2025-09-11 18:18:32, John Ogness wrote:
> > > > On 2025-09-11, Petr Mladek <pmladek@suse.com> wrote:
> > > > > diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > index 2282348e869a..241f7ef49ac6 100644
> > > > > --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> > > > > char text[] __counted_by(size);
> > > > > };
> > > > >
> > > > > -#define MAX_RBDATA_TEXT_SIZE 0x80
> > > > > +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
> > > >
> > > > I guess this should be:
> > > >
> > > > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> > >
> > > Great catch!
> > >
> > > But the KUnit test fails even with this change, see below. And I am
> > > not surprised. The test should work even with larger-than-allowed
> > > messages. prbtest_writer() should skip then because prb_reserve()
> > > should fail.
> > >
> > > Here is test result with:
> > >
> > > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> > > #define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
> > >
> > > DEFINE_PRINTKRB(test_rb, 4, 4);
> > >
> > > and with this patchset reverted, aka, sources from
> > > printk/linux.git, branch for-next:
> > >
> > > It is well reproducible. It always fails after reading few records.
> > > Here are results from few other runs:
> >
> > And I am not longer able to reproduce it after limiting the size
> > of the record to 1/4 of the data buffer size. I did it with
> > the following change:
> >
> > diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> > index bc811de18316..2f02254705aa 100644
> > --- a/kernel/printk/printk_ringbuffer.c
> > +++ b/kernel/printk/printk_ringbuffer.c
> > @@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
> > */
> > static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> > {
> > - struct prb_data_block *db = NULL;
> > -
> > if (size == 0)
> > return true;
> >
> > @@ -409,7 +407,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> > * at least the ID of the next block.
> > */
> > size = to_blk_size(size);
> > - if (size > DATA_SIZE(data_ring) - sizeof(db->id))
> > + if (size > DATA_SIZE(data_ring) / 4)
> > return false;
> >
> > return true;
> >
> >
> > I guess that there is a race when we need to make all existing records
> > reusable when making space for the next one.
>
> It looks to me that the reader API is not ready to handle the
> situation when all records are "reusable".
>
> At least, it looks like prb_next_seq() might end up in an "infinite"
> loop because it blindly increments "seq" until finding a valid record...
>
> Honestly, I would really like to limit the maximal record size to
> 1/4 of the buffer size. I do not want to make the design more
> complicated just to be able to fill just one record, definitely.
>
>
> That said, I still a bit nervous because I do not understand why
> the KUnit test fails. It does not depend on prb_next_seq().
>
> It seems that prb_read_valid() retuns success even when
> returning a garbage. I have added some debug output using
> trace_printk():
>
> BTW2: It seems the even writers had problems to reserve space this
> time. Only few of them succeeded few times from many attempts.
> I the numbers by this change:
This actually helped. It seems that even prb_read_valid() is blindly
incrementing seq when the last entry is reusable
(desc_read_finalized_seq() return -ENOENT).
And the failed entries are entries which have not been written.
Here is another output where it is easier to see:
[ 241.991189] KTAP version 1
[ 241.991511] # Subtest: printk-ringbuffer
[ 241.991912] # module: printk_ringbuffer_kunit_test
[ 241.991941] 1..1
[ 241.993854] # test_readerwriter: running for 10000 ms
[ 241.995189] # test_readerwriter: start thread 001 (writer)
[ 241.995451] # test_readerwriter: start thread 002 (writer)
[ 241.995722] # test_readerwriter: start thread 003 (writer)
[ 241.997300] # test_readerwriter: start thread 004 (writer)
[ 241.997947] # test_readerwriter: start thread 005 (writer)
[ 241.998351] # test_readerwriter: start thread 006 (writer)
[ 241.998686] # test_readerwriter: start thread 007 (writer)
[ 241.998971] # test_readerwriter: start thread 008 (writer)
[ 242.001305] # test_readerwriter: start thread 009 (writer)
[ 242.001984] # test_readerwriter: start thread 010 (writer)
[ 242.002680] # test_readerwriter: start thread 011 (writer)
[ 242.002686] # test_readerwriter: starting test
[ 242.009239] # test_readerwriter: start reader
[ 242.009940] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=485 size=4325444 text=
[ 242.011676] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=487 size=1145324612 text=
[ 242.013569] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=490 size=4325444 text=
[ 242.015316] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=491 size=61 text=DDDDDDDDDDDDDDDDDDDBBBBBBBBBBB
[ 242.017483] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=492 size=1145324612 text=
[ 252.221105] # test_readerwriter: end reader: read=14 seq=496
[ 252.221918] # test_readerwriter: completed test
[ 252.225842] # test_readerwriter: end thread 011: wrote=0 by attempts=9760048
[ 252.226808] # test_readerwriter: end thread 010: wrote=0 by attempts=9764865
[ 252.227766] # test_readerwriter: end thread 009: wrote=0 by attempts=9768991
[ 252.228507] # test_readerwriter: end thread 008: wrote=0 by attempts=12716826
[ 252.229283] # test_readerwriter: end thread 007: wrote=0 by attempts=12674260
[ 252.230046] # test_readerwriter: end thread 006: wrote=0 by attempts=9769229
[ 252.230880] # test_readerwriter: end thread 005: wrote=0 by attempts=12716512
[ 252.231639] # test_readerwriter: end thread 004: wrote=0 by attempts=12627682
[ 252.232390] # test_readerwriter: end thread 003: wrote=364 by attempts=9766282
[ 252.233222] # test_readerwriter: end thread 002: wrote=0 by attempts=12710049
[ 252.233970] # test_readerwriter: end thread 001: wrote=123 by attempts=9786928
[ 252.234758] # test_readerwriter.speed: slow
[ 252.234792] not ok 1 test_readerwriter
Only two write threads succeed in writing few messages:
264 + 123 = 487
And the failed records have sequence numbers: 485, 487, 490, 491, 492.
I am not sure about 485 and 487. But the other three read entries
were never written.
It is possible the 485 and 487 entries were read before they were
written. I mean that the reader failed before writers failed.
I am not sure why writers failed. But I guess that they also
somehow depend on the fact that at least one oldest entry is
always finalized. I vaguely recall that we operated with this
condition at some point.
Best Regards,
Petr
On Thu 2025-09-11 17:30:36, Petr Mladek wrote:
> On Fri 2025-09-05 17:41:50, Daniil Tatianin wrote:
> > This series fixes the issue where data blocks would wrap in cases where the last
> > data block perfectly fits the ring. This caused whatever was at the beginning of
> > the ring to get discarded in this case, and the data block would get put there
> > even though it could be put at the end of the data ring just fine without
> > discarding anything.
> >
> > Fixing this issue also allows to simplify the check in data_check_size,
> > previously it would ensure there's space for a trailing id, which we
> > don't need anymore.
> >
> > v0->v1:
> > - Fix severely broken code alignment
> >
> > v1->v2:
> > - Rename & invert get_next_lpos -> is_blk_wrapped
> > - Add a new commit for changing the logic in data_check_size
>
> The patchset looks good to me. But I wanted to do some tests
> and it failed. I did the following:
>
> 1. Applied this patchset on top of printk/linux.git, branch for-next,
> https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git/log/?h=for-next
>
> I this branch because it contains a new KUnit test for the printk
> ring buffer.
The KUnit test fails even without this patchset, see below.
> 2. I applied the following patch:
>
> + It reduces the size of the data ring. If I count it correctly
> it should be 256 (2 << 8).
>
> + It increases the maximal size of the text so that the maximal
> record size is 256.
>
> 3. I built it with Kasan enabled:
>
> $> grep KASAN .config
> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> CONFIG_HAVE_ARCH_KASAN=y
> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> CONFIG_CC_HAS_KASAN_GENERIC=y
> CONFIG_CC_HAS_KASAN_SW_TAGS=y
> CONFIG_KASAN=y
> CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
> CONFIG_KASAN_GENERIC=y
> # CONFIG_KASAN_OUTLINE is not set
> CONFIG_KASAN_INLINE=y
> CONFIG_KASAN_STACK=y
> CONFIG_KASAN_VMALLOC=y
> # CONFIG_KASAN_KUNIT_TEST is not set
> # CONFIG_KASAN_EXTRA_INFO is not set
>
>
> 4. I loaded the test module:
>
> # depmod
> # modprobe printk_ringbuffer_kunit_test
>
> I am not sure if it is caused by this patchset or
Hmm, the KUnit test fails even after reverting this patchset.
I havn't hit the WARN_ON_ONCE() in get_data() yet.
But the data seem to be corrupted.
This is from two runs:
[ 45.808363] printk_ringbuffer_kunit_test: module verification failed: signature and/or required key missing - tainting kernel
[ 45.820389] KTAP version 1
[ 45.820682] 1..1
[ 45.822088] KTAP version 1
[ 45.822387] # Subtest: printk-ringbuffer
[ 45.822729] # module: printk_ringbuffer_kunit_test
[ 45.822759] 1..1
[ 45.824001] # test_readerwriter: running for 10000 ms
[ 45.825762] # test_readerwriter: start thread 001 (writer)
[ 45.826187] # test_readerwriter: start thread 002 (writer)
[ 45.826607] # test_readerwriter: start thread 003 (writer)
[ 45.826881] # test_readerwriter: start thread 004 (writer)
[ 45.827193] # test_readerwriter: start thread 005 (writer)
[ 45.827512] # test_readerwriter: start thread 006 (writer)
[ 45.827724] # test_readerwriter: start thread 007 (writer)
[ 45.827951] # test_readerwriter: start thread 008 (writer)
[ 45.828200] # test_readerwriter: start thread 009 (writer)
[ 45.828353] # test_readerwriter: start thread 010 (writer)
[ 45.828548] # test_readerwriter: starting test
[ 45.828553] # test_readerwriter: start thread 011 (writer)
[ 45.828575] # test_readerwriter: start reader
[ 45.828662] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=15 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\x80O\x81\x88\xff\xff
[ 45.835013] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=16 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\x01
[ 45.837261] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=17 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\x01
[ 45.837415] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=18 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.841872] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=19 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.844515] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=20 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.844663] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=21 size=1111638594 text=
[ 45.844816] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=22 size=1111638594 text=
[ 45.849842] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=23 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.849990] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=24 size=1111638594 text=
[ 55.841113] # test_readerwriter: end reader: read=14 seq=28
[ 55.841869] # test_readerwriter: completed test
[ 55.846197] # test_readerwriter: end thread 011: wrote=15338812
[ 55.847027] # test_readerwriter: end thread 010: wrote=13113829
[ 55.847838] # test_readerwriter: end thread 009: wrote=16786746
[ 55.848469] # test_readerwriter: end thread 008: wrote=14555662
[ 55.849240] # test_readerwriter: end thread 007: wrote=13117536
[ 55.849873] # test_readerwriter: end thread 006: wrote=13147885
[ 55.850478] # test_readerwriter: end thread 005: wrote=16799892
[ 55.851085] # test_readerwriter: end thread 004: wrote=14468908
[ 55.851692] # test_readerwriter: end thread 003: wrote=16795798
[ 55.852311] # test_readerwriter: end thread 002: wrote=15430923
[ 55.852918] # test_readerwriter: end thread 001: wrote=13106357
[ 55.853645] # test_readerwriter.speed: slow
[ 55.853681] not ok 1 test_readerwriter
[ 55.854080] not ok 1 printk-ringbuffer
[ 202.098871] KTAP version 1
[ 202.099389] # Subtest: printk-ringbuffer
[ 202.099901] # module: printk_ringbuffer_kunit_test
[ 202.100039] 1..1
[ 202.103458] # test_readerwriter: running for 10000 ms
[ 202.105324] # test_readerwriter: start thread 001 (writer)
[ 202.106870] # test_readerwriter: start thread 002 (writer)
[ 202.107413] # test_readerwriter: start thread 003 (writer)
[ 202.107805] # test_readerwriter: start thread 004 (writer)
[ 202.108256] # test_readerwriter: start thread 005 (writer)
[ 202.108486] # test_readerwriter: start thread 006 (writer)
[ 202.108777] # test_readerwriter: start thread 007 (writer)
[ 202.109017] # test_readerwriter: start thread 008 (writer)
[ 202.109608] # test_readerwriter: start thread 009 (writer)
[ 202.110028] # test_readerwriter: starting test
[ 202.110039] # test_readerwriter: start thread 011 (writer)
[ 202.110054] # test_readerwriter: start reader
[ 202.110138] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=5 size=1111638594 text=
[ 202.110279] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=8 size=1111638594 text=
[ 202.110407] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=9 size=94 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 202.110792] # test_readerwriter: start thread 010 (writer)
[ 202.111323] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=10 size=1111638594 text=
[ 202.120582] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=12 size=1111638594 text=
[ 212.257479] # test_readerwriter: end reader: read=12 seq=16
[ 212.258233] # test_readerwriter: completed test
[ 212.259613] # test_readerwriter: end thread 011: wrote=13062252
[ 212.260418] # test_readerwriter: end thread 010: wrote=13059061
[ 212.261766] # test_readerwriter: end thread 009: wrote=16683360
[ 212.262379] # test_readerwriter: end thread 008: wrote=16680568
[ 212.263016] # test_readerwriter: end thread 007: wrote=16687096
[ 212.263644] # test_readerwriter: end thread 006: wrote=16646963
[ 212.264265] # test_readerwriter: end thread 005: wrote=16708452
[ 212.264878] # test_readerwriter: end thread 004: wrote=13065967
[ 212.265531] # test_readerwriter: end thread 003: wrote=13090840
[ 212.266098] # test_readerwriter: end thread 002: wrote=13074259
[ 212.266695] # test_readerwriter: end thread 001: wrote=13083747
[ 212.267471] # test_readerwriter.speed: slow
[ 212.267502] not ok 1 test_readerwriter
[ 212.268136] not ok 1 printk-ringbuffer
It is interesting that it did read the same size=XXX values for
several seq= numbers. It was 1111638594 in two independent runs.
1st run:
BAD RECORD: seq=15 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\x80O\x81\x88\xff\xff
[ 45.835013] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=16 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\x01
[ 45.837261] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=17 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\x01
[ 45.837415] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=18 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.841872] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=19 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.844515] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=20 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.844663] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=21 size=1111638594 text=
[ 45.844816] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=22 size=1111638594 text=
[ 45.849842] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=23 size=217 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 45.849990] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=24 size=1111638594 text=
2nd run: triggered by: echo 1 > /sys/kernel/debug/kunit/printk-ringbuffer/run
BAD RECORD: seq=5 size=1111638594 text=
[ 202.110279] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=8 size=1111638594 text=
[ 202.110407] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=9 size=94 text=BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
[ 202.110792] # test_readerwriter: start thread 010 (writer)
[ 202.111323] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=10 size=1111638594 text=
[ 202.120582] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=12 size=1111638594 text=
Best Regards,
Petr
On Thu 2025-09-11 17:55:49, Petr Mladek wrote:
> On Thu 2025-09-11 17:30:36, Petr Mladek wrote:
> > On Fri 2025-09-05 17:41:50, Daniil Tatianin wrote:
> > > This series fixes the issue where data blocks would wrap in cases where the last
> > > data block perfectly fits the ring. This caused whatever was at the beginning of
> > > the ring to get discarded in this case, and the data block would get put there
> > > even though it could be put at the end of the data ring just fine without
> > > discarding anything.
> > >
> > > Fixing this issue also allows to simplify the check in data_check_size,
> > > previously it would ensure there's space for a trailing id, which we
> > > don't need anymore.
> > >
> > > v0->v1:
> > > - Fix severely broken code alignment
> > >
> > > v1->v2:
> > > - Rename & invert get_next_lpos -> is_blk_wrapped
> > > - Add a new commit for changing the logic in data_check_size
> >
> > The patchset looks good to me. But I wanted to do some tests
> > and it failed. I did the following:
> >
> > 1. Applied this patchset on top of printk/linux.git, branch for-next,
> > https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git/log/?h=for-next
> >
> > I this branch because it contains a new KUnit test for the printk
> > ring buffer.
>
> The KUnit test fails even without this patchset, see below.
>
> > 2. I applied the following patch:
> >
> > + It reduces the size of the data ring. If I count it correctly
> > it should be 256 (2 << 8).
> >
> > + It increases the maximal size of the text so that the maximal
> > record size is 256.
> >
> > 3. I built it with Kasan enabled:
> >
> > $> grep KASAN .config
> > CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> > CONFIG_HAVE_ARCH_KASAN=y
> > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> > CONFIG_CC_HAS_KASAN_GENERIC=y
> > CONFIG_CC_HAS_KASAN_SW_TAGS=y
> > CONFIG_KASAN=y
> > CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
> > CONFIG_KASAN_GENERIC=y
> > # CONFIG_KASAN_OUTLINE is not set
> > CONFIG_KASAN_INLINE=y
> > CONFIG_KASAN_STACK=y
> > CONFIG_KASAN_VMALLOC=y
> > # CONFIG_KASAN_KUNIT_TEST is not set
> > # CONFIG_KASAN_EXTRA_INFO is not set
> >
> >
> > 4. I loaded the test module:
> >
> > # depmod
> > # modprobe printk_ringbuffer_kunit_test
> >
> > I am not sure if it is caused by this patchset or
>
> Hmm, the KUnit test fails even after reverting this patchset.
I do not longer [*] see the problem with the following change in the
original code:
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index bc811de18316..ff93c4a079f7 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -409,7 +409,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
* at least the ID of the next block.
*/
size = to_blk_size(size);
- if (size > DATA_SIZE(data_ring) - sizeof(db->id))
+ if (size > (DATA_SIZE(data_ring) / 4))
return false;
return true;
[*] I did 10 runs on the Knuit test. The problem was always
reproducible before.
I hope that the race happens only when one record uses more than
half of the data ring. Or something like this. So that printk()
could not meet it in practice.
Sigh.
Best Regards,
Petr
© 2016 - 2026 Red Hat, Inc.