On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>
> Hello,
>
> for this change, we reported
> "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> in
> https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>
> at that time, we made some tests with x86_64 config which runs well.
>
> now we noticed the commit is in mainline now.
> the config still has expected diff with parent:
>
> --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> CONFIG_TEST_MISC_MINOR=m
> # CONFIG_TEST_LKM is not set
> CONFIG_TEST_BITOPS=m
> -CONFIG_TEST_VMALLOC=m
> +CONFIG_TEST_VMALLOC=y
> # CONFIG_TEST_BPF is not set
> CONFIG_FIND_BIT_BENCHMARK=m
> # CONFIG_TEST_FIRMWARE is not set
>
>
> then we noticed similar random issue with x86_64 randconfig this time.
>
> 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> :199 34% 67:200 dmesg.Mem-Info
> :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> :199 34% 67:200 dmesg.RIP:down_read_trylock
>
> we don't have enough knowledge to understand the relationship between code
> change and the random issues. just report what we obsverved in our tests FYI.
>
I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
memory allocation fails show_mem() would invoke alloc_tag_top_users.
With following configuration:
CONFIG_TEST_VMALLOC=y
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
a NULL deference because alloc_tag_cttype was not init yet.
I add some debug to confirm this theory
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d48b80f3f007..9b8e7501010f 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag *ct;
struct codetag_bytes n;
unsigned int i, nr = 0;
+ pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
+ return 0;
if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
@@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
shutdown_mem_profiling(true);
return PTR_ERR(alloc_tag_cttype);
}
+ pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
return 0;
}
When bootup the kernel, the log shows:
$ sudo dmesg -T | grep profiling
[Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
[Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
or mem_show() should check whether alloc_tag is done initialized when calling
alloc_tag_top_users
David
On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: > > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: > > > > Hello, > > > > for this change, we reported > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" > > in > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/ > > > > at that time, we made some tests with x86_64 config which runs well. > > > > now we noticed the commit is in mainline now. > > > the config still has expected diff with parent: > > > > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 > > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 > > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m > > CONFIG_TEST_MISC_MINOR=m > > # CONFIG_TEST_LKM is not set > > CONFIG_TEST_BITOPS=m > > -CONFIG_TEST_VMALLOC=m > > +CONFIG_TEST_VMALLOC=y > > # CONFIG_TEST_BPF is not set > > CONFIG_FIND_BIT_BENCHMARK=m > > # CONFIG_TEST_FIRMWARE is not set > > > > > > then we noticed similar random issue with x86_64 randconfig this time. > > > > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 > > ---------------- --------------------------- > > fail:runs %reproduction fail:runs > > | | | > > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] > > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception > > :199 34% 67:200 dmesg.Mem-Info > > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN > > :199 34% 67:200 dmesg.RIP:down_read_trylock > > > > we don't have enough knowledge to understand the relationship between code > > change and the random issues. just report what we obsverved in our tests FYI. > > > > I think this is caused by a race between vmalloc_test_init and alloc_tag_init. > > vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when > memory allocation fails show_mem() would invoke alloc_tag_top_users. > > With following configuration: > > CONFIG_TEST_VMALLOC=y > CONFIG_MEM_ALLOC_PROFILING=y > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y > > If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause > a NULL deference because alloc_tag_cttype was not init yet. > > I add some debug to confirm this theory > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > index d48b80f3f007..9b8e7501010f 100644 > --- a/lib/alloc_tag.c > +++ b/lib/alloc_tag.c > @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl > struct codetag *ct; > struct codetag_bytes n; > unsigned int i, nr = 0; > + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); > + return 0; > > if (can_sleep) > codetag_lock_module_list(alloc_tag_cttype, true); > @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) > shutdown_mem_profiling(true); > return PTR_ERR(alloc_tag_cttype); > } > + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); > > return 0; > } > > When bootup the kernel, the log shows: > > $ sudo dmesg -T | grep profiling > [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL > [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 > > > vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, > or mem_show() should check whether alloc_tag is done initialized when calling > alloc_tag_top_users Thanks for reporting! So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ will address this issue as well. Is that correct? > > > > David >
At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote: >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: >> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: >> > >> > Hello, >> > >> > for this change, we reported >> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" >> > in >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/ >> > >> > at that time, we made some tests with x86_64 config which runs well. >> > >> > now we noticed the commit is in mainline now. >> >> > the config still has expected diff with parent: >> > >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m >> > CONFIG_TEST_MISC_MINOR=m >> > # CONFIG_TEST_LKM is not set >> > CONFIG_TEST_BITOPS=m >> > -CONFIG_TEST_VMALLOC=m >> > +CONFIG_TEST_VMALLOC=y >> > # CONFIG_TEST_BPF is not set >> > CONFIG_FIND_BIT_BENCHMARK=m >> > # CONFIG_TEST_FIRMWARE is not set >> > >> > >> > then we noticed similar random issue with x86_64 randconfig this time. >> > >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 >> > ---------------- --------------------------- >> > fail:runs %reproduction fail:runs >> > | | | >> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] >> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception >> > :199 34% 67:200 dmesg.Mem-Info >> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN >> > :199 34% 67:200 dmesg.RIP:down_read_trylock >> > >> > we don't have enough knowledge to understand the relationship between code >> > change and the random issues. just report what we obsverved in our tests FYI. >> > >> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init. >> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when >> memory allocation fails show_mem() would invoke alloc_tag_top_users. >> >> With following configuration: >> >> CONFIG_TEST_VMALLOC=y >> CONFIG_MEM_ALLOC_PROFILING=y >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y >> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause >> a NULL deference because alloc_tag_cttype was not init yet. >> >> I add some debug to confirm this theory >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >> index d48b80f3f007..9b8e7501010f 100644 >> --- a/lib/alloc_tag.c >> +++ b/lib/alloc_tag.c >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl >> struct codetag *ct; >> struct codetag_bytes n; >> unsigned int i, nr = 0; >> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >> + return 0; >> >> if (can_sleep) >> codetag_lock_module_list(alloc_tag_cttype, true); >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) >> shutdown_mem_profiling(true); >> return PTR_ERR(alloc_tag_cttype); >> } >> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >> >> return 0; >> } >> >> When bootup the kernel, the log shows: >> >> $ sudo dmesg -T | grep profiling >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 >> >> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, >> or mem_show() should check whether alloc_tag is done initialized when calling >> alloc_tag_top_users > >Thanks for reporting! >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ >will address this issue as well. Is that correct? Yes, the panic can be fix by that patch. I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init. Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation not registered by memory profiling during boot, https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/ I will make some tests, and update later David > >> >> >> >> David >>
On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote: > > At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote: > >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: > >> > >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: > >> > > >> > Hello, > >> > > >> > for this change, we reported > >> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" > >> > in > >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/ > >> > > >> > at that time, we made some tests with x86_64 config which runs well. > >> > > >> > now we noticed the commit is in mainline now. > >> > >> > the config still has expected diff with parent: > >> > > >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 > >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 > >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m > >> > CONFIG_TEST_MISC_MINOR=m > >> > # CONFIG_TEST_LKM is not set > >> > CONFIG_TEST_BITOPS=m > >> > -CONFIG_TEST_VMALLOC=m > >> > +CONFIG_TEST_VMALLOC=y > >> > # CONFIG_TEST_BPF is not set > >> > CONFIG_FIND_BIT_BENCHMARK=m > >> > # CONFIG_TEST_FIRMWARE is not set > >> > > >> > > >> > then we noticed similar random issue with x86_64 randconfig this time. > >> > > >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 > >> > ---------------- --------------------------- > >> > fail:runs %reproduction fail:runs > >> > | | | > >> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] > >> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception > >> > :199 34% 67:200 dmesg.Mem-Info > >> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN > >> > :199 34% 67:200 dmesg.RIP:down_read_trylock > >> > > >> > we don't have enough knowledge to understand the relationship between code > >> > change and the random issues. just report what we obsverved in our tests FYI. > >> > > >> > >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init. > >> > >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when > >> memory allocation fails show_mem() would invoke alloc_tag_top_users. > >> > >> With following configuration: > >> > >> CONFIG_TEST_VMALLOC=y > >> CONFIG_MEM_ALLOC_PROFILING=y > >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y > >> > >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause > >> a NULL deference because alloc_tag_cttype was not init yet. > >> > >> I add some debug to confirm this theory > >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > >> index d48b80f3f007..9b8e7501010f 100644 > >> --- a/lib/alloc_tag.c > >> +++ b/lib/alloc_tag.c > >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl > >> struct codetag *ct; > >> struct codetag_bytes n; > >> unsigned int i, nr = 0; > >> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); > >> + return 0; > >> > >> if (can_sleep) > >> codetag_lock_module_list(alloc_tag_cttype, true); > >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) > >> shutdown_mem_profiling(true); > >> return PTR_ERR(alloc_tag_cttype); > >> } > >> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); > >> > >> return 0; > >> } > >> > >> When bootup the kernel, the log shows: > >> > >> $ sudo dmesg -T | grep profiling > >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL > >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 > >> > >> > >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, > >> or mem_show() should check whether alloc_tag is done initialized when calling > >> alloc_tag_top_users > > > >Thanks for reporting! > >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ > >will address this issue as well. Is that correct? > > Yes, the panic can be fix by that patch. > > I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init. > We can, but then we would not notice the bag that is in question :) At least we should, i think, to exclude the tests which trigger warnings when the test-suite is run with default configurations, i.e. run the tests which are not supposed to fail. -- Uladzislau Rezki
At 2025-06-23 19:36:03, "Uladzislau Rezki" <urezki@gmail.com> wrote: >On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote: >> >> At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote: >> >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: >> >> >> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: >> >> > >> >> > Hello, >> >> > >> >> > for this change, we reported >> >> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" >> >> > in >> >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/ >> >> > >> >> > at that time, we made some tests with x86_64 config which runs well. >> >> > >> >> > now we noticed the commit is in mainline now. >> >> >> >> > the config still has expected diff with parent: >> >> > >> >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 >> >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 >> >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m >> >> > CONFIG_TEST_MISC_MINOR=m >> >> > # CONFIG_TEST_LKM is not set >> >> > CONFIG_TEST_BITOPS=m >> >> > -CONFIG_TEST_VMALLOC=m >> >> > +CONFIG_TEST_VMALLOC=y >> >> > # CONFIG_TEST_BPF is not set >> >> > CONFIG_FIND_BIT_BENCHMARK=m >> >> > # CONFIG_TEST_FIRMWARE is not set >> >> > >> >> > >> >> > then we noticed similar random issue with x86_64 randconfig this time. >> >> > >> >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 >> >> > ---------------- --------------------------- >> >> > fail:runs %reproduction fail:runs >> >> > | | | >> >> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] >> >> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception >> >> > :199 34% 67:200 dmesg.Mem-Info >> >> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN >> >> > :199 34% 67:200 dmesg.RIP:down_read_trylock >> >> > >> >> > we don't have enough knowledge to understand the relationship between code >> >> > change and the random issues. just report what we obsverved in our tests FYI. >> >> > >> >> >> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init. >> >> >> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when >> >> memory allocation fails show_mem() would invoke alloc_tag_top_users. >> >> >> >> With following configuration: >> >> >> >> CONFIG_TEST_VMALLOC=y >> >> CONFIG_MEM_ALLOC_PROFILING=y >> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y >> >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y >> >> >> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause >> >> a NULL deference because alloc_tag_cttype was not init yet. >> >> >> >> I add some debug to confirm this theory >> >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >> >> index d48b80f3f007..9b8e7501010f 100644 >> >> --- a/lib/alloc_tag.c >> >> +++ b/lib/alloc_tag.c >> >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl >> >> struct codetag *ct; >> >> struct codetag_bytes n; >> >> unsigned int i, nr = 0; >> >> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >> >> + return 0; >> >> >> >> if (can_sleep) >> >> codetag_lock_module_list(alloc_tag_cttype, true); >> >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) >> >> shutdown_mem_profiling(true); >> >> return PTR_ERR(alloc_tag_cttype); >> >> } >> >> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >> >> >> >> return 0; >> >> } >> >> >> >> When bootup the kernel, the log shows: >> >> >> >> $ sudo dmesg -T | grep profiling >> >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL >> >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 >> >> >> >> >> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, >> >> or mem_show() should check whether alloc_tag is done initialized when calling >> >> alloc_tag_top_users >> > >> >Thanks for reporting! >> >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ >> >will address this issue as well. Is that correct? >> >> Yes, the panic can be fix by that patch. >> >> I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init. >> >We can, but then we would not notice the bag that is in question :) Yes, strangely lucky here~ :) I was thinking, if some vmalloc tests fail, is alloc_tag_top_users helpful for debug? Considering this bug has already been caught, if alloc_tag_top_users is helpful for vmalloc test analysis, maybe it is still reasonable to delay vmalloc_test_init?... ☺︎ > >At least we should, i think, to exclude the tests which trigger warnings >when the test-suite is run with default configurations, i.e. run the tests >which are not supposed to fail. > >-- >Uladzislau Rezki
At 2025-06-23 10:45:31, "David Wang" <00107082@163.com> wrote: > >At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote: >>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: >>> >>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: >>> > >>> > Hello, >>> > >>> > for this change, we reported >>> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" >>> > in >>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/ >>> > >>> > at that time, we made some tests with x86_64 config which runs well. >>> > >>> > now we noticed the commit is in mainline now. >>> >>> > the config still has expected diff with parent: >>> > >>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 >>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 >>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m >>> > CONFIG_TEST_MISC_MINOR=m >>> > # CONFIG_TEST_LKM is not set >>> > CONFIG_TEST_BITOPS=m >>> > -CONFIG_TEST_VMALLOC=m >>> > +CONFIG_TEST_VMALLOC=y >>> > # CONFIG_TEST_BPF is not set >>> > CONFIG_FIND_BIT_BENCHMARK=m >>> > # CONFIG_TEST_FIRMWARE is not set >>> > >>> > >>> > then we noticed similar random issue with x86_64 randconfig this time. >>> > >>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 >>> > ---------------- --------------------------- >>> > fail:runs %reproduction fail:runs >>> > | | | >>> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] >>> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception >>> > :199 34% 67:200 dmesg.Mem-Info >>> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN >>> > :199 34% 67:200 dmesg.RIP:down_read_trylock >>> > >>> > we don't have enough knowledge to understand the relationship between code >>> > change and the random issues. just report what we obsverved in our tests FYI. >>> > >>> >>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init. >>> >>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when >>> memory allocation fails show_mem() would invoke alloc_tag_top_users. >>> >>> With following configuration: >>> >>> CONFIG_TEST_VMALLOC=y >>> CONFIG_MEM_ALLOC_PROFILING=y >>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y >>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y >>> >>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause >>> a NULL deference because alloc_tag_cttype was not init yet. >>> >>> I add some debug to confirm this theory >>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>> index d48b80f3f007..9b8e7501010f 100644 >>> --- a/lib/alloc_tag.c >>> +++ b/lib/alloc_tag.c >>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl >>> struct codetag *ct; >>> struct codetag_bytes n; >>> unsigned int i, nr = 0; >>> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >>> + return 0; >>> >>> if (can_sleep) >>> codetag_lock_module_list(alloc_tag_cttype, true); >>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) >>> shutdown_mem_profiling(true); >>> return PTR_ERR(alloc_tag_cttype); >>> } >>> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >>> >>> return 0; >>> } >>> >>> When bootup the kernel, the log shows: >>> >>> $ sudo dmesg -T | grep profiling >>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL >>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 >>> >>> >>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, >>> or mem_show() should check whether alloc_tag is done initialized when calling >>> alloc_tag_top_users >> >>Thanks for reporting! >>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ >>will address this issue as well. Is that correct? > >Yes, the panic can be fix by that patch. > >I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init. >Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation >not registered by memory profiling during boot, >https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/ > >I will make some tests, and update later The memory allocations in sched_init_domains happened quite early, maybe it is core_initcall, while alloc_tag_init needs rootfs, it needs to be after rootfs_initcall, so no reasonable place to promote....... But I think this explain why some allocation counter missed during boot: the allocation happened before alloc_tag_init Thanks David > > >David > > >> >>> >>> >>> >>> David >>>
At 2025-06-23 11:16:15, "David Wang" <00107082@163.com> wrote: > >At 2025-06-23 10:45:31, "David Wang" <00107082@163.com> wrote: >> >>At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote: >>>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: >>>> >>>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: >>>> > >>>> > Hello, >>>> > >>>> > for this change, we reported >>>> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" >>>> > in >>>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/ >>>> > >>>> > at that time, we made some tests with x86_64 config which runs well. >>>> > >>>> > now we noticed the commit is in mainline now. >>>> >>>> > the config still has expected diff with parent: >>>> > >>>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 >>>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 >>>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m >>>> > CONFIG_TEST_MISC_MINOR=m >>>> > # CONFIG_TEST_LKM is not set >>>> > CONFIG_TEST_BITOPS=m >>>> > -CONFIG_TEST_VMALLOC=m >>>> > +CONFIG_TEST_VMALLOC=y >>>> > # CONFIG_TEST_BPF is not set >>>> > CONFIG_FIND_BIT_BENCHMARK=m >>>> > # CONFIG_TEST_FIRMWARE is not set >>>> > >>>> > >>>> > then we noticed similar random issue with x86_64 randconfig this time. >>>> > >>>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 >>>> > ---------------- --------------------------- >>>> > fail:runs %reproduction fail:runs >>>> > | | | >>>> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] >>>> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception >>>> > :199 34% 67:200 dmesg.Mem-Info >>>> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN >>>> > :199 34% 67:200 dmesg.RIP:down_read_trylock >>>> > >>>> > we don't have enough knowledge to understand the relationship between code >>>> > change and the random issues. just report what we obsverved in our tests FYI. >>>> > >>>> >>>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init. >>>> >>>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when >>>> memory allocation fails show_mem() would invoke alloc_tag_top_users. >>>> >>>> With following configuration: >>>> >>>> CONFIG_TEST_VMALLOC=y >>>> CONFIG_MEM_ALLOC_PROFILING=y >>>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y >>>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y >>>> >>>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause >>>> a NULL deference because alloc_tag_cttype was not init yet. >>>> >>>> I add some debug to confirm this theory >>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c >>>> index d48b80f3f007..9b8e7501010f 100644 >>>> --- a/lib/alloc_tag.c >>>> +++ b/lib/alloc_tag.c >>>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl >>>> struct codetag *ct; >>>> struct codetag_bytes n; >>>> unsigned int i, nr = 0; >>>> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >>>> + return 0; >>>> >>>> if (can_sleep) >>>> codetag_lock_module_list(alloc_tag_cttype, true); >>>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) >>>> shutdown_mem_profiling(true); >>>> return PTR_ERR(alloc_tag_cttype); >>>> } >>>> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); >>>> >>>> return 0; >>>> } >>>> >>>> When bootup the kernel, the log shows: >>>> >>>> $ sudo dmesg -T | grep profiling >>>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL >>>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 >>>> >>>> >>>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, >>>> or mem_show() should check whether alloc_tag is done initialized when calling >>>> alloc_tag_top_users >>> >>>Thanks for reporting! >>>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ >>>will address this issue as well. Is that correct? >> >>Yes, the panic can be fix by that patch. >> >>I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init. >>Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation >>not registered by memory profiling during boot, >>https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/ >> >>I will make some tests, and update later > >The memory allocations in sched_init_domains happened quite early, maybe it is core_initcall, while > alloc_tag_init needs rootfs, it needs to be after rootfs_initcall, so no reasonable place to promote....... >But I think this explain why some allocation counter missed during boot: the allocation happened before alloc_tag_init ..... Sorry, I think I was wrong..... The counters does not need alloc_tag_init... sorry for bothering, please ignore my mumbo jumbo. David > > >Thanks >David > >> >> >>David >> >> >>> >>>> >>>> >>>> >>>> David >>>>
On Sun, Jun 22, 2025 at 03:50:44PM -0700, Suren Baghdasaryan wrote: > On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote: > > > > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote: > > > > > > Hello, > > > > > > for this change, we reported > > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info" > > > in > > > https://urldefense.com/v3/__https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!LY3bHD8lW73pDdoyiPE87NlpBt6nrJCqoSCm7mxOX2M5tOiT__0NF9Hs2Qm0otnk8D6kx9-OrbpZWVI$ > > > > > > at that time, we made some tests with x86_64 config which runs well. > > > > > > now we noticed the commit is in mainline now. > > > > > the config still has expected diff with parent: > > > > > > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800 > > > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800 > > > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m > > > CONFIG_TEST_MISC_MINOR=m > > > # CONFIG_TEST_LKM is not set > > > CONFIG_TEST_BITOPS=m > > > -CONFIG_TEST_VMALLOC=m > > > +CONFIG_TEST_VMALLOC=y > > > # CONFIG_TEST_BPF is not set > > > CONFIG_FIND_BIT_BENCHMARK=m > > > # CONFIG_TEST_FIRMWARE is not set > > > > > > > > > then we noticed similar random issue with x86_64 randconfig this time. > > > > > > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830 > > > ---------------- --------------------------- > > > fail:runs %reproduction fail:runs > > > | | | > > > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#] > > > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception > > > :199 34% 67:200 dmesg.Mem-Info > > > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN > > > :199 34% 67:200 dmesg.RIP:down_read_trylock > > > > > > we don't have enough knowledge to understand the relationship between code > > > change and the random issues. just report what we obsverved in our tests FYI. > > > > > > > I think this is caused by a race between vmalloc_test_init and alloc_tag_init. > > > > vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when > > memory allocation fails show_mem() would invoke alloc_tag_top_users. > > > > With following configuration: > > > > CONFIG_TEST_VMALLOC=y > > CONFIG_MEM_ALLOC_PROFILING=y > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y > > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y > > > > If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause > > a NULL deference because alloc_tag_cttype was not init yet. > > > > I add some debug to confirm this theory > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > > index d48b80f3f007..9b8e7501010f 100644 > > --- a/lib/alloc_tag.c > > +++ b/lib/alloc_tag.c > > @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl > > struct codetag *ct; > > struct codetag_bytes n; > > unsigned int i, nr = 0; > > + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); > > + return 0; > > > > if (can_sleep) > > codetag_lock_module_list(alloc_tag_cttype, true); > > @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void) > > shutdown_mem_profiling(true); > > return PTR_ERR(alloc_tag_cttype); > > } > > + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype); > > > > return 0; > > } > > > > When bootup the kernel, the log shows: > > > > $ sudo dmesg -T | grep profiling > > [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL > > [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0 > > > > > > vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y, > > or mem_show() should check whether alloc_tag is done initialized when calling > > alloc_tag_top_users > > Thanks for reporting! > So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/ > will address this issue as well. Is that correct? Yes, I verified that it addresses this issue. > > > > David > > -- Cheers, Harry / Hyeonggon
© 2016 - 2025 Red Hat, Inc.