kmalloc's API family is critical for mm, with one nature that it will
round up the request size to a fixed one (mostly power of 2). Say
when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
could be allocated, so in worst case, there is around 50% memory
space waste.
The wastage is not a big issue for requests that get allocated/freed
quickly, but may cause problems with objects that have longer life
time.
We've met a kernel boot OOM panic (v5.10), and from the dumped slab
info:
[ 26.062145] kmalloc-2k 814056KB 814056KB
From debug we found there are huge number of 'struct iova_magazine',
whose size is 1032 bytes (1024 + 8), so each allocation will waste
1016 bytes. Though the issue was solved by giving the right (bigger)
size of RAM, it is still nice to optimize the size (either use a
kmalloc friendly size or create a dedicated slab for it).
And from lkml archive, there was another crash kernel OOM case [1]
back in 2019, which seems to be related with the similar slab waste
situation, as the log is similar:
[ 4.332648] iommu: Adding device 0000:20:02.0 to group 16
[ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
...
[ 4.857565] kmalloc-2048 59164KB 59164KB
The crash kernel only has 256M memory, and 59M is pretty big here.
(Note: the related code has been changed and optimised in recent
kernel [2], these logs are just picked to demo the problem, also
a patch changing its size to 1024 bytes has been merged)
So add an way to track each kmalloc's memory waste info, and
leverage the existing SLUB debug framework (specifically
SLUB_STORE_USER) to show its call stack of original allocation,
so that user can evaluate the waste situation, identify some hot
spots and optimize accordingly, for a better utilization of memory.
The waste info is integrated into existing interface:
'/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of
'kmalloc-4k' after boot is:
126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1
__kmem_cache_alloc_node+0x11f/0x4e0
__kmalloc_node+0x4e/0x140
ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe]
ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe]
ixgbe_probe+0x165f/0x1d20 [ixgbe]
local_pci_probe+0x78/0xc0
work_for_cpu_fn+0x26/0x40
...
which means in 'kmalloc-4k' slab, there are 126 requests of
2240 bytes which got a 4KB space (wasting 1856 bytes each
and 233856 bytes in total), from ixgbe_alloc_q_vector().
And when system starts some real workload like multiple docker
instances, there could are more severe waste.
[1]. https://lkml.org/lkml/2019/8/12/266
[2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/
[Thanks Hyeonggon for pointing out several bugs about sorting/format]
[Thanks Vlastimil for suggesting way to reduce memory usage of
orig_size and keep it only for kmalloc objects]
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
---
Documentation/mm/slub.rst | 33 +++++---
include/linux/slab.h | 2 +
mm/slub.c | 156 ++++++++++++++++++++++++++++----------
3 files changed, 141 insertions(+), 50 deletions(-)
diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst
index 43063ade737a..4e1578186b4f 100644
--- a/Documentation/mm/slub.rst
+++ b/Documentation/mm/slub.rst
@@ -400,21 +400,30 @@ information:
allocated objects. The output is sorted by frequency of each trace.
Information in the output:
- Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
- pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
+ Number of objects, allocating function, possible memory wastage of
+ kmalloc objects(total/per-object), minimal/average/maximal jiffies
+ since alloc, pid range of the allocating processes, cpu mask of
+ allocating cpus, numa node mask of origins of memory, and stack trace.
Example:::
- 1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
- __slab_alloc+0x6d/0x90
- kmem_cache_alloc_trace+0x2eb/0x300
- populate_error_injection_list+0x97/0x110
- init_error_injection+0x1b/0x71
- do_one_initcall+0x5f/0x2d0
- kernel_init_freeable+0x26f/0x2d7
- kernel_init+0xe/0x118
- ret_from_fork+0x22/0x30
-
+ 338 pci_alloc_dev+0x2c/0xa0 waste=521872/1544 age=290837/291891/293509 pid=1 cpus=106 nodes=0-1
+ __kmem_cache_alloc_node+0x11f/0x4e0
+ kmalloc_trace+0x26/0xa0
+ pci_alloc_dev+0x2c/0xa0
+ pci_scan_single_device+0xd2/0x150
+ pci_scan_slot+0xf7/0x2d0
+ pci_scan_child_bus_extend+0x4e/0x360
+ acpi_pci_root_create+0x32e/0x3b0
+ pci_acpi_scan_root+0x2b9/0x2d0
+ acpi_pci_root_add.cold.11+0x110/0xb0a
+ acpi_bus_attach+0x262/0x3f0
+ device_for_each_child+0xb7/0x110
+ acpi_dev_for_each_child+0x77/0xa0
+ acpi_bus_attach+0x108/0x3f0
+ device_for_each_child+0xb7/0x110
+ acpi_dev_for_each_child+0x77/0xa0
+ acpi_bus_attach+0x108/0x3f0
2. free_traces::
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9b592e611cb1..6dc495f76644 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -29,6 +29,8 @@
#define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U)
/* DEBUG: Poison objects */
#define SLAB_POISON ((slab_flags_t __force)0x00000800U)
+/* Indicate a kmalloc slab */
+#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U)
/* Align objs on cache lines */
#define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U)
/* Use GFP_DMA memory */
diff --git a/mm/slub.c b/mm/slub.c
index fe4fe0e72daf..c8ba16b3a4db 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -194,11 +194,24 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
#endif
#endif /* CONFIG_SLUB_DEBUG */
+/* Structure holding parameters for get_partial() call chain */
+struct partial_context {
+ struct slab **slab;
+ gfp_t flags;
+ unsigned int orig_size;
+};
+
static inline bool kmem_cache_debug(struct kmem_cache *s)
{
return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS);
}
+static inline bool slub_debug_orig_size(struct kmem_cache *s)
+{
+ return (kmem_cache_debug_flags(s, SLAB_STORE_USER) &&
+ (s->flags & SLAB_KMALLOC));
+}
+
void *fixup_red_left(struct kmem_cache *s, void *p)
{
if (kmem_cache_debug_flags(s, SLAB_RED_ZONE))
@@ -785,6 +798,39 @@ static void print_slab_info(const struct slab *slab)
folio_flags(folio, 0));
}
+/*
+ * kmalloc caches has fixed sizes (mostly power of 2), and kmalloc() API
+ * family will round up the real request size to these fixed ones, so
+ * there could be an extra area than what is requested. Save the original
+ * request size in the meta data area, for better debug and sanity check.
+ */
+static inline void set_orig_size(struct kmem_cache *s,
+ void *object, unsigned int orig_size)
+{
+ void *p = kasan_reset_tag(object);
+
+ if (!slub_debug_orig_size(s))
+ return;
+
+ p += get_info_end(s);
+ p += sizeof(struct track) * 2;
+
+ *(unsigned int *)p = orig_size;
+}
+
+static unsigned int get_orig_size(struct kmem_cache *s, void *object)
+{
+ void *p = kasan_reset_tag(object);
+
+ if (!slub_debug_orig_size(s))
+ return s->object_size;
+
+ p += get_info_end(s);
+ p += sizeof(struct track) * 2;
+
+ return *(unsigned int *)p;
+}
+
static void slab_bug(struct kmem_cache *s, char *fmt, ...)
{
struct va_format vaf;
@@ -844,6 +890,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
if (s->flags & SLAB_STORE_USER)
off += 2 * sizeof(struct track);
+ if (slub_debug_orig_size(s))
+ off += sizeof(unsigned int);
+
off += kasan_metadata_size(s);
if (off != size_from_object(s))
@@ -977,7 +1026,8 @@ static int check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
*
* A. Free pointer (if we cannot overwrite object on free)
* B. Tracking data for SLAB_STORE_USER
- * C. Padding to reach required alignment boundary or at minimum
+ * C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
+ * D. Padding to reach required alignment boundary or at minimum
* one word if debugging is on to be able to detect writes
* before the word boundary.
*
@@ -995,10 +1045,14 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
{
unsigned long off = get_info_end(s); /* The end of info */
- if (s->flags & SLAB_STORE_USER)
+ if (s->flags & SLAB_STORE_USER) {
/* We also have user information there */
off += 2 * sizeof(struct track);
+ if (s->flags & SLAB_KMALLOC)
+ off += sizeof(unsigned int);
+ }
+
off += kasan_metadata_size(s);
if (size_from_object(s) == off)
@@ -1293,7 +1347,7 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
}
static noinline int alloc_debug_processing(struct kmem_cache *s,
- struct slab *slab, void *object)
+ struct slab *slab, void *object, int orig_size)
{
if (s->flags & SLAB_CONSISTENCY_CHECKS) {
if (!alloc_consistency_checks(s, slab, object))
@@ -1302,6 +1356,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
/* Success. Perform special debug activities for allocs */
trace(s, slab, object, 1);
+ set_orig_size(s, object, orig_size);
init_object(s, object, SLUB_RED_ACTIVE);
return 1;
@@ -1570,7 +1625,10 @@ static inline
void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
static inline int alloc_debug_processing(struct kmem_cache *s,
- struct slab *slab, void *object) { return 0; }
+ struct slab *slab, void *object, int orig_size) { return 0; }
+
+static inline void set_orig_size(struct kmem_cache *s,
+ void *object, unsigned int orig_size) {}
static inline void free_debug_processing(
struct kmem_cache *s, struct slab *slab,
@@ -1999,7 +2057,7 @@ static inline void remove_partial(struct kmem_cache_node *n,
* it to full list if it was the last free object.
*/
static void *alloc_single_from_partial(struct kmem_cache *s,
- struct kmem_cache_node *n, struct slab *slab)
+ struct kmem_cache_node *n, struct slab *slab, int orig_size)
{
void *object;
@@ -2009,7 +2067,7 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
slab->freelist = get_freepointer(s, object);
slab->inuse++;
- if (!alloc_debug_processing(s, slab, object)) {
+ if (!alloc_debug_processing(s, slab, object, orig_size)) {
remove_partial(n, slab);
return NULL;
}
@@ -2028,7 +2086,7 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
* and put the slab to the partial (or full) list.
*/
static void *alloc_single_from_new_slab(struct kmem_cache *s,
- struct slab *slab)
+ struct slab *slab, int orig_size)
{
int nid = slab_nid(slab);
struct kmem_cache_node *n = get_node(s, nid);
@@ -2040,7 +2098,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s,
slab->freelist = get_freepointer(s, object);
slab->inuse = 1;
- if (!alloc_debug_processing(s, slab, object))
+ if (!alloc_debug_processing(s, slab, object, orig_size))
/*
* It's not really expected that this would fail on a
* freshly allocated slab, but a concurrent memory
@@ -2118,7 +2176,7 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags);
* Try to allocate a partial slab from a specific node.
*/
static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
- struct slab **ret_slab, gfp_t gfpflags)
+ struct partial_context *pc)
{
struct slab *slab, *slab2;
void *object = NULL;
@@ -2138,11 +2196,12 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
list_for_each_entry_safe(slab, slab2, &n->partial, slab_list) {
void *t;
- if (!pfmemalloc_match(slab, gfpflags))
+ if (!pfmemalloc_match(slab, pc->flags))
continue;
if (kmem_cache_debug(s)) {
- object = alloc_single_from_partial(s, n, slab);
+ object = alloc_single_from_partial(s, n, slab,
+ pc->orig_size);
if (object)
break;
continue;
@@ -2153,7 +2212,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
break;
if (!object) {
- *ret_slab = slab;
+ *pc->slab = slab;
stat(s, ALLOC_FROM_PARTIAL);
object = t;
} else {
@@ -2177,14 +2236,13 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
/*
* Get a slab from somewhere. Search in increasing NUMA distances.
*/
-static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
- struct slab **ret_slab)
+static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
{
#ifdef CONFIG_NUMA
struct zonelist *zonelist;
struct zoneref *z;
struct zone *zone;
- enum zone_type highest_zoneidx = gfp_zone(flags);
+ enum zone_type highest_zoneidx = gfp_zone(pc->flags);
void *object;
unsigned int cpuset_mems_cookie;
@@ -2212,15 +2270,15 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
do {
cpuset_mems_cookie = read_mems_allowed_begin();
- zonelist = node_zonelist(mempolicy_slab_node(), flags);
+ zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
struct kmem_cache_node *n;
n = get_node(s, zone_to_nid(zone));
- if (n && cpuset_zone_allowed(zone, flags) &&
+ if (n && cpuset_zone_allowed(zone, pc->flags) &&
n->nr_partial > s->min_partial) {
- object = get_partial_node(s, n, ret_slab, flags);
+ object = get_partial_node(s, n, pc);
if (object) {
/*
* Don't check read_mems_allowed_retry()
@@ -2241,8 +2299,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
/*
* Get a partial slab, lock it and return it.
*/
-static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
- struct slab **ret_slab)
+static void *get_partial(struct kmem_cache *s, int node, struct partial_context *pc)
{
void *object;
int searchnode = node;
@@ -2250,11 +2307,11 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
if (node == NUMA_NO_NODE)
searchnode = numa_mem_id();
- object = get_partial_node(s, get_node(s, searchnode), ret_slab, flags);
+ object = get_partial_node(s, get_node(s, searchnode), pc);
if (object || node != NUMA_NO_NODE)
return object;
- return get_any_partial(s, flags, ret_slab);
+ return get_any_partial(s, pc);
}
#ifdef CONFIG_PREEMPTION
@@ -2974,11 +3031,12 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
* already disabled (which is the case for bulk allocation).
*/
static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
+ unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
{
void *freelist;
struct slab *slab;
unsigned long flags;
+ struct partial_context pc;
stat(s, ALLOC_SLOWPATH);
@@ -3092,7 +3150,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
new_objects:
- freelist = get_partial(s, gfpflags, node, &slab);
+ pc.flags = gfpflags;
+ pc.slab = &slab;
+ pc.orig_size = orig_size;
+ freelist = get_partial(s, node, &pc);
if (freelist)
goto check_new_slab;
@@ -3108,7 +3169,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
stat(s, ALLOC_SLAB);
if (kmem_cache_debug(s)) {
- freelist = alloc_single_from_new_slab(s, slab);
+ freelist = alloc_single_from_new_slab(s, slab, orig_size);
if (unlikely(!freelist))
goto new_objects;
@@ -3140,6 +3201,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
*/
if (s->flags & SLAB_STORE_USER)
set_track(s, freelist, TRACK_ALLOC, addr);
+
return freelist;
}
@@ -3182,7 +3244,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
* pointer.
*/
static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
+ unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
{
void *p;
@@ -3195,7 +3257,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
c = slub_get_cpu_ptr(s->cpu_slab);
#endif
- p = ___slab_alloc(s, gfpflags, node, addr, c);
+ p = ___slab_alloc(s, gfpflags, node, addr, c, orig_size);
#ifdef CONFIG_PREEMPT_COUNT
slub_put_cpu_ptr(s->cpu_slab);
#endif
@@ -3280,7 +3342,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l
if (!USE_LOCKLESS_FAST_PATH() ||
unlikely(!object || !slab || !node_match(slab, node))) {
- object = __slab_alloc(s, gfpflags, node, addr, c);
+ object = __slab_alloc(s, gfpflags, node, addr, c, orig_size);
} else {
void *next_object = get_freepointer_safe(s, object);
@@ -3747,7 +3809,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
* of re-populating per CPU c->freelist
*/
p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE,
- _RET_IP_, c);
+ _RET_IP_, c, s->object_size);
if (unlikely(!p[i]))
goto error;
@@ -4150,12 +4212,17 @@ static int calculate_sizes(struct kmem_cache *s)
}
#ifdef CONFIG_SLUB_DEBUG
- if (flags & SLAB_STORE_USER)
+ if (flags & SLAB_STORE_USER) {
/*
* Need to store information about allocs and frees after
* the object.
*/
size += 2 * sizeof(struct track);
+
+ /* Save the original kmalloc request size */
+ if (flags & SLAB_KMALLOC)
+ size += sizeof(unsigned int);
+ }
#endif
kasan_cache_create(s, &size, &s->flags);
@@ -4770,7 +4837,7 @@ void __init kmem_cache_init(void)
/* Now we can use the kmem_cache to allocate kmalloc slabs */
setup_kmalloc_cache_index_table();
- create_kmalloc_caches(0);
+ create_kmalloc_caches(SLAB_KMALLOC);
/* Setup random freelists for each cache */
init_freelist_randomization();
@@ -4937,6 +5004,7 @@ struct location {
depot_stack_handle_t handle;
unsigned long count;
unsigned long addr;
+ unsigned long waste;
long long sum_time;
long min_time;
long max_time;
@@ -4983,13 +5051,15 @@ static int alloc_loc_track(struct loc_track *t, unsigned long max, gfp_t flags)
}
static int add_location(struct loc_track *t, struct kmem_cache *s,
- const struct track *track)
+ const struct track *track,
+ unsigned int orig_size)
{
long start, end, pos;
struct location *l;
- unsigned long caddr, chandle;
+ unsigned long caddr, chandle, cwaste;
unsigned long age = jiffies - track->when;
depot_stack_handle_t handle = 0;
+ unsigned int waste = s->object_size - orig_size;
#ifdef CONFIG_STACKDEPOT
handle = READ_ONCE(track->handle);
@@ -5007,11 +5077,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
if (pos == end)
break;
- caddr = t->loc[pos].addr;
- chandle = t->loc[pos].handle;
- if ((track->addr == caddr) && (handle == chandle)) {
+ l = &t->loc[pos];
+ caddr = l->addr;
+ chandle = l->handle;
+ cwaste = l->waste;
+ if ((track->addr == caddr) && (handle == chandle) &&
+ (waste == cwaste)) {
- l = &t->loc[pos];
l->count++;
if (track->when) {
l->sum_time += age;
@@ -5036,6 +5108,9 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
end = pos;
else if (track->addr == caddr && handle < chandle)
end = pos;
+ else if (track->addr == caddr && handle == chandle &&
+ waste < cwaste)
+ end = pos;
else
start = pos;
}
@@ -5059,6 +5134,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
l->min_pid = track->pid;
l->max_pid = track->pid;
l->handle = handle;
+ l->waste = waste;
cpumask_clear(to_cpumask(l->cpus));
cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
nodes_clear(l->nodes);
@@ -5077,7 +5153,7 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
for_each_object(p, s, addr, slab->objects)
if (!test_bit(__obj_to_index(s, addr, p), obj_map))
- add_location(t, s, get_track(s, p, alloc));
+ add_location(t, s, get_track(s, p, alloc), get_orig_size(s, p));
}
#endif /* CONFIG_DEBUG_FS */
#endif /* CONFIG_SLUB_DEBUG */
@@ -5942,6 +6018,10 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
else
seq_puts(seq, "<not-available>");
+ if (l->waste)
+ seq_printf(seq, " waste=%lu/%lu",
+ l->count * l->waste, l->waste);
+
if (l->sum_time != l->min_time) {
seq_printf(seq, " age=%ld/%llu/%ld",
l->min_time, div_u64(l->sum_time, l->count),
--
2.34.1
On Tue, 13 Sep 2022, at 06:54, Feng Tang wrote:
> kmalloc's API family is critical for mm, with one nature that it will
> round up the request size to a fixed one (mostly power of 2). Say
> when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
> could be allocated, so in worst case, there is around 50% memory
> space waste.
I have a ralink mt7621 router running Openwrt, using the mips ZBOOT kernel, and appear to have bisected
a very-nearly-clean kernel v6.1rc-2 boot issue to this commit.
I have 3 commits atop 6.1-rc2: fix a ZBOOT compile error, use the Openwrt LZMA options,
and enable DEBUG_ZBOOT for my platform. I am compiling my kernel within the Openwrt build system.
No guarantees this is not due to something I am doing wrong, but any insight would be greatly appreciated.
On UART, No indication of the (once extracted) kernel booting:
transfer started ......................................... transfer ok, time=2.01s
setting up elf image... OK
jumping to kernel code
zimage at: 80BA4100 810D4720
Uncompressing Linux at load address 80001000
Copy device tree to address 80B96EE0
Now, booting the kernel...
Nothing follows
6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") reverted, normal boot:
transfer started ......................................... transfer ok, time=2.01s
setting up elf image... OK
jumping to kernel code
zimage at: 80BA4100 810D47A4
Uncompressing Linux at load address 80001000
Copy device tree to address 80B96EE0
Now, booting the kernel...
[ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Oct 28 03:48:10 2022
[ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[ 0.000000] printk: bootconsole [early0] enabled
[ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[ 0.000000] MIPS: machine is MikroTik RouterBOARD 760iGS
[ 0.000000] Initrd not found or empty - disabling initrd
[ 0.000000] VPE topology {2,2} total 4
[ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] HighMem empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] percpu: Embedded 11 pages/cpu s16064 r8192 d20800 u45056
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
[ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
[ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[ 0.000000] Writing ErrCtl register=00019146
[ 0.000000] Readback ErrCtl register=00019146
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 246220K/262144K available (7455K kernel code, 628K rwdata, 1308K rodata, 3524K init, 245K bss, 15924K reserved, 0K cma-reserved, 0K highmem)
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] rcu: Hierarchical RCU implementation.
boot continues as expected
possibly relevant config options:
grep -E '(SLUB|SLAB)' .config
# SLAB allocator options
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options
# CONFIG_SLUB_DEBUG is not set
With this commit reverted: cpuinfo and meminfo
system type : MediaTek MT7621 ver:1 eco:3
machine : MikroTik RouterBOARD 760iGS
processor : 0
cpu model : MIPS 1004Kc V2.15
BogoMIPS : 586.13
wait instruction : yes
microsecond timers : yes
tlb_entries : 32
extra interrupt vector : yes
hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa : mips1 mips2 mips32r1 mips32r2
ASEs implemented : mips16 dsp mt
Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
shadow register sets : 1
kscratch registers : 0
package : 0
core : 0
VPE : 0
VCED exceptions : not available
VCEI exceptions : not available
processor : 1
cpu model : MIPS 1004Kc V2.15
BogoMIPS : 586.13
wait instruction : yes
microsecond timers : yes
tlb_entries : 32
extra interrupt vector : yes
hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa : mips1 mips2 mips32r1 mips32r2
ASEs implemented : mips16 dsp mt
Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
shadow register sets : 1
kscratch registers : 0
package : 0
core : 0
VPE : 1
VCED exceptions : not available
VCEI exceptions : not available
processor : 2
cpu model : MIPS 1004Kc V2.15
BogoMIPS : 586.13
wait instruction : yes
microsecond timers : yes
tlb_entries : 32
extra interrupt vector : yes
hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa : mips1 mips2 mips32r1 mips32r2
ASEs implemented : mips16 dsp mt
Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
shadow register sets : 1
kscratch registers : 0
package : 0
core : 1
VPE : 0
VCED exceptions : not available
VCEI exceptions : not available
processor : 3
cpu model : MIPS 1004Kc V2.15
BogoMIPS : 586.13
wait instruction : yes
microsecond timers : yes
tlb_entries : 32
extra interrupt vector : yes
hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa : mips1 mips2 mips32r1 mips32r2
ASEs implemented : mips16 dsp mt
Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
shadow register sets : 1
kscratch registers : 0
package : 0
core : 1
VPE : 1
VCED exceptions : not available
VCEI exceptions : not available
MemTotal: 249744 kB
MemFree: 211088 kB
MemAvailable: 187364 kB
Buffers: 0 kB
Cached: 8824 kB
SwapCached: 0 kB
Active: 1104 kB
Inactive: 8860 kB
Active(anon): 1104 kB
Inactive(anon): 8860 kB
Active(file): 0 kB
Inactive(file): 0 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 249744 kB
LowFree: 211088 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 1192 kB
Mapped: 2092 kB
Shmem: 8824 kB
KReclaimable: 1704 kB
Slab: 9372 kB
SReclaimable: 1704 kB
SUnreclaim: 7668 kB
KernelStack: 592 kB
PageTables: 264 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 124872 kB
Committed_AS: 14676 kB
VmallocTotal: 1040376 kB
VmallocUsed: 2652 kB
VmallocChunk: 0 kB
Percpu: 272 kB
Cheers,
--
John Thomson
On 10/30/22 20:23, John Thomson wrote:
> On Tue, 13 Sep 2022, at 06:54, Feng Tang wrote:
>> kmalloc's API family is critical for mm, with one nature that it will
>> round up the request size to a fixed one (mostly power of 2). Say
>> when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
>> could be allocated, so in worst case, there is around 50% memory
>> space waste.
>
>
> I have a ralink mt7621 router running Openwrt, using the mips ZBOOT kernel, and appear to have bisected
> a very-nearly-clean kernel v6.1rc-2 boot issue to this commit.
> I have 3 commits atop 6.1-rc2: fix a ZBOOT compile error, use the Openwrt LZMA options,
> and enable DEBUG_ZBOOT for my platform. I am compiling my kernel within the Openwrt build system.
> No guarantees this is not due to something I am doing wrong, but any insight would be greatly appreciated.
>
>
> On UART, No indication of the (once extracted) kernel booting:
>
> transfer started ......................................... transfer ok, time=2.01s
> setting up elf image... OK
> jumping to kernel code
> zimage at: 80BA4100 810D4720
> Uncompressing Linux at load address 80001000
> Copy device tree to address 80B96EE0
> Now, booting the kernel...
It's weird that the commit would cause no output so early, SLUB code is
run only later.
> Nothing follows
>
> 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") reverted, normal boot:
> transfer started ......................................... transfer ok, time=2.01s
> setting up elf image... OK
> jumping to kernel code
> zimage at: 80BA4100 810D47A4
> Uncompressing Linux at load address 80001000
> Copy device tree to address 80B96EE0
> Now, booting the kernel...
>
> [ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Oct 28 03:48:10 2022
> [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> [ 0.000000] printk: bootconsole [early0] enabled
> [ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
> [ 0.000000] MIPS: machine is MikroTik RouterBOARD 760iGS
> [ 0.000000] Initrd not found or empty - disabling initrd
> [ 0.000000] VPE topology {2,2} total 4
> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
> [ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
> [ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
> [ 0.000000] Zone ranges:
> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
> [ 0.000000] HighMem empty
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
> [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
> [ 0.000000] percpu: Embedded 11 pages/cpu s16064 r8192 d20800 u45056
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
> [ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> [ 0.000000] Writing ErrCtl register=00019146
> [ 0.000000] Readback ErrCtl register=00019146
> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [ 0.000000] Memory: 246220K/262144K available (7455K kernel code, 628K rwdata, 1308K rodata, 3524K init, 245K bss, 15924K reserved, 0K cma-reserved, 0K highmem)
> [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [ 0.000000] rcu: Hierarchical RCU implementation.
>
>
> boot continues as expected
>
>
> possibly relevant config options:
> grep -E '(SLUB|SLAB)' .config
> # SLAB allocator options
> # CONFIG_SLAB is not set
> CONFIG_SLUB=y
> CONFIG_SLAB_MERGE_DEFAULT=y
> # CONFIG_SLAB_FREELIST_RANDOM is not set
> # CONFIG_SLAB_FREELIST_HARDENED is not set
> # CONFIG_SLUB_STATS is not set
> CONFIG_SLUB_CPU_PARTIAL=y
> # end of SLAB allocator options
> # CONFIG_SLUB_DEBUG is not set
Also not having CONFIG_SLUB_DEBUG enabled means most of the code the
patch/commit touches is not even active.
Could this be some miscompile or code layout change exposing some
different bug, hmm.
Is it any different if you do enable CONFIG_SLUB_DEBUG ?
Or change to CONFIG_SLAB? (that would be really weird if not)
>
> With this commit reverted: cpuinfo and meminfo
>
> system type : MediaTek MT7621 ver:1 eco:3
> machine : MikroTik RouterBOARD 760iGS
> processor : 0
> cpu model : MIPS 1004Kc V2.15
> BogoMIPS : 586.13
> wait instruction : yes
> microsecond timers : yes
> tlb_entries : 32
> extra interrupt vector : yes
> hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
> isa : mips1 mips2 mips32r1 mips32r2
> ASEs implemented : mips16 dsp mt
> Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
> shadow register sets : 1
> kscratch registers : 0
> package : 0
> core : 0
> VPE : 0
> VCED exceptions : not available
> VCEI exceptions : not available
>
> processor : 1
> cpu model : MIPS 1004Kc V2.15
> BogoMIPS : 586.13
> wait instruction : yes
> microsecond timers : yes
> tlb_entries : 32
> extra interrupt vector : yes
> hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
> isa : mips1 mips2 mips32r1 mips32r2
> ASEs implemented : mips16 dsp mt
> Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
> shadow register sets : 1
> kscratch registers : 0
> package : 0
> core : 0
> VPE : 1
> VCED exceptions : not available
> VCEI exceptions : not available
>
> processor : 2
> cpu model : MIPS 1004Kc V2.15
> BogoMIPS : 586.13
> wait instruction : yes
> microsecond timers : yes
> tlb_entries : 32
> extra interrupt vector : yes
> hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
> isa : mips1 mips2 mips32r1 mips32r2
> ASEs implemented : mips16 dsp mt
> Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
> shadow register sets : 1
> kscratch registers : 0
> package : 0
> core : 1
> VPE : 0
> VCED exceptions : not available
> VCEI exceptions : not available
>
> processor : 3
> cpu model : MIPS 1004Kc V2.15
> BogoMIPS : 586.13
> wait instruction : yes
> microsecond timers : yes
> tlb_entries : 32
> extra interrupt vector : yes
> hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
> isa : mips1 mips2 mips32r1 mips32r2
> ASEs implemented : mips16 dsp mt
> Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc pindexed_dcache userlocal vint perf_cntr_intr_bit cdmm perf
> shadow register sets : 1
> kscratch registers : 0
> package : 0
> core : 1
> VPE : 1
> VCED exceptions : not available
> VCEI exceptions : not available
>
> MemTotal: 249744 kB
> MemFree: 211088 kB
> MemAvailable: 187364 kB
> Buffers: 0 kB
> Cached: 8824 kB
> SwapCached: 0 kB
> Active: 1104 kB
> Inactive: 8860 kB
> Active(anon): 1104 kB
> Inactive(anon): 8860 kB
> Active(file): 0 kB
> Inactive(file): 0 kB
> Unevictable: 0 kB
> Mlocked: 0 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 249744 kB
> LowFree: 211088 kB
> SwapTotal: 0 kB
> SwapFree: 0 kB
> Dirty: 0 kB
> Writeback: 0 kB
> AnonPages: 1192 kB
> Mapped: 2092 kB
> Shmem: 8824 kB
> KReclaimable: 1704 kB
> Slab: 9372 kB
> SReclaimable: 1704 kB
> SUnreclaim: 7668 kB
> KernelStack: 592 kB
> PageTables: 264 kB
> SecPageTables: 0 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> WritebackTmp: 0 kB
> CommitLimit: 124872 kB
> Committed_AS: 14676 kB
> VmallocTotal: 1040376 kB
> VmallocUsed: 2652 kB
> VmallocChunk: 0 kB
> Percpu: 272 kB
>
>
> Cheers,
>
Hi John,
Thanks for the bisecting and reporting!
On Mon, Oct 31, 2022 at 05:30:24AM +0800, Vlastimil Babka wrote:
> On 10/30/22 20:23, John Thomson wrote:
> > On Tue, 13 Sep 2022, at 06:54, Feng Tang wrote:
> >> kmalloc's API family is critical for mm, with one nature that it will
> >> round up the request size to a fixed one (mostly power of 2). Say
> >> when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
> >> could be allocated, so in worst case, there is around 50% memory
> >> space waste.
> >
> >
> > I have a ralink mt7621 router running Openwrt, using the mips ZBOOT kernel, and appear to have bisected
> > a very-nearly-clean kernel v6.1rc-2 boot issue to this commit.
> > I have 3 commits atop 6.1-rc2: fix a ZBOOT compile error, use the Openwrt LZMA options,
> > and enable DEBUG_ZBOOT for my platform. I am compiling my kernel within the Openwrt build system.
> > No guarantees this is not due to something I am doing wrong, but any insight would be greatly appreciated.
> >
> >
> > On UART, No indication of the (once extracted) kernel booting:
> >
> > transfer started ......................................... transfer ok, time=2.01s
> > setting up elf image... OK
> > jumping to kernel code
> > zimage at: 80BA4100 810D4720
> > Uncompressing Linux at load address 80001000
> > Copy device tree to address 80B96EE0
> > Now, booting the kernel...
>
> It's weird that the commit would cause no output so early, SLUB code is
> run only later.
I noticed your cmdline has console setting, could you enable the
earlyprintk in cmdline like "earlyprintk=ttyS0,115200" etc to see
if there is more message printed out.
Also I want to confirm this is a boot failure and not only a boot
message missing.
> > Nothing follows
> >
> > 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") reverted, normal boot:
> > transfer started ......................................... transfer ok, time=2.01s
> > setting up elf image... OK
> > jumping to kernel code
> > zimage at: 80BA4100 810D47A4
> > Uncompressing Linux at load address 80001000
> > Copy device tree to address 80B96EE0
> > Now, booting the kernel...
> >
> > [ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Oct 28 03:48:10 2022
> > [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> > [ 0.000000] printk: bootconsole [early0] enabled
> > [ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
> > [ 0.000000] MIPS: machine is MikroTik RouterBOARD 760iGS
> > [ 0.000000] Initrd not found or empty - disabling initrd
> > [ 0.000000] VPE topology {2,2} total 4
> > [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
> > [ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
> > [ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
> > [ 0.000000] Zone ranges:
> > [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
> > [ 0.000000] HighMem empty
> > [ 0.000000] Movable zone start for each node
> > [ 0.000000] Early memory node ranges
> > [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
> > [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
> > [ 0.000000] percpu: Embedded 11 pages/cpu s16064 r8192 d20800 u45056
> > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
> > [ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
> > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
> > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> > [ 0.000000] Writing ErrCtl register=00019146
> > [ 0.000000] Readback ErrCtl register=00019146
> > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > [ 0.000000] Memory: 246220K/262144K available (7455K kernel code, 628K rwdata, 1308K rodata, 3524K init, 245K bss, 15924K reserved, 0K cma-reserved, 0K highmem)
> > [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > [ 0.000000] rcu: Hierarchical RCU implementation.
> >
> >
> > boot continues as expected
> >
> >
> > possibly relevant config options:
> > grep -E '(SLUB|SLAB)' .config
> > # SLAB allocator options
> > # CONFIG_SLAB is not set
> > CONFIG_SLUB=y
> > CONFIG_SLAB_MERGE_DEFAULT=y
> > # CONFIG_SLAB_FREELIST_RANDOM is not set
> > # CONFIG_SLAB_FREELIST_HARDENED is not set
> > # CONFIG_SLUB_STATS is not set
> > CONFIG_SLUB_CPU_PARTIAL=y
> > # end of SLAB allocator options
> > # CONFIG_SLUB_DEBUG is not set
>
> Also not having CONFIG_SLUB_DEBUG enabled means most of the code the
> patch/commit touches is not even active.
> Could this be some miscompile or code layout change exposing some
> different bug, hmm.
> Is it any different if you do enable CONFIG_SLUB_DEBUG ?
> Or change to CONFIG_SLAB? (that would be really weird if not)
I haven't found any clue from the code either, and I compiled
kernel with the config above and tested booting on an Alder-lake
desktop and a QEMU, which boot fine.
Could you provide the full kernel config and demsg (in compressed
format if you think it's too big), so we can check more?
Thanks,
Feng
On Mon, 31 Oct 2022, at 02:36, Feng Tang wrote:
> Hi John,
>
> Thanks for the bisecting and reporting!
>
> On Mon, Oct 31, 2022 at 05:30:24AM +0800, Vlastimil Babka wrote:
>> On 10/30/22 20:23, John Thomson wrote:
>> > On Tue, 13 Sep 2022, at 06:54, Feng Tang wrote:
>> >> kmalloc's API family is critical for mm, with one nature that it will
>> >> round up the request size to a fixed one (mostly power of 2). Say
>> >> when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
>> >> could be allocated, so in worst case, there is around 50% memory
>> >> space waste.
>> >
>> >
>> > I have a ralink mt7621 router running Openwrt, using the mips ZBOOT kernel, and appear to have bisected
>> > a very-nearly-clean kernel v6.1rc-2 boot issue to this commit.
>> > I have 3 commits atop 6.1-rc2: fix a ZBOOT compile error, use the Openwrt LZMA options,
>> > and enable DEBUG_ZBOOT for my platform. I am compiling my kernel within the Openwrt build system.
>> > No guarantees this is not due to something I am doing wrong, but any insight would be greatly appreciated.
>> >
>> >
>> > On UART, No indication of the (once extracted) kernel booting:
>> >
>> > transfer started ......................................... transfer ok, time=2.01s
>> > setting up elf image... OK
>> > jumping to kernel code
>> > zimage at: 80BA4100 810D4720
>> > Uncompressing Linux at load address 80001000
>> > Copy device tree to address 80B96EE0
>> > Now, booting the kernel...
>>
>> It's weird that the commit would cause no output so early, SLUB code is
>> run only later.
>
> I noticed your cmdline has console setting, could you enable the
> earlyprintk in cmdline like "earlyprintk=ttyS0,115200" etc to see
> if there is more message printed out.
Still nothing from vmlinux with earlykprint on UART unless revert.
>
> Also I want to confirm this is a boot failure and not only a boot
> message missing.
Yes, boot failure.
Network comes up automatically on successful boot. Not happening when no kernel UART
>
>> > Nothing follows
>> >
>> > 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") reverted, normal boot:
>> > transfer started ......................................... transfer ok, time=2.01s
>> > setting up elf image... OK
>> > jumping to kernel code
>> > zimage at: 80BA4100 810D47A4
>> > Uncompressing Linux at load address 80001000
>> > Copy device tree to address 80B96EE0
>> > Now, booting the kernel...
>> >
>> > [ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Oct 28 03:48:10 2022
>> > [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
>> > [ 0.000000] printk: bootconsole [early0] enabled
>> > [ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
>> > [ 0.000000] MIPS: machine is MikroTik RouterBOARD 760iGS
>> > [ 0.000000] Initrd not found or empty - disabling initrd
>> > [ 0.000000] VPE topology {2,2} total 4
>> > [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
>> > [ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
>> > [ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
>> > [ 0.000000] Zone ranges:
>> > [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
>> > [ 0.000000] HighMem empty
>> > [ 0.000000] Movable zone start for each node
>> > [ 0.000000] Early memory node ranges
>> > [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
>> > [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
>> > [ 0.000000] percpu: Embedded 11 pages/cpu s16064 r8192 d20800 u45056
>> > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
>> > [ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
>> > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
>> > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
>> > [ 0.000000] Writing ErrCtl register=00019146
>> > [ 0.000000] Readback ErrCtl register=00019146
>> > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> > [ 0.000000] Memory: 246220K/262144K available (7455K kernel code, 628K rwdata, 1308K rodata, 3524K init, 245K bss, 15924K reserved, 0K cma-reserved, 0K highmem)
>> > [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> > [ 0.000000] rcu: Hierarchical RCU implementation.
>> >
>> >
>> > boot continues as expected
>> >
>> >
>> > possibly relevant config options:
>> > grep -E '(SLUB|SLAB)' .config
>> > # SLAB allocator options
>> > # CONFIG_SLAB is not set
>> > CONFIG_SLUB=y
>> > CONFIG_SLAB_MERGE_DEFAULT=y
>> > # CONFIG_SLAB_FREELIST_RANDOM is not set
>> > # CONFIG_SLAB_FREELIST_HARDENED is not set
>> > # CONFIG_SLUB_STATS is not set
>> > CONFIG_SLUB_CPU_PARTIAL=y
>> > # end of SLAB allocator options
>> > # CONFIG_SLUB_DEBUG is not set
>>
>> Also not having CONFIG_SLUB_DEBUG enabled means most of the code the
>> patch/commit touches is not even active.
>> Could this be some miscompile or code layout change exposing some
>> different bug, hmm.
Yes, it could be.
>> Is it any different if you do enable CONFIG_SLUB_DEBUG ?
No change
>> Or change to CONFIG_SLAB? (that would be really weird if not)
This boots fine
> I haven't found any clue from the code either, and I compiled
> kernel with the config above and tested booting on an Alder-lake
> desktop and a QEMU, which boot fine.
>
> Could you provide the full kernel config and demsg (in compressed
> format if you think it's too big), so we can check more?
Attached
> Thanks,
> Feng
vmlinux is bigger, and entry point is larger (0x8074081c vs 0x807407dc revert vs 0x8073fcbc),
so that may be it? Or not?
revert + SLUB_DEBUG + SLUB_DEBUG_ON is bigger still, but does successfully boot.
vmlinux entry point is 0x8074705c
transfer started ......................................... transfer ok, time=2.01s
setting up elf image... OK
jumping to kernel code
zimage at: 80BA4100 810D6FA0
Uncompressing Linux at load address 80001000
Copy device tree to address 80B9EEE0
Now, booting the kernel...
[ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gc
c (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37)
#0 SMP Fri Oct 28 03:48:10 2022
I will keep looking.
Thank you,
--
John Thomson(�/�X���D�< Gd� B@ ����>��z6�إ誑��.������W�.��y��������ؑ�� J\F�� *<\'�"'A�: DX�Hr�l�Q��&.��fa��I��|7=m���z��O�8 ��.����=`ƀ
�7���,�#��\��� &
JBTP"��L�j��$4��w�dҡ��x�0K�@��<M#1P8
<T@�E�EZE�t�D�(�cFR�J
�� ő���6 ��0x�ȼ ��d[|����8�i1p"Fbo�0Mt:�� ���
"�����@(�
�ă
���|�1� >8
�4�I��&.��E@x�́EtٷipA��G�M�C�8Q\p���q�Ǘ�����Ħ=0�q�I �'z$(�=&� "<N\�L����M�� )ń�Y��z%X�X[��|[ٸNS�e���F��%0ߖE&�Ef���u�a��6� �&>�Ƒb��8���$�l�˸̺���>�i��
h`&�E�G�M�,�(��D
�D
̲�
�S��H�"yn���I�#�oˠ2�eN�8��8�0P�4K��,��<Q��e�
f��,<4X*�� ��(
`������ ܰБe�|tp���P< D!�\��`�M�16O��L�i�i��; �iO=
�@��(!*�_�I$����%�
����9��$0�)��D�� ���Xj8�4a����=
����`�q��Ç(IN�$0�����`%����c�E��H�d�бhPAA���E�q��'6��8N���E�C�0ia$��"T@��
�0�m�Ė�,�u0A�a�#�@e�ˠ�4�")H��B��Gˠ�e���)1�p�
�m]6�e Y�_���|��H��8 d��"P�M8�2�@��'���P!"*xN{D��8
��'��N��M$)K���q @��AbRH �qpɺ �Q h���i:�`�� � ��;�(�� D�I@ �>� *P>@M�4�CS�= �n��8�Ӿ� <2Z\��(
��� T��h�
p�PAIP4�;nk�(Z�e�
Ⱥ��5�&�@�$|�"(.N��f� 1R�'J�AO�(��,���1� ~�a� �p*��@
�c�a�KP���8�$EF� �� ����:Nx�e
���y��Y"� ��.��!ZG��)��D�I$��@�MS�4�D��<��dr�D��&J�$D�f���Y�q�(ʂe�x �4p���
�h�� LX��(�e�B��h.�Q9 )�b * �|`)>_��1��O\h��i_�>/�hT�2Ob�t�� A��8T`�*ع����q���~jh�|A�"}�EXea�M/��Y�FY�G�M$���s@ς
��l��H1�(�#�����ó0������|]&A� ����<R�D�w��u��o��i�/q|�!Z�(�-{hp[�)����$E����O[�!���ƃL"T(F�e_�̃
>m�""$�6�a��� ��2/��c�" FI\���ۺL'%6͛@���&|h Ea@�!����N��4��>�����?��#�P���&2K�>��1��4�)d瀐�sl�>(����a�}��5��4��(�<M�<K�R�M$IL��%�MHL��,�8qy���0N�@
�@�0��@
�0��@��(Q\��Dq�$ �KIP\6�"����M8��m\7��t�9��x� �&>K�J|�Eʳ�Gy���]{J&Pb�$6����$���Ez�%B� �\�_]S�Z,s���������0�M�B ����$�Y��x�c�>o��
f6!�0 'FH���͢��F��~ڄ'I�#M�� �#8�e���M��Z�m���<
���D��I�h� Ii���(�6�0|X %B
"���i�2�"�T`�E�C����Y�'�����H 8 n�LLx"&$$H8�u��j��Qb���(��4�A�)Ƞ'�������yY`a%���A�� `�@���x�$ jG����
��3�C;h �j"0*x����`�@�
<,���B��&����A[�m�n|d�l�����������<kZ����${�o�"��A��� �G6��`� S هA&���^�0��߆i� Jl��| 6N��o�"D��59�`m��a]�YPa��ₗ=����(��� �7
�AY�G����,lcXPa�n�d� m�8�B��LA�yY�"9xT8���0\��Dx�}�cA�����p�2 �0T4M\��BY��Y��������`�<���8�σ
��i�7�`$�$p`,�M�QQ��.�<M"� �]W�Ħ}Z�x�y�A�D�������Dx�@�.{t�=��@/Ӻo�>8�aߖ�́I<�e�r|d��I`!m<H �uٗE���p[�m �ˋ�H��1�ax����E|�AEY�������c�"�!��e
呚*x6Jl�d�u���A4.��� ~m2Af�
|P )
b\�
�&��q��@E���)Fl��H�L���#{<2�H�
���$�d�e�d�O۬�!l�q�,@:�fn�,���<�m�`��pF2<!�Y�"�B�H�e�Lb�:�m�C���4���92��<L�`��6�a�/�@����4/�4<h�-@h���v�� ��}�I�y�tXD�v!��}���� �@��p�u/4(�i��6�$�v|l.~�ݧm�W��@�r��.�&� 4�u 4=
f�6��"P�o�p(E��o�
��E���(P��4Y�k�G���>(
�! f��8��qP!��i��Q��
]� ��<� 2�i�C�L����u�EY��F�Q`��ϡ��Qp�����<��M6m�W��^��0N�<��D�'�Y )vPI`b�sV��[&���!� �&�M� |݂-�p�̓hp�m�K�� .���������Ӽ.S L6�B�@ ,E �)�5�Q�iEQ ��u���C��� �� @�a�7��D��
�a"-L�8PT �)�� ���"{ d��-@�D��@_$AW)�۷#�͏~s=�qQ��o��+<��ѫZ���Z��=��0����w�}����.�ֹ��+��r����m��S��?�>��<��(��S�o������W�R���b�_INU�)�m���mL�2�}џ��Tnh��d)�K��HEP���b�*G0����%���L�E��З�~~�k���}��T�z1t�+����e�2{�e�;�ߵ�|͓ۅ�(�%���M/?��_�KWl=55�Y;H�+}{C�r��S]뫵��l�K*�P�}-��2�:n֊Z>��7��7X[�Y�Wp��5o���ws�-��OU����OW��"��w���~���Y��y|��O�;�>�m���
b�H`�O��Z����6亸w�Wξ�]Vu!��Ч��f�^v����S�m{���5[M��Uh=n��������W^����]O�.���Z��d�����V���oڨ��kƜj\�5%�e�%-�_��l���f,i�}��a�y��!!�a�S��K���:O���>Wί9��]q�W��]y�vQ�����ۻ���/l��l�Z��]R
�e�uY���OW�\Z��[+��$k�+��Է_�a��]2�C�z[>��n��{Le;}0�hGƠ�*۽��u�֙�i���s�3�ݿ��45(d��~O[;�g�_q��Y�Na.���߫�ކ��5~�\^ݧ����+�����kڧĖK/ee���nX�����,TZ���$
<ܻ��\��Nr�Ok��5Y���v����vb0���{�-������>۪O�3xÔU��
]�i�W�>��g\'yp���H`��rl��[��^�{����_ꛯ�p�ĽQ���綺��sV]Z=+��u�����\���b�k}��璡���y3Cӭ��,YZ��^o�`��Yf��mT��[{�Ք�uw�q]H�?e��������k����������Y2.ʫ�j���,�*���(���}�_b�@Y�5�q��UY��Bk���b�g_���%�\[��S�&y���c�$�������s���ݗ�מߣ0�<l���\1��/t�!� x�\w���]���d- l�"��V��ߚY��댩��ˍ�t�BCf��K�1�ua9�v�����
��xnx`�6 �7���~��j+���R��7[�5Q���z��7�<ŚS�f{)�ɩ[=���\��_g�k��gSa��%��Q�s���RQ��,K���ĨY���,l�?��O+�Vox��I��q�^^��'�mqe�h�{;Q�en�K�.�
˫��KK�i���s�����=����T��{E���禮�_��9}�*�מ�Ǹ`��Q<qR4�:��@Y�rJ�RߟI|\2(�ֵ����,,*����kť�R��BV�5��sy}=9D���ө֔��+����\��g�����
��rY{CϕWKϬ�^�{�������/��W,9���Z�J��O��iU�2��D��0Q�x���YW?�Ǘ]��L�+��\!5
�m��Եϵ_�^���\2��8�2[��_}�d���e�o���WTe�R�\}�y���)����m�.�u�����,���\�l$�Y鷦�WNz2�Sj�& r�C\�j�+�C2[<�c�
*�鸈.�J�{�{%^�گO�)����ba�����j9���v���__[+�~e�l�F�uQ�v9Q�^S����Z�~Y/<%���F��O.?Kks�-���J��TV��u���,1���sM��`��[���ǿ.x��ﵞ(whPN)��{�-_T�W�Z����J�z����ފY-_3���~�m����S�^��^�?u�xNy�9��Rj����گ=�C���o�疧�A��~���VRSN�[�P� W��WL5�
�s������J�}i^�|��7H����/q��Y�� ,��D� ,Xե�����vڻ��szY7��,v�ԷQi�L�d�
�-�KCb}���)k�O]ӯx��Z
k�4��?�5�+}���Sr�]��yݍ��ɯ���wxMn�����i���+��s��{���O����ϗ�S�kO�Ms`����K�a���'���
\�|]��Y��7��o*��|jO���|}jOY�� W�O�ړE�=UjO���U�C��{�ߪ�5-Ֆ����1��;X_gm����e��K�WL����곞^�}��q��Jͧ���[���s]�+go�J�:���b�;#8��2�@��l������5����6X�K^����X����5�ğ���g����V�4���ke�%'{k��)����k\��]����%��
���~����]�@�q��*��wm�{I�R��+��m�@}wh��uhH�p�m�m�k�U��BZ ,�+����?���2��Ӡ���z��z*���,x�������6[H@@h����I�_����k���e�X�����˟�_��4�_��͖V�����������ߥ��^���ޟ��s�K��w�酘��+wYey��0J�U��9��m ,�F�?k�y]�9P ��I��H�V�u��Bd-��9��p�DBP�Y�&* Pb�<�0"�@�Ih,�c��0�(MB�HR$1
-��D
��!5��h�L��0#=�%(̆-�"FZ�H��H��@\�,�"m�%�O�Gr��i"�Q"�q R��iIz����(1Q�0L��Hj��q%b�E�$fa(J�,� -��H��}��A�MK�� ,1�A:�=M�@ ����zY�sd
�*Kk�E��fT֓S*��C�0�r����Q(Z"GR�'Z�%!*�
����Hr"�y"&� CI���p�D4XD!J"<(�����D��DL��'2N=�*<@#9PB� ���<���C���ykvf�P3;��a�2q��2P����x�3`�սսŧ%'���sg � ?3�������@����H�#��,�$a|!'�_|'���Y����̞Xb����A����
��
�cC�fY��L%�i��z7A�m|,kQ��@�E�V^�����q�%�"D9J!
˯pl���t7|�Է�ٙ�-gyb�/E�Q"^����22�
��YcS]�$Q%�y9N#y�(�疢WǦV��f½�HN�8P�0L��,Sl� � �i�'6�a���}��!�ņ���0����t�,��x�^N��l*?���t����}u\�"S�+�WU��L}�������U�U%!����ԯ&J���F3cl4��Lʫ�_[j��;��p�L��Q�#ɹjH�!�ݟQQ��0 ������P� /�⊶���6>2��,�d��S::��[�]y��m�
� o(Nl�Oa߫l�5��v�H�̲'v�5��}�3�}�l�����m��i�G�y��`�"'����{[�7��^���)�������o̐�l|���e���ۭ<�Fqv��ll���������v�z���aw�
iw�0��v��7S])o�M|�7d��\�����]���z��Y������s��m]�1�X���SGP� #���E����*KUse{�.�_��p �
��E�-��.��K��m�s�՟��
k�������F��i0���8�N&��ܓIέ�+uu埋������ع�l�b��X�/�ý��5�ƋJ�ޒa}����s������K;�\����~�}�����g٨Y�F�T����)�e����G����R�#�\^
�!1T�%BظLi^7�_�<�Y�l�,-���,Ѽ��J��?�6/��ρu�ĬXV��f���a����H6K�Ŵ[_s�
�`dɀ��A�Nc�J&ym���ڶsLe��ѯ.�$�CN�zdY��<��㬾w��rm1n4.�d��A&X���0l����9���V��n-�������(]��J[z5�ۑ��+,��b-��C��@O51���6��h�i}͒W�Pb�������1J�;���q�����}���e��6�eYQT)qƨD_���A2�7n��i'�(j�F�)Ml<'E�1[*��,l��ڰ�絪,��ֻ�����֢���^ې�����2��u�]|i�F��.b�0�O���wF��[Y�W��'�6��s�ͽ�Y�Wܰ���)1��7j�v��Divܧ�U��w��ԟT��R\^-�n��m��"Mym��NU�_ˮ���^�#����Hbb�<�ṟ���@r2`=���45�~�
_�e`�e��e�Y�\-����kJ,�l�\-_��EV,5KP5��D:k�t˫�u/k��6�|e��hyo�t�|�9;Ƶ���HY^�����fi�\D�̺�"���?���O��N���q���ފ�����k���ʫ��EW�ŭ.��1�l��A�ʒQ)M�hT��F�Y"ː�e�|Q��k%s��V�.u��Y�|���{�i��n
���ZS�<-�<e]Y?�Vx6S�^^/���W�����Y^��c��;H�}���+�����-��F&�Z+�K��[R����=�=��5�u�}���u��h0��_{��.f�/�Gr2\����ٻ�okMͧ}��F��)�?��3I�Ⱦ ��}�X�|�|yݳ���>%�Xҫr9�\��b�)q����P��rK��l��[F�Z �4F�i��HL��Vk�E�N��ۧ��7���w�{��YNҚ�����-���?��֬[��%���o}�i����6
�mX��G�a^�Bc�y%B�Tb��nK����i4�1 ���Ď�k�k]����?�M���^ې^��g���Y��{y�����'v�0�ĖAњ��v�;�=(�r���I�ny��ִ^������\��/�ԧŔ�>Kks�0�覿�,���E�x�/_��2ӉS����l��W;���2��ҟ~J#���ҕ{J{�y�
֧O����.e�����V����2��*?��-��ѵo��xn� rJ��p�K�3����9�ۚY+��zZn;�2;<
W�K.5���
�'�&�W�J# $�@��X�q��2i��f��jj*�&&¦x�����Y|�J�ܨQ0˶�����:�ďlӰo���{yw�|�ݕ���z�y]H��ݨwc�I0\��?���K�Nq����b��f����r�[b���[j�\�O=��V��p�� ���,9��z�5�T�R�f*�Nק�,�0�u]�u�i�N�2s�,�yp���M|0����Tq�O}�Bo�:�İ˼c=y���O=��s���c��O��a�q� �Id��J�
���b.sֳ~kaL}C���O��ɧ�W�O�~��\h\���z�����7�5}ʠ �����4ˍ��R� D�q�V�_:4�%�r�ۘ��'�������Vk\ ^���̕������[��ff�-S�Ap8�0�&�y�v�h��Էs?��vn.��u/��N�_����_%�m������b�������d�Wkoy}���ދ��WLݫ�n��ίy�}�6�}S�[�N+��}ͻ��~��xu�]�����>�=�l��UT��r˸��,V|�7��$�K�V^�?/����3L��_�opۻ�s7�op��<�ʯ=���W�%�����E��[+����\���g���W���tRMY�[���}����~�&�,:�~�������;Z��yd&