|------|---------|-----------|-------|-----------|--------|---------|----------------|
From: Joel Granados <j.granados@samsung.com>
What?
Remove the loop stopping criteria check for ->procname == NULL, the
array size calculation based on sentinels and the superfluous sentinels
created within the sysctl infrastructure. None are needed as they are
now solely based on ctl_table ARRAY_SIZE. Finally, sentinels that have
been added in recent releases (not present for the original patchsets)
were removed.
Why?
By removing the sysctl sentinel elements we avoid kernel bloat as
ctl_table arrays get moved out of kernel/sysctl.c into their own
respective subsystems. This move was started long ago to avoid merge
conflicts; the sentinel removal bit is to avoid bloating the kernel by
one element as arrays moved out. It includes work in /arch [1], /dirver
[2], fs/ [3], kernel [4], net/ [5], mm/ [6], security/ [6], io_uring [6]
and other misc directories [6]. It will reduce the overall build time
size of the kernel and run time memory bloat by about ~64 bytes per
declared ctl_table array (more info here [0]).
Testing:
* Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh)
* Ran this through 0-day with no errors or warnings
Savings in vmlinux:
A total of 64 bytes per sentinel is saved after removal. Here is the
aggregated savings for all the removal patchsets ([1,2,3,4,5,6]) for
the x86_64 arch (actual savings will depend on kernel conf):
|------|---------|-----------|-------|-----------|--------|---------|----------------|
|dir | arch[1] | driver[2] | fs[3] | kernel[4] | net[5] | misc[6] | Total(approx.) |
|------|---------|-----------|-------|-----------|--------|---------|----------------|
|Bytes | 192 | 2432 | 1920 | 1984 | 3976 | 963 | 11467 |
|------|---------|-----------|-------|-----------|--------|---------|----------------|
Savings in allocated memory:
The estimated savings during boot for config [3] are 6272 bytes. See
[7] for how to measure it.
Comments/feedback greatly appreciated
Best
Joel
[0] Links Related to the ctl_table sentinel removal:
* Good summaries from Luis:
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
https://lore.kernel.org/all/ZMFizKFkVxUFtSqa@bombadil.infradead.org/
* Patches adjusting sysctl register calls:
https://lore.kernel.org/all/20230302204612.782387-1-mcgrof@kernel.org/
https://lore.kernel.org/all/20230302202826.776286-1-mcgrof@kernel.org/
* Discussions about expectations and approach
https://lore.kernel.org/all/20230321130908.6972-1-frank.li@vivo.com
https://lore.kernel.org/all/20220220060626.15885-1-tangmeng@uniontech.com
[1] https://lore.kernel.org/20231002-jag-sysctl_remove_empty_elem_arch-v3-0-606da2840a7a@samsung.com
[2] https://lore.kernel.org/20231002-jag-sysctl_remove_empty_elem_drivers-v2-0-02dd0d46f71e@samsung.com
[3] https://lore.kernel.org/20231121-jag-sysctl_remove_empty_elem_fs-v2-0-39eab723a034@samsung.com
[4] https://lore.kernel.org/20240328-jag-sysctl_remove_empty_elem_kernel-v3-0-285d273912fe@samsung.com
[5] https://lore.kernel.org/20240501-jag-sysctl_remset_net-v6-0-370b702b6b4a@samsung.com
[6] https://lore.kernel.org/20240328-jag-sysctl_remset_misc-v1-0-47c1463b3af2@samsung.com
[7]
To measure the in memory savings apply this on top of this patchset.
"
diff --git i/fs/proc/proc_sysctl.c w/fs/proc/proc_sysctl.c
index a6aeaa928dd2..6ca5341bcddf 100644
--- i/fs/proc/proc_sysctl.c
+++ w/fs/proc/proc_sysctl.c
@@ -963,6 +963,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
table[0].procname = new_name;
table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO;
init_header(&new->header, set->dir.header.root, set, node, table, 1);
+ printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table));
return new;
}
@@ -1186,6 +1187,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
link_name += len;
link++;
}
+ printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table));
init_header(links, dir->header.root, dir->header.set, node, link_table,
head->ctl_table_size);
links->nreg = head->ctl_table_size;
"
and then run the following bash script in the kernel:
accum=0
for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do
accum=$(calc "$accum + $n")
done
echo $accum
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
Joel Granados (8):
locking: Remove superfluous sentinel element from kern_lockdep_table
mm profiling: Remove superfluous sentinel element from ctl_table
sysctl: Remove check for sentinel element in ctl_table arrays
sysctl: Replace nr_entries with ctl_table_size in new_links
sysctl: Remove superfluous empty allocations from sysctl internals
sysctl: Remove "child" sysctl code comments
sysctl: Remove ctl_table sentinel code comments
sysctl: Warn on an empty procname element
fs/proc/proc_sysctl.c | 50 +++++++++++++++++++++---------------------------
kernel/locking/lockdep.c | 1 -
lib/alloc_tag.c | 1 -
net/sysctl_net.c | 11 ++---------
4 files changed, 24 insertions(+), 39 deletions(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240603-jag-sysctl_remset-4afb8c723003
Best regards,
--
Joel Granados <j.granados@samsung.com>
From: Joel Granados <j.granados@samsung.com>
This commit is part of a greater effort to remove all
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
kernel/locking/lockdep.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 151bd3de5936..726b22ce7d0b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -97,7 +97,6 @@ static struct ctl_table kern_lockdep_table[] = {
.proc_handler = proc_dointvec,
},
#endif /* CONFIG_LOCK_STAT */
- { }
};
static __init int kernel_lockdep_sysctls_init(void)
--
2.43.0
From: Joel Granados <j.granados@samsung.com>
This commit is part of a greater effort to remove all empty elements at
the end of the ctl_table arrays (sentinels) which will reduce the
overall build time size of the kernel and run time memory bloat by ~64
bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Removed sentinel from memory_allocation_profiling_sysctls
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
lib/alloc_tag.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 11ed973ac359..7293cd54d1b1 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -238,7 +238,6 @@ static struct ctl_table memory_allocation_profiling_sysctls[] = {
#endif
.proc_handler = proc_do_static_key,
},
- { }
};
static int __init alloc_tag_init(void)
--
2.43.0
From: Joel Granados <j.granados@samsung.com>
Use ARRAY_SIZE exclusively by removing the check to ->procname in the
stopping criteria of the loops traversing ctl_table arrays. This commit
finalizes the removal of the sentinel elements at the end of ctl_table
arrays which reduces the build time size and run time memory bloat by
~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Remove the entry->procname evaluation from the for loop stopping
criteria in sysctl and sysctl_net.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
fs/proc/proc_sysctl.c | 2 +-
net/sysctl_net.c | 11 ++---------
2 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b1c2c0b82116..d4ba7ad9dbe0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -21,7 +21,7 @@
#define list_for_each_table_entry(entry, header) \
entry = header->ctl_table; \
- for (size_t i = 0 ; i < header->ctl_table_size && entry->procname; ++i, entry++)
+ for (size_t i = 0 ; i < header->ctl_table_size; ++i, entry++)
static const struct dentry_operations proc_sys_dentry_operations;
static const struct file_operations proc_sys_file_operations;
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index f5017012a049..19e8048241ba 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -127,7 +127,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path,
pr_debug("Registering net sysctl (net %p): %s\n", net, path);
ent = table;
- for (size_t i = 0; i < table_size && ent->procname; ent++, i++) {
+ for (size_t i = 0; i < table_size; ent++, i++) {
unsigned long addr;
const char *where;
@@ -165,17 +165,10 @@ struct ctl_table_header *register_net_sysctl_sz(struct net *net,
struct ctl_table *table,
size_t table_size)
{
- int count;
- struct ctl_table *entry;
-
if (!net_eq(net, &init_net))
ensure_safe_net_sysctl(net, path, table, table_size);
- entry = table;
- for (count = 0 ; count < table_size && entry->procname; entry++, count++)
- ;
-
- return __register_sysctl_table(&net->sysctls, path, table, count);
+ return __register_sysctl_table(&net->sysctls, path, table, table_size);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_sz);
--
2.43.0
On Tue, Jun 04, 2024 at 08:29:21AM +0200, Joel Granados via B4 Relay wrote:
> From: Joel Granados <j.granados@samsung.com>
> --- a/net/sysctl_net.c
> +++ b/net/sysctl_net.c
> @@ -127,7 +127,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path,
>
> pr_debug("Registering net sysctl (net %p): %s\n", net, path);
> ent = table;
> - for (size_t i = 0; i < table_size && ent->procname; ent++, i++) {
> + for (size_t i = 0; i < table_size; ent++, i++) {
> unsigned long addr;
> const char *where;
>
> @@ -165,17 +165,10 @@ struct ctl_table_header *register_net_sysctl_sz(struct net *net,
> struct ctl_table *table,
> size_t table_size)
> {
> - int count;
> - struct ctl_table *entry;
> -
> if (!net_eq(net, &init_net))
> ensure_safe_net_sysctl(net, path, table, table_size);
>
> - entry = table;
> - for (count = 0 ; count < table_size && entry->procname; entry++, count++)
> - ;
> -
> - return __register_sysctl_table(&net->sysctls, path, table, count);
> + return __register_sysctl_table(&net->sysctls, path, table, table_size);
> }
> EXPORT_SYMBOL_GPL(register_net_sysctl_sz);
>
Given that this is a very small network related patch, I'm queueing this
in through the sysctl-next branch. Please let me know if you would
rather take it through the networking workflow.
Best
--
Joel Granados
From: Joel Granados <j.granados@samsung.com>
The number of ctl_table entries (nr_entries) calculation was previously
based on the ctl_table_size and the sentinel element. Since the
sentinels have been removed, we remove the calculation and just use the
ctl_table_size from the ctl_table_header.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
fs/proc/proc_sysctl.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index d4ba7ad9dbe0..1babb54347a4 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1154,18 +1154,16 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
struct ctl_table_header *links;
struct ctl_node *node;
char *link_name;
- int nr_entries, name_bytes;
+ int name_bytes;
name_bytes = 0;
- nr_entries = 0;
list_for_each_table_entry(entry, head) {
- nr_entries++;
name_bytes += strlen(entry->procname) + 1;
}
links = kzalloc(sizeof(struct ctl_table_header) +
- sizeof(struct ctl_node)*nr_entries +
- sizeof(struct ctl_table)*(nr_entries + 1) +
+ sizeof(struct ctl_node)*head->ctl_table_size +
+ sizeof(struct ctl_table)*(head->ctl_table_size + 1) +
name_bytes,
GFP_KERNEL);
@@ -1173,8 +1171,8 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
return NULL;
node = (struct ctl_node *)(links + 1);
- link_table = (struct ctl_table *)(node + nr_entries);
- link_name = (char *)&link_table[nr_entries + 1];
+ link_table = (struct ctl_table *)(node + head->ctl_table_size);
+ link_name = (char *)&link_table[head->ctl_table_size + 1];
link = link_table;
list_for_each_table_entry(entry, head) {
@@ -1188,7 +1186,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
}
init_header(links, dir->header.root, dir->header.set, node, link_table,
head->ctl_table_size);
- links->nreg = nr_entries;
+ links->nreg = head->ctl_table_size;
return links;
}
--
2.43.0
From: Joel Granados <j.granados@samsung.com>
Now that the sentinels have been removed from ctl_table arrays, there is
no need to artificially append empty ctl_table elements at ctl_table
registration. Remove superfluous empty allocation from new_dir and
new_links.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
fs/proc/proc_sysctl.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 1babb54347a4..29d40f0ff3ff 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -951,14 +951,14 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
char *new_name;
new = kzalloc(sizeof(*new) + sizeof(struct ctl_node) +
- sizeof(struct ctl_table)*2 + namelen + 1,
+ sizeof(struct ctl_table) + namelen + 1,
GFP_KERNEL);
if (!new)
return NULL;
node = (struct ctl_node *)(new + 1);
table = (struct ctl_table *)(node + 1);
- new_name = (char *)(table + 2);
+ new_name = (char *)(table + 1);
memcpy(new_name, name, namelen);
table[0].procname = new_name;
table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO;
@@ -1163,7 +1163,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
links = kzalloc(sizeof(struct ctl_table_header) +
sizeof(struct ctl_node)*head->ctl_table_size +
- sizeof(struct ctl_table)*(head->ctl_table_size + 1) +
+ sizeof(struct ctl_table)*head->ctl_table_size +
name_bytes,
GFP_KERNEL);
@@ -1172,7 +1172,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, struct ctl_table_
node = (struct ctl_node *)(links + 1);
link_table = (struct ctl_table *)(node + head->ctl_table_size);
- link_name = (char *)&link_table[head->ctl_table_size + 1];
+ link_name = (char *)(link_table + head->ctl_table_size);
link = link_table;
list_for_each_table_entry(entry, head) {
--
2.43.0
From: Joel Granados <j.granados@samsung.com>
Erase the code comments mentioning "child" that were forgotten when the
child element was removed in commit 2f2665c13af48 ("sysctl: replace
child with an enumeration").
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
fs/proc/proc_sysctl.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 29d40f0ff3ff..2ef23d2c3496 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1298,28 +1298,23 @@ static struct ctl_dir *sysctl_mkdir_p(struct ctl_dir *dir, const char *path)
* __register_sysctl_table - register a leaf sysctl table
* @set: Sysctl tree to register on
* @path: The path to the directory the sysctl table is in.
- * @table: the top-level table structure without any child. This table
- * should not be free'd after registration. So it should not be
- * used on stack. It can either be a global or dynamically allocated
- * by the caller and free'd later after sysctl unregistration.
+ *
+ * @table: the top-level table structure. This table should not be free'd
+ * after registration. So it should not be used on stack. It can either
+ * be a global or dynamically allocated by the caller and free'd later
+ * after sysctl unregistration.
* @table_size : The number of elements in table
*
* Register a sysctl table hierarchy. @table should be a filled in ctl_table
* array. A completely 0 filled entry terminates the table.
*
* The members of the &struct ctl_table structure are used as follows:
- *
* procname - the name of the sysctl file under /proc/sys. Set to %NULL to not
* enter a sysctl file
- *
- * data - a pointer to data for use by proc_handler
- *
- * maxlen - the maximum size in bytes of the data
- *
- * mode - the file permissions for the /proc/sys file
- *
- * child - must be %NULL.
- *
+ * data - a pointer to data for use by proc_handler
+ * maxlen - the maximum size in bytes of the data
+ * mode - the file permissions for the /proc/sys file
+ * type - Defines the target type (described in struct definition)
* proc_handler - the text handler routine (described below)
*
* extra1, extra2 - extra pointers usable by the proc handler routines
@@ -1327,8 +1322,7 @@ static struct ctl_dir *sysctl_mkdir_p(struct ctl_dir *dir, const char *path)
* [0] https://lkml.kernel.org/87zgpte9o4.fsf@email.froward.int.ebiederm.org
*
* Leaf nodes in the sysctl tree will be represented by a single file
- * under /proc; non-leaf nodes (where child is not NULL) are not allowed,
- * sysctl_check_table() verifies this.
+ * under /proc; non-leaf nodes are not allowed.
*
* There must be a proc_handler routine for any terminal nodes.
* Several default handlers are available to cover common cases -
--
2.43.0
From: Joel Granados <j.granados@samsung.com>
Remove the mention of a "zero terminated entry" from the
__register_sysctl_table function doc.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
fs/proc/proc_sysctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 2ef23d2c3496..806700b70dea 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1306,7 +1306,7 @@ static struct ctl_dir *sysctl_mkdir_p(struct ctl_dir *dir, const char *path)
* @table_size : The number of elements in table
*
* Register a sysctl table hierarchy. @table should be a filled in ctl_table
- * array. A completely 0 filled entry terminates the table.
+ * array.
*
* The members of the &struct ctl_table structure are used as follows:
* procname - the name of the sysctl file under /proc/sys. Set to %NULL to not
--
2.43.0
From: Joel Granados <j.granados@samsung.com>
Add a pr_err warning in case a ctl_table is registered with a sentinel
element containing a NULL procname.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
fs/proc/proc_sysctl.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 806700b70dea..f65098de5fcb 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1119,6 +1119,8 @@ static int sysctl_check_table(const char *path, struct ctl_table_header *header)
struct ctl_table *entry;
int err = 0;
list_for_each_table_entry(entry, header) {
+ if (!entry->procname)
+ err |= sysctl_err(path, entry, "procname is null");
if ((entry->proc_handler == proc_dostring) ||
(entry->proc_handler == proc_dobool) ||
(entry->proc_handler == proc_dointvec) ||
--
2.43.0
On Tue, Jun 04, 2024 at 08:29:26AM +0200, Joel Granados via B4 Relay wrote:
> From: Joel Granados <j.granados@samsung.com>
>
> Add a pr_err warning in case a ctl_table is registered with a sentinel
> element containing a NULL procname.
>
> Signed-off-by: Joel Granados <j.granados@samsung.com>
> ---
> fs/proc/proc_sysctl.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 806700b70dea..f65098de5fcb 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -1119,6 +1119,8 @@ static int sysctl_check_table(const char *path, struct ctl_table_header *header)
> struct ctl_table *entry;
> int err = 0;
> list_for_each_table_entry(entry, header) {
> + if (!entry->procname)
> + err |= sysctl_err(path, entry, "procname is null");
> if ((entry->proc_handler == proc_dostring) ||
> (entry->proc_handler == proc_dobool) ||
> (entry->proc_handler == proc_dointvec) ||
>
> --
> 2.43.0
>
>
To add to this check, I sent out a static analysis check to smatch in
such a way that a warning will be printed out if there is a ctl_table
element with a procname or prog_handler that are NULL. You can see it
here https://lore.kernel.org/all/20240614-master-v1-1-c652f5aa15fb@samsung.com/
Best
--
Joel Granados
© 2016 - 2026 Red Hat, Inc.