fs/sysfs/file.c | 4 ++++ 1 file changed, 4 insertions(+)
sysfs_kf_seq_show() defends against buggy show() callbacks that return
larger than PAGE_SIZE by clamping the value and printing a warning.
sysfs_kf_read(), the prealloc variant, has no such defense.
The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
whose show() callbacks are well-behaved, so this is hardening against
future drivers doing foolish things and out-of-tree code doing even more
foolish things.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: NeilBrown <neil@brown.name>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/sysfs/file.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 5709cede1d75..25b44fe171a3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -120,6 +120,10 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
len = ops->show(kobj, of->kn->priv, buf);
if (len < 0)
return len;
+ if (len >= (ssize_t)PAGE_SIZE) {
+ printk("fill_read_buffer: %pS returned bad count\n", ops->show);
+ len = PAGE_SIZE - 1;
+ }
if (pos) {
if (len <= pos)
return 0;
--
2.54.0
On Wed, 20 May 2026 15:07:01 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> sysfs_kf_seq_show() defends against buggy show() callbacks that return
> larger than PAGE_SIZE by clamping the value and printing a warning.
> sysfs_kf_read(), the prealloc variant, has no such defense.
>
> The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> whose show() callbacks are well-behaved, so this is hardening against
> future drivers doing foolish things and out-of-tree code doing even more
> foolish things.
What is the rational for it using PREALLOC?
From the code it looks like it could set a buffer size, but it doesn't seem to.
Which means there is a kmalloc(PAGE_SIZE + 1) in there.
If it did set a size, the check below would be wrong.
From the look of the fragment below it is just passed the address of the
pre-allocated buffer.
That also means it can't use sysfs_emit() because that relies on a
page-aligned buffer.
Also I suspect that kmalloc() grabbing a buffer from the per-cpu freelist
is cheaper than the mutex required to lock access to the preallocted buffer.
It would be 'nice' to change the type of the buffer passed to the show()
functions and to sysfs_emit() to something other than 'char *'.
Even if the initial change is just a typdef for 'char *' so that
you can tell which functions can call sysfs_emit().
At the moment it is pretty hard to actually decide whether it is safe
to change the printf() to sysfs_emit().
If all the show() functions are expected to include a terminating '\n'
then the wrapper code could add one if missing.
That would save a lot of strcat(buf, '\n'); return strlen(buf); sequences.
I also think (bad for me at 11pm) that the buffer should be 4k not PAGE_SIZE.
-- David
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: NeilBrown <neil@brown.name>
> Cc: Tejun Heo <tj@kernel.org>
> Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> fs/sysfs/file.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index 5709cede1d75..25b44fe171a3 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -120,6 +120,10 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
> len = ops->show(kobj, of->kn->priv, buf);
> if (len < 0)
> return len;
> + if (len >= (ssize_t)PAGE_SIZE) {
> + printk("fill_read_buffer: %pS returned bad count\n", ops->show);
> + len = PAGE_SIZE - 1;
> + }
> if (pos) {
> if (len <= pos)
> return 0;
On Wed, May 20, 2026 at 11:11:58PM +0100, David Laight wrote: > On Wed, 20 May 2026 15:07:01 +0200 > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > sysfs_kf_seq_show() defends against buggy show() callbacks that return > > larger than PAGE_SIZE by clamping the value and printing a warning. > > sysfs_kf_read(), the prealloc variant, has no such defense. > > > > The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c, > > whose show() callbacks are well-behaved, so this is hardening against > > future drivers doing foolish things and out-of-tree code doing even more > > foolish things. > > What is the rational for it using PREALLOC? No idea, you might want to dig to find the commit that did this. thanks, greg k-h
On Thu, 21 May 2026 08:19:16 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> On Wed, May 20, 2026 at 11:11:58PM +0100, David Laight wrote:
> > On Wed, 20 May 2026 15:07:01 +0200
> > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> >
> > > sysfs_kf_seq_show() defends against buggy show() callbacks that return
> > > larger than PAGE_SIZE by clamping the value and printing a warning.
> > > sysfs_kf_read(), the prealloc variant, has no such defense.
> > >
> > > The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> > > whose show() callbacks are well-behaved, so this is hardening against
> > > future drivers doing foolish things and out-of-tree code doing even more
> > > foolish things.
> >
> > What is the rational for it using PREALLOC?
>
> No idea, you might want to dig to find the commit that did this.
This one:
commit 750f199ee8b578062341e6ddfe36c59ac8ff2dcb
Author: NeilBrown <neil@brown.name>
Date: Tue Sep 30 08:53:05 2014 +1000
md: mark some attributes as pre-alloc
Since __ATTR_PREALLOC was introduced in v3.19-rc1~78^2~18
it can now be used by md.
This ensure that writing to these sysfs attributes will never
block due to a memory allocation.
Such blocking could become a deadlock if mdmon is trying to
reconfigure an array after a failure prior to re-enabling writes.
That might be better handled with a flag that changes the kmalloc()
to NOIO and falls back onto a global preallocated page in the
normal path.
It would certainly let a lot of code be deleted (always good).
The atomic_write_len would then just be a max_write_len
(and maybe renamed).
The only real use is the 'cgroup' code that wants to support writes
that are larger than 4k.
Currently, if atomic_write_len is zero writes are truncated to PAGE_SIZE,
if non-zero overlong writes are rejected.
AFAICT the file offset is never checked - so all writes are (sort of) appends.
I'm not at all sure that makes sense.
Possibly all overlong writes should be rejected.
-- David
>
> thanks,
>
> greg k-h
Hello, Two nits: - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp to that than hardcode PAGE_SIZE. - pr_warn() instead of bare printk()? Thanks. -- tejun
On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote: > Hello, > > Two nits: > > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp > to that than hardcode PAGE_SIZE. Where is that check at? And sysfs_kf_seq_show() doesn't check it this way, should that change? > - pr_warn() instead of bare printk()? This is copying the message in sysfs_kf_seq_show() that has the same exact format. I will send a follow-on patch to make both the same. thanks, greg k-h
Hello, On Thu, May 21, 2026 at 08:18:32AM +0200, Greg Kroah-Hartman wrote: > On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote: > > Hello, > > > > Two nits: > > > > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp > > to that than hardcode PAGE_SIZE. > > Where is that check at? And sysfs_kf_seq_show() doesn't check it this > way, should that change? In kernfs_fop_open(), if ops->prealloc, we allocate of->prealloc_buf to the size of of->atomic_write_len ?: PAGE_SIZE. Maybe we should just update of->atomic_write_len. I think the intention was to allow >PAGE_SIZE atomic content. seq_file now does dynamic buffer resizing, so this may not be necessary. There's no clean interface to preemptly set the minimum buffer size right now but that probably is the better direction. Then, we can just always go through seq_file. Thanks. -- tejun
On Thu, 21 May 2026 06:18:09 -1000 Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Thu, May 21, 2026 at 08:18:32AM +0200, Greg Kroah-Hartman wrote: > > On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote: > > > Hello, > > > > > > Two nits: > > > > > > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp > > > to that than hardcode PAGE_SIZE. > > > > Where is that check at? And sysfs_kf_seq_show() doesn't check it this > > way, should that change? > > In kernfs_fop_open(), if ops->prealloc, we allocate of->prealloc_buf to the > size of of->atomic_write_len ?: PAGE_SIZE. Maybe we should just update > of->atomic_write_len. I think the intention was to allow >PAGE_SIZE atomic > content. seq_file now does dynamic buffer resizing, so this may not be > necessary. There's no clean interface to preemptly set the minimum buffer > size right now but that probably is the better direction. Then, we can just > always go through seq_file. Except that the buffer size is always PAGE_SIZE in the prealloc paths. A larger size is used by the cgroup code - but that is really for writes. There seems to have been a lot of rework to all this code around 3.13. I can't quite see what it looked like before. But I think the write code should reject writes longer than of->atomic_write_len ?: PAGE_SIZE because all writes are treated as having 'offset zero'. If op->atomic_write_len is non-zero overlong writes are rejected. If op->atomic_write_len is zero then writes are truncated and userspace is likely to do a second write for the remainder - which won't DTRT. The naming suggests that loop existed in the kernel at some point, but I can't find that version. The 'prealloc' code was added later and reused 'atomic_write_len' for the buffer size. But the only users leave it as default - allowing PAGE_SIZE requests which require a kmalloc(PAGE_SIZE + 1) (aka 2 pages) because of the trailing '\0' added to writes. The 'prealloc' code is all about trying to avoid a kmalloc() that might fail and cause a deadlock. But I suspect the daemon doing the system call wont actually get to that kmalloc() if memory to so short it fails to allocate the IO buffer. Even if that were a problem an option to make that path 'try harder' to kmalloc() the buffer would be better than preallocating pairs of pages for each of the six? cgroup 'files'. (It could even fall back to a single preallocated buffer for all users that deem it important.) Reading this code is hard because of the tangled mess between kernfs/file.c and sysfs/file.c. -- David > > Thanks. >
On Thu, 21 May 2026 08:18:32 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote: > > Hello, > > > > Two nits: > > > > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp > > to that than hardcode PAGE_SIZE. > > Where is that check at? And sysfs_kf_seq_show() doesn't check it this > way, should that change? It would be better to always fix the 'show' buffer at PAGE_SIZE (or 4k). The only place that uses is different value is the cgroup code and I think it needs to support longer writes (100 + 6 * NR_CPUS). I've not looked at how it handles the matching reads - I presume they exist. If the data is binary, or it uses seq_printf() then it can just support a longer dynamically allocated buffer. -- David > > > - pr_warn() instead of bare printk()? > > This is copying the message in sysfs_kf_seq_show() that has the same > exact format. I will send a follow-on patch to make both the same. > > thanks, > > greg k-h >
On Wed May 20, 2026 at 3:07 PM CEST, Greg Kroah-Hartman wrote:
> sysfs_kf_seq_show() defends against buggy show() callbacks that return
> larger than PAGE_SIZE by clamping the value and printing a warning.
> sysfs_kf_read(), the prealloc variant, has no such defense.
>
> The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> whose show() callbacks are well-behaved, so this is hardening against
> future drivers doing foolish things and out-of-tree code doing even more
> foolish things.
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: NeilBrown <neil@brown.name>
> Cc: Tejun Heo <tj@kernel.org>
> Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
On Wed, May 20, 2026 at 3:06 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> sysfs_kf_seq_show() defends against buggy show() callbacks that return
> larger than PAGE_SIZE by clamping the value and printing a warning.
> sysfs_kf_read(), the prealloc variant, has no such defense.
>
> The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> whose show() callbacks are well-behaved, so this is hardening against
> future drivers doing foolish things and out-of-tree code doing even more
> foolish things.
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: NeilBrown <neil@brown.name>
> Cc: Tejun Heo <tj@kernel.org>
> Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No issues found, so
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> ---
> fs/sysfs/file.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index 5709cede1d75..25b44fe171a3 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -120,6 +120,10 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
> len = ops->show(kobj, of->kn->priv, buf);
> if (len < 0)
> return len;
> + if (len >= (ssize_t)PAGE_SIZE) {
> + printk("fill_read_buffer: %pS returned bad count\n", ops->show);
> + len = PAGE_SIZE - 1;
> + }
> if (pos) {
> if (len <= pos)
> return 0;
> --
> 2.54.0
>
© 2016 - 2026 Red Hat, Inc.