[PATCH 1/2] taskstats: set version in TGID exit notifications

Yiyang Chen posted 2 patches 3 days, 17 hours ago
[PATCH 1/2] taskstats: set version in TGID exit notifications
Posted by Yiyang Chen 3 days, 17 hours ago
delay accounting started populating taskstats records with a valid
version field via fill_pid() and fill_tgid().

Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
interface send tgid once") changed the TGID exit path to send the
cached signal->stats aggregate directly instead of building the outgoing
record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
accumulates accounting data and never initializes stats->version.

As a result, TGID exit notifications can reach userspace with
version == 0 even though PID exit notifications and
TASKSTATS_CMD_GET replies carry a valid taskstats version.

Set stats->version = TASKSTATS_VERSION after copying the cached TGID
aggregate into the outgoing netlink payload so all taskstats records are
self-describing again.

Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")
Cc: stable@vger.kernel.org
Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
---
 kernel/taskstats.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 0cd680ccc7e5..73bd6a6a7893 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -649,6 +649,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
 		goto err;
 
 	memcpy(stats, tsk->signal->stats, sizeof(*stats));
+	stats->version = TASKSTATS_VERSION;
 
 send:
 	send_cpu_listeners(rep_skb, listeners);
-- 
2.43.0
Re: [PATCH 1/2] taskstats: set version in TGID exit notifications
Posted by Andrew Morton 2 days, 14 hours ago
On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@gmail.com> wrote:

> delay accounting started populating taskstats records with a valid
> version field via fill_pid() and fill_tgid().
> 
> Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
> interface send tgid once") changed the TGID exit path to send the
> cached signal->stats aggregate directly instead of building the outgoing
> record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
> accumulates accounting data and never initializes stats->version.
> 
> As a result, TGID exit notifications can reach userspace with
> version == 0 even though PID exit notifications and
> TASKSTATS_CMD_GET replies carry a valid taskstats version.
> 
> Set stats->version = TASKSTATS_VERSION after copying the cached TGID
> aggregate into the outgoing netlink payload so all taskstats records are
> self-describing again.
> 
> Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")

Thanks, lol, 20 years ago.

Can you explain how others can trigger this?  Some combination of
steps which results in the bad output?

> Cc: stable@vger.kernel.org

Is there a chance of breaking existing userspace here?  Some existing
userspace code which is expecting 0 here and will get surprised by this
change?

> --- a/kernel/taskstats.c
> +++ b/kernel/taskstats.c
> @@ -649,6 +649,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
>  		goto err;
>  
>  	memcpy(stats, tsk->signal->stats, sizeof(*stats));
> +	stats->version = TASKSTATS_VERSION;
>  
>  send:
>  	send_cpu_listeners(rep_skb, listeners);
Re: [PATCH 1/2] taskstats: set version in TGID exit notifications
Posted by Yiyang Chen 1 day, 19 hours ago
On Tue, Mar 31, 2026 at 5:29 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@gmail.com> wrote:
>
> > delay accounting started populating taskstats records with a valid
> > version field via fill_pid() and fill_tgid().
> >
> > Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
> > interface send tgid once") changed the TGID exit path to send the
> > cached signal->stats aggregate directly instead of building the outgoing
> > record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
> > accumulates accounting data and never initializes stats->version.
> >
> > As a result, TGID exit notifications can reach userspace with
> > version == 0 even though PID exit notifications and
> > TASKSTATS_CMD_GET replies carry a valid taskstats version.
> >
> > Set stats->version = TASKSTATS_VERSION after copying the cached TGID
> > aggregate into the outgoing netlink payload so all taskstats records are
> > self-describing again.
> >
> > Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")
>
> Thanks, lol, 20 years ago.
>
> Can you explain how others can trigger this?  Some combination of
> steps which results in the bad output?

Yes. This is easy to reproduce with `tools/accounting/getdelays.c`.

I have a small follow-up patch for that tool which:
1. increases the receive buffer/message size so the pid+tgid combined exit
notification is not dropped/truncated
2. prints `stats->version`.

With that patch, the reproducer is:

  Terminal 1:
    ./getdelays -d -v -l -m 0

  Terminal 2:
    taskset -c 0 python3 -c 'import threading,time; t=threading.Thread(target=time.sleep,args=(0.1,)); t.start(); t.join()'

That produces both PID and TGID exit notifications for the same process. The PID
exit record reports a valid taskstats version, while the TGID exit record reports
`version 0`.

>
> > Cc: stable@vger.kernel.org
>
> Is there a chance of breaking existing userspace here?  Some existing
> userspace code which is expecting 0 here and will get surprised by this
> change?

In practice, userspace uses `taskstats.version` to decide which fields are
present in `struct taskstats`, i.e. as a schema/version discriminator. A zero
version does not describe a valid taskstats layout, so it is hard to see how
userspace could use `0` as a meaningful or useful distinction here.

So I do not think fixing this in mainline should break sensible userspace. It
just restores consistency of the taskstats version semantics across
`TASKSTATS_CMD_GET`, PID exit notifications, and TGID exit notifications.

To be honest, I'm also not sure if this should backport to stable. But I think
mainline should still fix it.

Thanks,
Yiyang