kernel: fix acct.c first test openai codex

[PATCH] kernel: fix acct.c first test openai codex

Posted by Ozgur Kara 8 months, 4 weeks ago

From: Ozgur Karatas <ozgur@goosey.org>

Hello,

i want to try out the openai codex and it seemed like a logical
process so with rcu_read_lock() a protection is started but we dont
call rcu_read_unlock(); it has to be called to end rcu read lock.

I guess, this means rcu stays open forever and data structures are not
cleaned which causes performance degradation.

Regards

Ozgur

Reported-by: Ozgur Karatas <ozgur@goosey.org>
---
 kernel/acct.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/acct.c b/kernel/acct.c
index 6520baa13669..8a2b75063d16 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -330,6 +330,7 @@ SYSCALL_DEFINE1(acct, const char __user *, name)
        } else {
                rcu_read_lock();
                pin_kill(task_active_pid_ns(current)->bacct);
+               rcu_read_unlock();
        }

        return error;
--
2.39.5

Re: [PATCH] kernel: fix acct.c first test openai codex

Posted by Al Viro 8 months, 4 weeks ago

On Fri, May 16, 2025 at 11:15:49PM +0000, Ozgur Kara wrote:
> From: Ozgur Karatas <ozgur@goosey.org>
> 
> Hello,
> 
> i want to try out the openai codex and it seemed like a logical
> process so with rcu_read_lock() a protection is started but we dont
> call rcu_read_unlock(); it has to be called to end rcu read lock.
> 
> I guess, this means rcu stays open forever and data structures are not
> cleaned which causes performance degradation.
> 
> Regards
> 
> Ozgur
> 
> Reported-by: Ozgur Karatas <ozgur@goosey.org>

... showing that one needs *something* that would be able to reason somewhere
in the loop.

Your "I guess, this means" is an excellent example - you've got something
(openai codex, visual examination, whatever) pointing you to unusual pattern -
rcu_read_lock() + call of function with no rcu_read_unlock() in sight.

Next step: hypothesis that unlock might be missing.

Next step: blindly send a patch adding an unlock on the strength of that
hypothesis?

Sorry, no.  It's not just that hypothesis is wrong - it is, but that's not
the real issue.  It's that instead of
	* asking how the hell could that work
	* trying to trigger that and checking whether the hypothesis is
corrent
	* looking around for similar places to see whether it looks intentional
(in which case it still might be a bug, but if it's inconsistent, the odds of
a bug are going up)
	* looking through the history (e.g. with git blame) to see if there
might be any explanations in commit message
	* looking through the function in question (git grep would immediately
point to fs/fs_pin.c, and the first glance would show rcu_read_unlock() on
all paths through it, with the total balance of lock/unlock being -1 on
each)
you've chosen to post a patch "fixing" the problem as a way to see whether
it's actually there.

Folks, programming tools can be very useful in directing to places that might
be worth checking.  The codebase is huge, so any heuristics useful in triage
are Good Things(tm); if instead of "bug found because somebody looked at
random line picked out of millions such and it turned out to be broken" you
have "bug found because such-and-such heuristic had picked that line as
odd-looking and it turned out to be broken", you win even if it lists
20 places and only one of them turns out to be actually broken.

But you really can't treat that as anything beyond a hint to bump that
particular place in the list of things to look through.  It might tell you
what looks unusual, but that's it.  Same as with any warning from any
source.

Asking around is fine; so's experimenting, so's reasoning, etc.
But you can't go from "tool gives a warning, I guess it translates into plain
language this way" to "modify that place so that the tool doesn't point to
that place anymore".  You really need to understand what's going on first.

Re: [PATCH] kernel: fix acct.c first test openai codex

Posted by Ozgur Kara 8 months, 4 weeks ago

Al Viro <viro@zeniv.linux.org.uk>, 17 May 2025 Cmt, 02:55 tarihinde şunu yazdı:
>
> On Fri, May 16, 2025 at 11:15:49PM +0000, Ozgur Kara wrote:
> > From: Ozgur Karatas <ozgur@goosey.org>
> >
> > Hello,
> >
> > i want to try out the openai codex and it seemed like a logical
> > process so with rcu_read_lock() a protection is started but we dont
> > call rcu_read_unlock(); it has to be called to end rcu read lock.
> >
> > I guess, this means rcu stays open forever and data structures are not
> > cleaned which causes performance degradation.
> >
> > Regards
> >
> > Ozgur
> >
> > Reported-by: Ozgur Karatas <ozgur@goosey.org>
>
> ... showing that one needs *something* that would be able to reason somewhere
> in the loop.
>
> Your "I guess, this means" is an excellent example - you've got something
> (openai codex, visual examination, whatever) pointing you to unusual pattern -
> rcu_read_lock() + call of function with no rcu_read_unlock() in sight.
>

Hello Al,

first thanks a lot for your guidance and insightful perspective and
your advice was incredibly valuable, very valuable.
And not just for this specific patch but also in helping us understand
how to approach bugs and testing in kernel world.

it really clarified how things that look like straightforward fixes
can actually be symptoms of deeper design decisions and why blindly
patching based on heuristics can lead us astray.
So your mentorship reminded us that understanding the why behind
kernel code is way more important than just code and that thoughtful
investigation is key in kernel development.

When examining every new technology (ai, openai codex) i think we need
an engineering perspective but you should know that i didnt send it
with a very short research, sorry.
and i didnt think about whether rcu should leave this open while doing
read copy and what would happen if it did but you said it very well,
so this was enough of a guide for me and i will definitely look into
it and try to understand it better by going to for example git blame
and grep.
I actually thought i understood this code and fix but that was it and
i just believed the patch was correct without any need for static
analysis or heuristic warnings.

Still we live with these, i mean ai or many other tools come into our
lives and this is also a learning phase for me, i need to learn more.
and i actually asked ai tool this again and this time it got a better answer:

"rcu locks are complex, it is not enough to just appear to be missing.
why does it appear to be missing? look at the entire code, how is the
lock/unlock balance provided?
look at past commits, why does it appear that there is no unlock there?
look at similar codes, is it intentional to act like this in that place?
experiments, tests, asking questions are a must.
in other words, do not ask “the tool warned, let’s patch it
immediately” but “why is there such a warning, is it really a
problem?”
"

this is a pattern i dont normally see elsewhere in kernel and i was
convinced that the switch was left open.
i learned that rcu_read_lock shouldnt be missing and it actually
improved my approach to kernel.

so thank you again.

Regards

Ozgur

> Next step: hypothesis that unlock might be missing.
>
> Next step: blindly send a patch adding an unlock on the strength of that
> hypothesis?
>
> Sorry, no.  It's not just that hypothesis is wrong - it is, but that's not
> the real issue.  It's that instead of
>         * asking how the hell could that work
>         * trying to trigger that and checking whether the hypothesis is
> corrent
>         * looking around for similar places to see whether it looks intentional
> (in which case it still might be a bug, but if it's inconsistent, the odds of
> a bug are going up)
>         * looking through the history (e.g. with git blame) to see if there
> might be any explanations in commit message
>         * looking through the function in question (git grep would immediately
> point to fs/fs_pin.c, and the first glance would show rcu_read_unlock() on
> all paths through it, with the total balance of lock/unlock being -1 on
> each)
> you've chosen to post a patch "fixing" the problem as a way to see whether
> it's actually there.
>
> Folks, programming tools can be very useful in directing to places that might
> be worth checking.  The codebase is huge, so any heuristics useful in triage
> are Good Things(tm); if instead of "bug found because somebody looked at
> random line picked out of millions such and it turned out to be broken" you
> have "bug found because such-and-such heuristic had picked that line as
> odd-looking and it turned out to be broken", you win even if it lists
> 20 places and only one of them turns out to be actually broken.
>
> But you really can't treat that as anything beyond a hint to bump that
> particular place in the list of things to look through.  It might tell you
> what looks unusual, but that's it.  Same as with any warning from any
> source.
>
> Asking around is fine; so's experimenting, so's reasoning, etc.
> But you can't go from "tool gives a warning, I guess it translates into plain
> language this way" to "modify that place so that the tool doesn't point to
> that place anymore".  You really need to understand what's going on first.
>
>