drivers/tty/tty_io.c | 48 ++++++++++++++++++++++---------------------- include/linux/tty.h | 1 + 2 files changed, 25 insertions(+), 24 deletions(-)
syzbot is reporting data race between __tty_hangup() and __fput(), for
filp->f_op readers are not holding tty->files_lock.
Dmitry Vyukov mentioned that this race has possibility of NULL pointer
dereference, for tty_fops implements e.g. splice_read callback whereas
hung_up_tty_fops does not.
CPU0 CPU1
---- ----
do_splice_to() {
__tty_hangup() {
// f_op->splice_read was generic_file_splice_read
if (unlikely(!in->f_op->splice_read))
return warn_unsupported(in, "read");
filp->f_op = &hung_up_tty_fops;
// f_op->splice_read is now NULL
return in->f_op->splice_read(in, ppos, pipe, len, flags);
}
}
If we care about only NULL pointer dereference, implementing missing
callbacks to hung_up_tty_fops is fine. But if we also care about KCSAN
reports, we will need to wrap all filp->f_op usages which are reachable
via tty_fops callbacks using data_race().
Such wrapping is overkill as a fix for tty_io code. Therefore, instead of
implementing missing callbacks, stop updating filp->f_op and remove
hung_up_tty_fops. Then, changes will be limited to within tty_io code.
tty_open() is doing "filp->f_op = &tty_fops;".
__tty_hangup() is doing "filp->f_op = &hung_up_tty_fops;".
tty_hung_up_p() is doing "filp->f_op == &hung_up_tty_fops", and
most functions are checking tty_hung_up_p().
Since tty_open() allocates "struct tty_file_private" for each
"struct file", we can remember whether __tty_hangup() was called
by adding a flag to "struct tty_file_private".
Below is detail of where/what to change.
Regarding __tty_hangup(), replace "filp->f_op = &hung_up_tty_fops;" with
setting the above-mentioned flag.
Regarding tty_hungup_p(), replace "filp->f_op == &hung_up_tty_fops" with
"filp->f_op == &tty_fops" and check the above-mentioned flag.
Regarding tty_open(), just remove "filp->f_op = &tty_fops;" because
"struct tty_file_private" was already released by tty_del_file() from
tty_release() when control reaches this line.
Regarding tty_{read,write,poll,ioctl,compat_ioctl}(), respectively embed
hung_up_tty_{read,write,poll,ioctl,compat_ioctl}() right before
tty_paranoia_check().
Regarding tty_fasync(), embed hung_up_tty_fasync() right before tty_lock()
in tty_fasync() rather than tty_paranoia_check() in __tty_fasync(), for
tty_hung_up_p() is checked right after tty_lock() returned.
Since tty_read() is called via file->f_op->read_iter() from
call_read_iter() from generic_file_splice_read(), no change is needed for
tty_fops->splice_read.
Since tty_write() is called via file->f_op->write_iter() from
call_write_iter() from do_iter_readv_writev() from do_iter_write() from
vfs_iter_write() from iter_file_splice_write(), no change is needed for
tty_fops->splice_write.
Since both tty_fops and hung_up_tty_fops point to the same callback for
llseek/release, no change is needed for tty_fops->{llseek,release}.
Finally, remove hung_up_tty_fops and mark callbacks used by
hung_up_tty_fops as "inline".
Reported-by: syzbot <syzbot+b7c3ba8cdc2f6cf83c21@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=b7c3ba8cdc2f6cf83c21
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvyukov@google.com>
---
Changes in v3:
Accept some of comments from Greg Kroah-Hartman
at https://lkml.kernel.org/r/2023053048-saved-undated-9adf@gregkh .
> That sounds like a bug in KCSAN, let's not add loads of infrastructure
> just because we have bad tools.
Not a bug in KCSAN. KCSAN is reporting that current code has race window
for NULL pointer dereference. This patch closes the race window.
> Again, document it, and also perhaps, not use KCSAN? :)
data_race() is already explained. Not using KCSAN is not a solution.
> It happens on open() which yes, is not performance critical, but you are
> now requiring it where before this was not required. Which isn't always
> so obvious, right?
Updated to do explicit initialization.
> Which we can fix. So let's fix that and then not worry about these
> false-positives with KCSAN as it's obviously wrong. That would make for
> a much smaller and simpler and easier-to-maintain-over-time change.
I disagree. We will surely forget to add corresponding dummy callback to
hung_up_tty_fops when a new callback is added to "struct file_operations".
Not changing filp->f_op after once published is the safer approach.
> How will you know this in 5 years when you see this new field?
> Documentation matters.
Added comment.
Changes in v2:
Mark callbacks used by hung_up_tty_fops as "inline" in order to fix
-Wunused-function warning when CONFIG_COMPAT=n, reported by
Nathan Chancellor <nathan@kernel.org> and Arnd Bergmann <arnd@kernel.org>.
drivers/tty/tty_io.c | 48 ++++++++++++++++++++++----------------------
include/linux/tty.h | 1 +
2 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 407b0d87b7c1..88b0e3221195 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -187,6 +187,7 @@ int tty_alloc_file(struct file *file)
if (!priv)
return -ENOMEM;
+ priv->hung = false;
file->private_data = priv;
return 0;
@@ -420,35 +421,35 @@ struct tty_driver *tty_find_polling_driver(char *name, int *line)
EXPORT_SYMBOL_GPL(tty_find_polling_driver);
#endif
-static ssize_t hung_up_tty_read(struct kiocb *iocb, struct iov_iter *to)
+static inline ssize_t hung_up_tty_read(struct kiocb *iocb, struct iov_iter *to)
{
return 0;
}
-static ssize_t hung_up_tty_write(struct kiocb *iocb, struct iov_iter *from)
+static inline ssize_t hung_up_tty_write(struct kiocb *iocb, struct iov_iter *from)
{
return -EIO;
}
/* No kernel lock held - none needed ;) */
-static __poll_t hung_up_tty_poll(struct file *filp, poll_table *wait)
+static inline __poll_t hung_up_tty_poll(struct file *filp, poll_table *wait)
{
return EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP | EPOLLRDNORM | EPOLLWRNORM;
}
-static long hung_up_tty_ioctl(struct file *file, unsigned int cmd,
+static inline long hung_up_tty_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
}
-static long hung_up_tty_compat_ioctl(struct file *file,
+static inline long hung_up_tty_compat_ioctl(struct file *file,
unsigned int cmd, unsigned long arg)
{
return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
}
-static int hung_up_tty_fasync(int fd, struct file *file, int on)
+static inline int hung_up_tty_fasync(int fd, struct file *file, int on)
{
return -ENOTTY;
}
@@ -490,17 +491,6 @@ static const struct file_operations console_fops = {
.fasync = tty_fasync,
};
-static const struct file_operations hung_up_tty_fops = {
- .llseek = no_llseek,
- .read_iter = hung_up_tty_read,
- .write_iter = hung_up_tty_write,
- .poll = hung_up_tty_poll,
- .unlocked_ioctl = hung_up_tty_ioctl,
- .compat_ioctl = hung_up_tty_compat_ioctl,
- .release = tty_release,
- .fasync = hung_up_tty_fasync,
-};
-
static DEFINE_SPINLOCK(redirect_lock);
static struct file *redirect;
@@ -618,7 +608,8 @@ static void __tty_hangup(struct tty_struct *tty, int exit_session)
continue;
closecount++;
__tty_fasync(-1, filp, 0); /* can't block */
- filp->f_op = &hung_up_tty_fops;
+ /* Accept race with tty_hung_up_p() test. */
+ data_race(priv->hung = true);
}
spin_unlock(&tty->files_lock);
@@ -742,7 +733,9 @@ void tty_vhangup_session(struct tty_struct *tty)
*/
int tty_hung_up_p(struct file *filp)
{
- return (filp && filp->f_op == &hung_up_tty_fops);
+ return filp && filp->f_op == &tty_fops &&
+ /* Accept race with __tty_hangup(). */
+ data_race(((struct tty_file_private *) filp->private_data)->hung);
}
EXPORT_SYMBOL(tty_hung_up_p);
@@ -921,6 +914,8 @@ static ssize_t tty_read(struct kiocb *iocb, struct iov_iter *to)
struct tty_ldisc *ld;
ssize_t ret;
+ if (tty_hung_up_p(file))
+ return hung_up_tty_read(iocb, to);
if (tty_paranoia_check(tty, inode, "tty_read"))
return -EIO;
if (!tty || tty_io_error(tty))
@@ -1080,6 +1075,8 @@ static ssize_t file_tty_write(struct file *file, struct kiocb *iocb, struct iov_
struct tty_ldisc *ld;
ssize_t ret;
+ if (tty_hung_up_p(file))
+ return hung_up_tty_write(iocb, from);
if (tty_paranoia_check(tty, file_inode(file), "tty_write"))
return -EIO;
if (!tty || !tty->ops->write || tty_io_error(tty))
@@ -2166,11 +2163,6 @@ static int tty_open(struct inode *inode, struct file *filp)
return retval;
schedule();
- /*
- * Need to reset f_op in case a hangup happened.
- */
- if (tty_hung_up_p(filp))
- filp->f_op = &tty_fops;
goto retry_open;
}
clear_bit(TTY_HUPPED, &tty->flags);
@@ -2204,6 +2196,8 @@ static __poll_t tty_poll(struct file *filp, poll_table *wait)
struct tty_ldisc *ld;
__poll_t ret = 0;
+ if (tty_hung_up_p(filp))
+ return hung_up_tty_poll(filp, wait);
if (tty_paranoia_check(tty, file_inode(filp), "tty_poll"))
return 0;
@@ -2256,6 +2250,8 @@ static int tty_fasync(int fd, struct file *filp, int on)
struct tty_struct *tty = file_tty(filp);
int retval = -ENOTTY;
+ if (tty_hung_up_p(filp))
+ return hung_up_tty_fasync(fd, filp, on);
tty_lock(tty);
if (!tty_hung_up_p(filp))
retval = __tty_fasync(fd, filp, on);
@@ -2684,6 +2680,8 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
int retval;
struct tty_ldisc *ld;
+ if (tty_hung_up_p(file))
+ return hung_up_tty_ioctl(file, cmd, arg);
if (tty_paranoia_check(tty, file_inode(file), "tty_ioctl"))
return -EINVAL;
@@ -2969,6 +2967,8 @@ static long tty_compat_ioctl(struct file *file, unsigned int cmd,
return tty_ioctl(file, cmd, arg);
}
+ if (tty_hung_up_p(file))
+ return hung_up_tty_compat_ioctl(file, cmd, arg);
if (tty_paranoia_check(tty, file_inode(file), "tty_ioctl"))
return -EINVAL;
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 2b2e6f0a54d6..4bf958efd6be 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -254,6 +254,7 @@ struct tty_file_private {
struct tty_struct *tty;
struct file *file;
struct list_head list;
+ bool hung; /* Whether __tty_hangup() was called or not. */
};
/**
--
2.18.4
On Fri, 26 Apr 2024 at 23:21, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting data race between __tty_hangup() and __fput(), for
> filp->f_op readers are not holding tty->files_lock.
Hmm. I looked round, and we actually have another case of this:
snd_card_disconnect() also does
mfile->file->f_op = &snd_shutdown_f_ops;
and I don't think tty->files_lock (or, in the sound case,
&card->files_lock) is at all relevant, since the users of f_ops don't
use it or care.
That said, I really think we'd be better off just keeping the current
model, and have the "you get one or the other". For the two cases that
do this, do that f_op replacement with a WRITE_ONCE(), and just make
the rule be that you have to have all the same ops in both the
original and the shutdown version.
I do *not* think it's at all better to replace (in two different
places) the racy f_op thing with another racy 'hungup' flag.
The sound case is actually a bit more involved, since it tries to deal
with module counts. That looks potentially bogus. It does
fops_get(mfile->file->f_op);
after it has installed the snd_shutdown_f_ops, but in snd_open() it
has done the proper
replace_fops(file, new_fops);
which actually drops the module count for the old one. So the sound
case seems to possibly leak a module ref on disconnect. That's a
separate issue, though.
Linus
Linus
On 2024/04/28 4:02, Linus Torvalds wrote:
> On Fri, 26 Apr 2024 at 23:21, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> syzbot is reporting data race between __tty_hangup() and __fput(), for
>> filp->f_op readers are not holding tty->files_lock.
>
> Hmm. I looked round, and we actually have another case of this:
> snd_card_disconnect() also does
>
> mfile->file->f_op = &snd_shutdown_f_ops;
OK. That one needs to be fixed as well.
>
> and I don't think tty->files_lock (or, in the sound case,
> &card->files_lock) is at all relevant, since the users of f_ops don't
> use it or care.
More precisely, the users of f_op can't access it. For example,
do_splice_read() cannot understand that "in" argument refers to a tty
device and therefore will not know about tty->files_lock.
>
> That said, I really think we'd be better off just keeping the current
> model, and have the "you get one or the other". For the two cases that
> do this, do that f_op replacement with a WRITE_ONCE(), and just make
> the rule be that you have to have all the same ops in both the
> original and the shutdown version.
If we keep the current model, WRITE_ONCE() is not sufficient.
My understanding is that KCSAN's report like
https://elixir.bootlin.com/linux/v6.9-rc5/source/Documentation/dev-tools/kcsan.rst#L56
will remain unless we wrap all f_op readers using data_race() macro. That is,
we will need to define a wrapper like
static inline struct file_operations *f_op(struct file *file)
{
/*
* Ignore race in order to silence KCSAN, for __tty_hangup() or
* snd_card_disconnect() might update f_op while file is in use.
*/
return data_race(file->f_op);
}
and do for example
- if (unlikely(!in->f_op->splice_read))
+ if (unlikely(!f_op(in)->splice_read))
for https://elixir.bootlin.com/linux/v6.9-rc5/source/fs/splice.c#L977 and
- return in->f_op->splice_read(in, ppos, pipe, len, flags);
+ return f_op(in)->splice_read(in, ppos, pipe, len, flags);
for https://elixir.bootlin.com/linux/v6.9-rc5/source/fs/splice.c#L985 .
Are VFS people happy with such change? I guess that VFS people assume that
file->f_op does not get updated while file is in use. Also, such data_race()
usage does not match one of situations listed in
https://elixir.bootlin.com/linux/v6.9-rc5/source/tools/memory-model/Documentation/access-marking.txt#L58 .
>
> I do *not* think it's at all better to replace (in two different
> places) the racy f_op thing with another racy 'hungup' flag.
This approach allows VFS people to assume that file->f_op does not
get updated while file is in use.
>
> The sound case is actually a bit more involved, since it tries to deal
> with module counts. That looks potentially bogus. It does
>
> fops_get(mfile->file->f_op);
>
> after it has installed the snd_shutdown_f_ops, but in snd_open() it
> has done the proper
>
> replace_fops(file, new_fops);
replace_fops() is intended to be used *ONLY* from ->open() instances.
>
> which actually drops the module count for the old one. So the sound
> case seems to possibly leak a module ref on disconnect. That's a
> separate issue, though.
>
> Linus
>
> Linus
On Sun, 28 Apr 2024 at 03:20, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
>
> If we keep the current model, WRITE_ONCE() is not sufficient.
>
> My understanding is that KCSAN's report like
I find it obnoxious that these are NOT REAL PROBLEMS.
It's KCSAN that is broken and doesn't allow us to just tell it to
sanely ignore things.
I don't want to add stupid and pointless annotations for a broken tooling.
Can you instead just ask the KCSAN people to have some mode where we
can annotate a pointer as a "use one or the other", and just shut that
thing up that way?
Because no, we're not adding some idiotic "f_op()" wrapper just to
shut KCSAN up about a non-issue.
Linus
On Sun, 28 Apr 2024 at 20:50, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sun, 28 Apr 2024 at 03:20, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > If we keep the current model, WRITE_ONCE() is not sufficient. FWIW, the original report here came from syzbot, which is configured so that the WRITE_ONCE() is sufficient (CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=n, CONFIG_KCSAN_IGNORE_ATOMICS=y ... long names, I know), because this was an idiom we ran into in the past and just wanted to filter them out (for better or worse). That being said, the reader side still has a real problem, even if hidden in some KCSAN configs. Up to you if the WRITE_ONCE() is sufficient or not, at least on syzbot this case wouldn't resurface (for now). > > My understanding is that KCSAN's report like > > I find it obnoxious that these are NOT REAL PROBLEMS. > > It's KCSAN that is broken and doesn't allow us to just tell it to > sanely ignore things. > > I don't want to add stupid and pointless annotations for a broken tooling. KCSAN is the messenger in this case that our mental model vs. what our language/memory model gives us is wrong: we already have our own memory model (!= C11), but we still have data races, and have to deal with the fallout. Data races (here: 2 plain unmarked accesses of the pointers) still exist, and the compiler is still free to optimize them (miscompile them according to our mental model). Assuming the data race is not a problem assumes all compilers on all architectures won't mess up the accesses. This comes up over and over, and the problem hasn't gone away. Our compilers still don't know about the kernel doing things outside the scope of standard C - we can beat the compiler into submission with lots of flags, but we know [1] compilers break our assumptions. What's the long-term fix? I don't know, besides trying to teach compilers more of what Linux wants. [1] https://lpc.events/event/16/contributions/1174/attachments/1108/2121/Status%20Report%20-%20Broken%20Dependency%20Orderings%20in%20the%20Linux%20Kernel.pdf > Can you instead just ask the KCSAN people to have some mode where we > can annotate a pointer as a "use one or the other", and just shut that > thing up that way? Saying "use one or the other" pointer, implies atomicity of the load and store. Telling KCSAN about our assumption of what we think the compiler should do is the wrong way around - we should tell the compiler. Which part of the code is telling the compiler we want atomicity of the loaded pointers? A WRITE_ONCE() / READ_ONCE() pair would do it here. What should we use instead? Thanks, -- Marco
On Mon, 29 Apr 2024 at 06:56, Marco Elver <elver@google.com> wrote:
>
> A WRITE_ONCE() / READ_ONCE() pair would do it here. What should we use instead?
Why would we annotate a "any other code generation is insane" issues at all?
When we do chained pointer loads in
file->f_op->op()
and we say "I don't care what value I get for the middle one", I don't
see the value in annotating that at all.
There is no compiler that will sanely and validly do a pointer chain
load by *anything* but a load. And it doesn't matter to us if it then
spills and reloads, it will *STILL* be a load.
We're not talking about "extract different bits in separate
operations". We're talking about following one pointer that can point
to two separate static values.
Reality matters. A *lot* more than some "C standard" that we already
have ignored for decades because it's not strong enough.
Linus
On Mon, Apr 29, 2024 at 08:38:28AM -0700, Linus Torvalds wrote: > On Mon, 29 Apr 2024 at 06:56, Marco Elver <elver@google.com> wrote: > > > > A WRITE_ONCE() / READ_ONCE() pair would do it here. What should we use instead? > > Why would we annotate a "any other code generation is insane" issues at all? > > When we do chained pointer loads in > > file->f_op->op() > > and we say "I don't care what value I get for the middle one", I don't > see the value in annotating that at all. In code that isn't being actively developed, sticks to known patterns (as above), or hides the lockless accesses behind a good API, this can make a lot of sense. And I certainly have talked to a few people who feel that KCSAN is nothing but an irritant, and are not willing to make any concessions whatsoever to it. In fact, many of them seem to wish that it would disappear completely. Of course, that wish is their privilege. But in RCU, new patterns are often being created, and so I am quite happy to give KCSAN additional information in order to help it help me. I am also quite happy to run KCSAN with its most aggressive settings, also to help it help me. In my experience, it is way easier having KCSAN spot a data-race bug than to have to find it the hard way, but perhaps I am just showing my age. In addition, KCSAN does a tireless and thorough (if somewhat simple-minded) code review of the full RCU code base, and can easily be persuaded to do so each and every day, if desired. Just *you* try doing that manually, whatever your age! ;-) Plus, the documentation benefits are significant. "Wait, which of these unmarked accesses is to a shared variable?" In the wee hours of the morning while chasing a bug. Especially if the person doing the chasing is an innocent bystander who is not already an expert on the code currently being investigated. :-/ Oddly enough, the simplest concurrency designs also want a maximally aggressive KCSAN. If you are using pure locking with absolutely no lockless accesses, then any data race at all is a bug. Again, it is a lot easier for KCSAN to tell you that you forgot to acquire the lock than to find out the hard way. > There is no compiler that will sanely and validly do a pointer chain > load by *anything* but a load. And it doesn't matter to us if it then > spills and reloads, it will *STILL* be a load. > > We're not talking about "extract different bits in separate > operations". We're talking about following one pointer that can point > to two separate static values. > > Reality matters. A *lot* more than some "C standard" that we already > have ignored for decades because it's not strong enough. Agreed, but it also appears that different developers and maintainers in different parts of the kernel are looking for different things from KCSAN. To illustrate my personal concerns, I confess to being a bit disgusted by those pontificating on software reliability, especially when they compare it unfavorably to things like house construction. The difference is of course that the average house is not under active attack by nation states. In contrast, whether we like it or not, the Linux kernel is under active attack by nation states, organized crime, and who knows what all else. For RCU at least, I will take all the help I can get, even if it requires me to do a little bit of work up front. In short, I for one do greatly value KCSAN's help. Along with that of a great many other tools, none of which are perfect, but all of which are helpful. Thanx, Paul
On Wed, 1 May 2024 at 11:46, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> In short, I for one do greatly value KCSAN's help. Along with that of
> a great many other tools, none of which are perfect, but all of which
> are helpful.
It's not that I don't value what KCSAN does, but I really think this
is a KCSAN issue.
I absolutely *detest* these crazy "randomly add data race annotations".
Could we instead annotate particular structure fields? I don't want to
mark things actually "volatile", because that then causes the compiler
to generate absolutely horrendous code. But some KCSAN equivalent of
"this field has data races, and we don't care" kind of annotation
would be lovely..
Linus
On Wed, May 01, 2024 at 11:56:26AM -0700, Linus Torvalds wrote: > On Wed, 1 May 2024 at 11:46, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > In short, I for one do greatly value KCSAN's help. Along with that of > > a great many other tools, none of which are perfect, but all of which > > are helpful. > > It's not that I don't value what KCSAN does, but I really think this > is a KCSAN issue. > > I absolutely *detest* these crazy "randomly add data race annotations". > > Could we instead annotate particular structure fields? I don't want to > mark things actually "volatile", because that then causes the compiler > to generate absolutely horrendous code. But some KCSAN equivalent of > "this field has data races, and we don't care" kind of annotation > would be lovely.. That would give the poor sleep-deprived innocent bystander some way to figure out which fields were shared, so that does sound like a good improvement! I would naively expect that KCSAN's ability to handle volatile fields would make this doable, but there is much that I do not know about KCSAN internals. So I must defer to Marco on this one. Thanx, Paul
On Wed, 1 May 2024 at 21:02, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, May 01, 2024 at 11:56:26AM -0700, Linus Torvalds wrote: > > On Wed, 1 May 2024 at 11:46, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > In short, I for one do greatly value KCSAN's help. Along with that of > > > a great many other tools, none of which are perfect, but all of which > > > are helpful. > > > > It's not that I don't value what KCSAN does, but I really think this > > is a KCSAN issue. > > > > I absolutely *detest* these crazy "randomly add data race annotations". > > > > Could we instead annotate particular structure fields? I don't want to > > mark things actually "volatile", because that then causes the compiler > > to generate absolutely horrendous code. But some KCSAN equivalent of > > "this field has data races, and we don't care" kind of annotation > > would be lovely.. > > That would give the poor sleep-deprived innocent bystander some way > to figure out which fields were shared, so that does sound like a good > improvement! > > I would naively expect that KCSAN's ability to handle volatile fields > would make this doable, but there is much that I do not know about > KCSAN internals. So I must defer to Marco on this one. This is relatively trivial: #ifdef __SANITIZE_THREAD__ #define __data_racy volatile #endif KCSAN will just ignore racy access to them (it will pretend they are "marked"). In some cases it might cause the compiler to complain if converting a volatile pointer to a non-volatile pointer, but I suspect that the corresponding pointer should then similarly be marked as __data_racy. The fact that without casting the attribute is "viral" is probably WAI even in this case. Do we want this kind of attribute? Thanks, -- Marco
On Wed, 1 May 2024 at 13:15, Marco Elver <elver@google.com> wrote:
>
> This is relatively trivial:
>
> #ifdef __SANITIZE_THREAD__
> #define __data_racy volatile
> #endif
I really wouldn't want to make a code generation difference, but I
guess when the sanitizer is on, the compiler generating crap code
isn't a huge deal.
> In some cases it might cause the compiler to complain if converting a
> volatile pointer to a non-volatile pointer
No. Note that it's not the *pointer* that is volatile, it's the
structure member.
So it would be something like
const struct file_operations * __data_racy f_op;
and only the load of f_op would be volatile - not the pointer itself.
Of course, if somebody then does "&file->f_op" to get a pointer to a
pointer, *that* would now be a volatile pointer, but I don't see
people doing that.
So I guess this might be a way forward. Anybody want to verify?
Now, the "hung_up_tty_fops" *do* need to be expanded to have hung up
ops for every op that is non-NULL in the normal tty ops. That was a
real bug. We'd also want to add a big comment to the tty fops to make
sure anybody who adds a new tty f_op member to make sure to populate
the hung up version too.
Linus
On Wed, 1 May 2024 at 23:06, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 1 May 2024 at 13:15, Marco Elver <elver@google.com> wrote: > > > > This is relatively trivial: > > > > #ifdef __SANITIZE_THREAD__ > > #define __data_racy volatile > > #endif > > I really wouldn't want to make a code generation difference, but I > guess when the sanitizer is on, the compiler generating crap code > isn't a huge deal. > > > In some cases it might cause the compiler to complain if converting a > > volatile pointer to a non-volatile pointer > > No. Note that it's not the *pointer* that is volatile, it's the > structure member. > > So it would be something like > > const struct file_operations * __data_racy f_op; > > and only the load of f_op would be volatile - not the pointer itself. > > Of course, if somebody then does "&file->f_op" to get a pointer to a > pointer, *that* would now be a volatile pointer, but I don't see > people doing that. This is the case I thought of. I still think everything is working as intended then, since passing a pointer to a __data_racy variable should be done with pointers to __data_racy (just like other type qualifiers - the rules are by virtue of implementation equivalent to volatile). Not a problem, just an observation. > So I guess this might be a way forward. Anybody want to verify? I sent a patch to add the type qualifier - in a simple test I added it does what we want: https://lore.kernel.org/all/20240502141242.2765090-1-elver@google.com/T/#u I'll leave it to Tetsuo to amend the original patch if __data_racy makes sense. Thanks, -- Marco > Now, the "hung_up_tty_fops" *do* need to be expanded to have hung up > ops for every op that is non-NULL in the normal tty ops. That was a > real bug. We'd also want to add a big comment to the tty fops to make > sure anybody who adds a new tty f_op member to make sure to populate > the hung up version too. > > Linus
On 2024/05/02 23:14, Marco Elver wrote:
> I sent a patch to add the type qualifier - in a simple test I added it
> does what we want:
> https://lore.kernel.org/all/20240502141242.2765090-1-elver@google.com/T/#u
Want some updates to Documentation/process/volatile-considered-harmful.rst
because __data_racy is for patches to add volatile variables ?
Patches to remove volatile variables are generally welcome - as long as
they come with a justification which shows that the concurrency issues have
been properly thought through.
>
> I'll leave it to Tetsuo to amend the original patch if __data_racy makes sense.
OK if below change is acceptable.
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1012,7 +1012,7 @@ struct file {
struct file_ra_state f_ra;
struct path f_path;
struct inode *f_inode; /* cached value */
- const struct file_operations *f_op;
+ const __data_racy struct file_operations *f_op;
u64 f_version;
#ifdef CONFIG_SECURITY
Hmm, debugfs assumes that f_op does not change?
fs/debugfs/file.c: In function 'full_proxy_release':
fs/debugfs/file.c:357:45: warning: initialization discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
const struct file_operations *proxy_fops = filp->f_op;
^~~~
On Thu, 2 May 2024 at 09:42, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> OK if below change is acceptable.
>
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1012,7 +1012,7 @@ struct file {
> struct file_ra_state f_ra;
> struct path f_path;
> struct inode *f_inode; /* cached value */
> - const struct file_operations *f_op;
> + const __data_racy struct file_operations *f_op;
No, this is very wrong.
It's not the *pointer* that is __data_racy. It's the structure *fied*.
So that should be
const struct file_operations *__data_racy f_op;
which is very different.
> Hmm, debugfs assumes that f_op does not change?
>
> fs/debugfs/file.c: In function 'full_proxy_release':
> fs/debugfs/file.c:357:45: warning: initialization discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
> const struct file_operations *proxy_fops = filp->f_op;
> ^~~~
This error is a direct result of placing the __data_racy in the wrong place.
It's not that the _result_ of reading filp->f_op is racy. It's the
read of filp->f_op itself that is.
Yes, this is unusual. The *common* thing is to mark pointers as being
volatile. But this really is something entirely different from that.
Linus
On 2024/05/03 2:29, Linus Torvalds wrote:
>> Hmm, debugfs assumes that f_op does not change?
>>
>> fs/debugfs/file.c: In function 'full_proxy_release':
>> fs/debugfs/file.c:357:45: warning: initialization discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
>> const struct file_operations *proxy_fops = filp->f_op;
>> ^~~~
>
> This error is a direct result of placing the __data_racy in the wrong place.
Oops. But it seems to me that the problem is that
>
> It's not that the _result_ of reading filp->f_op is racy. It's the
> read of filp->f_op itself that is.
debugfs (or maybe any filesystems that wraps f_op) caches filp->f_op into
private data
struct debugfs_fsdata {
const struct file_operations *real_fops; // <= may not be up to dated?
(...snip...)
}
and cannot update cached real_fops value when
filp->f_op = &hung_up_tty_fops;
or
mfile->file->f_op = &snd_shutdown_f_ops;
is called.
Such filesystems need to be updated to cache "struct file *" rather than
"struct file_operations *" ?
On Thu, 2 May 2024 at 16:54, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> debugfs (or maybe any filesystems that wraps f_op) caches filp->f_op into
> private data
That's just for debugfs uses. If you somehow try to do that on a tty,
you are quite welcome to keep both broken pieces.
Linus
On Thu, May 02, 2024 at 10:29:52AM -0700, Linus Torvalds wrote: > Yes, this is unusual. The *common* thing is to mark pointers as being > volatile. But this really is something entirely different from that. The common thing is to mark pointers are pointers to volatile; calling them "volatile pointers" is common and incorrect, and the only reason why that sloppy turn of phrase persists is that real "volatile pointers" are rare... Marco, struct foo volatile *p; declares p as a (non-volatile) pointer to volatile struct foo. struct foo * volatile p; declares p as volatile pointer to (non-volatile) struct foo. The former is a statement about the objects whose addresses might be stored in p; the latter is a statement about the object p itself. Replace volatile with const and it becomes easier to experiment with: char const *p; char s[] = "barf"; char * const q = s; ... p = "yuck"; - fine, p itself can be modified *p = 'a'; - error *p can not be modified, it's an l-value of type const char q = s + 1; - error, can't modify q *q = 'a'; - fine, *q is l-value of type char p = q; - fine, right-hand side of assignment loses the top qualifier, so q (const pointer to char as l-value) becomes a plain pointer to char, which can be converted to pointer to const char, and stored in p (l-value of type pointer to const char) strlen(q); - almost the same story, except that it's passing an argument rather than assignment; they act the same way. strcpy(q, "s"); - almost the same, except that here the type of argument is pointer to char rather than pointer to const char (strlen() promises not to modify the string passed to it, strcpy() obviously doesn't) strcpy(p, "s"); - error; pointer to char converts to a pointer to const char, but not the other way round. The situations where you want a const (or volatile) pointer (as opposed to pointer to const or volatile object) are rare, but this is exactly what you are asking for - you want to say that the value of 'f_op' member in any struct file can change at any time. That value is an address of some instance of struct file_operations and what you want to express is the property of f_op member itself, not that of the objects whose addresses might end up stored there. So having a driver do const struct file_operations *ops = file->f_op; is fine - it's basically "take the value of 'file'; it will be an address of some struct file instance. Fetch 'f_op' from that instance, without any assumptions of the stability of that member. Use whatever value you find there as initial value of 'ops'". That's fine, and since nobody is going to change 'ops' itself behind your back, you don't need any qualifiers on it. The type of 'ops' here is "(unqualified) pointer to const struct file_operations".
On Thu, 2 May 2024 at 20:14, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, May 02, 2024 at 10:29:52AM -0700, Linus Torvalds wrote: > > > Yes, this is unusual. The *common* thing is to mark pointers as being > > volatile. But this really is something entirely different from that. > > The common thing is to mark pointers are pointers to volatile; > calling them "volatile pointers" is common and incorrect, and the only > reason why that sloppy turn of phrase persists is that real "volatile > pointers" are rare... > > Marco, I think we agree on what we want. I misread the intention of Tetsuo in [1], and provided incorrect feedback. [1] https://lore.kernel.org/all/CANpmjNPtoKf1ysbKd=E8o753JT0DzBanzFBP234VBsazfufVAQ@mail.gmail.com/T/#u > struct foo volatile *p; > declares p as a (non-volatile) pointer to volatile struct foo. > struct foo * volatile p; > declares p as volatile pointer to (non-volatile) struct foo. > > The former is a statement about the objects whose addresses might > be stored in p; the latter is a statement about the object p itself. > > Replace volatile with const and it becomes easier to experiment with: > char const *p; > char s[] = "barf"; > char * const q = s; > ... > p = "yuck"; - fine, p itself can be modified > *p = 'a'; - error *p can not be modified, it's an l-value of type const char > q = s + 1; - error, can't modify q > *q = 'a'; - fine, *q is l-value of type char > p = q; - fine, right-hand side of assignment loses the top > qualifier, so q (const pointer to char as l-value) > becomes a plain pointer to char, which can be > converted to pointer to const char, and stored in > p (l-value of type pointer to const char) > strlen(q); - almost the same story, except that it's passing > an argument rather than assignment; they act the > same way. > strcpy(q, "s"); - almost the same, except that here the type of > argument is pointer to char rather than pointer to > const char (strlen() promises not to modify the > string passed to it, strcpy() obviously doesn't) > strcpy(p, "s"); - error; pointer to char converts to a pointer > to const char, but not the other way round. > > The situations where you want a const (or volatile) pointer (as opposed to > pointer to const or volatile object) are rare, but this is exactly what > you are asking for - you want to say that the value of 'f_op' member > in any struct file can change at any time. That value is an address of > some instance of struct file_operations and what you want to express is > the property of f_op member itself, not that of the objects whose addresses > might end up stored there. > > So having a driver do > const struct file_operations *ops = file->f_op; > is fine - it's basically "take the value of 'file'; it will be an address > of some struct file instance. Fetch 'f_op' from that instance, without > any assumptions of the stability of that member. Use whatever value > you find there as initial value of 'ops'". > > That's fine, and since nobody is going to change 'ops' itself behind your > back, you don't need any qualifiers on it. The type of 'ops' here is > "(unqualified) pointer to const struct file_operations". Is this feedback for the __data_racy attribute patch [2], or a comment for the patch "tty: tty_io: remove hung_up_tty_fops"? [ With the former I can help, with the latter Tetsuo can help. ] [2] https://lore.kernel.org/all/20240502141242.2765090-1-elver@google.com/ The __data_racy attribute should behave like any other type qualifier (be it const, volatile), and what you point out above applies equally, no doubt about it. But I think it's important to treat it as a completely distinct type qualifier - volatile is an implementation detail (in non-KCSAN kernels it's a no-op). Thanks, -- Marco
On Thu, 2 May 2024 at 18:42, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/05/02 23:14, Marco Elver wrote:
> > I sent a patch to add the type qualifier - in a simple test I added it
> > does what we want:
> > https://lore.kernel.org/all/20240502141242.2765090-1-elver@google.com/T/#u
>
> Want some updates to Documentation/process/volatile-considered-harmful.rst
> because __data_racy is for patches to add volatile variables ?
This has nothing to do with volatile. It's merely an implementation
artifact that in CONFIG_KCSAN builds __data_racy translates to
"volatile": the compiler will emit special instrumentation for
volatile accesses so that KCSAN thinks they are "marked". However,
volatile is and has been an implementation detail of certain
primitives like READ_ONCE()/WRITE_ONCE(), although as a developer
using this interface we should not be concerned with the fact that
there's volatile underneath. In a perfect world the compiler would
give us a better "tool" than volatile, but we have to make do with the
tools we have at our disposal today.
> Patches to remove volatile variables are generally welcome - as long as
> they come with a justification which shows that the concurrency issues have
> been properly thought through.
My suggestion is to forget about "volatile" and simply pretend it's
data_race() but as a type qualifier, like the bit of documentation I
added to Documentation/dev-tools/kcsan.rst in the patch.
> >
> > I'll leave it to Tetsuo to amend the original patch if __data_racy makes sense.
>
> OK if below change is acceptable.
>
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1012,7 +1012,7 @@ struct file {
> struct file_ra_state f_ra;
> struct path f_path;
> struct inode *f_inode; /* cached value */
> - const struct file_operations *f_op;
> + const __data_racy struct file_operations *f_op;
>
> u64 f_version;
> #ifdef CONFIG_SECURITY
>
> Hmm, debugfs assumes that f_op does not change?
>
> fs/debugfs/file.c: In function 'full_proxy_release':
> fs/debugfs/file.c:357:45: warning: initialization discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
> const struct file_operations *proxy_fops = filp->f_op;
> ^~~~
Exactly as I pointed out elsewhere: pointers to __data_racy fields now
have to become __data_racy as well:
const struct file_operations __data_racy *proxy_fops = filp->f_op;
should be what you want there. The type system is in fact helping us
here as intended. :-)
On Wed, 1 May 2024 at 14:06, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So it would be something like
>
> const struct file_operations * __data_racy f_op;
>
> and only the load of f_op would be volatile - not the pointer itself.
Noe that in reality, we'd actually prefer the compiler to treat that
"__data_racy" as volatile in the sense of "don't reload this value",
but at the same time be the opposite of volatile in the sense that
using one read multiple times is actually a good idea.
IOW, the problem is rematerialization ("read the value more than once
when there is just one access in the source"), not strictly a "read
the value separately each time it is accessed".
We've actually had that before: it's not that we want each access to
force a read from memory, we want to avoid a TOCTOU race.
Many of our "READ_ONCE()" uses are of that kind, and using "volatile"
sadly generates horrible code, but is the only way to tell the
compiler to not ever rematerialize the value by loading it _twice_.
I'd love to see an extension where "const volatile" basically means
exactly that: the volatile tells the compiler that it can't
rematerialize by doing the load multiple times, but the "const" would
say that if the compiler sees two or more accesses, it can still CSE
them.
Oh well. Thankfully it's not a hugely common code generation problem.
It comes up every once in a while, and I think the last time this
worry came up, I think we had gcc people tell us that they don't
actually ever rematerialize loads from memory.
Of course, that was an implementation issue, not a guarantee.
Linus
On Wed, May 01, 2024 at 02:20:35PM -0700, Linus Torvalds wrote:
> On Wed, 1 May 2024 at 14:06, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So it would be something like
> >
> > const struct file_operations * __data_racy f_op;
> >
> > and only the load of f_op would be volatile - not the pointer itself.
>
> Noe that in reality, we'd actually prefer the compiler to treat that
> "__data_racy" as volatile in the sense of "don't reload this value",
> but at the same time be the opposite of volatile in the sense that
> using one read multiple times is actually a good idea.
>
> IOW, the problem is rematerialization ("read the value more than once
> when there is just one access in the source"), not strictly a "read
> the value separately each time it is accessed".
>
> We've actually had that before: it's not that we want each access to
> force a read from memory, we want to avoid a TOCTOU race.
>
> Many of our "READ_ONCE()" uses are of that kind, and using "volatile"
> sadly generates horrible code, but is the only way to tell the
> compiler to not ever rematerialize the value by loading it _twice_.
>
> I'd love to see an extension where "const volatile" basically means
> exactly that: the volatile tells the compiler that it can't
> rematerialize by doing the load multiple times, but the "const" would
> say that if the compiler sees two or more accesses, it can still CSE
> them.
No promises, other than that if we don't ask, they won't say "yes".
Let me see what can be done.
Thanx, Paul
> Oh well. Thankfully it's not a hugely common code generation problem.
> It comes up every once in a while, and I think the last time this
> worry came up, I think we had gcc people tell us that they don't
> actually ever rematerialize loads from memory.
>
> Of course, that was an implementation issue, not a guarantee.
>
> Linus
On Wed, May 01, 2024 at 02:49:17PM -0700, Paul E. McKenney wrote: > On Wed, May 01, 2024 at 02:20:35PM -0700, Linus Torvalds wrote: > > On Wed, 1 May 2024 at 14:06, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: [ . . . ] > > I'd love to see an extension where "const volatile" basically means > > exactly that: the volatile tells the compiler that it can't > > rematerialize by doing the load multiple times, but the "const" would > > say that if the compiler sees two or more accesses, it can still CSE > > them. Except that "const volatile" already means "you cannot write to it, and reads will not be fused". :-/ > No promises, other than that if we don't ask, they won't say "yes". > > Let me see what can be done. From a semantics viewpoint __atomic_load_n(&x, __ATOMIC_RELAXED) would work for loading from x. The compilers that I tried currently do not fuse loads, but they are allowed to do so. Or is there something I am missing that would make this not work? Aside from compilers not yet optimizing this case. Thanx, Paul
On Wed, May 01, 2024 at 03:32:34PM -0700, Paul E. McKenney wrote: > On Wed, May 01, 2024 at 02:49:17PM -0700, Paul E. McKenney wrote: > > On Wed, May 01, 2024 at 02:20:35PM -0700, Linus Torvalds wrote: > > > On Wed, 1 May 2024 at 14:06, Linus Torvalds > > > <torvalds@linux-foundation.org> wrote: > > [ . . . ] > > > > I'd love to see an extension where "const volatile" basically means > > > exactly that: the volatile tells the compiler that it can't > > > rematerialize by doing the load multiple times, but the "const" would > > > say that if the compiler sees two or more accesses, it can still CSE > > > them. > > Except that "const volatile" already means "you cannot write to it, > and reads will not be fused". :-/ > > > No promises, other than that if we don't ask, they won't say "yes". > > > > Let me see what can be done. > > >From a semantics viewpoint __atomic_load_n(&x, __ATOMIC_RELAXED) would > work for loading from x. The compilers that I tried currently do not > fuse loads, but they are allowed to do so. > Yeah, I wonder the same, from what I read, "const volatile" seems to be just a (non-volatile) relaxed atomic load. Regards, Boqun > Or is there something I am missing that would make this not work? > Aside from compilers not yet optimizing this case. > > Thanx, Paul
On Thu, May 02, 2024 at 09:37:17AM -0700, Boqun Feng wrote:
> On Wed, May 01, 2024 at 03:32:34PM -0700, Paul E. McKenney wrote:
> > On Wed, May 01, 2024 at 02:49:17PM -0700, Paul E. McKenney wrote:
> > > On Wed, May 01, 2024 at 02:20:35PM -0700, Linus Torvalds wrote:
> > > > On Wed, 1 May 2024 at 14:06, Linus Torvalds
> > > > <torvalds@linux-foundation.org> wrote:
> >
> > [ . . . ]
> >
> > > > I'd love to see an extension where "const volatile" basically means
> > > > exactly that: the volatile tells the compiler that it can't
> > > > rematerialize by doing the load multiple times, but the "const" would
> > > > say that if the compiler sees two or more accesses, it can still CSE
> > > > them.
> >
> > Except that "const volatile" already means "you cannot write to it,
> > and reads will not be fused". :-/
> >
> > > No promises, other than that if we don't ask, they won't say "yes".
> > >
> > > Let me see what can be done.
> >
> > >From a semantics viewpoint __atomic_load_n(&x, __ATOMIC_RELAXED) would
> > work for loading from x. The compilers that I tried currently do not
> > fuse loads, but they are allowed to do so.
>
> Yeah, I wonder the same, from what I read, "const volatile" seems to
> be just a (non-volatile) relaxed atomic load.
Hmmm... Maybe something like this very lightly tested patch?
(I did not immediately see a use case for WRITE_ONCE_MERGEABLE(),
but that is likely a failure of imagination on my part.)
------------------------------------------------------------------------
diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h
index 8d0a6280e9824..55e87a8aec56f 100644
--- a/include/asm-generic/rwonce.h
+++ b/include/asm-generic/rwonce.h
@@ -79,6 +79,15 @@ unsigned long __read_once_word_nocheck(const void *addr)
(typeof(x))__read_once_word_nocheck(&(x)); \
})
+/*
+ * Use READ_ONCE_MERGEABLE() and WRITE_ONCE_MERGEABLE() when you need to
+ * avoid duplicating or tearing a load or store, respectively, but when
+ * it is OK to merge nearby loads and stores. It must also be OK for a
+ * later nearby load to take its value directly from a prior store.
+ */
+#define READ_ONCE_MERGEABLE(x) __atomic_load_n(&x, __ATOMIC_RELAXED)
+#define WRITE_ONCE_MERGEABLE(x, val) __atomic_store_n(&x, val, __ATOMIC_RELAXED)
+
static __no_kasan_or_inline
unsigned long read_word_at_a_time(const void *addr)
{
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d5507ac1bbf19..b37c0dbde8cde 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -459,7 +459,7 @@ static void adjust_jiffies_till_sched_qs(void)
return;
}
/* Otherwise, set to third fqs scan, but bound below on large system. */
- j = READ_ONCE(jiffies_till_first_fqs) +
+ j = READ_ONCE_MERGEABLE(jiffies_till_first_fqs) +
2 * READ_ONCE(jiffies_till_next_fqs);
if (j < HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV)
j = HZ / 10 + nr_cpu_ids / RCU_JIFFIES_FQS_DIV;
On Fri, 3 May 2024 at 16:59, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Hmmm... Maybe something like this very lightly tested patch?
I'm a bit nervous about using the built-in atomics, when it's not
clear what the compiler will do on various architectures.
Gcc documentation talks about __atomic_is_lock_free(), which makes me
think that on various architectures it might end up doing some "fall
back to helper functions" cases (possibly for odd architectures).
IOW: I don't think the patch is wrong, but I do think we need to
verify that all compilers we support generate the obvious code for
this, and we don't have some "fall back to function calls".
Linus
On Fri, May 03, 2024 at 05:14:22PM -0700, Linus Torvalds wrote:
> On Fri, 3 May 2024 at 16:59, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Hmmm... Maybe something like this very lightly tested patch?
>
> I'm a bit nervous about using the built-in atomics, when it's not
> clear what the compiler will do on various architectures.
>
> Gcc documentation talks about __atomic_is_lock_free(), which makes me
> think that on various architectures it might end up doing some "fall
> back to helper functions" cases (possibly for odd architectures).
Right now, both GCC and Clang complain if you give __atomic_load_n()
something other than a pointer or a sufficiently small scalar on x86.
Let's see, starting with READ_ONCE()...
ARM7-a Clang complains about even single bytes (ARM7-a GCC is
fine).
ARMv8 works like x86 for both GCC and Clang,
AVR GCC and M68K Clang generate calls to helper functions, so they
need to implement {READ,WRITE}_ONCE_MERGEABLE() in terms of the
current {READ,WRITE}_ONCE() macros.
M68K GCC works like x86, but generates a call to a helper function
for an 8-byte load. Which means that the 8-byte case needs to
generate a build error.
Hexagon Clang works like x86.
Loongarch GCC works like x86. Ditto S390, sh, and xtensa GCC.
MIPS Clang works like x86, but throws a build error for long long,
which might be OK given 32-bit. MIPS GCC handles long long also.
MIPS64 and MIPS EL GCC and Clang work like x86, as do both compilers
for POWERPC and POWERPC LE. And for RISC-V 32 and 64 bit.
I based these on this godbolt: https://godbolt.org/z/rrKnnE8nb
The #ifs on lines select the 8-byte and structure case, respectively,
and you can pick your compiler. I just used the latest versions
of each compiler for each architecture, so there might well be
a few more surprises.
> IOW: I don't think the patch is wrong, but I do think we need to
> verify that all compilers we support generate the obvious code for
> this, and we don't have some "fall back to function calls".
You are right, this is going to need some arch-specific code for a few
of the architectures. Hey, I was hoping!!!
The compilers do not currently optimize these things, but things appear
to me to be heading in that direction.
Thanx, Paul
On Fri, 3 May 2024 at 22:08, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> You are right, this is going to need some arch-specific code for a few
> of the architectures. Hey, I was hoping!!!
>
> The compilers do not currently optimize these things, but things appear
> to me to be heading in that direction.
Ok, so it sounds like right now it makes no sense - presumably
__atomic_load_n() doesn't actually generate better code than
READ_ONCE() does as-is, and we have the issue with having to make it
per-architecture anyway.
But maybe in a couple of years we can revisit this when / if it
actually generates better code and is more widely applicable.
Linus
On Sat, May 04, 2024 at 10:50:29AM -0700, Linus Torvalds wrote: > On Fri, 3 May 2024 at 22:08, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > You are right, this is going to need some arch-specific code for a few > > of the architectures. Hey, I was hoping!!! > > > > The compilers do not currently optimize these things, but things appear > > to me to be heading in that direction. > > Ok, so it sounds like right now it makes no sense - presumably > __atomic_load_n() doesn't actually generate better code than > READ_ONCE() does as-is, and we have the issue with having to make it > per-architecture anyway. > > But maybe in a couple of years we can revisit this when / if it > actually generates better code and is more widely applicable. Completely agreed. Here is my current thoughts for possible optimizations of non-volatile memory_order_relaxed atomics: o Loads from the same variable that can legitimately be reordered to be adjacent to one another can be fused into a single load. o Stores to the same variable that can legitimately be reordered to be adjacent to one another can be replaced by the last store in the series. o Loads and stores may not be invented. o The only way that a computation based on the value from a given load can instead use some other load is if the two loads are fused into a single load. Anything that I am missing? Thanx, Paul
On Sat, 4 May 2024 at 11:18, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Here is my current thoughts for possible optimizations of non-volatile
> memory_order_relaxed atomics:
>
> o Loads from the same variable that can legitimately be
> reordered to be adjacent to one another can be fused
> into a single load.
Let's call this "Rule 1"
I think you can extend this to also "can be forwarded from a previous store".
I also think you're too strict in saying "fused into a single load".
Let me show an example below.
> o Stores to the same variable that can legitimately be
> reordered to be adjacent to one another can be replaced
> by the last store in the series.
Rule 2.
Ack, although again, I think you're being a bit too strict in your
language, and the rule can be relaxed.
> o Loads and stores may not be invented.
Rule 3.
I think you can and should relax this. You can invent loads all you want.
> o The only way that a computation based on the value from
> a given load can instead use some other load is if the
> two loads are fused into a single load.
Rule 4.
I'm not convinced that makes sense, and I don't think it's true as written.
I think I understand what you are trying to say, but I think you're
saying it in a way that only confuses a compiler person.
In particular, the case I do not think is true is very much the
"spill" case: if you have code like this:
a = expression involving '__atomic_load_n(xyz, RELAXED)'
then it's perfectly fine to spill the result of that load and reload
the value. So the "computation based on the value" *is* actually based
on "some other load" (the reload).
I really *REALLY* think you need to explain the semantics in concrete
terms that a compiler writer would understand and agree with.
So to explain your rules to an actual compiler person (and relax the
semantics a bit) I would rewrite your rules as:
Rule 1: a strictly dominating load can be replaced by the value of a
preceding load or store
Ruie 2: a strictly dominating store can remove preceding stores
Rule 3: stores cannot be done speculatively (put another way: a
subsequent dominating store can only *remove* a store entirely, it
can't turn the store into one with speculative data)
Rule 4: loads cannot be rematerialized (ie a load can be *combined*
as per Rule 1, but a load cannot be *split* into two loads)
Anyway, let's get to the examples of *why* I think your language was
bad and your rules were too strict.
Let's start with your Rule 3, where you said:
- Loads and stores may not be invented
and while I think this should be very very true for stores, I think
inventing loads is not only valid, but a good idea. Example:
if (a)
b = __atomic_load_n(ptr) + 1;
can perfectly validly just be written as
tmp = __atomic_load_n(ptr);
if (a)
b = tmp+1;
which in turn may allow other optimizations (ie depending on how 'b'
is used, the conditional may go away entirely, and you just end up
with 'b = tmp+!!a').
There's nothing wrong with extra loads that aren't used.
And to make that more explicit, let's look at Rule 1:
Example of Rule 1 (together with the above case):
if (a)
b = __atomic_load_n(ptr) + 1;
else
b = __atomic_load_n(ptr) + 2;
c = __atomic_load_n(ptr) + 3;
then that can perfectly validly re-write this all as
tmp = __atomic_load_n(ptr);
b = a ? tmp+1 : tmp+2;
c = tmp + 3;
because my version of Rule 1 allows the dominating load used for 'c'
to be replaced by the value of a preceding load that was used for 'a'
and 'b'.
And to give an example of Rule 2, where you said "reordered to be
adjacent", I'm saying that all that matters is being strictly
dominant, so
if (a)
__atomic_store_n(ptr,1);
else
__atomic_store_n(ptr,2);
__atomic_store_n(ptr,3);
can be perfectly validly be combined into just
__atomic_store_n(ptr,3);
because the third store completely dominates the two others, even if
in the intermediate form they are not necessarily ever "adjacent".
(Your "adjacency" model might still be valid in how you could turn
first of the first stores to be a fall-through, then remove it, and
then turn the other to be a fall-through and then remove it, so maybe
your language isn't _tecnically_ wrong, But I think the whole
"dominating store" is how a compiler writer would think about it).
Anyway, the above are all normal optimizations that compilers
*already* do, and the only real issue is that with memory ordering,
the "dominance" thing will need to take into account the ordering
requirements of other memory operations with stricter memory ordering
in between. So, for example, if you have
__atomic_store_n(ptr,1, RELAXED);
__atomic_load_n(otherptr,2, ACQUIRE);
__atomic_store_n(ptr,2, RELAXED);
then the second store doesn't dominate the first store, because
there's a stricter memory ordering instruction in between.
But I think the *important* rule is that "Rule 4", which is what
actually gives us the true "atomicity" rule:
Loads cannot be rematerialized
IOW, when we do a
a = __atomic_load_n(ptr);
and use the value of 'a' at any point later, it can *never* have
turned into a re-load of that ptr value. Yes, the load might have been
combined with an earlier one, and the load may have been moved up
earlier (by the "you can invent loads and then combine them with the
later real one" rule).
So even atomic loads can be combined, they can be moved to outside of
loops etc. BUT THEY CANNOT BE RE-DONE.
I think that's really the fundamnetally different rule, and the one
that makes us use 'volatile' in many cases.
So we have a "READ_ONCE()" macro, but in many cases it's really
"READ_AT_MOST_ONCE()". It's usually not that you can't optimize the
read away - you can - but you absolutely must not turn it into two
reads.
Turning a read into two reads is what makes those TOCTOU issues problematic.
Honestly, I'd love to just have that "Rule 4" be there for *all*
variable accesses. Not just __atomic ones. If we had a compiler flag
like "-fno-TUCTOU", I'd enable it.
Linus
On Sat, May 04, 2024 at 12:11:10PM -0700, Linus Torvalds wrote:
> On Sat, 4 May 2024 at 11:18, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Here is my current thoughts for possible optimizations of non-volatile
> > memory_order_relaxed atomics:
> >
> > o Loads from the same variable that can legitimately be
> > reordered to be adjacent to one another can be fused
> > into a single load.
>
> Let's call this "Rule 1"
>
> I think you can extend this to also "can be forwarded from a previous store".
Agreed, with constraints based on intervening synchronization.
> I also think you're too strict in saying "fused into a single load".
> Let me show an example below.
I certainly did intend to make any errors in the direction of being
too strict.
> > o Stores to the same variable that can legitimately be
> > reordered to be adjacent to one another can be replaced
> > by the last store in the series.
>
> Rule 2.
>
> Ack, although again, I think you're being a bit too strict in your
> language, and the rule can be relaxed.
>
> > o Loads and stores may not be invented.
>
> Rule 3.
>
> I think you can and should relax this. You can invent loads all you want.
I might be misunderstanding you, but given my understanding, I disagree.
Consider this example:
x = __atomic_load(&a, RELAXED);
r0 = x * x + 2 * x + 1;
It would not be good for a load to be invented as follows:
x = __atomic_load(&a, RELAXED);
invented = __atomic_load(&a, RELAXED);
r0 = x * x + 2 * invented + 1;
In the first example, we know that r0 is a perfect square, at least
assuming that x is small enough to avoid wrapping. In the second
example, x might not be equal to the value from the invented load,
and r0 might not be a perfect square.
I believe that we really need the compiler to keep basic arithmetic
working.
That said, I agree that this disallows directly applying current
CSE optimizations, which might make some people sad. But we do need
code to work regardless.
Again, it is quite possible that I am misunderstanding you here.
> > o The only way that a computation based on the value from
> > a given load can instead use some other load is if the
> > two loads are fused into a single load.
>
> Rule 4.
>
> I'm not convinced that makes sense, and I don't think it's true as written.
>
> I think I understand what you are trying to say, but I think you're
> saying it in a way that only confuses a compiler person.
>
> In particular, the case I do not think is true is very much the
> "spill" case: if you have code like this:
>
> a = expression involving '__atomic_load_n(xyz, RELAXED)'
>
> then it's perfectly fine to spill the result of that load and reload
> the value. So the "computation based on the value" *is* actually based
> on "some other load" (the reload).
As in the result is stored to a compiler temporary and then reloaded
from that temporary? Agreed, that would be just fine. In contrast,
spilling and reloading from xyz would not be good at all.
> I really *REALLY* think you need to explain the semantics in concrete
> terms that a compiler writer would understand and agree with.
Experience would indicate that I should not dispute sentence. ;-)
> So to explain your rules to an actual compiler person (and relax the
> semantics a bit) I would rewrite your rules as:
>
> Rule 1: a strictly dominating load can be replaced by the value of a
> preceding load or store
>
> Ruie 2: a strictly dominating store can remove preceding stores
>
> Rule 3: stores cannot be done speculatively (put another way: a
> subsequent dominating store can only *remove* a store entirely, it
> can't turn the store into one with speculative data)
>
> Rule 4: loads cannot be rematerialized (ie a load can be *combined*
> as per Rule 1, but a load cannot be *split* into two loads)
I still believe that synchronization operations need a look-in, and
I am not sure what is being dominated in your Rules 1 and 2 (all
subsequent execution?), but let's proceed.
> Anyway, let's get to the examples of *why* I think your language was
> bad and your rules were too strict.
>
> Let's start with your Rule 3, where you said:
>
> - Loads and stores may not be invented
>
> and while I think this should be very very true for stores, I think
> inventing loads is not only valid, but a good idea. Example:
>
> if (a)
> b = __atomic_load_n(ptr) + 1;
>
> can perfectly validly just be written as
>
> tmp = __atomic_load_n(ptr);
> if (a)
> b = tmp+1;
>
> which in turn may allow other optimizations (ie depending on how 'b'
> is used, the conditional may go away entirely, and you just end up
> with 'b = tmp+!!a').
>
> There's nothing wrong with extra loads that aren't used.
From a functional viewpoint, if the value isn't used, then agreed,
inventing the load is harmless. But there are some code sequences where
I really wouldn't want the extra cache miss.
> And to make that more explicit, let's look at Rule 1:
>
> Example of Rule 1 (together with the above case):
>
> if (a)
> b = __atomic_load_n(ptr) + 1;
> else
> b = __atomic_load_n(ptr) + 2;
> c = __atomic_load_n(ptr) + 3;
>
> then that can perfectly validly re-write this all as
>
> tmp = __atomic_load_n(ptr);
> b = a ? tmp+1 : tmp+2;
> c = tmp + 3;
>
> because my version of Rule 1 allows the dominating load used for 'c'
> to be replaced by the value of a preceding load that was used for 'a'
> and 'b'.
OK, I thought that nodes early in the control-flow graph dominated
nodes that are later in that graph, but I am not a compiler expert.
In any case, I agree with this transformation. This is making three
loads into one load, and there is no intervening synchronization to gum
up the works.
> And to give an example of Rule 2, where you said "reordered to be
> adjacent", I'm saying that all that matters is being strictly
> dominant, so
>
> if (a)
> __atomic_store_n(ptr,1);
> else
> __atomic_store_n(ptr,2);
> __atomic_store_n(ptr,3);
>
> can be perfectly validly be combined into just
>
> __atomic_store_n(ptr,3);
>
> because the third store completely dominates the two others, even if
> in the intermediate form they are not necessarily ever "adjacent".
I agree with this transformation as well. But suppose that the code
also contained an smp_mb() right after that "if" statement. Given that,
it is not hard to construct a larger example in which dropping the first
two stores would be problematic.
> (Your "adjacency" model might still be valid in how you could turn
> first of the first stores to be a fall-through, then remove it, and
> then turn the other to be a fall-through and then remove it, so maybe
> your language isn't _tecnically_ wrong, But I think the whole
> "dominating store" is how a compiler writer would think about it).
I was thinking in terms of first transforming the code as follows:
if (a) {
__atomic_store_n(ptr,1);
__atomic_store_n(ptr,3);
} else {
__atomic_store_n(ptr,2);
__atomic_store_n(ptr,3);
}
(And no, I would not expect a real compiler to do this!)
Then it is clearly OK to further transform into the following:
if (a) {
__atomic_store_n(ptr,3);
} else {
__atomic_store_n(ptr,3);
}
At which point both branches of the "if" statement are doing the
same thing, so:
__atomic_store_n(ptr,3);
On to your next email!
Thanx, Paul
On Sat, 4 May 2024 at 12:11, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anyway, the above are all normal optimizations that compilers
> *already* do, and the only real issue is that with memory ordering,
> the "dominance" thing will need to take into account the ordering
> requirements of other memory operations with stricter memory ordering
> in between. So, for example, if you have
>
> __atomic_store_n(ptr,1, RELAXED);
> __atomic_load_n(otherptr,2, ACQUIRE);
> __atomic_store_n(ptr,2, RELAXED);
>
> then the second store doesn't dominate the first store, because
> there's a stricter memory ordering instruction in between.
Actually, that was a bad example, because in that pattern, the second
store does actually dominate (the second store can not move up beyond
the ACQUIRE, but the first store can indeed move down after it, so
dominance analysis does actually allow the second store to strictly
dominate the first one).
So the ACQUIRE would need to be SEQ for my example to be valid.
Of course, usually the barrier that stops domination is something
entirely different. Even without an actual conditional (which is
almost certainly the most common reason why dominance goes away), it
might be a function call (which could do any arbitrary memory ordering
operations - there was a clang bug in this very case) or something
like an explicit full memory barrier.
Anyway, take that email as a "written in the MUA on the fly". There
might be other thinkos in there. But I think the big picture was
right.
Linus
On Sat, May 04, 2024 at 12:25:12PM -0700, Linus Torvalds wrote:
> On Sat, 4 May 2024 at 12:11, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Anyway, the above are all normal optimizations that compilers
> > *already* do, and the only real issue is that with memory ordering,
> > the "dominance" thing will need to take into account the ordering
> > requirements of other memory operations with stricter memory ordering
> > in between. So, for example, if you have
> >
> > __atomic_store_n(ptr,1, RELAXED);
> > __atomic_load_n(otherptr,2, ACQUIRE);
> > __atomic_store_n(ptr,2, RELAXED);
I am assuming that I should ignore the "2," in that load.
> > then the second store doesn't dominate the first store, because
> > there's a stricter memory ordering instruction in between.
>
> Actually, that was a bad example, because in that pattern, the second
> store does actually dominate (the second store can not move up beyond
> the ACQUIRE, but the first store can indeed move down after it, so
> dominance analysis does actually allow the second store to strictly
> dominate the first one).
Agreed, and the stores can be combined as a result of the fact that
the two stores can be reordered to be adjacent to one another.
> So the ACQUIRE would need to be SEQ for my example to be valid.
And here the C and C++ memory models get very strange due to mixing
memory_order_seq_cst and non-memory_order_seq_cst.
But if there was a Linux-kernel smp_mb() immediately after that first
store, then the compiler would not be allowed to combine the two stores.
Though that is true regardless because of the smp_mb()'s barrier().
> Of course, usually the barrier that stops domination is something
> entirely different. Even without an actual conditional (which is
> almost certainly the most common reason why dominance goes away), it
> might be a function call (which could do any arbitrary memory ordering
> operations - there was a clang bug in this very case) or something
> like an explicit full memory barrier.
If there was something like, then the two stores could not be combined,
from what I can see.
__atomic_store_n(ptr,1, RELAXED);
__atomic_load_n(otherptr, ACQUIRE);
__atomic_store_n(otherptr, 4, RELEASE);
__atomic_store_n(ptr,2, RELAXED);
I freely confess that I don't know how to map that into dominance
relations, which means that I have no idea what this example means
in terms of your rules.
> Anyway, take that email as a "written in the MUA on the fly". There
> might be other thinkos in there. But I think the big picture was
> right.
If things go as they usually do, there will be quite a few litmus tests
between here and something credible. ;-)
Thank you for the tutorial on dominance in CFGs!
Thanx, Paul
© 2016 - 2025 Red Hat, Inc.