Underflow of f_count needs to be more carefully detected than it
currently is. The results of get_file() should be checked, but the
first step is detection. Redefine f_count from atomic_long_t to
refcount_long_t.
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
---
fs/file.c | 4 ++--
fs/file_table.c | 6 +++---
include/linux/fs.h | 6 +++---
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index 3b683b9101d8..570424dd634b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -865,7 +865,7 @@ static struct file *__get_file_rcu(struct file __rcu **f)
if (!file)
return NULL;
- if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+ if (unlikely(!refcount_long_inc_not_zero(&file->f_count)))
return ERR_PTR(-EAGAIN);
file_reloaded = rcu_dereference_raw(*f);
@@ -987,7 +987,7 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
* barrier. We only really need an 'acquire' one to
* protect the loads below, but we don't have that.
*/
- if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+ if (unlikely(!refcount_long_inc_not_zero(&file->f_count)))
continue;
/*
diff --git a/fs/file_table.c b/fs/file_table.c
index 4f03beed4737..f29e7b94bca1 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -167,7 +167,7 @@ static int init_file(struct file *f, int flags, const struct cred *cred)
* fget-rcu pattern users need to be able to handle spurious
* refcount bumps we should reinitialize the reused file first.
*/
- atomic_long_set(&f->f_count, 1);
+ refcount_long_set(&f->f_count, 1);
return 0;
}
@@ -470,7 +470,7 @@ static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
void fput(struct file *file)
{
- if (atomic_long_dec_and_test(&file->f_count)) {
+ if (refcount_long_dec_and_test(&file->f_count)) {
struct task_struct *task = current;
if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
@@ -503,7 +503,7 @@ void fput(struct file *file)
*/
void __fput_sync(struct file *file)
{
- if (atomic_long_dec_and_test(&file->f_count))
+ if (refcount_long_dec_and_test(&file->f_count))
__fput(file);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 210bbbfe9b83..b8f6cce7c39d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1001,7 +1001,7 @@ struct file {
*/
spinlock_t f_lock;
fmode_t f_mode;
- atomic_long_t f_count;
+ refcount_long_t f_count;
struct mutex f_pos_lock;
loff_t f_pos;
unsigned int f_flags;
@@ -1038,7 +1038,7 @@ struct file_handle {
static inline struct file *get_file(struct file *f)
{
- if (unlikely(!atomic_long_inc_not_zero(&f->f_count)))
+ if (unlikely(!refcount_long_inc_not_zero(&f->f_count)))
return NULL;
return f;
}
@@ -1046,7 +1046,7 @@ static inline struct file *get_file(struct file *f)
struct file *get_file_rcu(struct file __rcu **f);
struct file *get_file_active(struct file **f);
-#define file_count(x) atomic_long_read(&(x)->f_count)
+#define file_count(x) refcount_long_read(&(x)->f_count)
#define MAX_NON_LFS ((1UL<<31) - 1)
--
2.34.1
On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote: > Underflow of f_count needs to be more carefully detected than it > currently is. The results of get_file() should be checked, but the > first step is detection. Redefine f_count from atomic_long_t to > refcount_long_t. It is used on fairly hot paths. What's more, it's not at all obvious what the hell would right semantics be. NAKed-by: Al Viro <viro@zeniv.linux.org.uk>
On Thu, May 02, 2024 at 11:42:50PM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote: > > Underflow of f_count needs to be more carefully detected than it > > currently is. The results of get_file() should be checked, but the > > first step is detection. Redefine f_count from atomic_long_t to > > refcount_long_t. > > It is used on fairly hot paths. What's more, it's not > at all obvious what the hell would right semantics be. I think we've put performance concerns between refcount_t and atomic_t to rest long ago. If there is a real workload where it's a problem, let's find it! :) As for semantics, what do you mean? Detecting dec-below-zero means we catch underflow, and detected inc-from-zero means we catch resurrection attempts. In both cases we avoid double-free, but we have already lost to a potential dangling reference to a freed struct file. But just letting f_count go bad seems dangerous. -- Kees Cook
On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > As for semantics, what do you mean? Detecting dec-below-zero means we > catch underflow, and detected inc-from-zero means we catch resurrection > attempts. In both cases we avoid double-free, but we have already lost > to a potential dangling reference to a freed struct file. But just > letting f_count go bad seems dangerous. Detected inc-from-zero can also mean an RCU lookup detecting a descriptor in the middle of getting closed. And it's more subtle than that, actually, thanks to SLAB_TYPESAFE_BY_RCU for struct file.
On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > > > As for semantics, what do you mean? Detecting dec-below-zero means we > > catch underflow, and detected inc-from-zero means we catch resurrection > > attempts. In both cases we avoid double-free, but we have already lost > > to a potential dangling reference to a freed struct file. But just > > letting f_count go bad seems dangerous. > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor > in the middle of getting closed. And it's more subtle than that, actually, > thanks to SLAB_TYPESAFE_BY_RCU for struct file. But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be impossible for a simple get_file() to ever see a 0 f_count under normal conditions? -- Kees Cook
On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote:
> On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> >
> > > As for semantics, what do you mean? Detecting dec-below-zero means we
> > > catch underflow, and detected inc-from-zero means we catch resurrection
> > > attempts. In both cases we avoid double-free, but we have already lost
> > > to a potential dangling reference to a freed struct file. But just
> > > letting f_count go bad seems dangerous.
> >
> > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> > in the middle of getting closed. And it's more subtle than that, actually,
> > thanks to SLAB_TYPESAFE_BY_RCU for struct file.
>
> But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
> impossible for a simple get_file() to ever see a 0 f_count under normal
> conditions?
For get_file() it is impossible. The comment about semantics had been
about the sane ways to recover if such crap gets detected.
__get_file_rcu() is a separate story - consider the comment in there:
* atomic_long_inc_not_zero() above provided a full memory
* barrier when we acquired a reference.
*
* This is paired with the write barrier from assigning to the
* __rcu protected file pointer so that if that pointer still
* matches the current file, we know we have successfully
* acquired a reference to the right file.
and IIRC, refcount_t is weaker wrt barriers.
On Fri, May 03, 2024 at 12:41:52AM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote: > > On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote: > > > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote: > > > > > > > As for semantics, what do you mean? Detecting dec-below-zero means we > > > > catch underflow, and detected inc-from-zero means we catch resurrection > > > > attempts. In both cases we avoid double-free, but we have already lost > > > > to a potential dangling reference to a freed struct file. But just > > > > letting f_count go bad seems dangerous. > > > > > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor > > > in the middle of getting closed. And it's more subtle than that, actually, > > > thanks to SLAB_TYPESAFE_BY_RCU for struct file. > > > > But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be > > impossible for a simple get_file() to ever see a 0 f_count under normal > > conditions? > > For get_file() it is impossible. The comment about semantics had been > about the sane ways to recover if such crap gets detected. > > __get_file_rcu() is a separate story - consider the comment in there: > * atomic_long_inc_not_zero() above provided a full memory > * barrier when we acquired a reference. > * > * This is paired with the write barrier from assigning to the > * __rcu protected file pointer so that if that pointer still > * matches the current file, we know we have successfully > * acquired a reference to the right file. > > and IIRC, refcount_t is weaker wrt barriers. I think that was also fixed for refcount_t. I'll need to go dig out the commit... But anyway, there needs to be a general "oops I hit 0"-aware form of get_file(), and it seems like it should just be get_file() itself... -- Kees Cook
On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > But anyway, there needs to be a general "oops I hit 0"-aware form of > get_file(), and it seems like it should just be get_file() itself... ... which brings back the question of what's the sane damage mitigation for that. Adding arseloads of never-exercised failure exits is generally a bad idea - it's asking for bitrot and making the thing harder to review in future.
On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > get_file(), and it seems like it should just be get_file() itself... > > ... which brings back the question of what's the sane damage mitigation > for that. Adding arseloads of never-exercised failure exits is generally > a bad idea - it's asking for bitrot and making the thing harder to review > in future. Linus seems to prefer best-effort error recovery to sprinkling BUG()s around. But if that's really the solution, then how about get_file() switching to to use inc_not_zero and BUG on 0? -- Kees Cook
On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > get_file(), and it seems like it should just be get_file() itself... > > > > ... which brings back the question of what's the sane damage mitigation > > for that. Adding arseloads of never-exercised failure exits is generally > > a bad idea - it's asking for bitrot and making the thing harder to review > > in future. > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > around. But if that's really the solution, then how about get_file() > switching to to use inc_not_zero and BUG on 0? Making get_file() return an error is not an option. For all current callers that's pointless churn for a condition that's not supposed to happen at all. Additionally, iirc *_inc_not_zero() variants are implemented with try_cmpxchg() which scales poorly under contention for a condition that's not supposed to happen.
On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote: > On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > > get_file(), and it seems like it should just be get_file() itself... > > > > > > ... which brings back the question of what's the sane damage mitigation > > > for that. Adding arseloads of never-exercised failure exits is generally > > > a bad idea - it's asking for bitrot and making the thing harder to review > > > in future. > > > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > > around. But if that's really the solution, then how about get_file() > > switching to to use inc_not_zero and BUG on 0? > > Making get_file() return an error is not an option. For all current > callers that's pointless churn for a condition that's not supposed to > happen at all. > > Additionally, iirc *_inc_not_zero() variants are implemented with > try_cmpxchg() which scales poorly under contention for a condition > that's not supposed to happen. unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count); WARN_ON(!old); Or somesuch might be an option?
On Fri, May 03, 2024 at 12:36:14PM +0200, Peter Zijlstra wrote: > On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote: > > On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > > > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > > > get_file(), and it seems like it should just be get_file() itself... > > > > > > > > ... which brings back the question of what's the sane damage mitigation > > > > for that. Adding arseloads of never-exercised failure exits is generally > > > > a bad idea - it's asking for bitrot and making the thing harder to review > > > > in future. > > > > > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > > > around. But if that's really the solution, then how about get_file() > > > switching to to use inc_not_zero and BUG on 0? > > > > Making get_file() return an error is not an option. For all current > > callers that's pointless churn for a condition that's not supposed to > > happen at all. > > > > Additionally, iirc *_inc_not_zero() variants are implemented with > > try_cmpxchg() which scales poorly under contention for a condition > > that's not supposed to happen. > > unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count); > WARN_ON(!old); > > Or somesuch might be an option? Yeah, I'd be fine with that. WARN_ON() (or WARN_ON_ONCE() even?) and then people can do their panic_on_warn stuff to get the BUG_ON() behavior if they want to.
© 2016 - 2025 Red Hat, Inc.