fs/backing-file.c | 27 +++++------------- fs/overlayfs/copy_up.c | 4 +-- fs/overlayfs/dir.c | 22 +++++++-------- fs/overlayfs/file.c | 63 +++++++++++++++++------------------------- fs/overlayfs/inode.c | 60 +++++++++++++++------------------------- fs/overlayfs/namei.c | 22 ++++----------- fs/overlayfs/readdir.c | 16 +++-------- fs/overlayfs/util.c | 25 ++++++++--------- fs/overlayfs/xattrs.c | 33 +++++++++------------- include/linux/cred.h | 25 +++++++++++++++++ kernel/cred.c | 6 ++-- 11 files changed, 127 insertions(+), 176 deletions(-)
Hi,
Changes from RFC v3:
- Removed the warning "fixes" patches, as they could hide potencial
bugs (Christian Brauner);
- Added "cred-specific" macros (Christian Brauner), from my side,
added a few '_' to the guards to signify that the newly introduced
helper macros are preferred.
- Changed a few guard() to scoped_guard() to fix the clang (17.0.6)
compilation error about 'goto' bypassing variable initialization;
Link to RFC v3:
https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/
Changes from RFC v2:
- Added separate patches for the warnings for the discarded const
when using the cleanup macros: one for DEFINE_GUARD() and one for
DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them
together);
- Reordered the series so the backing file patch is the first user of
the introduced helpers (Amir Goldstein);
- Change the definition of the cleanup "class" from a GUARD to a
LOCK_GUARD_1, which defines an implicit container, that allows us
to remove some variable declarations to store the overriden
credentials (Amir Goldstein);
- Replaced most of the uses of scoped_guard() with guard(), to reduce
the code churn, the remaining ones I wasn't sure if I was changing
the behavior: either they were nested (overrides "inside"
overrides) or something calls current_cred() (Amir Goldstein).
New questions:
- The backing file callbacks are now called with the "light"
overriden credentials, so they are kind of restricted in what they
can do with their credentials, is this acceptable in general?
- in ovl_rename() I had to manually call the "light" the overrides,
both using the guard() macro or using the non-light version causes
the workload to crash the kernel. I still have to investigate why
this is happening. Hints are appreciated.
Link to the RFC v2:
https://lore.kernel.org/r/20240125235723.39507-1-vinicius.gomes@intel.com/
Original cover letter (lightly edited):
It was noticed that some workloads suffer from contention on
increasing/decrementing the ->usage counter in their credentials,
those refcount operations are associated with overriding/reverting the
current task credentials. (the linked thread adds more context)
In some specialized cases, overlayfs is one of them, the credentials
in question have a longer lifetime than the override/revert "critical
section". In the overlayfs case, the credentials are created when the
fs is mounted and destroyed when it's unmounted. In this case of long
lived credentials, the usage counter doesn't need to be
incremented/decremented.
Add a lighter version of credentials override/revert to be used in
these specialized cases. To make sure that the override/revert calls
are paired, add a cleanup guard macro. This was suggested here:
https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/
With a small number of tweaks:
- Used inline functions instead of macros;
- A small change to store the credentials into the passed argument,
the guard is now defined as (note the added '_T ='):
DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T),
revert_creds_light(_T));
- Allow "const" arguments to be used with these kind of guards;
Some comments:
- If patch 1/5 and 2/5 are not a good idea (adding the cast), the
alternative I can see is using some kind of container for the
credentials;
- The only user for the backing file ops is overlayfs, so these
changes make sense, but may not make sense in the most general
case;
For the numbers, some from 'perf c2c', before this series:
(edited to fit)
#
# ----- HITM ----- Shared
# Num RmtHitm LclHitm Symbol Object Source:Line Node
# ..... ....... ....... .......................... ................ .................. ....
#
-------------------------
0 412 1028
-------------------------
41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1
15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1
0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1
0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1
32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1
0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0
-------------------------
1 50 103
-------------------------
100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
-------------------------
2 50 98
-------------------------
96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1
0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0
2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1
after this series:
#
# ----- HITM ----- Shared
# Num RmtHitm LclHitm Symbol Object Source:Line Node
# ..... ....... ....... .................... ................ ................ ....
#
-------------------------
0 54 88
-------------------------
100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
-------------------------
1 48 83
-------------------------
97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1
2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1
0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1
-------------------------
2 28 44
-------------------------
85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1
The contention is practically gone.
Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@intel.com/
Vinicius Costa Gomes (3):
cred: Add a light version of override/revert_creds()
fs: Optimize credentials reference count for backing file ops
overlayfs: Optimize credentials usage
fs/backing-file.c | 27 +++++-------------
fs/overlayfs/copy_up.c | 4 +--
fs/overlayfs/dir.c | 22 +++++++--------
fs/overlayfs/file.c | 63 +++++++++++++++++-------------------------
fs/overlayfs/inode.c | 60 +++++++++++++++-------------------------
fs/overlayfs/namei.c | 22 ++++-----------
fs/overlayfs/readdir.c | 16 +++--------
fs/overlayfs/util.c | 25 ++++++++---------
fs/overlayfs/xattrs.c | 33 +++++++++-------------
include/linux/cred.h | 25 +++++++++++++++++
kernel/cred.c | 6 ++--
11 files changed, 127 insertions(+), 176 deletions(-)
--
2.44.0
On Tue, Apr 02, 2024 at 07:18:05PM -0700, Vinicius Costa Gomes wrote: > Hi, > > Changes from RFC v3: > - Removed the warning "fixes" patches, as they could hide potencial > bugs (Christian Brauner); > - Added "cred-specific" macros (Christian Brauner), from my side, > added a few '_' to the guards to signify that the newly introduced > helper macros are preferred. > - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) > compilation error about 'goto' bypassing variable initialization; > > Link to RFC v3: > > https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/ > > Changes from RFC v2: > - Added separate patches for the warnings for the discarded const > when using the cleanup macros: one for DEFINE_GUARD() and one for > DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them > together); > - Reordered the series so the backing file patch is the first user of > the introduced helpers (Amir Goldstein); > - Change the definition of the cleanup "class" from a GUARD to a > LOCK_GUARD_1, which defines an implicit container, that allows us > to remove some variable declarations to store the overriden > credentials (Amir Goldstein); > - Replaced most of the uses of scoped_guard() with guard(), to reduce > the code churn, the remaining ones I wasn't sure if I was changing > the behavior: either they were nested (overrides "inside" > overrides) or something calls current_cred() (Amir Goldstein). > > New questions: > - The backing file callbacks are now called with the "light" > overriden credentials, so they are kind of restricted in what they > can do with their credentials, is this acceptable in general? Until we grow additional users, I think yes. Just needs to be documented. > - in ovl_rename() I had to manually call the "light" the overrides, > both using the guard() macro or using the non-light version causes > the workload to crash the kernel. I still have to investigate why > this is happening. Hints are appreciated. Do you have a reproducer? Do you have a splat from dmesg?
Christian Brauner <brauner@kernel.org> writes: > On Tue, Apr 02, 2024 at 07:18:05PM -0700, Vinicius Costa Gomes wrote: >> Hi, >> >> Changes from RFC v3: >> - Removed the warning "fixes" patches, as they could hide potencial >> bugs (Christian Brauner); >> - Added "cred-specific" macros (Christian Brauner), from my side, >> added a few '_' to the guards to signify that the newly introduced >> helper macros are preferred. >> - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) >> compilation error about 'goto' bypassing variable initialization; >> >> Link to RFC v3: >> >> https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/ >> >> Changes from RFC v2: >> - Added separate patches for the warnings for the discarded const >> when using the cleanup macros: one for DEFINE_GUARD() and one for >> DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them >> together); >> - Reordered the series so the backing file patch is the first user of >> the introduced helpers (Amir Goldstein); >> - Change the definition of the cleanup "class" from a GUARD to a >> LOCK_GUARD_1, which defines an implicit container, that allows us >> to remove some variable declarations to store the overriden >> credentials (Amir Goldstein); >> - Replaced most of the uses of scoped_guard() with guard(), to reduce >> the code churn, the remaining ones I wasn't sure if I was changing >> the behavior: either they were nested (overrides "inside" >> overrides) or something calls current_cred() (Amir Goldstein). >> >> New questions: >> - The backing file callbacks are now called with the "light" >> overriden credentials, so they are kind of restricted in what they >> can do with their credentials, is this acceptable in general? > > Until we grow additional users, I think yes. Just needs to be > documented. > Will add some documentation for it, then. >> - in ovl_rename() I had to manually call the "light" the overrides, >> both using the guard() macro or using the non-light version causes >> the workload to crash the kernel. I still have to investigate why >> this is happening. Hints are appreciated. > > Do you have a reproducer? Do you have a splat from dmesg? Just to be sure, with this version of the series the crash doesn't happen. It was only happening when I was using the guard() macro everywhere. I just looked at my crash collection and couldn't find the splats, from what I remember I lost connection to the machine, and wasn't able to retrieve the splat. I believe the crash and clang 17 compilation error point to the same problem, that in ovl_rename() some 'goto' skips the declaration of the (implicit) variable that the guard() macro generates. And it ends up doing a revert_creds_light() on garbage memory when ovl_rename() returns. (if you want I can try and go back to "guard() everywhere" and try a bit harder to get a splat) Does that make sense? Cheers, -- Vinicius
On Wed, Apr 24, 2024 at 12:15:25PM -0700, Vinicius Costa Gomes wrote: > Christian Brauner <brauner@kernel.org> writes: > > > On Tue, Apr 02, 2024 at 07:18:05PM -0700, Vinicius Costa Gomes wrote: > >> Hi, > >> > >> Changes from RFC v3: > >> - Removed the warning "fixes" patches, as they could hide potencial > >> bugs (Christian Brauner); > >> - Added "cred-specific" macros (Christian Brauner), from my side, > >> added a few '_' to the guards to signify that the newly introduced > >> helper macros are preferred. > >> - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) > >> compilation error about 'goto' bypassing variable initialization; > >> > >> Link to RFC v3: > >> > >> https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/ > >> > >> Changes from RFC v2: > >> - Added separate patches for the warnings for the discarded const > >> when using the cleanup macros: one for DEFINE_GUARD() and one for > >> DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them > >> together); > >> - Reordered the series so the backing file patch is the first user of > >> the introduced helpers (Amir Goldstein); > >> - Change the definition of the cleanup "class" from a GUARD to a > >> LOCK_GUARD_1, which defines an implicit container, that allows us > >> to remove some variable declarations to store the overriden > >> credentials (Amir Goldstein); > >> - Replaced most of the uses of scoped_guard() with guard(), to reduce > >> the code churn, the remaining ones I wasn't sure if I was changing > >> the behavior: either they were nested (overrides "inside" > >> overrides) or something calls current_cred() (Amir Goldstein). > >> > >> New questions: > >> - The backing file callbacks are now called with the "light" > >> overriden credentials, so they are kind of restricted in what they > >> can do with their credentials, is this acceptable in general? > > > > Until we grow additional users, I think yes. Just needs to be > > documented. > > > > Will add some documentation for it, then. > > >> - in ovl_rename() I had to manually call the "light" the overrides, > >> both using the guard() macro or using the non-light version causes > >> the workload to crash the kernel. I still have to investigate why > >> this is happening. Hints are appreciated. > > > > Do you have a reproducer? Do you have a splat from dmesg? > > Just to be sure, with this version of the series the crash doesn't > happen. It was only happening when I was using the guard() macro > everywhere. > > I just looked at my crash collection and couldn't find the splats, from > what I remember I lost connection to the machine, and wasn't able to > retrieve the splat. > > I believe the crash and clang 17 compilation error point to the same > problem, that in ovl_rename() some 'goto' skips the declaration of the > (implicit) variable that the guard() macro generates. And it ends up > doing a revert_creds_light() on garbage memory when ovl_rename() > returns. If this is a compiler bug this warrants at least a comment in the commit message because right now people will be wondering why that place doesn't use a guard. Ideally we can just use guards everywhere though and report this as a bug against clang, I think. > > (if you want I can try and go back to "guard() everywhere" and try a bit > harder to get a splat) > > Does that make sense? Yes.
Christian Brauner <brauner@kernel.org> writes: > On Wed, Apr 24, 2024 at 12:15:25PM -0700, Vinicius Costa Gomes wrote: >> Christian Brauner <brauner@kernel.org> writes: >> >> > On Tue, Apr 02, 2024 at 07:18:05PM -0700, Vinicius Costa Gomes wrote: >> >> Hi, >> >> >> >> Changes from RFC v3: >> >> - Removed the warning "fixes" patches, as they could hide potencial >> >> bugs (Christian Brauner); >> >> - Added "cred-specific" macros (Christian Brauner), from my side, >> >> added a few '_' to the guards to signify that the newly introduced >> >> helper macros are preferred. >> >> - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) >> >> compilation error about 'goto' bypassing variable initialization; >> >> >> >> Link to RFC v3: >> >> >> >> https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/ >> >> >> >> Changes from RFC v2: >> >> - Added separate patches for the warnings for the discarded const >> >> when using the cleanup macros: one for DEFINE_GUARD() and one for >> >> DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them >> >> together); >> >> - Reordered the series so the backing file patch is the first user of >> >> the introduced helpers (Amir Goldstein); >> >> - Change the definition of the cleanup "class" from a GUARD to a >> >> LOCK_GUARD_1, which defines an implicit container, that allows us >> >> to remove some variable declarations to store the overriden >> >> credentials (Amir Goldstein); >> >> - Replaced most of the uses of scoped_guard() with guard(), to reduce >> >> the code churn, the remaining ones I wasn't sure if I was changing >> >> the behavior: either they were nested (overrides "inside" >> >> overrides) or something calls current_cred() (Amir Goldstein). >> >> >> >> New questions: >> >> - The backing file callbacks are now called with the "light" >> >> overriden credentials, so they are kind of restricted in what they >> >> can do with their credentials, is this acceptable in general? >> > >> > Until we grow additional users, I think yes. Just needs to be >> > documented. >> > >> >> Will add some documentation for it, then. >> >> >> - in ovl_rename() I had to manually call the "light" the overrides, >> >> both using the guard() macro or using the non-light version causes >> >> the workload to crash the kernel. I still have to investigate why >> >> this is happening. Hints are appreciated. >> > >> > Do you have a reproducer? Do you have a splat from dmesg? >> >> Just to be sure, with this version of the series the crash doesn't >> happen. It was only happening when I was using the guard() macro >> everywhere. >> >> I just looked at my crash collection and couldn't find the splats, from >> what I remember I lost connection to the machine, and wasn't able to >> retrieve the splat. >> >> I believe the crash and clang 17 compilation error point to the same >> problem, that in ovl_rename() some 'goto' skips the declaration of the >> (implicit) variable that the guard() macro generates. And it ends up >> doing a revert_creds_light() on garbage memory when ovl_rename() >> returns. > > If this is a compiler bug this warrants at least a comment in the commit > message because right now people will be wondering why that place > doesn't use a guard. Ideally we can just use guards everywhere though > and report this as a bug against clang, I think. > I am seeing this like a bug/mising feature in gcc (at least in the version I was using), as clang (correctly) refuses to compile the buggy code (I agree with the error). But I will add a comment to the code explaining why guard() cannot be used in that case. Cheers, -- Vinicius
On Thu, Apr 25, 2024 at 10:12:34AM -0700, Vinicius Costa Gomes wrote: > Christian Brauner <brauner@kernel.org> writes: > > > On Wed, Apr 24, 2024 at 12:15:25PM -0700, Vinicius Costa Gomes wrote: > >> I believe the crash and clang 17 compilation error point to the same > >> problem, that in ovl_rename() some 'goto' skips the declaration of the > >> (implicit) variable that the guard() macro generates. And it ends up > >> doing a revert_creds_light() on garbage memory when ovl_rename() > >> returns. > > > > If this is a compiler bug this warrants at least a comment in the commit > > message because right now people will be wondering why that place > > doesn't use a guard. Ideally we can just use guards everywhere though > > and report this as a bug against clang, I think. > > > > I am seeing this like a bug/mising feature in gcc (at least in the > version I was using), as clang (correctly) refuses to compile the buggy > code (I agree with the error). Indeed, your description of the issue and the fact clang refuses to compile the problematic code makes me think that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91951 is the relevant GCC issue. As an aside, just in case it comes up in the future, there is a potential issue in clang's scope checking where it would attempt to validate all labels in a function as potential destinations of 'asm goto()' instances in that same function, rather than just the labels that the 'asm goto()' could jump to, which can lead to false positive errors about jumping past the initialization of a variable declared with cleanup. https://github.com/ClangBuiltLinux/linux/issues/1886 https://github.com/ClangBuiltLinux/linux/issues/2003 Cheers, Nathan
On Wed, 3 Apr 2024 at 04:18, Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote: > - in ovl_rename() I had to manually call the "light" the overrides, > both using the guard() macro or using the non-light version causes > the workload to crash the kernel. I still have to investigate why > this is happening. Hints are appreciated. Don't know. Well, there's nesting (in ovl_nlink_end()) but I don't see why that should be an issue. I see why Amir suggested moving away from scoped guards, but that also introduces the possibility of subtle bugs if we don't audit every one of those sites carefully... Maybe patchset should be restructured to first do the override_creds_light() conversion without guards, and then move over to guards. Or the other way round, I don't have a preference. But mixing these two independent changes doesn't sound like a great idea in any case. Thanks, Miklos
Miklos Szeredi <miklos@szeredi.hu> writes: > On Wed, 3 Apr 2024 at 04:18, Vinicius Costa Gomes > <vinicius.gomes@intel.com> wrote: > >> - in ovl_rename() I had to manually call the "light" the overrides, >> both using the guard() macro or using the non-light version causes >> the workload to crash the kernel. I still have to investigate why >> this is happening. Hints are appreciated. > > Don't know. Well, there's nesting (in ovl_nlink_end()) but I don't > see why that should be an issue. > > I see why Amir suggested moving away from scoped guards, but that also > introduces the possibility of subtle bugs if we don't audit every one > of those sites carefully... > > Maybe patchset should be restructured to first do the > override_creds_light() conversion without guards, and then move over > to guards. Or the other way round, I don't have a preference. But > mixing these two independent changes doesn't sound like a great idea > in any case. Sounds good. Here's I am thinking: patch 1: introduce *_creds_light() patch 2: move backing-file.c to *_creds_light() patch 3: move overlayfs to *_creds_light() patch 4: introduce the guard helpers patch 5: move backing-file.c to the guard helpers patch 6: move overlayfs to the guard helpers (and yeah, the subject of the patches will be better than these ;-) Is this what you had in mind? Cheers, -- Vinicius
© 2016 - 2026 Red Hat, Inc.