fs/backing-file.c | 20 ++++++++++---------- fs/overlayfs/copy_up.c | 2 +- fs/overlayfs/dir.c | 17 +++++++++++------ fs/overlayfs/file.c | 14 +++++++------- fs/overlayfs/inode.c | 20 ++++++++++---------- fs/overlayfs/namei.c | 10 +++++----- fs/overlayfs/overlayfs.h | 1 + fs/overlayfs/readdir.c | 8 ++++---- fs/overlayfs/util.c | 11 ++++++++--- fs/overlayfs/xattrs.c | 9 ++++----- include/linux/cred.h | 18 ++++++++++++++++++ kernel/cred.c | 6 +++--- 12 files changed, 82 insertions(+), 54 deletions(-)
Hi, Changes from v3: - Another reorganization of the series: separate the pure mechanical changes into their own (Amir Goldstein) The series now reads: Patch 1: Introduce the _light() version of the override/revert cred operations; Patch 2: Convert backing-file.c to use those; Patch 3: Mechanical change to introduce the ovl_revert_creds() helper; Patch 4: Make the ovl_{override,convert}_creds() use the _light() creds helpers, and fix the reference counting issue that would happen; Changes from v2: - Removed the "convert to guard()/scoped_guard()" patches (Miklos Szeredi); - In the overlayfs code, convert all users of override_creds()/revert_creds() to the _light() versions by: 1. making ovl_override_creds() use override_creds_light(); 2. introduce ovl_revert_creds() which calls revert_creds_light(); 3. convert revert_creds() to ovl_revert_creds() (Amir Goldstein); - Fix an potential reference counting issue, as the lifetime expectations of the mounter credentials are different (Christian Brauner); The series is now much simpler: Patch 1: Introduce the _light() version of the override/revert cred operations; Patch 2: Convert backing-file.c to use those; Patch 3: Do the conversion to use the _light() version internally; Patch 4: Fix a potential refcounting issue Changes from v1: - Re-organized the series to be easier to follow, more details below (Miklos Szeredi and Amir Goldstein); The series now reads as follows: Patch 1: Introduce the _light() version of the override/revert cred operations; Patch 2: Convert backing-file.c to use those; Patch 3: Introduce the overlayfs specific _light() helper; Patch 4: Document the cases that the helper cannot be used (critical section may change the cred->usage counter); Patch 5: Convert the "rest" of overlayfs to the _light() helpers (mostly mechanical); Patch 6: Introduce the GUARD() helpers; Patch 7: Convert backing-file.c to the GUARD() helpers; Patch 8-15: Convert each overlayfs/ file to use the GUARD() helpers, also explain the cases in which the scoped_guard() helper is used. Note that a 'goto' jump that crosses the guard() should fail to compile, gcc has a bug that fails to detect the error[1]. Patch 16: Remove the helper introduced in Patch 3 to close the series, as it is no longer used, everything was converted to use the safer/shorter GUARD() helpers. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91951 This bug was also noticed here: https://lore.kernel.org/all/20240730050927.GC5334@ZenIV/ Link to v1: https://lore.kernel.org/r/20240403021808.309900-1-vinicius.gomes@intel.com/ Changes from RFC v3: - Removed the warning "fixes" patches, as they could hide potencial bugs (Christian Brauner); - Added "cred-specific" macros (Christian Brauner), from my side, added a few '_' to the guards to signify that the newly introduced helper macros are preferred. - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) compilation error about 'goto' bypassing variable initialization; Link to RFC v3: https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/ Changes from RFC v2: - Added separate patches for the warnings for the discarded const when using the cleanup macros: one for DEFINE_GUARD() and one for DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them together); - Reordered the series so the backing file patch is the first user of the introduced helpers (Amir Goldstein); - Change the definition of the cleanup "class" from a GUARD to a LOCK_GUARD_1, which defines an implicit container, that allows us to remove some variable declarations to store the overriden credentials (Amir Goldstein); - Replaced most of the uses of scoped_guard() with guard(), to reduce the code churn, the remaining ones I wasn't sure if I was changing the behavior: either they were nested (overrides "inside" overrides) or something calls current_cred() (Amir Goldstein). New questions: - The backing file callbacks are now called with the "light" overriden credentials, so they are kind of restricted in what they can do with their credentials, is this acceptable in general? - in ovl_rename() I had to manually call the "light" the overrides, both using the guard() macro or using the non-light version causes the workload to crash the kernel. I still have to investigate why this is happening. Hints are appreciated. Link to the RFC v2: https://lore.kernel.org/r/20240125235723.39507-1-vinicius.gomes@intel.com/ Original cover letter (lightly edited): It was noticed that some workloads suffer from contention on increasing/decrementing the ->usage counter in their credentials, those refcount operations are associated with overriding/reverting the current task credentials. (the linked thread adds more context) In some specialized cases, overlayfs is one of them, the credentials in question have a longer lifetime than the override/revert "critical section". In the overlayfs case, the credentials are created when the fs is mounted and destroyed when it's unmounted. In this case of long lived credentials, the usage counter doesn't need to be incremented/decremented. Add a lighter version of credentials override/revert to be used in these specialized cases. To make sure that the override/revert calls are paired, add a cleanup guard macro. This was suggested here: https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/ With a small number of tweaks: - Used inline functions instead of macros; - A small change to store the credentials into the passed argument, the guard is now defined as (note the added '_T ='): DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T), revert_creds_light(_T)); - Allow "const" arguments to be used with these kind of guards; Some comments: - If patch 1/5 and 2/5 are not a good idea (adding the cast), the alternative I can see is using some kind of container for the credentials; - The only user for the backing file ops is overlayfs, so these changes make sense, but may not make sense in the most general case; For the numbers, some from 'perf c2c', before this series: (edited to fit) # # ----- HITM ----- Shared # Num RmtHitm LclHitm Symbol Object Source:Line Node # ..... ....... ....... .......................... ................ .................. .... # ------------------------- 0 412 1028 ------------------------- 41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1 15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1 0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1 0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1 32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1 0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0 ------------------------- 1 50 103 ------------------------- 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 ------------------------- 2 50 98 ------------------------- 96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1 after this series: # # ----- HITM ----- Shared # Num RmtHitm LclHitm Symbol Object Source:Line Node # ..... ....... ....... .................... ................ ................ .... # ------------------------- 0 54 88 ------------------------- 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 ------------------------- 1 48 83 ------------------------- 97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1 ------------------------- 2 28 44 ------------------------- 85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 The contention is practically gone. Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@intel.com/ Vinicius Costa Gomes (4): cred: Add a light version of override/revert_creds() fs/backing-file: Convert to revert/override_creds_light() ovl: use wrapper ovl_revert_creds() ovl: Optimize override/revert creds fs/backing-file.c | 20 ++++++++++---------- fs/overlayfs/copy_up.c | 2 +- fs/overlayfs/dir.c | 17 +++++++++++------ fs/overlayfs/file.c | 14 +++++++------- fs/overlayfs/inode.c | 20 ++++++++++---------- fs/overlayfs/namei.c | 10 +++++----- fs/overlayfs/overlayfs.h | 1 + fs/overlayfs/readdir.c | 8 ++++---- fs/overlayfs/util.c | 11 ++++++++--- fs/overlayfs/xattrs.c | 9 ++++----- include/linux/cred.h | 18 ++++++++++++++++++ kernel/cred.c | 6 +++--- 12 files changed, 82 insertions(+), 54 deletions(-) -- 2.47.0
On Thu, Nov 7, 2024 at 1:57 AM Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote: > > Hi, > > Changes from v3: > - Another reorganization of the series: separate the pure mechanical > changes into their own (Amir Goldstein) > > The series now reads: > > Patch 1: Introduce the _light() version of the override/revert cred operations; > Patch 2: Convert backing-file.c to use those; > Patch 3: Mechanical change to introduce the ovl_revert_creds() helper; > Patch 4: Make the ovl_{override,convert}_creds() use the _light() > creds helpers, and fix the reference counting issue that would happen; > For the record, this series depends on backing_file API cleanup patch by Miklos: https://lore.kernel.org/linux-fsdevel/20241021103340.260731-1-mszeredi@redhat.com/ > Changes from v2: > - Removed the "convert to guard()/scoped_guard()" patches (Miklos Szeredi); > - In the overlayfs code, convert all users of override_creds()/revert_creds() to the _light() versions by: > 1. making ovl_override_creds() use override_creds_light(); > 2. introduce ovl_revert_creds() which calls revert_creds_light(); > 3. convert revert_creds() to ovl_revert_creds() > (Amir Goldstein); > - Fix an potential reference counting issue, as the lifetime > expectations of the mounter credentials are different (Christian > Brauner); > I pushed these patches to: https://github.com/amir73il/linux/commits/ovl_creds rebased overlayfs-next on top of them and tested. Christian, Since this work is mostly based on your suggestions, I thought that you might want to author and handle this PR? Would you like to take the patches from ovl_creds (including the backing_file API cleanup) to a stable branch in your tree for me to base overlayfs-next on? Or would you rather I include them in the overlayfs PR for v6.13 myself? Thanks, Amir. P.S. some of the info below is relevant for the PR message and some of it is completely stale... > The series is now much simpler: > > Patch 1: Introduce the _light() version of the override/revert cred operations; > Patch 2: Convert backing-file.c to use those; > Patch 3: Do the conversion to use the _light() version internally; > Patch 4: Fix a potential refcounting issue > > Changes from v1: > - Re-organized the series to be easier to follow, more details below > (Miklos Szeredi and Amir Goldstein); > > The series now reads as follows: > > Patch 1: Introduce the _light() version of the override/revert cred operations; > Patch 2: Convert backing-file.c to use those; > Patch 3: Introduce the overlayfs specific _light() helper; > Patch 4: Document the cases that the helper cannot be used (critical > section may change the cred->usage counter); > Patch 5: Convert the "rest" of overlayfs to the _light() helpers (mostly mechanical); > Patch 6: Introduce the GUARD() helpers; > Patch 7: Convert backing-file.c to the GUARD() helpers; > Patch 8-15: Convert each overlayfs/ file to use the GUARD() helpers, > also explain the cases in which the scoped_guard() helper is > used. Note that a 'goto' jump that crosses the guard() should > fail to compile, gcc has a bug that fails to detect the > error[1]. > Patch 16: Remove the helper introduced in Patch 3 to close the series, > as it is no longer used, everything was converted to use the > safer/shorter GUARD() helpers. this info is stale and confusing in the context of this series. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91951 > > This bug was also noticed here: > > https://lore.kernel.org/all/20240730050927.GC5334@ZenIV/ > > Link to v1: > > https://lore.kernel.org/r/20240403021808.309900-1-vinicius.gomes@intel.com/ > > Changes from RFC v3: > - Removed the warning "fixes" patches, as they could hide potencial > bugs (Christian Brauner); > - Added "cred-specific" macros (Christian Brauner), from my side, > added a few '_' to the guards to signify that the newly introduced > helper macros are preferred. > - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) > compilation error about 'goto' bypassing variable initialization; > > Link to RFC v3: > > https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@intel.com/ > > Changes from RFC v2: > - Added separate patches for the warnings for the discarded const > when using the cleanup macros: one for DEFINE_GUARD() and one for > DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them > together); > - Reordered the series so the backing file patch is the first user of > the introduced helpers (Amir Goldstein); > - Change the definition of the cleanup "class" from a GUARD to a > LOCK_GUARD_1, which defines an implicit container, that allows us > to remove some variable declarations to store the overriden > credentials (Amir Goldstein); > - Replaced most of the uses of scoped_guard() with guard(), to reduce > the code churn, the remaining ones I wasn't sure if I was changing > the behavior: either they were nested (overrides "inside" > overrides) or something calls current_cred() (Amir Goldstein). > > New questions: > - The backing file callbacks are now called with the "light" > overriden credentials, so they are kind of restricted in what they > can do with their credentials, is this acceptable in general? > - in ovl_rename() I had to manually call the "light" the overrides, > both using the guard() macro or using the non-light version causes > the workload to crash the kernel. I still have to investigate why > this is happening. Hints are appreciated. > > Link to the RFC v2: > > https://lore.kernel.org/r/20240125235723.39507-1-vinicius.gomes@intel.com/ > > Original cover letter (lightly edited): > > It was noticed that some workloads suffer from contention on > increasing/decrementing the ->usage counter in their credentials, > those refcount operations are associated with overriding/reverting the > current task credentials. (the linked thread adds more context) > > In some specialized cases, overlayfs is one of them, the credentials > in question have a longer lifetime than the override/revert "critical > section". In the overlayfs case, the credentials are created when the > fs is mounted and destroyed when it's unmounted. In this case of long > lived credentials, the usage counter doesn't need to be > incremented/decremented. > > Add a lighter version of credentials override/revert to be used in > these specialized cases. To make sure that the override/revert calls > are paired, add a cleanup guard macro. This was suggested here: > > https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/ > > With a small number of tweaks: > - Used inline functions instead of macros; > - A small change to store the credentials into the passed argument, > the guard is now defined as (note the added '_T ='): > > DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T), > revert_creds_light(_T)); > > - Allow "const" arguments to be used with these kind of guards; > > Some comments: > - If patch 1/5 and 2/5 are not a good idea (adding the cast), the > alternative I can see is using some kind of container for the > credentials; > - The only user for the backing file ops is overlayfs, so these > changes make sense, but may not make sense in the most general > case; > > For the numbers, some from 'perf c2c', before this series: > (edited to fit) > > # > # ----- HITM ----- Shared > # Num RmtHitm LclHitm Symbol Object Source:Line Node > # ..... ....... ....... .......................... ................ .................. .... > # > ------------------------- > 0 412 1028 > ------------------------- > 41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1 > 15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1 > 32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0 > > ------------------------- > 1 50 103 > ------------------------- > 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > > ------------------------- > 2 50 98 > ------------------------- > 96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > 2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 > 2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1 > > after this series: > > # > # ----- HITM ----- Shared > # Num RmtHitm LclHitm Symbol Object Source:Line Node > # ..... ....... ....... .................... ................ ................ .... > # > ------------------------- > 0 54 88 > ------------------------- > 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > > ------------------------- > 1 48 83 > ------------------------- > 97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > 2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1 > > ------------------------- > 2 28 44 > ------------------------- > 85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > > The contention is practically gone. > > Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@intel.com/ > > Vinicius Costa Gomes (4): > cred: Add a light version of override/revert_creds() > fs/backing-file: Convert to revert/override_creds_light() > ovl: use wrapper ovl_revert_creds() > ovl: Optimize override/revert creds > > fs/backing-file.c | 20 ++++++++++---------- > fs/overlayfs/copy_up.c | 2 +- > fs/overlayfs/dir.c | 17 +++++++++++------ > fs/overlayfs/file.c | 14 +++++++------- > fs/overlayfs/inode.c | 20 ++++++++++---------- > fs/overlayfs/namei.c | 10 +++++----- > fs/overlayfs/overlayfs.h | 1 + > fs/overlayfs/readdir.c | 8 ++++---- > fs/overlayfs/util.c | 11 ++++++++--- > fs/overlayfs/xattrs.c | 9 ++++----- > include/linux/cred.h | 18 ++++++++++++++++++ > kernel/cred.c | 6 +++--- > 12 files changed, 82 insertions(+), 54 deletions(-) > > -- > 2.47.0 >
On Thu, Nov 7, 2024 at 11:19 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 1:57 AM Vinicius Costa Gomes > <vinicius.gomes@intel.com> wrote: > > > > Hi, > > > > Changes from v3: > > - Another reorganization of the series: separate the pure mechanical > > changes into their own (Amir Goldstein) > > > > The series now reads: > > > > Patch 1: Introduce the _light() version of the override/revert cred operations; > > Patch 2: Convert backing-file.c to use those; > > Patch 3: Mechanical change to introduce the ovl_revert_creds() helper; > > Patch 4: Make the ovl_{override,convert}_creds() use the _light() > > creds helpers, and fix the reference counting issue that would happen; > > > > For the record, this series depends on backing_file API cleanup patch by Miklos: > https://lore.kernel.org/linux-fsdevel/20241021103340.260731-1-mszeredi@redhat.com/ > > > Changes from v2: > > - Removed the "convert to guard()/scoped_guard()" patches (Miklos Szeredi); > > - In the overlayfs code, convert all users of override_creds()/revert_creds() to the _light() versions by: > > 1. making ovl_override_creds() use override_creds_light(); > > 2. introduce ovl_revert_creds() which calls revert_creds_light(); > > 3. convert revert_creds() to ovl_revert_creds() > > (Amir Goldstein); > > - Fix an potential reference counting issue, as the lifetime > > expectations of the mounter credentials are different (Christian > > Brauner); > > > > I pushed these patches to: > https://github.com/amir73il/linux/commits/ovl_creds > > rebased overlayfs-next on top of them and tested. > > Christian, > > Since this work is mostly based on your suggestions, > I thought that you might want to author and handle this PR? > > Would you like to take the patches from ovl_creds (including the backing_file > API cleanup) to a stable branch in your tree for me to base overlayfs-next on? > Or would you rather I include them in the overlayfs PR for v6.13 myself? > For now, there patches are queued in overlayfs-next. Thanks Vinicius, for being patient and working through all the different revisions! Amir.
© 2016 - 2024 Red Hat, Inc.