fs/super.c | 3 ++- include/linux/shrinker.h | 4 ++++ mm/vmscan.c | 39 +++++++++++++++++++++++++++++++-------- 3 files changed, 37 insertions(+), 9 deletions(-)
This patch set introduces a new scheme of shrinker unregistration. It allows to split
the unregistration in two parts: fast and slow. This allows to hide slow part from
a user, so user-visible unregistration becomes fast.
This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed
by kernel test robot:
https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
---
Kirill Tkhai (2):
mm: Split unregister_shrinker() in fast and slow part
fs: Use delayed shrinker unregistration
Qi Zheng (1):
mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()
fs/super.c | 3 ++-
include/linux/shrinker.h | 4 ++++
mm/vmscan.c | 39 +++++++++++++++++++++++++++++++--------
3 files changed, 37 insertions(+), 9 deletions(-)
--
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote: > This patch set introduces a new scheme of shrinker unregistration. It allows to split > the unregistration in two parts: fast and slow. This allows to hide slow part from > a user, so user-visible unregistration becomes fast. > > This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed > by kernel test robot: > > https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/ > > --- > > Kirill Tkhai (2): > mm: Split unregister_shrinker() in fast and slow part > fs: Use delayed shrinker unregistration Did you test any filesystem other than ramfs? Filesystems more complex than ramfs have internal shrinkers, and so they will still be running the slow synchronize_srcu() - potentially multiple times! - in every unmount. Both XFS and ext4 have 3 internal shrinker instances per mount, so they will still call synchronize_srcu() at least 3 times per unmount after this change. What about any other subsystem that runs a shrinker - do they have context depedent shrinker instances that get frequently created and destroyed? They'll need the same treatment. Seriously, part of changing shrinker infrastructure is doing an audit of all the shrinker instances to determine how the change will impact those shrinkers, and if the same structural changes are needed to those implementations. I don't see any of this being done - this looks like a "slap a bandaid over the visible symptom" patch set without any deeper investigation of the scope of the issue having been gained. Along with all shrinkers now running under a SRCU critical region and requiring a machine wide synchronisation point for every unregister_shrinker() call made, the ability to repeated abort global shrinker passes via external SRCU expediting, and now an intricate locking and state dance in do_shrink_slab() vs unregister_shrinker, I can't say I'm particularly liking any of this, regardles of the benefits it supposedly provides. -Dave. -- Dave Chinner david@fromorbit.com
On 06.06.2023 01:32, Dave Chinner wrote: > On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote: >> This patch set introduces a new scheme of shrinker unregistration. It allows to split >> the unregistration in two parts: fast and slow. This allows to hide slow part from >> a user, so user-visible unregistration becomes fast. >> >> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed >> by kernel test robot: >> >> https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/ >> >> --- >> >> Kirill Tkhai (2): >> mm: Split unregister_shrinker() in fast and slow part >> fs: Use delayed shrinker unregistration > > Did you test any filesystem other than ramfs? > > Filesystems more complex than ramfs have internal shrinkers, and so > they will still be running the slow synchronize_srcu() - potentially > multiple times! - in every unmount. Both XFS and ext4 have 3 > internal shrinker instances per mount, so they will still call > synchronize_srcu() at least 3 times per unmount after this change. > > What about any other subsystem that runs a shrinker - do they have > context depedent shrinker instances that get frequently created and > destroyed? They'll need the same treatment. Of course, all of shrinkers should be fixed. This patch set just aims to describe the idea more wider, because I'm not sure most people read replys to kernel robot reports. This is my suggestion of way to go. Probably, Qi is right person to ask whether we're going to extend this and to maintain f95bdb700bc6 in tree. There is not much time. Unfortunately, kernel test robot reported this significantly late. > Seriously, part of changing shrinker infrastructure is doing an > audit of all the shrinker instances to determine how the change will > impact those shrinkers, and if the same structural changes are > needed to those implementations. > > I don't see any of this being done - this looks like a "slap a bandaid > over the visible symptom" patch set without any deeper investigation > of the scope of the issue having been gained. > > Along with all shrinkers now running under a SRCU critical region > and requiring a machine wide synchronisation point for every > unregister_shrinker() call made, the ability to repeated abort > global shrinker passes via external SRCU expediting, and now an > intricate locking and state dance in do_shrink_slab() vs > unregister_shrinker, I can't say I'm particularly liking any of > this, regardles of the benefits it supposedly provides. > > -Dave.
On Wed, Jun 07, 2023 at 12:06:03AM +0300, Kirill Tkhai wrote: > On 06.06.2023 01:32, Dave Chinner wrote: > > On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote: > >> This patch set introduces a new scheme of shrinker unregistration. It allows to split > >> the unregistration in two parts: fast and slow. This allows to hide slow part from > >> a user, so user-visible unregistration becomes fast. > >> > >> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed > >> by kernel test robot: > >> > >> https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/ > >> > >> --- > >> > >> Kirill Tkhai (2): > >> mm: Split unregister_shrinker() in fast and slow part > >> fs: Use delayed shrinker unregistration > > > > Did you test any filesystem other than ramfs? > > > > Filesystems more complex than ramfs have internal shrinkers, and so > > they will still be running the slow synchronize_srcu() - potentially > > multiple times! - in every unmount. Both XFS and ext4 have 3 > > internal shrinker instances per mount, so they will still call > > synchronize_srcu() at least 3 times per unmount after this change. > > > > What about any other subsystem that runs a shrinker - do they have > > context depedent shrinker instances that get frequently created and > > destroyed? They'll need the same treatment. > > Of course, all of shrinkers should be fixed. This patch set just aims to describe > the idea more wider, because I'm not sure most people read replys to kernel robot reports. > > This is my suggestion of way to go. Probably, Qi is right person to ask whether > we're going to extend this and to maintain f95bdb700bc6 in tree. > > There is not much time. Unfortunately, kernel test robot reported this significantly late. And that's why it should be reverted rather than trying to rush to try to fix it. I'm kind of tired of finding out about mm reclaim regressions only when I see patches making naive and/or broken changes to subsystem shrinker implementations without any real clue about what they are doing. If people/subsystems who maintain shrinker implementations were cc'd on the changes to the shrinker implementation, this would have all been resolved before merging occurred.... Lockless shrinker lists need a heap of supporting changes to be done first so that they aren't reliant on synchronise_srcu() *at all*. If the code was properly designed in the first place (i.e. dynamic shrinker structures freed via call_rcu()), we wouldn't be in rushing to fix weird regressions right now. Can we please revert this and start again with a properly throught out and reveiwed design? -Dave. -- Dave Chinner david@fromorbit.com
On 2023/6/7 06:02, Dave Chinner wrote: > On Wed, Jun 07, 2023 at 12:06:03AM +0300, Kirill Tkhai wrote: >> On 06.06.2023 01:32, Dave Chinner wrote: >>> On Mon, Jun 05, 2023 at 10:02:46PM +0300, Kirill Tkhai wrote: >>>> This patch set introduces a new scheme of shrinker unregistration. It allows to split >>>> the unregistration in two parts: fast and slow. This allows to hide slow part from >>>> a user, so user-visible unregistration becomes fast. >>>> >>>> This fixes the -88.8% regression of stress-ng.ramfs.ops_per_sec noticed >>>> by kernel test robot: >>>> >>>> https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/ >>>> >>>> --- >>>> >>>> Kirill Tkhai (2): >>>> mm: Split unregister_shrinker() in fast and slow part >>>> fs: Use delayed shrinker unregistration >>> >>> Did you test any filesystem other than ramfs? >>> >>> Filesystems more complex than ramfs have internal shrinkers, and so >>> they will still be running the slow synchronize_srcu() - potentially >>> multiple times! - in every unmount. Both XFS and ext4 have 3 >>> internal shrinker instances per mount, so they will still call >>> synchronize_srcu() at least 3 times per unmount after this change. >>> >>> What about any other subsystem that runs a shrinker - do they have >>> context depedent shrinker instances that get frequently created and >>> destroyed? They'll need the same treatment. >> >> Of course, all of shrinkers should be fixed. This patch set just aims to describe >> the idea more wider, because I'm not sure most people read replys to kernel robot reports. Thank you, Kirill. >> >> This is my suggestion of way to go. Probably, Qi is right person to ask whether >> we're going to extend this and to maintain f95bdb700bc6 in tree. >> >> There is not much time. Unfortunately, kernel test robot reported this significantly late. > > And that's why it should be reverted rather than trying to rush to > try to fix it. > > I'm kind of tired of finding out about mm reclaim regressions only > when I see patches making naive and/or broken changes to subsystem > shrinker implementations without any real clue about what they are > doing. If people/subsystems who maintain shrinker implementations > were cc'd on the changes to the shrinker implementation, this would > have all been resolved before merging occurred.... > > Lockless shrinker lists need a heap of supporting changes to be done > first so that they aren't reliant on synchronise_srcu() *at all*. If > the code was properly designed in the first place (i.e. dynamic > shrinker structures freed via call_rcu()), we wouldn't be in rushing > to fix weird regressions right now. > > Can we please revert this and start again with a properly throught > out and reveiwed design? I have no idea on whether to revert this, I follow the final decision of the community. From my personal point of view, I think it is worth sacrificing the speed of unregistration alone compared to the benefits it brings (lockless shrink, etc). Of course, it would be better if there is a more perfect solution. If you have a better idea, it might be better to post the code first for discussion. Otherwise, I am afraid that if we just revert it, the problem of shrinker_rwsem will continue for many years. And hi Dave, I know you're mad that I didn't cc you in the original patch. Sorry again. How about splitting shrinker-related codes into the separate files? Then we can add a MAINTAINERS entry to it and add linux-fsdevel@vger.kernel.org to this entry? So that future people will not miss to cc fs folks. Qi. > > -Dave.
On Wed, Jun 07, 2023 at 10:51:35AM +0800, Qi Zheng wrote: > From my personal point of view, I think it is worth sacrificing the > speed of unregistration alone compared to the benefits it brings > (lockless shrink, etc). Nobody is questioning whether this is a worthwhile improvement. The lockless shrinker instance iteration is definitely a good direction to move in. The problem is the -process- that has been followed has lead to a very sub-optimal result. > Of course, it would be better if there is a more perfect solution. > If you have a better idea, it might be better to post the code first for > discussion. Otherwise, I am afraid that if we just revert it, the > problem of shrinker_rwsem will continue for many years. No, a revert doesn't mean we don't want the change; a revert means the way the change was attempted has caused unexpected problems. We need to go back to the drawing board and work out a better way to do this. > And hi Dave, I know you're mad that I didn't cc you in the original > patch. No, I'm not mad at you. If I'm annoyed at anyone, it's the senior mm developers and maintainers that I'm annoyed at - not informing relevant parties about modifications to shrinker infrastructure or implementations has lead to regressions escaping out to user systems multiple times in the past. Yet here we are again.... > Sorry again. How about splitting shrinker-related codes into > the separate files? Then we can add a MAINTAINERS entry to it and add > linux-fsdevel@vger.kernel.org to this entry? So that future people > will not miss to cc fs folks. I don't think that fixes the problem, because the scope if much wider than fs-devel: look at all the different subsystems that have a shrinker. The whole kernel development process has always worked by the rule that we're changing common infrastructure, all the subsystems using that infrastructure need to be cc'd on the changes to the infrastructure they are using. Just cc'ing -fsdevel isn't enough, we've also got shrinkers in graphics driver infrastructure, *RCU*, virtio, DM, bcache and various other subsystems. And I'm betting most of them don't know that significant changes have been made to how the shrinkers work.... Cheers, Dave. -- Dave Chinner david@fromorbit.com
© 2016 - 2026 Red Hat, Inc.