include/linux/slab.h | 1 + init/main.c | 1 + kernel/rcu/tree.c | 876 ------------------------------------------ mm/slab_common.c | 880 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 882 insertions(+), 876 deletions(-)
Hello! This is v2. It is based on the Linux 6.13-rc2. The first version is here: https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ The difference between v1 and v2 is that, the preparation process is done in original place instead and after that there is one final move. Uladzislau Rezki (Sony) (5): rcu/kvfree: Initialize kvfree_rcu() separately rcu/kvfree: Move some functions under CONFIG_TINY_RCU rcu/kvfree: Adjust names passed into trace functions rcu/kvfree: Adjust a shrinker name mm/slab: Move kvfree_rcu() into SLAB include/linux/slab.h | 1 + init/main.c | 1 + kernel/rcu/tree.c | 876 ------------------------------------------ mm/slab_common.c | 880 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 882 insertions(+), 876 deletions(-) -- 2.39.5
On 2024-12-13 3:02 AM, Uladzislau Rezki (Sony) wrote: > Hello! > > This is v2. It is based on the Linux 6.13-rc2. The first version is > here: > > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > > The difference between v1 and v2 is that, the preparation process is > done in original place instead and after that there is one final move. > > Uladzislau Rezki (Sony) (5): > rcu/kvfree: Initialize kvfree_rcu() separately > rcu/kvfree: Move some functions under CONFIG_TINY_RCU > rcu/kvfree: Adjust names passed into trace functions > rcu/kvfree: Adjust a shrinker name > mm/slab: Move kvfree_rcu() into SLAB > > include/linux/slab.h | 1 + > init/main.c | 1 + > kernel/rcu/tree.c | 876 ------------------------------------------ > mm/slab_common.c | 880 +++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 882 insertions(+), 876 deletions(-) Sorry for the late reply, but better late than never... FWIW, Acked-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Tested-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Thanks for all the efforts! By the way, any future plans how to take advantage of internal slab state? -- Hyeonggon
On 1/6/25 08:21, Hyeonggon Yoo wrote: > > > On 2024-12-13 3:02 AM, Uladzislau Rezki (Sony) wrote: >> Hello! >> >> This is v2. It is based on the Linux 6.13-rc2. The first version is >> here: >> >> https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ >> >> The difference between v1 and v2 is that, the preparation process is >> done in original place instead and after that there is one final move. >> >> Uladzislau Rezki (Sony) (5): >> rcu/kvfree: Initialize kvfree_rcu() separately >> rcu/kvfree: Move some functions under CONFIG_TINY_RCU >> rcu/kvfree: Adjust names passed into trace functions >> rcu/kvfree: Adjust a shrinker name >> mm/slab: Move kvfree_rcu() into SLAB >> >> include/linux/slab.h | 1 + >> init/main.c | 1 + >> kernel/rcu/tree.c | 876 ------------------------------------------ >> mm/slab_common.c | 880 +++++++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 882 insertions(+), 876 deletions(-) > > Sorry for the late reply, but better late than never... > > FWIW, > > Acked-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> > Tested-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Thanks, applied. > Thanks for all the efforts! > > By the way, any future plans how to take advantage of internal slab > state? One way is the sheaves effort which in the last RFC also replaced the way kfree_rcu() is handled for caches with enabled sheaves. Perhaps even without those we could consider e.g. doing the kfree_rcu() batching by grouping the pending frees per kmem cache, which could perhaps lead to more efficient flushing later. > -- > Hyeonggon
On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > Hello! > > This is v2. It is based on the Linux 6.13-rc2. The first version is > here: > > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > > The difference between v1 and v2 is that, the preparation process is > done in original place instead and after that there is one final move. Looks good, will include in slab/for-next I think patch 5 should add more explanation to the commit message - the subthread started by Christoph could provide content :) Can you summarize so I can amend the commit log? Also how about a followup patch moving the rcu-tiny implementation of kvfree_call_rcu()? We might also consider moving the kfree_rcu*() entry points from rcupdate.h to slab.h, what do you think, is it a more logical place for them? There's some risk that files that include rcupdate.h and not slab.h would break, so that will need some build testing... Thanks, Vlastimil > Uladzislau Rezki (Sony) (5): > rcu/kvfree: Initialize kvfree_rcu() separately > rcu/kvfree: Move some functions under CONFIG_TINY_RCU > rcu/kvfree: Adjust names passed into trace functions > rcu/kvfree: Adjust a shrinker name > mm/slab: Move kvfree_rcu() into SLAB > > include/linux/slab.h | 1 + > init/main.c | 1 + > kernel/rcu/tree.c | 876 ------------------------------------------ > mm/slab_common.c | 880 +++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 882 insertions(+), 876 deletions(-) >
On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: > On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > > Hello! > > > > This is v2. It is based on the Linux 6.13-rc2. The first version is > > here: > > > > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > > > > The difference between v1 and v2 is that, the preparation process is > > done in original place instead and after that there is one final move. > > Looks good, will include in slab/for-next > > I think patch 5 should add more explanation to the commit message - the > subthread started by Christoph could provide content :) Can you summarize so > I can amend the commit log? > <snip> mm/slab: Move kvfree_rcu() into SLAB Move kvfree_rcu() functionality to the slab_common.c file. The reason of being kvfree_rcu() functionality as part of SLAB is that, there is a clear trend and need of closer integration. One of the recent example is creating a barrier function for SLAB caches. Another reason is to prevent of having several implementations of RCU machinery for reclaiming objects after a GP. As future steps, it can be more integrated(easier) with SLAB internals. <snip> -- Uladzislau Rezki
On 12/16/24 14:07, Uladzislau Rezki wrote: > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: >> > Hello! >> > >> > This is v2. It is based on the Linux 6.13-rc2. The first version is >> > here: >> > >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ >> > >> > The difference between v1 and v2 is that, the preparation process is >> > done in original place instead and after that there is one final move. >> >> Looks good, will include in slab/for-next >> >> I think patch 5 should add more explanation to the commit message - the >> subthread started by Christoph could provide content :) Can you summarize so >> I can amend the commit log? >> > <snip> > mm/slab: Move kvfree_rcu() into SLAB > > Move kvfree_rcu() functionality to the slab_common.c file. > > The reason of being kvfree_rcu() functionality as part of SLAB is > that, there is a clear trend and need of closer integration. One > of the recent example is creating a barrier function for SLAB caches. > > Another reason is to prevent of having several implementations of > RCU machinery for reclaiming objects after a GP. As future steps, > it can be more integrated(easier) with SLAB internals. > <snip> Thanks, amended. > -- > Uladzislau Rezki
On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: > On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > > Hello! > > > > This is v2. It is based on the Linux 6.13-rc2. The first version is > > here: > > > > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > > > > The difference between v1 and v2 is that, the preparation process is > > done in original place instead and after that there is one final move. > > Looks good, will include in slab/for-next > > I think patch 5 should add more explanation to the commit message - the > subthread started by Christoph could provide content :) Can you summarize so > I can amend the commit log? > I will :) > Also how about a followup patch moving the rcu-tiny implementation of > kvfree_call_rcu()? > As, Paul already noted, it would make sense. Or just remove a tiny implementation. > > We might also consider moving the kfree_rcu*() entry points from rcupdate.h > to slab.h, what do you think, is it a more logical place for them? There's > some risk that files that include rcupdate.h and not slab.h would break, so > that will need some build testing... > I agree. I have not moved them in this series, because it requires more testing due to a build break. I can work on this further, so it is not an issue. Thank you for taking this! -- Uladzislau Rezki
On 12/16/24 12:03, Uladzislau Rezki wrote: > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: >> > Hello! >> > >> > This is v2. It is based on the Linux 6.13-rc2. The first version is >> > here: >> > >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ >> > >> > The difference between v1 and v2 is that, the preparation process is >> > done in original place instead and after that there is one final move. >> >> Looks good, will include in slab/for-next >> >> I think patch 5 should add more explanation to the commit message - the >> subthread started by Christoph could provide content :) Can you summarize so >> I can amend the commit log? >> > I will :) > >> Also how about a followup patch moving the rcu-tiny implementation of >> kvfree_call_rcu()? >> > As, Paul already noted, it would make sense. Or just remove a tiny > implementation. AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full" implementation with all the batching etc or would that be unnecessary overhead? >> >> We might also consider moving the kfree_rcu*() entry points from rcupdate.h >> to slab.h, what do you think, is it a more logical place for them? There's >> some risk that files that include rcupdate.h and not slab.h would break, so >> that will need some build testing... >> > I agree. I have not moved them in this series, because it requires more > testing due to a build break. I can work on this further, so it is not > an issue. > > Thank you for taking this! > > -- > Uladzislau Rezki
On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote: > On 12/16/24 12:03, Uladzislau Rezki wrote: > > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: > >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > >> > Hello! > >> > > >> > This is v2. It is based on the Linux 6.13-rc2. The first version is > >> > here: > >> > > >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > >> > > >> > The difference between v1 and v2 is that, the preparation process is > >> > done in original place instead and after that there is one final move. > >> > >> Looks good, will include in slab/for-next > >> > >> I think patch 5 should add more explanation to the commit message - the > >> subthread started by Christoph could provide content :) Can you summarize so > >> I can amend the commit log? > >> > > I will :) > > > >> Also how about a followup patch moving the rcu-tiny implementation of > >> kvfree_call_rcu()? > >> > > As, Paul already noted, it would make sense. Or just remove a tiny > > implementation. > > AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full" > implementation with all the batching etc or would that be unnecessary overhead? > Yes, it is for a really small systems with low amount of memory. I see only one overhead it is about driving objects in pages. For a small system it can be critical because we allocate. From the other hand, for a tiny variant we can modify the normal variant by bypassing batching logic, thus do not consume memory(for Tiny case) i.e. merge it to a normal kvfree_rcu() path. After that we do not depend on CONFIG_RCU_TINY option. Probably we need also to perform some adaptation of regular kvfree_rcu() for a single CPU system. -- Uladzislau Rezki
On 12/16/24 16:41, Uladzislau Rezki wrote: > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote: >> On 12/16/24 12:03, Uladzislau Rezki wrote: >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: >> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: >> >> > Hello! >> >> > >> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is >> >> > here: >> >> > >> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ >> >> > >> >> > The difference between v1 and v2 is that, the preparation process is >> >> > done in original place instead and after that there is one final move. >> >> >> >> Looks good, will include in slab/for-next >> >> >> >> I think patch 5 should add more explanation to the commit message - the >> >> subthread started by Christoph could provide content :) Can you summarize so >> >> I can amend the commit log? >> >> >> > I will :) >> > >> >> Also how about a followup patch moving the rcu-tiny implementation of >> >> kvfree_call_rcu()? >> >> >> > As, Paul already noted, it would make sense. Or just remove a tiny >> > implementation. >> >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full" >> implementation with all the batching etc or would that be unnecessary overhead? >> > Yes, it is for a really small systems with low amount of memory. I see > only one overhead it is about driving objects in pages. For a small > system it can be critical because we allocate. > > From the other hand, for a tiny variant we can modify the normal variant > by bypassing batching logic, thus do not consume memory(for Tiny case) > i.e. merge it to a normal kvfree_rcu() path. Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use case (less memory usage on low memory system, tradeoff for worse performance). > After that we do not depend on CONFIG_RCU_TINY option. Probably we need > also to perform some adaptation of regular kvfree_rcu() for a single CPU > system. > > -- > Uladzislau Rezki
On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote: > On 12/16/24 16:41, Uladzislau Rezki wrote: > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote: > >> On 12/16/24 12:03, Uladzislau Rezki wrote: > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: > >> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > >> >> > Hello! > >> >> > > >> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is > >> >> > here: > >> >> > > >> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > >> >> > > >> >> > The difference between v1 and v2 is that, the preparation process is > >> >> > done in original place instead and after that there is one final move. > >> >> > >> >> Looks good, will include in slab/for-next > >> >> > >> >> I think patch 5 should add more explanation to the commit message - the > >> >> subthread started by Christoph could provide content :) Can you summarize so > >> >> I can amend the commit log? > >> >> > >> > I will :) > >> > > >> >> Also how about a followup patch moving the rcu-tiny implementation of > >> >> kvfree_call_rcu()? > >> >> > >> > As, Paul already noted, it would make sense. Or just remove a tiny > >> > implementation. > >> > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full" > >> implementation with all the batching etc or would that be unnecessary overhead? > >> > > Yes, it is for a really small systems with low amount of memory. I see > > only one overhead it is about driving objects in pages. For a small > > system it can be critical because we allocate. > > > > From the other hand, for a tiny variant we can modify the normal variant > > by bypassing batching logic, thus do not consume memory(for Tiny case) > > i.e. merge it to a normal kvfree_rcu() path. > > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use > case (less memory usage on low memory system, tradeoff for worse performance). > Yep, i also was thinking about that without saying it :) -- Uladzislau Rezki
On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote: > On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote: > > On 12/16/24 16:41, Uladzislau Rezki wrote: > > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote: > > >> On 12/16/24 12:03, Uladzislau Rezki wrote: > > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: > > >> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > > >> >> > Hello! > > >> >> > > > >> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is > > >> >> > here: > > >> >> > > > >> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > > >> >> > > > >> >> > The difference between v1 and v2 is that, the preparation process is > > >> >> > done in original place instead and after that there is one final move. > > >> >> > > >> >> Looks good, will include in slab/for-next > > >> >> > > >> >> I think patch 5 should add more explanation to the commit message - the > > >> >> subthread started by Christoph could provide content :) Can you summarize so > > >> >> I can amend the commit log? > > >> >> > > >> > I will :) > > >> > > > >> >> Also how about a followup patch moving the rcu-tiny implementation of > > >> >> kvfree_call_rcu()? > > >> >> > > >> > As, Paul already noted, it would make sense. Or just remove a tiny > > >> > implementation. > > >> > > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full" > > >> implementation with all the batching etc or would that be unnecessary overhead? > > >> > > > Yes, it is for a really small systems with low amount of memory. I see > > > only one overhead it is about driving objects in pages. For a small > > > system it can be critical because we allocate. > > > > > > From the other hand, for a tiny variant we can modify the normal variant > > > by bypassing batching logic, thus do not consume memory(for Tiny case) > > > i.e. merge it to a normal kvfree_rcu() path. > > > > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use > > case (less memory usage on low memory system, tradeoff for worse performance). > > > Yep, i also was thinking about that without saying it :) Works for me as well! Thanx, Paul
On 12/16/24 17:46, Paul E. McKenney wrote: > On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote: >> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote: >> > On 12/16/24 16:41, Uladzislau Rezki wrote: >> > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote: >> > >> On 12/16/24 12:03, Uladzislau Rezki wrote: >> > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: >> > >> > >> > >> >> Also how about a followup patch moving the rcu-tiny implementation of >> > >> >> kvfree_call_rcu()? >> > >> >> >> > >> > As, Paul already noted, it would make sense. Or just remove a tiny >> > >> > implementation. >> > >> >> > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full" >> > >> implementation with all the batching etc or would that be unnecessary overhead? >> > >> >> > > Yes, it is for a really small systems with low amount of memory. I see >> > > only one overhead it is about driving objects in pages. For a small >> > > system it can be critical because we allocate. >> > > >> > > From the other hand, for a tiny variant we can modify the normal variant >> > > by bypassing batching logic, thus do not consume memory(for Tiny case) >> > > i.e. merge it to a normal kvfree_rcu() path. >> > >> > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use >> > case (less memory usage on low memory system, tradeoff for worse performance). >> > >> Yep, i also was thinking about that without saying it :) > > Works for me as well! Hi, so I tried looking at this. First I just moved the code to slab as seen in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is not a show-stopper here. Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY to control whether we use the full blown batching implementation or the simple call_rcu() implmentation, and realized it's not straightforward and reveals there are still some subtle dependencies of kvfree_rcu() on RCU internals :) Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU AFAICS the batching implementation includes kfree_rcu_scheduler_running() which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps there are other facilities the batching implementation needs that only exists in the TREE_RCU implementation Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small memory systems are fine with the simple implementation. Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY AFAICS I can't just make the simple implementation do call_rcu() on CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that but no such equivalent exists in TREE_RCU. Am I right? Possible solution: teach TREE_RCU callback invocation to handle __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used. Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing but RCU has to special case it still. Possible solution 2: instead of the special offset handling, SLUB provides a callback function, which will determine pointer to the object from the pointer to a middle of it without knowing the rcu_head offset. Downside: this will have some overhead, but SLUB_TINY is not meant to be performant anyway so we might not care. Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well Thoughts? [1] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-tiny-kfree_rcu
On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> On 12/16/24 17:46, Paul E. McKenney wrote:
> > On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> >> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> >> > On 12/16/24 16:41, Uladzislau Rezki wrote:
> >> > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> >> > >> On 12/16/24 12:03, Uladzislau Rezki wrote:
> >> > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> >> > >> >
> >> > >> >> Also how about a followup patch moving the rcu-tiny implementation of
> >> > >> >> kvfree_call_rcu()?
> >> > >> >>
> >> > >> > As, Paul already noted, it would make sense. Or just remove a tiny
> >> > >> > implementation.
> >> > >>
> >> > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> >> > >> implementation with all the batching etc or would that be unnecessary overhead?
> >> > >>
> >> > > Yes, it is for a really small systems with low amount of memory. I see
> >> > > only one overhead it is about driving objects in pages. For a small
> >> > > system it can be critical because we allocate.
> >> > >
> >> > > From the other hand, for a tiny variant we can modify the normal variant
> >> > > by bypassing batching logic, thus do not consume memory(for Tiny case)
> >> > > i.e. merge it to a normal kvfree_rcu() path.
> >> >
> >> > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> >> > case (less memory usage on low memory system, tradeoff for worse performance).
> >> >
> >> Yep, i also was thinking about that without saying it :)
> >
> > Works for me as well!
>
> Hi, so I tried looking at this. First I just moved the code to slab as seen
> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> not a show-stopper here.
>
> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> to control whether we use the full blown batching implementation or the
> simple call_rcu() implmentation, and realized it's not straightforward and
> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> internals :)
>
> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
>
> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> there are other facilities the batching implementation needs that only
> exists in the TREE_RCU implementation
>
> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> memory systems are fine with the simple implementation.
>
> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
>
> AFAICS I can't just make the simple implementation do call_rcu() on
> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> but no such equivalent exists in TREE_RCU. Am I right?
>
> Possible solution: teach TREE_RCU callback invocation to handle
> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> but RCU has to special case it still.
>
> Possible solution 2: instead of the special offset handling, SLUB provides a
> callback function, which will determine pointer to the object from the
> pointer to a middle of it without knowing the rcu_head offset.
> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> performant anyway so we might not care.
> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
>
> Thoughts?
>
For the call_rcu() and to be able to reclaim over it we need to patch the
tree.c(please note TINY already works):
<snip>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b1f883fcd918..ab24229dfa73 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
debug_rcu_head_unqueue(rhp);
rcu_lock_acquire(&rcu_callback_map);
- trace_rcu_invoke_callback(rcu_state.name, rhp);
f = rhp->func;
- debug_rcu_head_callback(rhp);
- WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
- f(rhp);
+ if (__is_kvfree_rcu_offset((unsigned long) f)) {
+ trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
+ kvfree((void *) rhp - (unsigned long) f);
+ } else {
+ trace_rcu_invoke_callback(rcu_state.name, rhp);
+ debug_rcu_head_callback(rhp);
+ WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
+ f(rhp);
+ }
rcu_lock_release(&rcu_callback_map);
/*
<snip>
Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
should be avoided, i.e. if we can, we should eliminate a dependency on
TREE_RCU or TINY_RCU in a slab. As much as possible.
So, it requires a more closer look for sure :)
--
Uladzislau Rezki
On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
>> On 12/16/24 17:46, Paul E. McKenney wrote:
>>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
>>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
>>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
>>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
>>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
>>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
>>>>>>>>
>>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
>>>>>>>>> kvfree_call_rcu()?
>>>>>>>>>
>>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
>>>>>>>> implementation.
>>>>>>>
>>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
>>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
>>>>>>>
>>>>>> Yes, it is for a really small systems with low amount of memory. I see
>>>>>> only one overhead it is about driving objects in pages. For a small
>>>>>> system it can be critical because we allocate.
>>>>>>
>>>>>> From the other hand, for a tiny variant we can modify the normal variant
>>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
>>>>>> i.e. merge it to a normal kvfree_rcu() path.
>>>>>
>>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
>>>>> case (less memory usage on low memory system, tradeoff for worse performance).
>>>>>
>>>> Yep, i also was thinking about that without saying it :)
>>>
>>> Works for me as well!
>>
>> Hi, so I tried looking at this. First I just moved the code to slab as seen
>> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
>> not a show-stopper here.
>>
>> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
>> to control whether we use the full blown batching implementation or the
>> simple call_rcu() implmentation, and realized it's not straightforward and
>> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
>> internals :)
>>
>> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
>>
>> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
>> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
>> there are other facilities the batching implementation needs that only
>> exists in the TREE_RCU implementation
>>
>> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
>> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
>> memory systems are fine with the simple implementation.
>>
>> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
>>
>> AFAICS I can't just make the simple implementation do call_rcu() on
>> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
>> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
>> but no such equivalent exists in TREE_RCU. Am I right?
>>
>> Possible solution: teach TREE_RCU callback invocation to handle
>> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
>> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
>> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
>> but RCU has to special case it still.
>>
>> Possible solution 2: instead of the special offset handling, SLUB provides a
>> callback function, which will determine pointer to the object from the
>> pointer to a middle of it without knowing the rcu_head offset.
>> Downside: this will have some overhead, but SLUB_TINY is not meant to be
>> performant anyway so we might not care.
>> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
>>
>> Thoughts?
>>
> For the call_rcu() and to be able to reclaim over it we need to patch the
> tree.c(please note TINY already works):
>
> <snip>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b1f883fcd918..ab24229dfa73 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> debug_rcu_head_unqueue(rhp);
>
> rcu_lock_acquire(&rcu_callback_map);
> - trace_rcu_invoke_callback(rcu_state.name, rhp);
>
> f = rhp->func;
> - debug_rcu_head_callback(rhp);
> - WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> - f(rhp);
>
> + if (__is_kvfree_rcu_offset((unsigned long) f)) {
> + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> + kvfree((void *) rhp - (unsigned long) f);
> + } else {
> + trace_rcu_invoke_callback(rcu_state.name, rhp);
> + debug_rcu_head_callback(rhp);
> + WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> + f(rhp);
> + }
> rcu_lock_release(&rcu_callback_map);
Right so that's the first Possible solution, but without the #ifdef. So
there's an overhead of checking __is_kvfree_rcu_offset() even if the
batching is done in slab and this function is never called with an offset.
After coming up with Possible solution 2, I've started liking the idea
more as RCU could then forget about the __is_kvfree_rcu_offset()
"callbacks" completely, and the performant case of TREE_RCU + batching
would be unaffected.
I'm speculating perhaps if there was not CONFIG_SLOB in the past, the
__is_kvfree_rcu_offset() would never exist in the first place? SLAB and
SLUB both can determine start of the object from a pointer to the middle
of it, while SLOB couldn't.
> /*
> <snip>
>
> Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
> should be avoided, i.e. if we can, we should eliminate a dependency on
> TREE_RCU or TINY_RCU in a slab. As much as possible.
>
> So, it requires a more closer look for sure :)
That requires solving Problem 1 above, but question is if it's worth the
trouble. Systems running TINY_RCU are unlikely to benefit from the batching?
But sure there's also possibility to hide these dependencies in KConfig,
so the slab code would only consider a single (for example) #ifdef
CONFIG_KVFREE_RCU_BATCHING that would be set automatically depending on
TREE_RCU and !SLUB_TINY.
> --
> Uladzislau Rezki
On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> >> On 12/16/24 17:46, Paul E. McKenney wrote:
> >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> >>>>>>>>
> >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> >>>>>>>>> kvfree_call_rcu()?
> >>>>>>>>>
> >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> >>>>>>>> implementation.
> >>>>>>>
> >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> >>>>>>>
> >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> >>>>>> only one overhead it is about driving objects in pages. For a small
> >>>>>> system it can be critical because we allocate.
> >>>>>>
> >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> >>>>>
> >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> >>>>>
> >>>> Yep, i also was thinking about that without saying it :)
> >>>
> >>> Works for me as well!
> >>
> >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> >> not a show-stopper here.
> >>
> >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> >> to control whether we use the full blown batching implementation or the
> >> simple call_rcu() implmentation, and realized it's not straightforward and
> >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> >> internals :)
> >>
> >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> >>
> >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> >> there are other facilities the batching implementation needs that only
> >> exists in the TREE_RCU implementation
> >>
> >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> >> memory systems are fine with the simple implementation.
> >>
> >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> >>
> >> AFAICS I can't just make the simple implementation do call_rcu() on
> >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> >> but no such equivalent exists in TREE_RCU. Am I right?
> >>
> >> Possible solution: teach TREE_RCU callback invocation to handle
> >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> >> but RCU has to special case it still.
> >>
> >> Possible solution 2: instead of the special offset handling, SLUB provides a
> >> callback function, which will determine pointer to the object from the
> >> pointer to a middle of it without knowing the rcu_head offset.
> >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> >> performant anyway so we might not care.
> >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> >>
> >> Thoughts?
> >>
> > For the call_rcu() and to be able to reclaim over it we need to patch the
> > tree.c(please note TINY already works):
> >
> > <snip>
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b1f883fcd918..ab24229dfa73 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > debug_rcu_head_unqueue(rhp);
> >
> > rcu_lock_acquire(&rcu_callback_map);
> > - trace_rcu_invoke_callback(rcu_state.name, rhp);
> >
> > f = rhp->func;
> > - debug_rcu_head_callback(rhp);
> > - WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > - f(rhp);
> >
> > + if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > + kvfree((void *) rhp - (unsigned long) f);
> > + } else {
> > + trace_rcu_invoke_callback(rcu_state.name, rhp);
> > + debug_rcu_head_callback(rhp);
> > + WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > + f(rhp);
> > + }
> > rcu_lock_release(&rcu_callback_map);
>
> Right so that's the first Possible solution, but without the #ifdef. So
> there's an overhead of checking __is_kvfree_rcu_offset() even if the
> batching is done in slab and this function is never called with an offset.
>
Or fulfilling a missing functionality? TREE is broken in that sense
whereas a TINY handles it without any issues.
It can be called for SLUB_TINY option, just call_rcu() instead of
batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
>
> After coming up with Possible solution 2, I've started liking the idea
> more as RCU could then forget about the __is_kvfree_rcu_offset()
> "callbacks" completely, and the performant case of TREE_RCU + batching
> would be unaffected.
>
I doubt it is a performance issue :)
>
> I'm speculating perhaps if there was not CONFIG_SLOB in the past, the
> __is_kvfree_rcu_offset() would never exist in the first place? SLAB and
> SLUB both can determine start of the object from a pointer to the middle
> of it, while SLOB couldn't.
>
We needed just to reclaim over RCU. So, i do not know. Paul probably
knows more then me :)
> > /*
> > <snip>
> >
> > Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
> > should be avoided, i.e. if we can, we should eliminate a dependency on
> > TREE_RCU or TINY_RCU in a slab. As much as possible.
> >
> > So, it requires a more closer look for sure :)
>
> That requires solving Problem 1 above, but question is if it's worth the
> trouble. Systems running TINY_RCU are unlikely to benefit from the batching?
>
> But sure there's also possibility to hide these dependencies in KConfig,
> so the slab code would only consider a single (for example) #ifdef
> CONFIG_KVFREE_RCU_BATCHING that would be set automatically depending on
> TREE_RCU and !SLUB_TINY.
>
It is for small systems. We can use TINY or !SMP. We covered this AFAIR
that a single CPU system should not go with batching:
#ifdef SLUB_TINY || !SMP || TINY_RCU
or:
config TINY_RCU
bool
default y if !PREEMPT_RCU && !SMP
+ select SLUB_TINY
Paul, more input?
--
Uladzislau Rezki
On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> > On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> > >> On 12/16/24 17:46, Paul E. McKenney wrote:
> > >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> > >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> > >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> > >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> > >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> > >>>>>>>>
> > >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> > >>>>>>>>> kvfree_call_rcu()?
> > >>>>>>>>>
> > >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> > >>>>>>>> implementation.
> > >>>>>>>
> > >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> > >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> > >>>>>>>
> > >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> > >>>>>> only one overhead it is about driving objects in pages. For a small
> > >>>>>> system it can be critical because we allocate.
> > >>>>>>
> > >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> > >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> > >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> > >>>>>
> > >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> > >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> > >>>>>
> > >>>> Yep, i also was thinking about that without saying it :)
> > >>>
> > >>> Works for me as well!
> > >>
> > >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> > >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> > >> not a show-stopper here.
> > >>
> > >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> > >> to control whether we use the full blown batching implementation or the
> > >> simple call_rcu() implmentation, and realized it's not straightforward and
> > >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> > >> internals :)
> > >>
> > >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> > >>
> > >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> > >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> > >> there are other facilities the batching implementation needs that only
> > >> exists in the TREE_RCU implementation
> > >>
> > >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> > >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> > >> memory systems are fine with the simple implementation.
> > >>
> > >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> > >>
> > >> AFAICS I can't just make the simple implementation do call_rcu() on
> > >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> > >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> > >> but no such equivalent exists in TREE_RCU. Am I right?
> > >>
> > >> Possible solution: teach TREE_RCU callback invocation to handle
> > >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> > >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> > >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> > >> but RCU has to special case it still.
> > >>
> > >> Possible solution 2: instead of the special offset handling, SLUB provides a
> > >> callback function, which will determine pointer to the object from the
> > >> pointer to a middle of it without knowing the rcu_head offset.
> > >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> > >> performant anyway so we might not care.
> > >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> > >>
> > >> Thoughts?
> > >>
> > > For the call_rcu() and to be able to reclaim over it we need to patch the
> > > tree.c(please note TINY already works):
> > >
> > > <snip>
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b1f883fcd918..ab24229dfa73 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > > debug_rcu_head_unqueue(rhp);
> > >
> > > rcu_lock_acquire(&rcu_callback_map);
> > > - trace_rcu_invoke_callback(rcu_state.name, rhp);
> > >
> > > f = rhp->func;
> > > - debug_rcu_head_callback(rhp);
> > > - WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > - f(rhp);
> > >
> > > + if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > > + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > > + kvfree((void *) rhp - (unsigned long) f);
> > > + } else {
> > > + trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > + debug_rcu_head_callback(rhp);
> > > + WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > + f(rhp);
> > > + }
> > > rcu_lock_release(&rcu_callback_map);
> >
> > Right so that's the first Possible solution, but without the #ifdef. So
> > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> > batching is done in slab and this function is never called with an offset.
> >
> Or fulfilling a missing functionality? TREE is broken in that sense
> whereas a TINY handles it without any issues.
>
> It can be called for SLUB_TINY option, just call_rcu() instead of
> batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
Would this make sense?
if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
Just to be repetitive, other alternatives include:
1. Take advantage of SLOB being no longer with us.
2. Get rid of Tiny RCU's special casing of kfree_rcu(), and then
eliminate the above "if" statement in favor of its "else" clause.
3. Make Tiny RCU implement a trivial version of kfree_rcu() that
passes a list through RCU.
I don't have strong feelings, and am happy to defer to your guys'
decision.
> > After coming up with Possible solution 2, I've started liking the idea
> > more as RCU could then forget about the __is_kvfree_rcu_offset()
> > "callbacks" completely, and the performant case of TREE_RCU + batching
> > would be unaffected.
> >
> I doubt it is a performance issue :)
Me neither, especially with IS_ENABLED().
> > I'm speculating perhaps if there was not CONFIG_SLOB in the past, the
> > __is_kvfree_rcu_offset() would never exist in the first place? SLAB and
> > SLUB both can determine start of the object from a pointer to the middle
> > of it, while SLOB couldn't.
> >
> We needed just to reclaim over RCU. So, i do not know. Paul probably
> knows more then me :)
In the absence of SLOB, yes, I would hope that I would have thought of
determining the start of the object from a pointer to the middle of it.
Or that someone would have pointed that out during review. But I honestly
do not remember. ;-)
> > > /*
> > > <snip>
> > >
> > > Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
> > > should be avoided, i.e. if we can, we should eliminate a dependency on
> > > TREE_RCU or TINY_RCU in a slab. As much as possible.
> > >
> > > So, it requires a more closer look for sure :)
> >
> > That requires solving Problem 1 above, but question is if it's worth the
> > trouble. Systems running TINY_RCU are unlikely to benefit from the batching?
> >
> > But sure there's also possibility to hide these dependencies in KConfig,
> > so the slab code would only consider a single (for example) #ifdef
> > CONFIG_KVFREE_RCU_BATCHING that would be set automatically depending on
> > TREE_RCU and !SLUB_TINY.
> >
> It is for small systems. We can use TINY or !SMP. We covered this AFAIR
> that a single CPU system should not go with batching:
>
> #ifdef SLUB_TINY || !SMP || TINY_RCU
>
> or:
>
> config TINY_RCU
> bool
> default y if !PREEMPT_RCU && !SMP
> + select SLUB_TINY
>
>
> Paul, more input?
I will say that Tiny RCU used to get much more focus from its users
10-15 years ago than it does now. So one approach is to implement
the simplest option, and add any needed complexity back in when and if
people complain. ;-)
Thanx, Paul
On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> > > On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > > > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> > > >> On 12/16/24 17:46, Paul E. McKenney wrote:
> > > >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> > > >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> > > >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> > > >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> > > >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > > >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> > > >>>>>>>>> kvfree_call_rcu()?
> > > >>>>>>>>>
> > > >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> > > >>>>>>>> implementation.
> > > >>>>>>>
> > > >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> > > >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> > > >>>>>>>
> > > >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> > > >>>>>> only one overhead it is about driving objects in pages. For a small
> > > >>>>>> system it can be critical because we allocate.
> > > >>>>>>
> > > >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> > > >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> > > >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> > > >>>>>
> > > >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> > > >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> > > >>>>>
> > > >>>> Yep, i also was thinking about that without saying it :)
> > > >>>
> > > >>> Works for me as well!
> > > >>
> > > >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> > > >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> > > >> not a show-stopper here.
> > > >>
> > > >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> > > >> to control whether we use the full blown batching implementation or the
> > > >> simple call_rcu() implmentation, and realized it's not straightforward and
> > > >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> > > >> internals :)
> > > >>
> > > >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> > > >>
> > > >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> > > >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> > > >> there are other facilities the batching implementation needs that only
> > > >> exists in the TREE_RCU implementation
> > > >>
> > > >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> > > >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> > > >> memory systems are fine with the simple implementation.
> > > >>
> > > >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> > > >>
> > > >> AFAICS I can't just make the simple implementation do call_rcu() on
> > > >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> > > >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> > > >> but no such equivalent exists in TREE_RCU. Am I right?
> > > >>
> > > >> Possible solution: teach TREE_RCU callback invocation to handle
> > > >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> > > >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> > > >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> > > >> but RCU has to special case it still.
> > > >>
> > > >> Possible solution 2: instead of the special offset handling, SLUB provides a
> > > >> callback function, which will determine pointer to the object from the
> > > >> pointer to a middle of it without knowing the rcu_head offset.
> > > >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> > > >> performant anyway so we might not care.
> > > >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> > > >>
> > > >> Thoughts?
> > > >>
> > > > For the call_rcu() and to be able to reclaim over it we need to patch the
> > > > tree.c(please note TINY already works):
> > > >
> > > > <snip>
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b1f883fcd918..ab24229dfa73 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > > > debug_rcu_head_unqueue(rhp);
> > > >
> > > > rcu_lock_acquire(&rcu_callback_map);
> > > > - trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > >
> > > > f = rhp->func;
> > > > - debug_rcu_head_callback(rhp);
> > > > - WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > > - f(rhp);
> > > >
> > > > + if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > > > + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > > > + kvfree((void *) rhp - (unsigned long) f);
> > > > + } else {
> > > > + trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > > + debug_rcu_head_callback(rhp);
> > > > + WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > > + f(rhp);
> > > > + }
> > > > rcu_lock_release(&rcu_callback_map);
> > >
> > > Right so that's the first Possible solution, but without the #ifdef. So
> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> > > batching is done in slab and this function is never called with an offset.
> > >
> > Or fulfilling a missing functionality? TREE is broken in that sense
> > whereas a TINY handles it without any issues.
> >
> > It can be called for SLUB_TINY option, just call_rcu() instead of
> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
>
> Would this make sense?
>
> if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
>
> Just to be repetitive, other alternatives include:
>
> 1. Take advantage of SLOB being no longer with us.
>
> 2. Get rid of Tiny RCU's special casing of kfree_rcu(), and then
> eliminate the above "if" statement in favor of its "else" clause.
>
> 3. Make Tiny RCU implement a trivial version of kfree_rcu() that
> passes a list through RCU.
>
> I don't have strong feelings, and am happy to defer to your guys'
> decision.
If I may chime in with an opinion, I think the cleanest approach would
be to not special-case the func pointer and instead provide a callback
from the SLAB layer which does the kfree. Then get rid of
__is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
the overhead of function calling, but I highly doubt that it is going
to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
is a kfree slow-path. I feel in the long run, this will also be more
maintainable.
Or is there a reason other than the theoretical function call overhead
why this may not work?
thanks,
- Joel
On 1/22/25 16:04, Joel Fernandes wrote:
> On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>>
>> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
>> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
>> > > Right so that's the first Possible solution, but without the #ifdef. So
>> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
>> > > batching is done in slab and this function is never called with an offset.
>> > >
>> > Or fulfilling a missing functionality? TREE is broken in that sense
>> > whereas a TINY handles it without any issues.
>> >
>> > It can be called for SLUB_TINY option, just call_rcu() instead of
>> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
>>
>> Would this make sense?
>>
>> if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
>>
>> Just to be repetitive, other alternatives include:
>>
>> 1. Take advantage of SLOB being no longer with us.
>>
>> 2. Get rid of Tiny RCU's special casing of kfree_rcu(), and then
>> eliminate the above "if" statement in favor of its "else" clause.
>>
>> 3. Make Tiny RCU implement a trivial version of kfree_rcu() that
>> passes a list through RCU.
>>
>> I don't have strong feelings, and am happy to defer to your guys'
>> decision.
>
> If I may chime in with an opinion, I think the cleanest approach would
> be to not special-case the func pointer and instead provide a callback
> from the SLAB layer which does the kfree. Then get rid of
Right.
> __is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
> the overhead of function calling, but I highly doubt that it is going
> to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
> is a kfree slow-path. I feel in the long run, this will also be more
> maintainable.
>
> Or is there a reason other than the theoretical function call overhead
> why this may not work?
My concern was about the overhead of calculating the pointer to the object
starting address, but it's just some arithmetics, so it should be
negligible. So I'm prototyping this approach now. Thanks all.
> thanks,
>
> - Joel
On Wed, Jan 22, 2025 at 05:43:06PM +0100, Vlastimil Babka wrote:
> On 1/22/25 16:04, Joel Fernandes wrote:
> > On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>
> >> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> >> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> >> > > Right so that's the first Possible solution, but without the #ifdef. So
> >> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> >> > > batching is done in slab and this function is never called with an offset.
> >> > >
> >> > Or fulfilling a missing functionality? TREE is broken in that sense
> >> > whereas a TINY handles it without any issues.
> >> >
> >> > It can be called for SLUB_TINY option, just call_rcu() instead of
> >> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
> >>
> >> Would this make sense?
> >>
> >> if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
> >>
> >> Just to be repetitive, other alternatives include:
> >>
> >> 1. Take advantage of SLOB being no longer with us.
> >>
> >> 2. Get rid of Tiny RCU's special casing of kfree_rcu(), and then
> >> eliminate the above "if" statement in favor of its "else" clause.
> >>
> >> 3. Make Tiny RCU implement a trivial version of kfree_rcu() that
> >> passes a list through RCU.
> >>
> >> I don't have strong feelings, and am happy to defer to your guys'
> >> decision.
> >
> > If I may chime in with an opinion, I think the cleanest approach would
> > be to not special-case the func pointer and instead provide a callback
> > from the SLAB layer which does the kfree. Then get rid of
>
> Right.
>
> > __is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
> > the overhead of function calling, but I highly doubt that it is going
> > to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
> > is a kfree slow-path. I feel in the long run, this will also be more
> > maintainable.
> >
> > Or is there a reason other than the theoretical function call overhead
> > why this may not work?
>
> My concern was about the overhead of calculating the pointer to the object
> starting address, but it's just some arithmetics, so it should be
> negligible. So I'm prototyping this approach now. Thanks all.
>
You are welcome :)
--
Uladzislau Rezki
On Wed, Jan 22, 2025 at 11:43 AM Vlastimil Babka <vbabka@suse.cz> wrote: [...] > > __is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is > > the overhead of function calling, but I highly doubt that it is going > > to be a bottleneck, considering that the __is_kvfree_rcu_offset() path > > is a kfree slow-path. I feel in the long run, this will also be more > > maintainable. > > > > Or is there a reason other than the theoretical function call overhead > > why this may not work? > > My concern was about the overhead of calculating the pointer to the object > starting address, but it's just some arithmetics, so it should be > negligible. So I'm prototyping this approach now. Thanks all. Ah, that's a valid point. Looking forward to reviewing the patch and hope it works out! thanks, - Joel
On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote: > On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote: > > Hello! > > > > This is v2. It is based on the Linux 6.13-rc2. The first version is > > here: > > > > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/ > > > > The difference between v1 and v2 is that, the preparation process is > > done in original place instead and after that there is one final move. > > Looks good, will include in slab/for-next > > I think patch 5 should add more explanation to the commit message - the > subthread started by Christoph could provide content :) Can you summarize so > I can amend the commit log? > > Also how about a followup patch moving the rcu-tiny implementation of > kvfree_call_rcu()? > > We might also consider moving the kfree_rcu*() entry points from rcupdate.h > to slab.h, what do you think, is it a more logical place for them? There's > some risk that files that include rcupdate.h and not slab.h would break, so > that will need some build testing... Moving the RCU Tiny implemention (or maybe even just retiring it in favor of the RCU Tree implementation) and moving the entry points make sense to me! Thanx, Paul > Thanks, > Vlastimil > > > Uladzislau Rezki (Sony) (5): > > rcu/kvfree: Initialize kvfree_rcu() separately > > rcu/kvfree: Move some functions under CONFIG_TINY_RCU > > rcu/kvfree: Adjust names passed into trace functions > > rcu/kvfree: Adjust a shrinker name > > mm/slab: Move kvfree_rcu() into SLAB > > > > include/linux/slab.h | 1 + > > init/main.c | 1 + > > kernel/rcu/tree.c | 876 ------------------------------------------ > > mm/slab_common.c | 880 +++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 882 insertions(+), 876 deletions(-) > > >
On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote: > This is v2. It is based on the Linux 6.13-rc2. The first version is > here: I do not see any use of internal slab interfaces by this code. It seems to be using rcu internals though. So it would best be placed with the rcu code.
On Thu, Dec 12, 2024 at 10:30:28AM -0800, Christoph Lameter (Ampere) wrote: > On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote: > > > This is v2. It is based on the Linux 6.13-rc2. The first version is > > here: > > I do not see any use of internal slab interfaces by this code. It seems to > be using rcu internals though. So it would best be placed with the rcu > code. That is indeed the current state. The point of moving it is to later take advantage of internal slab state. Thanx, Paul
On Thu, Dec 12, 2024 at 11:10:47AM -0800, Paul E. McKenney wrote: > On Thu, Dec 12, 2024 at 10:30:28AM -0800, Christoph Lameter (Ampere) wrote: > > On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote: > > > > > This is v2. It is based on the Linux 6.13-rc2. The first version is > > > here: > > > > I do not see any use of internal slab interfaces by this code. It seems to > > be using rcu internals though. So it would best be placed with the rcu > > code. > > That is indeed the current state. The point of moving it is to later > take advantage of internal slab state. > And, in fact we already have some integrations. For example a barrier has been added for slab caches. -- Uladzislau Rezki
On Thu, Dec 12, 2024 at 10:30:28AM -0800, Christoph Lameter (Ampere) wrote: > On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote: > > > This is v2. It is based on the Linux 6.13-rc2. The first version is > > here: > > I do not see any use of internal slab interfaces by this code. It seems to > be using rcu internals though. So it would best be placed with the rcu > code. > I think, later on there will be integration. This is a step forward to place it under mm where it should be. -- Uladzislau Rezki
© 2016 - 2025 Red Hat, Inc.