RE: [RFC 00/14] Dynamic Kernel Stacks

David Laight posted 14 patches 1 year, 10 months ago
Only 0 patches received!
There is a newer version of this series
RE: [RFC 00/14] Dynamic Kernel Stacks
Posted by David Laight 1 year, 10 months ago
...
> - exit_to_user_mode(): Unmap the extra three pages and return them to
> the per-CPU cache. This function is called late in the kernel exit
> path.

Why bother?
The number of tasks running in user_mode is limited to the number
of cpu. So the most you save is a few pages per cpu.

Plausibly a context switch from an interrupt (eg timer tick)
could suspend a task without saving anything on its kernel stack.
But how common is that in reality?
In a well behaved system most user threads will be sleeping on
some event - so with an active kernel stack.

I can also imagine that something like sys_epoll() actually
sleeps with not (that much) stack allocated.
But the calls into all the drivers to check the status
could easily go into another page.
You really wouldn't to keep allocating and deallocating
physical pages (which I'm sure has TLB flushing costs)
all the time for those processes.

Perhaps a 'garbage collection' activity that reclaims stack
pages from processes that have been asleep 'for a while' or
haven't used a lot of stack recently (if hw 'page accessed'
bit can be used) might make more sense.

Have you done any instrumentation to see which system calls
are actually using more than (say) 8k of stack?
And how often the user threads that make those calls do so?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC 00/14] Dynamic Kernel Stacks
Posted by Pasha Tatashin 1 year, 10 months ago
On Mon, Mar 18, 2024 at 11:39 AM David Laight <David.Laight@aculab.com> wrote:
>
> ...
> > - exit_to_user_mode(): Unmap the extra three pages and return them to
> > the per-CPU cache. This function is called late in the kernel exit
> > path.
>
> Why bother?
> The number of tasks running in user_mode is limited to the number
> of cpu. So the most you save is a few pages per cpu.
>
> Plausibly a context switch from an interrupt (eg timer tick)
> could suspend a task without saving anything on its kernel stack.
> But how common is that in reality?
> In a well behaved system most user threads will be sleeping on
> some event - so with an active kernel stack.
>
> I can also imagine that something like sys_epoll() actually
> sleeps with not (that much) stack allocated.
> But the calls into all the drivers to check the status
> could easily go into another page.
> You really wouldn't to keep allocating and deallocating
> physical pages (which I'm sure has TLB flushing costs)
> all the time for those processes.
>
> Perhaps a 'garbage collection' activity that reclaims stack
> pages from processes that have been asleep 'for a while' or
> haven't used a lot of stack recently (if hw 'page accessed'
> bit can be used) might make more sense.
>
> Have you done any instrumentation to see which system calls
> are actually using more than (say) 8k of stack?
> And how often the user threads that make those calls do so?

None of our syscalls, AFAIK.

Pasha

>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Re: [RFC 00/14] Dynamic Kernel Stacks
Posted by Pasha Tatashin 1 year, 10 months ago
> > Perhaps a 'garbage collection' activity that reclaims stack
> > pages from processes that have been asleep 'for a while' or
> > haven't used a lot of stack recently (if hw 'page accessed'
> > bit can be used) might make more sense.

Interesting approach: if we take the original Andy's suggestion of
using an access bit to know which stack pages were never used during
context switch and unmap them, and as an extra optimization have a
"garbage collector" that unmaps stacks in some long sleeping rarely
used threads.  I will think about this.

Thanks,
Pasha