RE: [RFC 00/14] Dynamic Kernel Stacks

David Laight posted 14 patches 1 year, 10 months ago
Only 0 patches received!
RE: [RFC 00/14] Dynamic Kernel Stacks
Posted by David Laight 1 year, 10 months ago
From: Pasha Tatashin
> Sent: 18 March 2024 15:31
> 
> On Mon, Mar 18, 2024 at 11:19 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Mar 18, 2024 at 11:09:47AM -0400, Pasha Tatashin wrote:
> > > The TLB load is going to be exactly the same as today, we already use
> > > small pages for VMA mapped stacks. We won't need to have extra
> > > flushing either, the mappings are in the kernel space, and once pages
> > > are removed from the page table, no one is going to access that VA
> > > space until that thread enters the kernel again. We will need to
> > > invalidate the VA range only when the pages are mapped, and only on
> > > the local cpu.
> >
> > No; we can pass pointers to our kernel stack to other threads.  The
> > obvious one is a mutex; we put a mutex_waiter on our own stack and
> > add its list_head to the mutex's waiter list.  I'm sure you can
> > think of many other places we do this (eg wait queues, poll(), select(),
> > etc).
> 
> Hm, it means that stack is sleeping in the kernel space, and has its
> stack pages mapped and invalidated on the local CPU, but access from
> the remote CPU to that stack pages would be problematic.
> 
> I think we still won't need IPI, but VA-range invalidation is actually
> needed on unmaps, and should happen during context switch so every
> time we go off-cpu. Therefore, what Brian/Andy have suggested makes
> more sense instead of kernel/enter/exit paths.

I think you'll need to broadcast an invalidate.
Consider:
CPU A: task allocates extra pages and adds something to some list.
CPU B: accesses that data and maybe modifies it.
	Some page-table walk setup ut the TLB.
CPU A: task detects the modify, removes the item from the list,
	collapses back the stack and sleeps.
	Stack pages freed.
CPU A: task wakes up (on the same cpu for simplicity).
	Goes down a deep stack and puts an item on a list.
	Different physical pages are allocated.
CPU B: accesses the associated KVA.
	It better not have a cached TLB.

Doesn't that need an IPI?

Freeing the pages is much harder than allocating them.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC 00/14] Dynamic Kernel Stacks
Posted by Pasha Tatashin 1 year, 10 months ago
> I think you'll need to broadcast an invalidate.
> Consider:
> CPU A: task allocates extra pages and adds something to some list.
> CPU B: accesses that data and maybe modifies it.
>         Some page-table walk setup ut the TLB.
> CPU A: task detects the modify, removes the item from the list,
>         collapses back the stack and sleeps.
>         Stack pages freed.
> CPU A: task wakes up (on the same cpu for simplicity).
>         Goes down a deep stack and puts an item on a list.
>         Different physical pages are allocated.
> CPU B: accesses the associated KVA.
>         It better not have a cached TLB.
>
> Doesn't that need an IPI?

Yes, this is annoying. If we share a stack with another CPU, then get
a new stack, and share it again with another CPU we get in trouble.
Yet, IPI during context switch would kill the performance :-\

I wonder if there is a way to optimize this scenario like doing IPI
invalidation only after stack sharing?

Pasha