fs,ipc: batch RCU synchronization in free_ipc

[PATCH RFC] fs,ipc: batch RCU synchronization in free_ipc

Posted by Rik van Riel 3 years, 8 months ago

TL;DR: it runs better than it looks, and I am looking for ideas on how to make it look better

---8<---

The following program will get ENOSPACE sooner or later, because
the way ipc namespaces get freed currently results in only one
ipc namespace being freed every RCU grace period.

int main()
{
	int i;

	for (i = 0; i < 1000000; i++) {
		if (unshare(CLONE_NEWIPC) < 0)
			error(EXIT_FAILURE, errno, "unshare");
	}
}

There are various ways to solve this issue, they all come down to
batching the RCU synchronization, so multiple ipc namespaces can
be freed in the same RCU grace period.

Unfortunately there seems to be a tradeoff between temporarily
allocating things on the stack, and having slightly uglier code,
or adding a struct rcu_work to the struct vfsmount.

I am not entirely happy with the way this code looks, and hoping
for suggestions on how to improve it.

However, I am quite happy with how this code runs. Between batching
the kern_unmount RCU synchronization, moving to the expedited RCU
grace period in kern_unmount_array, and grabbing things off the
llist that were added after the work item started, freeing ipc
namespaces is 2-3 orders of magnitude faster than before, and
able to keep up with the test case above.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 ipc/namespace.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/ipc/namespace.c b/ipc/namespace.c
index e1fcaedba4fa..ba33015f1a23 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -127,10 +127,6 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 
 static void free_ipc_ns(struct ipc_namespace *ns)
 {
-	/* mq_put_mnt() waits for a grace period as kern_unmount()
-	 * uses synchronize_rcu().
-	 */
-	mq_put_mnt(ns);
 	sem_exit_ns(ns);
 	msg_exit_ns(ns);
 	shm_exit_ns(ns);
@@ -144,14 +140,33 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	kfree(ns);
 }
 
+#define FREE_IPC_BATCH 64
 static LLIST_HEAD(free_ipc_list);
 static void free_ipc(struct work_struct *unused)
 {
-	struct llist_node *node = llist_del_all(&free_ipc_list);
-	struct ipc_namespace *n, *t;
+	struct ipc_namespace *ipc_nses[FREE_IPC_BATCH];
+	struct vfsmount *mounts[FREE_IPC_BATCH];
+	int i, j;
+
+ next_batch:
+	i = 0;
+	for (i = 0; !llist_empty(&free_ipc_list) && i < FREE_IPC_BATCH; i++) {
+		struct llist_node *node = llist_del_first(&free_ipc_list);
+		struct ipc_namespace *n = llist_entry(node,
+						      struct ipc_namespace,
+						      mnt_llist);
+		ipc_nses[i] = n;
+		mounts[i] = n->mq_mnt;
+	}
+
+	/* Consolidate the RCU synchronization across the whole batch. */
+	kern_unmount_array(mounts, i);
+
+	for (j = 0; j < i; j++)
+		free_ipc_ns(ipc_nses[j]);
 
-	llist_for_each_entry_safe(n, t, node, mnt_llist)
-		free_ipc_ns(n);
+	if (i == FREE_IPC_BATCH)
+		goto next_batch;
 }
 
 /*
-- 
2.24.1

Re: [PATCH RFC] fs,ipc: batch RCU synchronization in free_ipc

Posted by Al Viro 3 years, 8 months ago

On Mon, Aug 15, 2022 at 05:26:20PM -0400, Rik van Riel wrote:
> TL;DR: it runs better than it looks, and I am looking for ideas on how to make it look better

> Unfortunately there seems to be a tradeoff between temporarily
> allocating things on the stack, and having slightly uglier code,
> or adding a struct rcu_work to the struct vfsmount.
> 
> I am not entirely happy with the way this code looks, and hoping
> for suggestions on how to improve it.
> 
> However, I am quite happy with how this code runs. Between batching
> the kern_unmount RCU synchronization, moving to the expedited RCU
> grace period in kern_unmount_array, and grabbing things off the
> llist that were added after the work item started, freeing ipc
> namespaces is 2-3 orders of magnitude faster than before, and
> able to keep up with the test case above.

IMO you are going in wrong direction with that; it's a long story,
and I've partial writeup on that, but I won't have it ready for
posting until the end of the week.  Put it that way - there's
a possibility of reorganizing the way mount refcounts work,
eliminating this synchronize_rcu().  RCU delay still has to happen
in some form, but we get smarter ways to wait for it.

Anyway, it's about 15 pages of writeup right now, and it's going
to be a major massage, part for the coming cycle, part for the
next one.  I'll post it to fsdevel later this week, will make sure
to Cc you.

Re: [PATCH RFC] fs,ipc: batch RCU synchronization in free_ipc

Posted by Rik van Riel 3 years, 8 months ago

On Mon, 2022-08-15 at 22:40 +0100, Al Viro wrote:
> On Mon, Aug 15, 2022 at 05:26:20PM -0400, Rik van Riel wrote:
> > TL;DR: it runs better than it looks, and I am looking for ideas on
> > how to make it look better
> 
> > Unfortunately there seems to be a tradeoff between temporarily
> > allocating things on the stack, and having slightly uglier code,
> > or adding a struct rcu_work to the struct vfsmount.
> > 
> > I am not entirely happy with the way this code looks, and hoping
> > for suggestions on how to improve it.
> > 
> 
> IMO you are going in wrong direction with that; it's a long story,
> and I've partial writeup on that, but I won't have it ready for
> posting until the end of the week.  Put it that way - there's
> a possibility of reorganizing the way mount refcounts work,
> eliminating this synchronize_rcu().  RCU delay still has to happen
> in some form, but we get smarter ways to wait for it.
> 
I'm more than happy to abandon this approach.

I looked at this a bunch, and could not find a
nice way to do this the way the VFS currently
works, and am looking forward to getting this
done in a better way!


-- 
All Rights Reversed.