From nobody Thu Apr 2 03:17:12 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2203A3B232E for ; Sat, 28 Feb 2026 18:28:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772303295; cv=none; b=lO9MvYZgqeS10J3g+Ymc99BHRz7iDs1A1IxabULaBPWvP2hLXl9Wef4kiZGrz8LT5eqbkRYL+OaMMJaQUUSqMT2NH4M0pkfsdnET+E0de7pHqom4Ch2sKsHOO3WT8b0hjeqg2duKpYJ/lzRbXBWiW0V9YlsJYC5Hik6fthH9T6c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772303295; c=relaxed/simple; bh=JK0WuN2W9e9vtsZTDx1aUaSC3joEllv3YfQdmUK5FOY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cO6q9EFJmbk/4fQrR60JtNdK5VoDwYDZTw1f2W1ktSmtGURoMXYbZ5EZVofU6jflbEVTgdJxgNpPe0SCiOIQzC1TnP3QC7wfCQ/JOyxvwk/Mgq8/qwa5e9kPiGpsMZWbxkna7OrJG2GY0jMD8DzHqZ7nh5c9MW0hm6w95tMWL18= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=h1nM2Xau; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="h1nM2Xau" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772303293; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YdUUSuhz4rUHhjIdCOjmR2mCL+qfpF5oQsqY8Qu1sxc=; b=h1nM2XauXdezwiFNOgIUwbCNFP9WxaVPEhycAOxPeadF4CS/1GuqwrSkZEkNv5CRYjuTNm i6h+4aKOS3XTvG92vF6n9tHzfy4NRGLrRJeDT06YwSwERp6zaF8OS1iEqeNqsoU74+ldqa qJDTHWls7Qweg29sPtaN6c/d5Y3rEIw= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-477-TMnmMC0uNmeNydVTSkWD-w-1; Sat, 28 Feb 2026 13:28:11 -0500 X-MC-Unique: TMnmMC0uNmeNydVTSkWD-w-1 X-Mimecast-MFC-AGG-ID: TMnmMC0uNmeNydVTSkWD-w_1772303290 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 43C7B1956095; Sat, 28 Feb 2026 18:28:10 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.2.16.6]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 942CF1800351; Sat, 28 Feb 2026 18:28:07 +0000 (UTC) From: Waiman Long To: Paul Moore , Eric Paris , Christian Brauner , Al Viro , Jan Kara Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, audit@vger.kernel.org, Richard Guy Briggs , Ricardo Robaina , Waiman Long Subject: [PATCH v4 1/2] fs: Add a pool of extra fs->pwd references to fs_struct Date: Sat, 28 Feb 2026 13:27:56 -0500 Message-ID: <20260228182757.90528-2-longman@redhat.com> In-Reply-To: <20260228182757.90528-1-longman@redhat.com> References: <20260228182757.90528-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" When the audit subsystem is enabled, it can do a lot of get_fs_pwd() calls to get references to fs->pwd and then releasing those references back with path_put() later. That may cause a lot of spinlock contention on a single pwd's dentry lock because of the constant changes to the reference count when there are many processes on the same working directory actively doing open/close system calls. This can cause noticeable performance regresssion when compared with the case where the audit subsystem is turned off especially on systems with a lot of CPUs which is becoming more common these days. A simple and elegant solution to avoid this kind of performance regression is to add a common pool of extra fs->pwd references inside the fs_struct. When a caller needs a pwd reference, it can borrow one from pool, if available, to avoid an explicit path_get(). When it is time to release the reference, it can put it back into the common pool if fs->pwd isn't changed before without doing a path_put(). We still need to acquire the fs's spinlock, but fs_struct is more distributed and it is less common to have many tasks sharing a single fs_struct. A new set of get_fs_pwd_pool/put_fs_pwd_pool() APIs are introduced with this patch to enable other subsystems to acquire and release a pwd reference from the common pool without doing unnecessary path_get/path_put(). Besides fs/fs_struct.c, the copy_mnt_ns() function of fs/namespace.c is also modified to properly handle the extra pwd references, if available. Signed-off-by: Waiman Long Reviewed-by: Richard Guy Briggs Reviewed-by: Waiman Long --- fs/fs_struct.c | 26 +++++++++++++++++++++----- fs/namespace.c | 8 ++++++++ include/linux/fs_struct.h | 28 +++++++++++++++++++++++++++- 3 files changed, 56 insertions(+), 6 deletions(-) diff --git a/fs/fs_struct.c b/fs/fs_struct.c index 394875d06fd6..43af98e0a10c 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -33,15 +33,19 @@ void set_fs_root(struct fs_struct *fs, const struct pat= h *path) void set_fs_pwd(struct fs_struct *fs, const struct path *path) { struct path old_pwd; + int count; =20 path_get(path); write_seqlock(&fs->seq); old_pwd =3D fs->pwd; fs->pwd =3D *path; + count =3D fs->pwd_refs + 1; + fs->pwd_refs =3D 0; write_sequnlock(&fs->seq); =20 if (old_pwd.dentry) - path_put(&old_pwd); + while (count--) + path_put(&old_pwd); } =20 static inline int replace_path(struct path *p, const struct path *old, con= st struct path *new) @@ -63,10 +67,15 @@ void chroot_fs_refs(const struct path *old_root, const = struct path *new_root) task_lock(p); fs =3D p->fs; if (fs) { - int hits =3D 0; + int hits; + write_seqlock(&fs->seq); + hits =3D replace_path(&fs->pwd, old_root, new_root); + if (hits && fs->pwd_refs) { + count +=3D fs->pwd_refs; + fs->pwd_refs =3D 0; + } hits +=3D replace_path(&fs->root, old_root, new_root); - hits +=3D replace_path(&fs->pwd, old_root, new_root); while (hits--) { count++; path_get(new_root); @@ -82,8 +91,11 @@ void chroot_fs_refs(const struct path *old_root, const s= truct path *new_root) =20 void free_fs_struct(struct fs_struct *fs) { + int count =3D fs->pwd_refs + 1; + path_put(&fs->root); - path_put(&fs->pwd); + while (count--) + path_put(&fs->pwd); kmem_cache_free(fs_cachep, fs); } =20 @@ -111,6 +123,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old) if (fs) { fs->users =3D 1; fs->in_exec =3D 0; + fs->pwd_refs =3D 0; seqlock_init(&fs->seq); fs->umask =3D old->umask; =20 @@ -118,7 +131,10 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old) fs->root =3D old->root; path_get(&fs->root); fs->pwd =3D old->pwd; - path_get(&fs->pwd); + if (old->pwd_refs) + old->pwd_refs--; + else + path_get(&fs->pwd); read_sequnlock_excl(&old->seq); } return fs; diff --git a/fs/namespace.c b/fs/namespace.c index 854f4fc66469..96d41f00add6 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4272,6 +4272,14 @@ struct mnt_namespace *copy_mnt_ns(u64 flags, struct = mnt_namespace *ns, * as belonging to new namespace. We have already acquired a private * fs_struct, so tsk->fs->lock is not needed. */ + if (new_fs) + WARN_ON_ONCE(new_fs->users !=3D 1); + + /* Release the extra pwd references of new_fs, if present. */ + while (new_fs && new_fs->pwd_refs) { + path_put(&new_fs->pwd); + new_fs->pwd_refs--; + } p =3D old; q =3D new; while (p) { diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h index 0070764b790a..f8cf3b280398 100644 --- a/include/linux/fs_struct.h +++ b/include/linux/fs_struct.h @@ -8,10 +8,11 @@ #include =20 struct fs_struct { - int users; seqlock_t seq; + int users; int umask; int in_exec; + int pwd_refs; /* A pool of extra pwd references */ struct path root, pwd; } __randomize_layout; =20 @@ -40,6 +41,31 @@ static inline void get_fs_pwd(struct fs_struct *fs, stru= ct path *pwd) read_sequnlock_excl(&fs->seq); } =20 +/* Acquire a pwd reference from the pwd_refs pool, if available */ +static inline void get_fs_pwd_pool(struct fs_struct *fs, struct path *pwd) +{ + read_seqlock_excl(&fs->seq); + *pwd =3D fs->pwd; + if (fs->pwd_refs) + fs->pwd_refs--; + else + path_get(pwd); + read_sequnlock_excl(&fs->seq); +} + +/* Release a pwd reference back to the pwd_refs pool, if appropriate */ +static inline void put_fs_pwd_pool(struct fs_struct *fs, struct path *pwd) +{ + read_seqlock_excl(&fs->seq); + if ((fs->pwd.dentry =3D=3D pwd->dentry) && (fs->pwd.mnt =3D=3D pwd->mnt))= { + fs->pwd_refs++; + pwd =3D NULL; + } + read_sequnlock_excl(&fs->seq); + if (pwd) + path_put(pwd); +} + extern bool current_chrooted(void); =20 static inline int current_umask(void) --=20 2.53.0 From nobody Thu Apr 2 03:17:12 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB9B13B3BE0 for ; Sat, 28 Feb 2026 18:28:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772303299; cv=none; b=iePrjNQ1EKmm/rfHAQ6uWNNhIo64ZlIucsxCpS1G3V9BmKuKUpgINkbupUWb3s+sulbUUMygkoKcrRxx/FizUMoUxvFWaGGUtnIMC8dJPQpdz8Otft9mdQqy/xLvpr/ajDRavm/cXgTRpOqohcXya26s853W0du+d1muDEAuoH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772303299; c=relaxed/simple; bh=6gz20nsM24m2S/dck8ZmFQ35FSYPIPpH8Q92e6D/gI8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Fi2hkuo+ujz46UtBCUcPu09mDAdPU4OAhnHyIJknyTEEsobzzuJJHr7kkJDVtX8dsvqRIGSOaaDmwR+gQCn+TC3w10GphnC8LmtoUoYXtgPl3ncGs9iQTq6AN7irpMBXHt5/IVCOlSxs2hsTEsANqlI7UNuFr5iyVsxfsvA8pRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fndDdVxB; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fndDdVxB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772303296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IoIBsfhbhKEi7ETTfW3xNVUCfo/hcm8J/JCFyb6okZw=; b=fndDdVxBi5UzqOmEUqwIAvCEFD9zi35PnkP260jQVj1CPu8mhCpGRywCpe99FEwQG6C7YY KLUWNQ4fks7rkyki9j91QW4DgY5elMaX7bc8ixGzbxtMhmzKADwiA1w1PIM1V0K/3cJYmA HqxcWnT4P7Vr9vNXagTYManeUCLVGt8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-427-GF4WwlKAM_uhwhWHQGt9Dg-1; Sat, 28 Feb 2026 13:28:15 -0500 X-MC-Unique: GF4WwlKAM_uhwhWHQGt9Dg-1 X-Mimecast-MFC-AGG-ID: GF4WwlKAM_uhwhWHQGt9Dg_1772303294 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C5F3C195608A; Sat, 28 Feb 2026 18:28:13 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.2.16.6]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B32781800286; Sat, 28 Feb 2026 18:28:10 +0000 (UTC) From: Waiman Long To: Paul Moore , Eric Paris , Christian Brauner , Al Viro , Jan Kara Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, audit@vger.kernel.org, Richard Guy Briggs , Ricardo Robaina , Waiman Long Subject: [PATCH v4 2/2] audit: Use the new {get,put}_fs_pwd_pool() APIs to get/put pwd references Date: Sat, 28 Feb 2026 13:27:57 -0500 Message-ID: <20260228182757.90528-3-longman@redhat.com> In-Reply-To: <20260228182757.90528-1-longman@redhat.com> References: <20260228182757.90528-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" When the audit subsystem is enabled, it can do a lot of get_fs_pwd() calls to get references to fs->pwd and then releasing those references back with path_put() later. That may cause a lot of spinlock contention on a single pwd's dentry lock because of the constant changes to the reference count when there are many processes on the same working directory actively doing open/close system calls. This can cause noticeable performance regresssion when compared with the case where the audit subsystem is turned off especially on systems with a lot of CPUs which is becoming more common these days. To avoid this kind of performance regression, use the new get_fs_pwd_pool() and put_fs_pwd_pool() APIs to acquire and release a fs->pwd reference. This should greatly reduce the number of path_get() and path_put() calls that are needed. After installing a test kernel with auditing enabled and counters added to track the get_fs_pwd_pool() and put_fs_pwd_pool() calls on a 2-socket 96-core test system and running a parallel kernel build, the counter values for this particular test run were shown below. fs_get_path=3D307,903 fs_get_pool=3D56,583,192 fs_put_path=3D6,209 fs_put_pool=3D56,885,147 Of the about 57M calls to get_fs_pwd_pool() and put_fs_pwd_pool(), the majority of them are just updating the pwd_refs counters. Only less than 1% of those calls require an actual path_get() and path_put() calls. The difference between fs_get_path and fs_put_path represents the extra pwd references that were still stored in various active task->fs's when the counter values were retrieved. It can be seen that the number of path_get() and path_put() calls are reduced by quite a lot. Signed-off-by: Waiman Long Acked-by: Paul Moore Reviewed-by: Richard Guy Briggs Reviewed-by: Waiman Long --- kernel/auditsc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index f6af6a8f68c4..26ba61eabfb0 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -931,6 +931,9 @@ static inline void audit_free_names(struct audit_contex= t *context) { struct audit_names *n, *next; =20 + if (!context->name_count) + return; /* audit_alloc_name() has not been called */ + list_for_each_entry_safe(n, next, &context->names_list, list) { list_del(&n->list); if (n->name) @@ -939,7 +942,7 @@ static inline void audit_free_names(struct audit_contex= t *context) kfree(n); } context->name_count =3D 0; - path_put(&context->pwd); + put_fs_pwd_pool(current->fs, &context->pwd); context->pwd.dentry =3D NULL; context->pwd.mnt =3D NULL; } @@ -2165,7 +2168,7 @@ static struct audit_names *audit_alloc_name(struct au= dit_context *context, =20 context->name_count++; if (!context->pwd.dentry) - get_fs_pwd(current->fs, &context->pwd); + get_fs_pwd_pool(current->fs, &context->pwd); return aname; } =20 --=20 2.53.0