From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-1908.mail.infomaniak.ch (smtp-1908.mail.infomaniak.ch [185.125.25.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66EB33B2FF0 for ; Thu, 12 Mar 2026 10:05:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.125.25.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309920; cv=none; b=JaD8CIZkkegetOO/7vOgb3a3PF/3sXEm7p40b4U37yiyNoWCTsxkRt4TSytTOR2jc3ismoPSPJiOhiz+zlP//VdD2r0DhU3PYV4xc1fkF4+gdSSLbdpyyNXJrebIEzsNnlO9IhgZm9O5KFLp2eopG5Os77dLmSkq0ROefHsrXps= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309920; c=relaxed/simple; bh=g1MVcFocwFprOD05JbUOdmnhJUbD1bzZqb2K4auCHi8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gZih6B8yy4aRfHD8llHqO5CyvINd098oYuPj02e9U0YzTaKetGULK7izQlrqlUXCEUi23ge/BxVtMLUrwjAZSETWwzFn7YX/857UUxsatX+uSMz5p2z4Txg52SRdgd1ZGqQCZer+SETCjIRpdz8ok5XI6WaJ7AsdVq31Vq+QgJs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=aJgntXRi; arc=none smtp.client-ip=185.125.25.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="aJgntXRi" Received: from smtp-3-0001.mail.infomaniak.ch (smtp-3-0001.mail.infomaniak.ch [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsd5L76zBVQ; Thu, 12 Mar 2026 11:05:09 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309909; bh=EvBBUs2A2WPjeqOyCK1l/ch6Sn9K8GiRl+PW2utO5xU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aJgntXRilr/mmkEhlQ8mlmQAUzPC4SoxQ59PDPGtJ7FjiwHJjBc24DKMuxBFuhNCn 2l55Vy8v4rn8ioB5R2EOVybtme3PRo0M+d0lGDhtuB6qFQGgeXjW++FMx5Plqcwa4n jIfZhFqzrbxOOQtxcngQh0wLIT29se/ijJDSmecw= Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsd1Gx7zDHv; Thu, 12 Mar 2026 11:05:09 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces Date: Thu, 12 Mar 2026 11:04:34 +0100 Message-ID: <20260312100444.2609563-2-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha From: Christian Brauner All namespace types now share the same ns_common infrastructure. Extend this to include a security blob so LSMs can start managing namespaces uniformly without having to add one-off hooks or security fields to every individual namespace type. Add a ns_security pointer to ns_common and the corresponding lbs_ns blob size to lsm_blob_sizes. Allocation and freeing hooks are called from the common __ns_common_init() and __ns_common_free() paths so every namespace type gets covered in one go. All information about the namespace type and the appropriate casting helpers to get at the containing namespace are available via ns_common making it straightforward for LSMs to differentiate when they need to. A namespace_install hook is called from validate_ns() during setns(2) giving LSMs a chance to enforce policy on namespace transitions. Individual namespace types can still have their own specialized security hooks when needed. This is just the common baseline that makes it easy to track and manage namespaces from the security side without requiring every namespace type to reinvent the wheel. Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Christian Brauner Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c2= 8758e1f@kernel.org --- include/linux/lsm_hook_defs.h | 3 ++ include/linux/lsm_hooks.h | 1 + include/linux/ns/ns_common_types.h | 3 ++ include/linux/security.h | 20 ++++++++ kernel/nscommon.c | 12 +++++ kernel/nsproxy.c | 8 +++- security/lsm_init.c | 2 + security/security.c | 76 ++++++++++++++++++++++++++++++ 8 files changed, 124 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 8c42b4bde09c..fefd3aa6d8f4 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -260,6 +260,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned= long arg2, LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) LSM_HOOK(int, 0, userns_create, const struct cred *cred) +LSM_HOOK(int, 0, namespace_alloc, struct ns_common *ns) +LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns) +LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_c= ommon *ns) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp, struct lsm_prop *prop) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index d48bf0ad26f4..3e7afe76e86c 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -111,6 +111,7 @@ struct lsm_blob_sizes { unsigned int lbs_ipc; unsigned int lbs_key; unsigned int lbs_msg_msg; + unsigned int lbs_ns; unsigned int lbs_perf_event; unsigned int lbs_task; unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */ diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_commo= n_types.h index 0014fbc1c626..170288e2e895 100644 --- a/include/linux/ns/ns_common_types.h +++ b/include/linux/ns/ns_common_types.h @@ -115,6 +115,9 @@ struct ns_common { struct dentry *stashed; const struct proc_ns_operations *ops; unsigned int inum; +#ifdef CONFIG_SECURITY + void *ns_security; +#endif union { struct ns_tree; struct rcu_head ns_rcu; diff --git a/include/linux/security.h b/include/linux/security.h index 83a646d72f6f..611b9098367d 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -67,6 +67,7 @@ enum fs_value_type; struct watch; struct watch_notification; struct lsm_ctx; +struct nsset; =20 /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -80,6 +81,7 @@ struct lsm_ctx; =20 struct ctl_table; struct audit_krule; +struct ns_common; struct user_namespace; struct timezone; =20 @@ -533,6 +535,9 @@ int security_task_prctl(int option, unsigned long arg2,= unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); int security_create_user_ns(const struct cred *cred); +int security_namespace_alloc(struct ns_common *ns); +void security_namespace_free(struct ns_common *ns); +int security_namespace_install(const struct nsset *nsset, struct ns_common= *ns); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *= prop); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1407,6 +1412,21 @@ static inline int security_create_user_ns(const stru= ct cred *cred) return 0; } =20 +static inline int security_namespace_alloc(struct ns_common *ns) +{ + return 0; +} + +static inline void security_namespace_free(struct ns_common *ns) +{ +} + +static inline int security_namespace_install(const struct nsset *nsset, + struct ns_common *ns) +{ + return 0; +} + static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { diff --git a/kernel/nscommon.c b/kernel/nscommon.c index bdc3c86231d3..de774e374f9d 100644 --- a/kernel/nscommon.c +++ b/kernel/nscommon.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include =20 @@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, c= onst struct proc_ns_ope =20 refcount_set(&ns->__ns_ref, 1); ns->stashed =3D NULL; +#ifdef CONFIG_SECURITY + ns->ns_security =3D NULL; +#endif ns->ops =3D ops; ns->ns_id =3D 0; ns->ns_type =3D ns_type; @@ -77,6 +81,13 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, = const struct proc_ns_ope ret =3D proc_alloc_inum(&ns->inum); if (ret) return ret; + + ret =3D security_namespace_alloc(ns); + if (ret) { + proc_free_inum(ns->inum); + return ret; + } + /* * Tree ref starts at 0. It's incremented when namespace enters * active use (installed in nsproxy) and decremented when all @@ -91,6 +102,7 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, = const struct proc_ns_ope =20 void __ns_common_free(struct ns_common *ns) { + security_namespace_free(ns); proc_free_inum(ns->inum); } =20 diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index 259c4b4f1eeb..f0b30d1907e7 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -379,7 +379,13 @@ static int prepare_nsset(unsigned flags, struct nsset = *nsset) =20 static inline int validate_ns(struct nsset *nsset, struct ns_common *ns) { - return ns->ops->install(nsset, ns); + int ret; + + ret =3D ns->ops->install(nsset, ns); + if (ret) + return ret; + + return security_namespace_install(nsset, ns); } =20 /* diff --git a/security/lsm_init.c b/security/lsm_init.c index 573e2a7250c4..637c2d65e131 100644 --- a/security/lsm_init.c +++ b/security/lsm_init.c @@ -301,6 +301,7 @@ static void __init lsm_prepare(struct lsm_info *lsm) lsm_blob_size_update(&blobs->lbs_ipc, &blob_sizes.lbs_ipc); lsm_blob_size_update(&blobs->lbs_key, &blob_sizes.lbs_key); lsm_blob_size_update(&blobs->lbs_msg_msg, &blob_sizes.lbs_msg_msg); + lsm_blob_size_update(&blobs->lbs_ns, &blob_sizes.lbs_ns); lsm_blob_size_update(&blobs->lbs_perf_event, &blob_sizes.lbs_perf_event); lsm_blob_size_update(&blobs->lbs_sock, &blob_sizes.lbs_sock); @@ -446,6 +447,7 @@ int __init security_init(void) lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc); lsm_pr("blob(key) size %d\n", blob_sizes.lbs_key); lsm_pr("blob(msg_msg)_size %d\n", blob_sizes.lbs_msg_msg); + lsm_pr("blob(ns) size %d\n", blob_sizes.lbs_ns); lsm_pr("blob(sock) size %d\n", blob_sizes.lbs_sock); lsm_pr("blob(superblock) size %d\n", blob_sizes.lbs_superblock); lsm_pr("blob(perf_event) size %d\n", blob_sizes.lbs_perf_event); diff --git a/security/security.c b/security/security.c index 67af9228c4e9..dcf073cac848 100644 --- a/security/security.c +++ b/security/security.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -355,6 +356,19 @@ static int lsm_superblock_alloc(struct super_block *sb) GFP_KERNEL); } =20 +/** + * lsm_ns_alloc - allocate a composite namespace blob + * @ns: the namespace that needs a blob + * + * Allocate the namespace blob for all the modules + * + * Returns 0, or -ENOMEM if memory can't be allocated. + */ +static int lsm_ns_alloc(struct ns_common *ns) +{ + return lsm_blob_alloc(&ns->ns_security, blob_sizes.lbs_ns, GFP_KERNEL); +} + /** * lsm_fill_user_ctx - Fill a user space lsm_ctx structure * @uctx: a userspace LSM context to be filled @@ -3255,6 +3269,68 @@ int security_create_user_ns(const struct cred *cred) return call_int_hook(userns_create, cred); } =20 +/** + * security_namespace_alloc() - Allocate LSM security data for a namespace + * @ns: the namespace being allocated + * + * Allocate and attach security data to the namespace. The namespace type + * is available via ns->ns_type, and the owning user namespace (if any) + * via ns->ops->owner(ns). + * + * Return: Returns 0 if successful, otherwise < 0 error code. + */ +int security_namespace_alloc(struct ns_common *ns) +{ + int rc; + + rc =3D lsm_ns_alloc(ns); + if (unlikely(rc)) + return rc; + + rc =3D call_int_hook(namespace_alloc, ns); + if (unlikely(rc)) + security_namespace_free(ns); + + return rc; +} + +/** + * security_namespace_free() - Release LSM security data from a namespace + * @ns: the namespace being freed + * + * Release security data attached to the namespace. Called before the + * namespace structure is freed. + * + * Note: The namespace may be freed via kfree_rcu(). LSMs must use + * RCU-safe freeing for any data that might be accessed by concurrent + * RCU readers. + */ +void security_namespace_free(struct ns_common *ns) +{ + if (!ns->ns_security) + return; + + call_void_hook(namespace_free, ns); + + kfree(ns->ns_security); + ns->ns_security =3D NULL; +} + +/** + * security_namespace_install() - Check permission to install a namespace + * @nsset: the target nsset being configured + * @ns: the namespace being installed + * + * Check permission before allowing a namespace to be installed into the + * process's set of namespaces via setns(2). + * + * Return: Returns 0 if permission is granted, otherwise < 0 error code. + */ +int security_namespace_install(const struct nsset *nsset, struct ns_common= *ns) +{ + return call_int_hook(namespace_install, nsset, ns); +} + /** * security_ipc_permission() - Check if sysv ipc access is allowed * @ipcp: ipc permission structure --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-190d.mail.infomaniak.ch (smtp-190d.mail.infomaniak.ch [185.125.25.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 958213BE62F for ; Thu, 12 Mar 2026 10:05:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.125.25.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309920; cv=none; b=S+5kkSnqN7WiP+puCMVxkcwltBlmr5sZtderBxa4oHICRLQkAeltSQKoaUEQjJs4urmJ/eGOPGI9dva7SIEL+1XQb+Jnr27SXsfiYjmh6Hz1iLRvfwC/b76dbFijp09IAFMynT+/l/XNol45PBF+/eVF+3fXLVHTxRNgGc+N/6g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309920; c=relaxed/simple; bh=uAt1SEEjlEw8NdttcqoSaWto6L2RFLgJ4V40lLbvCWk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Sg5s1LWsvXl6zdK35VhzqrSkmzCD3rYwl/gaVhRWrLUyL2nA1ATHJr6S7gwERrRykPHB3DCf2Vm27ALJZ28J3KHo+rFtghBXTDe3WJiXpWj8gKBtSeGtLKKqhnL2g10D1l8Efcc+FF/4ehKy6PXTgujXLYvk6U/+EAunYE+ySk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=kdk8WLzX; arc=none smtp.client-ip=185.125.25.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="kdk8WLzX" Received: from smtp-4-0001.mail.infomaniak.ch (unknown [IPv6:2001:1600:7:10::a6c]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsg1644z1B44; Thu, 12 Mar 2026 11:05:11 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309911; bh=NgNgRqo0jn8LY997ouAFmQMP4oRHoI4EfIOlgxtBdUg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kdk8WLzXsoNaChiBy/GWFnoBfvNWhWjNWWekwpbTWbRM+VTcwLatmmNd34N8ycXb1 9tH+w56ea3b0cAqOl8Fe5Ljpp4GNwFvqMAuaZ9btD6dX4H4oHv5XhQaTOZ36XntUiK Njoi/GyAMOPN6e5vzDk7pXuFz3tlZFjyzDdnxNrs= Received: from unknown by smtp-4-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsf3c8hzJR; Thu, 12 Mar 2026 11:05:10 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records Date: Thu, 12 Mar 2026 11:04:35 +0100 Message-ID: <20260312100444.2609563-3-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Add a new LSM audit data type LSM_AUDIT_DATA_NS that logs namespace information in audit records. Two fields are provided, matching the field names of struct ns_common: - ns_type: the CLONE_NEW* flag identifying the namespace type, logged in hexadecimal. - inum: the proc inode number identifying a specific namespace instance. Namespace inode numbers are allocated by proc_alloc_inum() via ida_alloc_max() bounded to UINT_MAX, so the value always fits in 32 bits. A new audit data type is needed because no existing LSM_AUDIT_DATA_* type carries namespace information. The closest alternatives (e.g. LSM_AUDIT_DATA_TASK or LSM_AUDIT_DATA_NONE with custom strings) would either lose the namespace type or require ad-hoc formatting that bypasses the structured audit data union. Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Signed-off-by: Micka=C3=ABl Sala=C3=BCn Reviewed-by: Christian Brauner --- include/linux/lsm_audit.h | 5 +++++ security/lsm_audit.c | 4 ++++ 2 files changed, 9 insertions(+) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 382c56a97bba..6e20a56b8c22 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -78,6 +78,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_NOTIFICATION 16 #define LSM_AUDIT_DATA_ANONINODE 17 #define LSM_AUDIT_DATA_NLMSGTYPE 18 +#define LSM_AUDIT_DATA_NS 19 union { struct path path; struct dentry *dentry; @@ -100,6 +101,10 @@ struct common_audit_data { int reason; const char *anonclass; u16 nlmsg_type; + struct { + u32 ns_type; + unsigned int inum; + } ns; } u; /* this union contains LSM specific data */ union { diff --git a/security/lsm_audit.c b/security/lsm_audit.c index 7d623b00495c..7f71a77c1c12 100644 --- a/security/lsm_audit.c +++ b/security/lsm_audit.c @@ -403,6 +403,10 @@ void audit_log_lsm_data(struct audit_buffer *ab, case LSM_AUDIT_DATA_NLMSGTYPE: audit_log_format(ab, " nl-msgtype=3D%hu", a->u.nlmsg_type); break; + case LSM_AUDIT_DATA_NS: + audit_log_format(ab, " namespace_type=3D0x%x namespace_inum=3D%u", + a->u.ns.ns_type, a->u.ns.inum); + break; } /* switch (a->type) */ } =20 --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-bc0f.mail.infomaniak.ch (smtp-bc0f.mail.infomaniak.ch [45.157.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCD872DF128 for ; Thu, 12 Mar 2026 10:23:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.157.188.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773311041; cv=none; b=D3suW4cakWGMoLc26mcjX5qTqCpKBkLIqcpZX0J2hGsHBmhKso8FLMAxMANJltqF/iJ5ptcvYvknv6g64QO6XuRQloWGyfJgFQ93JD/lWfQ2uHS+I83AaVAMPkcPG9QUY2/hTgVd5/mcT3O0a5VZv0q0ZRmVJPak+TIL7M6hvJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773311041; c=relaxed/simple; bh=Kuk5M65TLV2DqlBhfIIA4Tfd/b9WlaqiLZsMXaDZRQQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZWSewg1QRf1oNCBmL9zB4Ri783Co08wDTJeyAw/heIksYYDT1xwJ+utJ3vqkgdK0/ZIjotVQc9j+OFNHA8alYtbFa3ud7LaGM0H+Bs58LCwxE4YUCgX8Ut3indtmlf6JI6rTX0JIBI14pFCR/Iwy4924O8XCfvXXwC7GY2ZI+zI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=jyIGR1tP; arc=none smtp.client-ip=45.157.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="jyIGR1tP" Received: from smtp-4-0000.mail.infomaniak.ch (smtp-4-0000.mail.infomaniak.ch [10.7.10.107]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsh2pkqz18rZ; Thu, 12 Mar 2026 11:05:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309912; bh=jw0yh2VxEdeafuAPsIR+ZYaijzn/nG0FVrH3u/L1ovk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jyIGR1tPRhv6TL7PsKJ1qo7f5c2QhfawtPZGigiLyY7aptF9vJvRxlAxhz1Vw5h5D UBCIQcULB6C887sFgFjD07MFE57vSPHyh9KLc/85cAX3+n4jAazNCpBKNqco50M44u oDWkJHyVKKHi+/+CJUNvfn7pJCwKafpYmB/tDD7g= Received: from unknown by smtp-4-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsg6LPVzBpK; Thu, 12 Mar 2026 11:05:11 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Date: Thu, 12 Mar 2026 11:04:36 +0100 Message-ID: <20260312100444.2609563-4-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Introduce the FOR_EACH_NS_TYPE(X) macro as the single source of truth for the set of (struct type, CLONE_NEW* flag) pairs that define Linux namespace types. Currently, the list of CLONE_NEW* flags is duplicated inline in multiple call sites and would need another copy in each new consumer. This makes it easy to miss one when a new namespace type is added. Derive two things from the X-macro: - CLONE_NS_ALL: Bitmask of all known CLONE_NEW* flags, usable as a validity mask or iteration bound. - ns_common_type(): Rewritten to use the X-macro via a leading-comma _Generic pattern, so the struct-to-flag mapping stays in sync with the flag set automatically. Replace the inline flag enumerations in copy_namespaces(), unshare_nsproxy_namespaces(), check_setns_flags(), and ksys_unshare() with CLONE_NS_ALL. When a new namespace type is added, only FOR_EACH_NS_TYPE needs to be updated; CLONE_NS_ALL, ns_common_type(), and all the call sites pick up the change automatically. Cc: Christian Brauner Cc: G=C3=BCnther Noack Signed-off-by: Micka=C3=ABl Sala=C3=BCn Reviewed-by: Christian Brauner --- include/linux/ns/ns_common_types.h | 44 +++++++++++++++++++++++------- kernel/fork.c | 7 ++--- kernel/nsproxy.c | 13 +++------ 3 files changed, 41 insertions(+), 23 deletions(-) diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_commo= n_types.h index 170288e2e895..5cfe0ce3c881 100644 --- a/include/linux/ns/ns_common_types.h +++ b/include/linux/ns/ns_common_types.h @@ -7,6 +7,7 @@ #include #include #include +#include =20 struct cgroup_namespace; struct dentry; @@ -187,15 +188,38 @@ struct ns_common { struct user_namespace *: (IS_ENABLED(CONFIG_USER_NS) ? &userns_operati= ons : NULL), \ struct uts_namespace *: (IS_ENABLED(CONFIG_UTS_NS) ? &utsns_operatio= ns : NULL)) =20 -#define ns_common_type(__ns) \ - _Generic((__ns), \ - struct cgroup_namespace *: CLONE_NEWCGROUP, \ - struct ipc_namespace *: CLONE_NEWIPC, \ - struct mnt_namespace *: CLONE_NEWNS, \ - struct net *: CLONE_NEWNET, \ - struct pid_namespace *: CLONE_NEWPID, \ - struct time_namespace *: CLONE_NEWTIME, \ - struct user_namespace *: CLONE_NEWUSER, \ - struct uts_namespace *: CLONE_NEWUTS) +/* + * FOR_EACH_NS_TYPE - Canonical list of namespace types + * + * Enumerates all (struct type, CLONE_NEW* flag) pairs. This is the + * single source of truth used to derive ns_common_type() and + * CLONE_NS_ALL. When adding a new namespace type, add a single entry + * here; all consumers update automatically. + * + * @X: Callback macro taking (struct_name, clone_flag) as arguments. + */ +#define FOR_EACH_NS_TYPE(X) \ + X(cgroup_namespace, CLONE_NEWCGROUP) \ + X(ipc_namespace, CLONE_NEWIPC) \ + X(mnt_namespace, CLONE_NEWNS) \ + X(net, CLONE_NEWNET) \ + X(pid_namespace, CLONE_NEWPID) \ + X(time_namespace, CLONE_NEWTIME) \ + X(user_namespace, CLONE_NEWUSER) \ + X(uts_namespace, CLONE_NEWUTS) + +/* Bitmask of all known CLONE_NEW* flags. */ +#define _NS_TYPE_FLAG_OR(struct_name, flag) | (flag) +#define CLONE_NS_ALL (0 FOR_EACH_NS_TYPE(_NS_TYPE_F= LAG_OR)) + +/* + * ns_common_type - Map a namespace struct pointer to its CLONE_NEW* flag + * + * Uses a leading-comma pattern so the FOR_EACH_NS_TYPE expansion + * produces ", struct foo *: FLAG" entries without a trailing comma. + */ +#define _NS_TYPE_ASSOC(struct_name, flag) , struct struct_name *: (flag) + +#define ns_common_type(__ns) _Generic((__ns)FOR_EACH_NS_TYPE(_NS_TYPE_ASSO= C)) =20 #endif /* _LINUX_NS_COMMON_TYPES_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 65113a304518..767559acd060 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include @@ -3046,11 +3047,9 @@ void __init proc_caches_init(void) */ static int check_unshare_flags(unsigned long unshare_flags) { - if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND| + if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_SIGHAND| CLONE_VM|CLONE_FILES|CLONE_SYSVSEM| - CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET| - CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP| - CLONE_NEWTIME)) + CLONE_NS_ALL)) return -EINVAL; /* * Not implemented, but pretend it works if there is nothing diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index f0b30d1907e7..7181886331c8 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -170,9 +171,7 @@ int copy_namespaces(u64 flags, struct task_struct *tsk) struct user_namespace *user_ns =3D task_cred_xxx(tsk, user_ns); struct nsproxy *new_ns; =20 - if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | - CLONE_NEWPID | CLONE_NEWNET | - CLONE_NEWCGROUP | CLONE_NEWTIME)))) { + if (likely(!(flags & (CLONE_NS_ALL & ~CLONE_NEWUSER)))) { if ((flags & CLONE_VM) || likely(old_ns->time_ns_for_children =3D=3D old_ns->time_ns)) { get_nsproxy(old_ns); @@ -214,9 +213,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_fl= ags, struct user_namespace *user_ns; int err =3D 0; =20 - if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | - CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP | - CLONE_NEWTIME))) + if (!(unshare_flags & (CLONE_NS_ALL & ~CLONE_NEWUSER))) return 0; =20 user_ns =3D new_cred ? new_cred->user_ns : current_user_ns(); @@ -292,9 +289,7 @@ int exec_task_namespaces(void) =20 static int check_setns_flags(unsigned long flags) { - if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | - CLONE_NEWNET | CLONE_NEWTIME | CLONE_NEWUSER | - CLONE_NEWPID | CLONE_NEWCGROUP))) + if (!flags || (flags & ~CLONE_NS_ALL)) return -EINVAL; =20 #ifndef CONFIG_USER_NS --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-42ab.mail.infomaniak.ch (smtp-42ab.mail.infomaniak.ch [84.16.66.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E358C3B7741 for ; Thu, 12 Mar 2026 10:13:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=84.16.66.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310445; cv=none; b=coLVGfr8AO+EpghiqLbk/WgkGTYGbDVXM0ZsNWFk+3B+NlZ2Kd4fPjpb8x6sWCIk3Dgj6C8u2mKKvtNfume0z4FChPDTmRPRyKhaTBtASNI895Ua9KnvinsYbCZtG8fQJy4so3TNtR1Nk2BqNGOCwitWI1V79yHYIjfMJlfJ95A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310445; c=relaxed/simple; bh=WSGfULoaaoh9SZWdXCHWXj3qpO9/VQrgb53l57qTdDw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LL+hwyzoL375RgOxYLOaEdZnLWD+jSY91cPT2K0dn8XMM1dYxR9MQudz1VkwldbPinXhmCZ227Fekl3Nti0w/3Y94eH3gZDrHjfwxdtSwc6oZ20HgchXyIdeJZHwHxbVxotu8OGVjalcZVzgxW0gFJfMG5XkOr9r09Pe8WM+rtM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=fJlIkmir; arc=none smtp.client-ip=84.16.66.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="fJlIkmir" Received: from smtp-4-0000.mail.infomaniak.ch (unknown [IPv6:2001:1600:7:10::a6b]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsj5pJ7z197m; Thu, 12 Mar 2026 11:05:13 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309913; bh=MgD9cfGoZ73o8ZH114oUHG3bIOLlZf431npXrrlq/MM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fJlIkmirbI4pSu1aHf1X7zpIeN5ohdOHZ8C1Z79bBgqoBPhUT2eF7YqyqcRZg0o1f /gWeXtV3DsqI0thrq6IWcM1uOiudprIOAUuizN5aAsHiukklokPd98MhUo9lv5HHFs HVpxr418GVCgUxoI4bd5Ohx325oxGIYsOECEvDGw= Received: from unknown by smtp-4-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsj15Grz8Qc; Thu, 12 Mar 2026 11:05:13 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights Date: Thu, 12 Mar 2026 11:04:37 +0100 Message-ID: <20260312100444.2609563-5-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha The per-layer FAM in struct landlock_ruleset currently stores struct access_masks directly, but upcoming permission features (capability and namespace restrictions) need additional per-layer data beyond the handled-access bitfields. Introduce struct layer_rights as a wrapper around struct access_masks and rename the FAM from access_masks[] to layers[]. This makes room for future per-layer fields (e.g. allowed bitmasks) without modifying struct access_masks itself, which is also used as a lightweight parameter type for functions that only need the handled-access bitfields. No functional change. Cc: G=C3=BCnther Noack Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- security/landlock/access.h | 29 ++++++++++++++++++++++------- security/landlock/cred.h | 2 +- security/landlock/ruleset.c | 12 ++++++------ security/landlock/ruleset.h | 28 +++++++++++++++------------- security/landlock/syscalls.c | 2 +- 5 files changed, 45 insertions(+), 28 deletions(-) diff --git a/security/landlock/access.h b/security/landlock/access.h index 42c95747d7bd..b3e147771a0e 100644 --- a/security/landlock/access.h +++ b/security/landlock/access.h @@ -19,7 +19,7 @@ =20 /* * All access rights that are denied by default whether they are handled o= r not - * by a ruleset/layer. This must be ORed with all ruleset->access_masks[] + * by a ruleset/layer. This must be ORed with all ruleset->layers[] * entries when we need to get the absolute handled access masks, see * landlock_upgrade_handled_access_masks(). */ @@ -45,7 +45,7 @@ static_assert(BITS_PER_TYPE(access_mask_t) >=3D LANDLOCK_= NUM_SCOPE); /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */ static_assert(sizeof(unsigned long) >=3D sizeof(access_mask_t)); =20 -/* Ruleset access masks. */ +/* Handled access masks (bitfields only). */ struct access_masks { access_mask_t fs : LANDLOCK_NUM_ACCESS_FS; access_mask_t net : LANDLOCK_NUM_ACCESS_NET; @@ -61,6 +61,21 @@ union access_masks_all { static_assert(sizeof(typeof_member(union access_masks_all, masks)) =3D=3D sizeof(typeof_member(union access_masks_all, all))); =20 +/** + * struct layer_rights - Per-layer access configuration + * + * Wraps the handled-access bitfields together with any additional per-lay= er + * data (e.g. allowed bitmasks added by future patches). This is the elem= ent + * type of the &struct landlock_ruleset.layers FAM. + */ +struct layer_rights { + /** + * @handled: Bitmask of access rights handled (i.e. restricted) by + * this layer. + */ + struct access_masks handled; +}; + /** * struct layer_access_masks - A boolean matrix of layers and access rights * @@ -100,17 +115,17 @@ static_assert(BITS_PER_TYPE(deny_masks_t) >=3D static_assert(HWEIGHT(LANDLOCK_MAX_NUM_LAYERS) =3D=3D 1); =20 /* Upgrades with all initially denied by default access rights. */ -static inline struct access_masks -landlock_upgrade_handled_access_masks(struct access_masks access_masks) +static inline struct layer_rights +landlock_upgrade_handled_access_masks(struct layer_rights layer_rights) { /* * All access rights that are denied by default whether they are * explicitly handled or not. */ - if (access_masks.fs) - access_masks.fs |=3D _LANDLOCK_ACCESS_FS_INITIALLY_DENIED; + if (layer_rights.handled.fs) + layer_rights.handled.fs |=3D _LANDLOCK_ACCESS_FS_INITIALLY_DENIED; =20 - return access_masks; + return layer_rights; } =20 /* Checks the subset relation between access masks. */ diff --git a/security/landlock/cred.h b/security/landlock/cred.h index f287c56b5fd4..3e2a7e88710e 100644 --- a/security/landlock/cred.h +++ b/security/landlock/cred.h @@ -139,7 +139,7 @@ landlock_get_applicable_subject(const struct cred *cons= t cred, for (layer_level =3D domain->num_layers - 1; layer_level >=3D 0; layer_level--) { union access_masks_all layer =3D { - .masks =3D domain->access_masks[layer_level], + .masks =3D domain->layers[layer_level].handled, }; =20 if (layer.all & masks_all.all) { diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c index 181df7736bb9..a7f8be37ec31 100644 --- a/security/landlock/ruleset.c +++ b/security/landlock/ruleset.c @@ -32,7 +32,7 @@ static struct landlock_ruleset *create_ruleset(const u32 = num_layers) { struct landlock_ruleset *new_ruleset; =20 - new_ruleset =3D kzalloc_flex(*new_ruleset, access_masks, num_layers, + new_ruleset =3D kzalloc_flex(*new_ruleset, layers, num_layers, GFP_KERNEL_ACCOUNT); if (!new_ruleset) return ERR_PTR(-ENOMEM); @@ -48,7 +48,7 @@ static struct landlock_ruleset *create_ruleset(const u32 = num_layers) /* * hierarchy =3D NULL * num_rules =3D 0 - * access_masks[] =3D 0 + * layers[] =3D 0 */ return new_ruleset; } @@ -381,8 +381,8 @@ static int merge_ruleset(struct landlock_ruleset *const= dst, err =3D -EINVAL; goto out_unlock; } - dst->access_masks[dst->num_layers - 1] =3D - landlock_upgrade_handled_access_masks(src->access_masks[0]); + dst->layers[dst->num_layers - 1] =3D + landlock_upgrade_handled_access_masks(src->layers[0]); =20 /* Merges the @src inode tree. */ err =3D merge_tree(dst, src, LANDLOCK_KEY_INODE); @@ -464,8 +464,8 @@ static int inherit_ruleset(struct landlock_ruleset *con= st parent, goto out_unlock; } /* Copies the parent layer stack and leaves a space for the new layer. */ - memcpy(child->access_masks, parent->access_masks, - flex_array_size(parent, access_masks, parent->num_layers)); + memcpy(child->layers, parent->layers, + flex_array_size(parent, layers, parent->num_layers)); =20 if (WARN_ON_ONCE(!parent->hierarchy)) { err =3D -EINVAL; diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h index 889f4b30301a..900c47eb0216 100644 --- a/security/landlock/ruleset.h +++ b/security/landlock/ruleset.h @@ -146,7 +146,7 @@ struct landlock_ruleset { * section. This is only used by * landlock_put_ruleset_deferred() when @usage reaches zero. * The fields @lock, @usage, @num_rules, @num_layers and - * @access_masks are then unused. + * @layers are then unused. */ struct work_struct work_free; struct { @@ -173,9 +173,10 @@ struct landlock_ruleset { */ u32 num_layers; /** - * @access_masks: Contains the subset of filesystem and - * network actions that are restricted by a ruleset. - * A domain saves all layers of merged rulesets in a + * @layers: Per-layer access configuration, including + * handled access masks and allowed permission + * bitmasks. A domain saves all layers of merged + * rulesets in a * stack (FAM), starting from the first layer to the * last one. These layers are used when merging * rulesets, for user space backward compatibility @@ -184,7 +185,7 @@ struct landlock_ruleset { * layers are set once and never changed for the * lifetime of the ruleset. */ - struct access_masks access_masks[]; + struct layer_rights layers[] __counted_by(num_layers); }; }; }; @@ -224,7 +225,8 @@ static inline void landlock_get_ruleset(struct landlock= _ruleset *const ruleset) * * @domain: Landlock ruleset (used as a domain) * - * Return: An access_masks result of the OR of all the domain's access mas= ks. + * Return: An access_masks result of the OR of all the domain's handled ac= cess + * masks. */ static inline struct access_masks landlock_union_access_masks(const struct landlock_ruleset *const domain) @@ -234,7 +236,7 @@ landlock_union_access_masks(const struct landlock_rules= et *const domain) =20 for (layer_level =3D 0; layer_level < domain->num_layers; layer_level++) { union access_masks_all layer =3D { - .masks =3D domain->access_masks[layer_level], + .masks =3D domain->layers[layer_level].handled, }; =20 matches.all |=3D layer.all; @@ -252,7 +254,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *co= nst ruleset, =20 /* Should already be checked in sys_landlock_create_ruleset(). */ WARN_ON_ONCE(fs_access_mask !=3D fs_mask); - ruleset->access_masks[layer_level].fs |=3D fs_mask; + ruleset->layers[layer_level].handled.fs |=3D fs_mask; } =20 static inline void @@ -264,7 +266,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *c= onst ruleset, =20 /* Should already be checked in sys_landlock_create_ruleset(). */ WARN_ON_ONCE(net_access_mask !=3D net_mask); - ruleset->access_masks[layer_level].net |=3D net_mask; + ruleset->layers[layer_level].handled.net |=3D net_mask; } =20 static inline void @@ -275,7 +277,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const = ruleset, =20 /* Should already be checked in sys_landlock_create_ruleset(). */ WARN_ON_ONCE(scope_mask !=3D mask); - ruleset->access_masks[layer_level].scope |=3D mask; + ruleset->layers[layer_level].handled.scope |=3D mask; } =20 static inline access_mask_t @@ -283,7 +285,7 @@ landlock_get_fs_access_mask(const struct landlock_rules= et *const ruleset, const u16 layer_level) { /* Handles all initially denied by default access rights. */ - return ruleset->access_masks[layer_level].fs | + return ruleset->layers[layer_level].handled.fs | _LANDLOCK_ACCESS_FS_INITIALLY_DENIED; } =20 @@ -291,14 +293,14 @@ static inline access_mask_t landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset, const u16 layer_level) { - return ruleset->access_masks[layer_level].net; + return ruleset->layers[layer_level].handled.net; } =20 static inline access_mask_t landlock_get_scope_mask(const struct landlock_ruleset *const ruleset, const u16 layer_level) { - return ruleset->access_masks[layer_level].scope; + return ruleset->layers[layer_level].handled.scope; } =20 bool landlock_unmask_layers(const struct landlock_rule *const rule, diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c index 3b33839b80c7..2aa7b50d875f 100644 --- a/security/landlock/syscalls.c +++ b/security/landlock/syscalls.c @@ -341,7 +341,7 @@ static int add_rule_path_beneath(struct landlock_rulese= t *const ruleset, return -ENOMSG; =20 /* Checks that allowed_access matches the @ruleset constraints. */ - mask =3D ruleset->access_masks[0].fs; + mask =3D ruleset->layers[0].handled.fs; if ((path_beneath_attr.allowed_access | mask) !=3D mask) return -EINVAL; =20 --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-8fac.mail.infomaniak.ch (smtp-8fac.mail.infomaniak.ch [83.166.143.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BD453BED45 for ; Thu, 12 Mar 2026 10:14:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=83.166.143.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310493; cv=none; b=SA0jD7JRtzJd38V4kRj4pjRaXIeC7kXc1aRYb8+Cxnb95Rgh1dCvAUMqcSSHiwnQ+tHG0DCFMP6bJSIjyJFzpG9DWlBJ8gStxHnPyNoYtiJy9jt8aeXBQF2YB9FMIudJ8UY0BCkg6DKM6axwaeI3G9BhKr+fOKXIJq5J99bdi3g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310493; c=relaxed/simple; bh=iPHpY7xpQU6rvatYVMu+RN9uWis71tIZQEIaBPlLASI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SgS3Fw6JwtT5MnxTvjnAWq2J5BVxI7mKyE2jqgpPEw4x0T0Ln+Np7MzYaAXPefd9G9V5/gBlQVWKg4IklHZMLGNTffXDYD4qdxKFXOKjdAU8Qnhz93R8pwhl/Kv9D+OPZ7l3ZodysWRw8ebPrt73iFsSxYH0EnW3a8U2fEXv610= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=dkg7vTVu; arc=none smtp.client-ip=83.166.143.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="dkg7vTVu" Received: from smtp-3-0000.mail.infomaniak.ch (unknown [IPv6:2001:1600:4:17::246b]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsl0m09zBdr; Thu, 12 Mar 2026 11:05:15 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309915; bh=T+yooCcrnZgZ869TwrUV7zKt7D6IpnUIVe2Nv7/vaDs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dkg7vTVumi3qB6t2exKA1TSqpXAPpqLD+kREOsC3oysz7KSwVX5i95TQr00/4e5wV JULY4QMXyNXtk9tCdUF0+b5P7lF2EDmp1W6PGVOMWB2/Pq6mjPcNOeGTjSgiJl99UA 7ULUwgw6/ztk+AiK7f6RTzdFf+sO5E3OzWKsUjS0= Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsk3tYvzsfX; Thu, 12 Mar 2026 11:05:14 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions Date: Thu, 12 Mar 2026 11:04:38 +0100 Message-ID: <20260312100444.2609563-6-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Add Landlock enforcement for namespace entry via the LSM namespace_alloc and namespace_install hooks. This lets a sandboxed process restrict which namespace types it can acquire, using LANDLOCK_PERM_NAMESPACE_ENTER and per-type rules. Introduce the handled_perm field in struct landlock_ruleset_attr for permission categories that control broad operations enforced at single kernel chokepoints, achieving complete deny-by-default coverage. Each LANDLOCK_PERM_* flag names a gateway operation (use, enter) whose control transitively covers downstream operations. Rule values reference constants from other kernel subsystems (CLONE_NEW* for namespaces); unknown values are silently accepted because the allow-list denies them by default. See the "Ruleset restriction models" section in the kernel documentation for the full design rationale. Add two namespace hooks: - hook_namespace_alloc() fires during unshare(CLONE_NEW*) and clone(CLONE_NEW*) via __ns_common_init(), and checks the namespace type against the domain's allowed set. - hook_namespace_install() fires during setns() via validate_ns(), performing the same type-based check. Both hooks set namespace_type in the audit data; hook_namespace_install() also sets inum for the target namespace. Both hooks perform a pure bitmask check: if the namespace's CLONE_NEW* type is not in the layer's allowed set, the operation is denied. No domain ancestry bypass, no namespace creator tracking, just a flat per-layer allowed-types bitmask. Add the perm_rules bitfield to struct layer_rights (introduced by a preceding commit) to store per-layer namespace type bitmasks. The 8-bit NS field maps to the 8 known namespace types via landlock_ns_type_to_bit(), keeping the storage compact. LANDLOCK_RULE_NAMESPACE uses struct landlock_namespace_attr with an allowed_perm field (matching the pattern of allowed_access in existing rule types) and a namespace_types bitmask of CLONE_NEW* flags. Unknown namespace type bits are silently accepted for forward compatibility; they have no effect since the allow-list denies by default. User namespace creation does not require capabilities, so Landlock can restrict it directly. Non-user namespace types require CAP_SYS_ADMIN before the Landlock check is reached; when both LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE are handled, both must allow the operation. Five KUnit tests verify the landlock_ns_type_to_bit() and landlock_ns_types_to_bits() conversion helpers. Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- include/uapi/linux/landlock.h | 58 +++++- security/landlock/Makefile | 1 + security/landlock/access.h | 42 ++++- security/landlock/audit.c | 4 + security/landlock/audit.h | 1 + security/landlock/cred.h | 42 +++++ security/landlock/limits.h | 7 + security/landlock/ns.c | 188 +++++++++++++++++++ security/landlock/ns.h | 74 ++++++++ security/landlock/ruleset.c | 11 +- security/landlock/ruleset.h | 25 ++- security/landlock/setup.c | 2 + security/landlock/syscalls.c | 70 ++++++- tools/testing/selftests/landlock/base_test.c | 2 +- 14 files changed, 509 insertions(+), 18 deletions(-) create mode 100644 security/landlock/ns.c create mode 100644 security/landlock/ns.h diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index f88fa1f68b77..b76e656241df 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -51,6 +51,14 @@ struct landlock_ruleset_attr { * resources (e.g. IPCs). */ __u64 scoped; + /** + * @handled_perm: Bitmask of permissions (cf. `Permission flags`_) + * that this ruleset handles. Each permission controls a broad + * operation enforced at a kernel chokepoint: all instances of + * that operation are denied unless explicitly allowed by a rule. + * See Documentation/security/landlock.rst for the rationale. + */ + __u64 handled_perm; }; =20 /** @@ -153,6 +161,11 @@ enum landlock_rule_type { * landlock_net_port_attr . */ LANDLOCK_RULE_NET_PORT, + /** + * @LANDLOCK_RULE_NAMESPACE: Type of a &struct + * landlock_namespace_attr . + */ + LANDLOCK_RULE_NAMESPACE, }; =20 /** @@ -206,6 +219,24 @@ struct landlock_net_port_attr { __u64 port; }; =20 +/** + * struct landlock_namespace_attr - Namespace type definition + * + * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_NAMESPACE. + */ +struct landlock_namespace_attr { + /** + * @allowed_perm: Must be set to %LANDLOCK_PERM_NAMESPACE_ENTER. + */ + __u64 allowed_perm; + /** + * @namespace_types: Bitmask of namespace types (``CLONE_NEW*`` flags) + * that should be allowed to be entered under this rule. Unknown bits + * are silently ignored for forward compatibility. + */ + __u64 namespace_types; +}; + /** * DOC: fs_access * @@ -379,6 +410,31 @@ struct landlock_net_port_attr { /* clang-format off */ #define LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET (1ULL << 0) #define LANDLOCK_SCOPE_SIGNAL (1ULL << 1) -/* clang-format on*/ +/* clang-format on */ + +/** + * DOC: perm + * + * Permission flags + * ~~~~~~~~~~~~~~~~ + * + * These flags restrict broad operations enforced at kernel chokepoints. + * Each flag names a gateway operation whose control transitively covers + * an open-ended set of downstream operations. Handled permissions that + * are not explicitly allowed by a rule are denied by default. Rule + * values reference constants from other kernel subsystems; unknown values + * are silently accepted for forward compatibility since the allow-list + * denies them by default. + * See Documentation/security/landlock.rst for design details. + * + * - %LANDLOCK_PERM_NAMESPACE_ENTER: Restrict entering (creating or joining + * via :manpage:`setns(2)`) specific namespace types. A process in a + * Landlock domain that handles this permission is denied from entering + * namespace types that are not explicitly allowed by a + * %LANDLOCK_RULE_NAMESPACE rule. + */ +/* clang-format off */ +#define LANDLOCK_PERM_NAMESPACE_ENTER (1ULL << 0) +/* clang-format on */ =20 #endif /* _UAPI_LINUX_LANDLOCK_H */ diff --git a/security/landlock/Makefile b/security/landlock/Makefile index ffa7646d99f3..734aed4ac1bf 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -8,6 +8,7 @@ landlock-y :=3D \ cred.o \ task.o \ fs.o \ + ns.o \ tsync.o =20 landlock-$(CONFIG_INET) +=3D net.o diff --git a/security/landlock/access.h b/security/landlock/access.h index b3e147771a0e..9c67987a77ae 100644 --- a/security/landlock/access.h +++ b/security/landlock/access.h @@ -42,6 +42,8 @@ static_assert(BITS_PER_TYPE(access_mask_t) >=3D LANDLOCK_= NUM_ACCESS_FS); static_assert(BITS_PER_TYPE(access_mask_t) >=3D LANDLOCK_NUM_ACCESS_NET); /* Makes sure all scoped rights can be stored. */ static_assert(BITS_PER_TYPE(access_mask_t) >=3D LANDLOCK_NUM_SCOPE); +/* Makes sure all permission types can be stored. */ +static_assert(BITS_PER_TYPE(access_mask_t) >=3D LANDLOCK_NUM_PERM); /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */ static_assert(sizeof(unsigned long) >=3D sizeof(access_mask_t)); =20 @@ -50,6 +52,7 @@ struct access_masks { access_mask_t fs : LANDLOCK_NUM_ACCESS_FS; access_mask_t net : LANDLOCK_NUM_ACCESS_NET; access_mask_t scope : LANDLOCK_NUM_SCOPE; + access_mask_t perm : LANDLOCK_NUM_PERM; }; =20 union access_masks_all { @@ -61,14 +64,47 @@ union access_masks_all { static_assert(sizeof(typeof_member(union access_masks_all, masks)) =3D=3D sizeof(typeof_member(union access_masks_all, all))); =20 +/** + * struct perm_rules - Per-layer allowed bitmasks for permission types + * + * Compact bitfield struct holding the allowed bitmasks for permission + * types that use flat (non-tree) per-layer storage. All fields share + * a single 64-bit storage unit. + */ +struct perm_rules { + /** + * @ns: Allowed namespace types. Each bit corresponds to a + * sequential index assigned by the ``_LANDLOCK_NS_*`` enum + * (derived from ``FOR_EACH_NS_TYPE``). Bits are converted from + * ``CLONE_NEW*`` flags at rule-add time via + * ``landlock_ns_types_to_bits()`` and at enforcement time via + * ``landlock_ns_type_to_bit()``. + */ + u64 ns : LANDLOCK_NUM_PERM_NS; +}; + +static_assert(sizeof(struct perm_rules) =3D=3D sizeof(u64)); + /** * struct layer_rights - Per-layer access configuration * - * Wraps the handled-access bitfields together with any additional per-lay= er - * data (e.g. allowed bitmasks added by future patches). This is the elem= ent - * type of the &struct landlock_ruleset.layers FAM. + * Wraps the handled-access bitfields together with per-layer allowed + * bitmasks. This is the element type of the &struct + * landlock_ruleset.layers FAM. + * + * Unlike filesystem and network access rights, which are tracked per-obje= ct + * in red-black trees, namespace types use a flat bitmask because their + * keyspace is small and bounded (~8 namespace types). A single rule adds + * to the allowed set via bitwise OR; at enforcement time each layer is + * checked directly (no tree lookup needed). */ struct layer_rights { + /** + * @allowed: Per-layer allowed bitmasks for permission types. + * Placed before @handled to avoid an internal padding hole + * (8-byte perm_rules followed by 4-byte access_masks). + */ + struct perm_rules allowed; /** * @handled: Bitmask of access rights handled (i.e. restricted) by * this layer. diff --git a/security/landlock/audit.c b/security/landlock/audit.c index 60ff217ab95b..46a635893914 100644 --- a/security/landlock/audit.c +++ b/security/landlock/audit.c @@ -78,6 +78,10 @@ get_blocker(const enum landlock_request_type type, case LANDLOCK_REQUEST_SCOPE_SIGNAL: WARN_ON_ONCE(access_bit !=3D -1); return "scope.signal"; + + case LANDLOCK_REQUEST_NAMESPACE: + WARN_ON_ONCE(access_bit !=3D -1); + return "perm.namespace_enter"; } =20 WARN_ON_ONCE(1); diff --git a/security/landlock/audit.h b/security/landlock/audit.h index 56778331b58c..e9e52fb628f5 100644 --- a/security/landlock/audit.h +++ b/security/landlock/audit.h @@ -21,6 +21,7 @@ enum landlock_request_type { LANDLOCK_REQUEST_NET_ACCESS, LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET, LANDLOCK_REQUEST_SCOPE_SIGNAL, + LANDLOCK_REQUEST_NAMESPACE, }; =20 /* diff --git a/security/landlock/cred.h b/security/landlock/cred.h index 3e2a7e88710e..68067ff53ead 100644 --- a/security/landlock/cred.h +++ b/security/landlock/cred.h @@ -153,6 +153,48 @@ landlock_get_applicable_subject(const struct cred *con= st cred, return NULL; } =20 +/** + * landlock_perm_is_denied - Check if a permission bitmask request is deni= ed + * + * @domain: The enforced domain. + * @perm_bit: The LANDLOCK_PERM_* flag to check. + * @request_value: Compact bitmask to look for (e.g. result of + * ``landlock_ns_type_to_bit(CLONE_NEWNET)``). + * + * Iterate from the youngest layer to the oldest. For each layer that + * handles @perm_bit, check whether @request_value is present in the + * layer's allowed bitmask. Return on the first (youngest) denying + * layer. + * + * Return: The youngest denying layer + 1, or 0 if allowed. + */ +static inline size_t +landlock_perm_is_denied(const struct landlock_ruleset *const domain, + const access_mask_t perm_bit, const u64 request_value) +{ + ssize_t layer; + + for (layer =3D domain->num_layers - 1; layer >=3D 0; layer--) { + u64 allowed; + + if (!(domain->layers[layer].handled.perm & perm_bit)) + continue; + + switch (perm_bit) { + case LANDLOCK_PERM_NAMESPACE_ENTER: + allowed =3D domain->layers[layer].allowed.ns; + break; + default: + WARN_ON_ONCE(1); + return layer + 1; + } + + if (!(allowed & request_value)) + return layer + 1; + } + return 0; +} + __init void landlock_add_cred_hooks(void); =20 #endif /* _SECURITY_LANDLOCK_CRED_H */ diff --git a/security/landlock/limits.h b/security/landlock/limits.h index eb584f47288d..e361b653fcf5 100644 --- a/security/landlock/limits.h +++ b/security/landlock/limits.h @@ -12,6 +12,7 @@ =20 #include #include +#include #include =20 /* clang-format off */ @@ -31,6 +32,12 @@ #define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1) #define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE) =20 +#define LANDLOCK_LAST_PERM LANDLOCK_PERM_NAMESPACE_ENTER +#define LANDLOCK_MASK_PERM ((LANDLOCK_LAST_PERM << 1) - 1) +#define LANDLOCK_NUM_PERM __const_hweight64(LANDLOCK_MASK_PERM) + +#define LANDLOCK_NUM_PERM_NS __const_hweight64((u64)(CLONE_NS_ALL)) + #define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC #define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - = 1) =20 diff --git a/security/landlock/ns.c b/security/landlock/ns.c new file mode 100644 index 000000000000..fd9e00a295d2 --- /dev/null +++ b/security/landlock/ns.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock - Namespace hooks + * + * Copyright =C2=A9 2026 Cloudflare + */ + +#include +#include +#include +#include +#include +#include + +#include "audit.h" +#include "cred.h" +#include "limits.h" +#include "ns.h" +#include "ruleset.h" +#include "setup.h" + +/* Ensures the audit inum field can hold ns_common.inum without truncation= . */ +static_assert(sizeof(((struct common_audit_data *)NULL)->u.ns.inum) >=3D + sizeof(((struct ns_common *)NULL)->inum)); + +static const struct access_masks ns_perm =3D { + .perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, +}; + +/** + * hook_namespace_alloc - Check namespace entry permission for creation + * + * @ns: The namespace being initialized. + * + * Checks if the current domain allows entering (creating) this namespace + * type. Fires during unshare(2) and clone(2) via __ns_common_init() in + * kernel/nscommon.c. + * + * Return: 0 if allowed, -EPERM if namespace creation is denied. + */ +static int hook_namespace_alloc(struct ns_common *const ns) +{ + const struct landlock_cred_security *subject; + size_t denied_layer; + + WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type)); + + subject =3D + landlock_get_applicable_subject(current_cred(), ns_perm, NULL); + if (!subject) + return 0; + + denied_layer =3D landlock_perm_is_denied( + subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER, + landlock_ns_type_to_bit(ns->ns_type)); + if (!denied_layer) + return 0; + + landlock_log_denial(subject, &(struct landlock_request){ + .type =3D LANDLOCK_REQUEST_NAMESPACE, + .audit.type =3D LSM_AUDIT_DATA_NS, + .audit.u.ns.ns_type =3D ns->ns_type, + .layer_plus_one =3D denied_layer, + }); + return -EPERM; +} + +/** + * hook_namespace_install - Check namespace entry permission + * + * @nsset: The namespace set being modified. + * @ns: The namespace being entered. + * + * Checks if the current domain restricts entering this namespace type. + * Fires during setns(2) via validate_ns() in kernel/nsproxy.c. + * Uses the same type-based check as hook_namespace_alloc(): the + * restriction is on which namespace types the process can enter, + * regardless of who created the namespace. + * + * Return: 0 if entry is allowed, -EPERM if denied. + */ +static int hook_namespace_install(const struct nsset *nsset, + struct ns_common *ns) +{ + const struct landlock_cred_security *subject; + size_t denied_layer; + + WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type)); + + subject =3D + landlock_get_applicable_subject(current_cred(), ns_perm, NULL); + if (!subject) + return 0; + + denied_layer =3D landlock_perm_is_denied( + subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER, + landlock_ns_type_to_bit(ns->ns_type)); + if (!denied_layer) + return 0; + + landlock_log_denial(subject, &(struct landlock_request){ + .type =3D LANDLOCK_REQUEST_NAMESPACE, + .audit.type =3D LSM_AUDIT_DATA_NS, + .audit.u.ns.ns_type =3D ns->ns_type, + .audit.u.ns.inum =3D ns->inum, + .layer_plus_one =3D denied_layer, + }); + return -EPERM; +} + +static struct security_hook_list landlock_hooks[] __ro_after_init =3D { + LSM_HOOK_INIT(namespace_alloc, hook_namespace_alloc), + LSM_HOOK_INIT(namespace_install, hook_namespace_install), +}; + +__init void landlock_add_ns_hooks(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + &landlock_lsmid); +} + +#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST + +#include + +/* clang-format off */ +#define _TEST_NS_BIT(struct_name, flag) \ + do { \ + const u64 bit =3D landlock_ns_type_to_bit(flag); \ + KUNIT_EXPECT_NE(test, 0ULL, bit); \ + KUNIT_EXPECT_EQ(test, 0ULL, seen &bit); \ + seen |=3D bit; \ + } while (0); +/* clang-format on */ + +static void test_ns_type_to_bit(struct kunit *const test) +{ + u64 seen =3D 0; + + FOR_EACH_NS_TYPE(_TEST_NS_BIT) + + KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), seen); +} + +static void test_ns_type_to_bit_unknown(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_type_to_bit(CLONE_THREAD)); +} + +static void test_ns_types_to_bits_all(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), + landlock_ns_types_to_bits(CLONE_NS_ALL)); +} + +/* clang-format off */ +#define _TEST_NS_SINGLE(struct_name, flag) \ + KUNIT_EXPECT_EQ(test, landlock_ns_type_to_bit(flag), \ + landlock_ns_types_to_bits(flag)); +/* clang-format on */ + +static void test_ns_types_to_bits_single(struct kunit *const test) +{ + FOR_EACH_NS_TYPE(_TEST_NS_SINGLE) +} + +static void test_ns_types_to_bits_zero(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_types_to_bits(0)); +} + +static struct kunit_case test_cases[] =3D { + KUNIT_CASE(test_ns_type_to_bit), + KUNIT_CASE(test_ns_type_to_bit_unknown), + KUNIT_CASE(test_ns_types_to_bits_all), + KUNIT_CASE(test_ns_types_to_bits_single), + KUNIT_CASE(test_ns_types_to_bits_zero), + {} +}; + +static struct kunit_suite test_suite =3D { + .name =3D "landlock_ns", + .test_cases =3D test_cases, +}; + +kunit_test_suite(test_suite); + +#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */ diff --git a/security/landlock/ns.h b/security/landlock/ns.h new file mode 100644 index 000000000000..c731ecc08f8c --- /dev/null +++ b/security/landlock/ns.h @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock - Namespace hooks + * + * Copyright =C2=A9 2026 Cloudflare + */ + +#ifndef _SECURITY_LANDLOCK_NS_H +#define _SECURITY_LANDLOCK_NS_H + +#include +#include +#include +#include +#include + +#include "limits.h" + +/* _LANDLOCK_NS_CLONE_NEWCGROUP, */ +#define _LANDLOCK_NS_ENUM(struct_name, flag) _LANDLOCK_NS_##flag, + +/* _LANDLOCK_NS_CLONE_NEWCGROUP =3D 0, */ +enum { + FOR_EACH_NS_TYPE(_LANDLOCK_NS_ENUM) _LANDLOCK_NUM_NS_TYPES, +}; + +static_assert(_LANDLOCK_NUM_NS_TYPES =3D=3D LANDLOCK_NUM_PERM_NS); + +/* + * case CLONE_NEWCGROUP: + * return BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP); + */ +/* clang-format off */ +#define _LANDLOCK_NS_CASE(struct_name, flag) \ + case flag: \ + return BIT_ULL(_LANDLOCK_NS_##flag); +/* clang-format on */ + +static inline __attribute_const__ u64 +landlock_ns_type_to_bit(const unsigned long ns_type) +{ + switch (ns_type) { + FOR_EACH_NS_TYPE(_LANDLOCK_NS_CASE) + default: + WARN_ON_ONCE(1); + return 0; + } +} + +/* + * if (ns_types & CLONE_NEWCGROUP) + * bits |=3D BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP); + */ +/* clang-format off */ +#define _LANDLOCK_NS_CONVERT(struct_name, flag) \ + do { \ + if (ns_types & (flag)) \ + bits |=3D BIT_ULL(_LANDLOCK_NS_##flag); \ + } while (0); +/* clang-format on */ + +static inline __attribute_const__ u64 +landlock_ns_types_to_bits(const u64 ns_types) +{ + u64 bits =3D 0; + + WARN_ON_ONCE(ns_types & ~CLONE_NS_ALL); + FOR_EACH_NS_TYPE(_LANDLOCK_NS_CONVERT) + return bits; +} + +__init void landlock_add_ns_hooks(void); + +#endif /* _SECURITY_LANDLOCK_NS_H */ diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c index a7f8be37ec31..7321e2f19b03 100644 --- a/security/landlock/ruleset.c +++ b/security/landlock/ruleset.c @@ -53,15 +53,14 @@ static struct landlock_ruleset *create_ruleset(const u3= 2 num_layers) return new_ruleset; } =20 -struct landlock_ruleset * -landlock_create_ruleset(const access_mask_t fs_access_mask, - const access_mask_t net_access_mask, - const access_mask_t scope_mask) +struct landlock_ruleset *landlock_create_ruleset( + const access_mask_t fs_access_mask, const access_mask_t net_access_mask, + const access_mask_t scope_mask, const access_mask_t perm_mask) { struct landlock_ruleset *new_ruleset; =20 /* Informs about useless ruleset. */ - if (!fs_access_mask && !net_access_mask && !scope_mask) + if (!fs_access_mask && !net_access_mask && !scope_mask && !perm_mask) return ERR_PTR(-ENOMSG); new_ruleset =3D create_ruleset(1); if (IS_ERR(new_ruleset)) @@ -72,6 +71,8 @@ landlock_create_ruleset(const access_mask_t fs_access_mas= k, landlock_add_net_access_mask(new_ruleset, net_access_mask, 0); if (scope_mask) landlock_add_scope_mask(new_ruleset, scope_mask, 0); + if (perm_mask) + landlock_add_perm_mask(new_ruleset, perm_mask, 0); return new_ruleset; } =20 diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h index 900c47eb0216..747261391c00 100644 --- a/security/landlock/ruleset.h +++ b/security/landlock/ruleset.h @@ -190,10 +190,9 @@ struct landlock_ruleset { }; }; =20 -struct landlock_ruleset * -landlock_create_ruleset(const access_mask_t access_mask_fs, - const access_mask_t access_mask_net, - const access_mask_t scope_mask); +struct landlock_ruleset *landlock_create_ruleset( + const access_mask_t access_mask_fs, const access_mask_t access_mask_net, + const access_mask_t scope_mask, const access_mask_t perm_mask); =20 void landlock_put_ruleset(struct landlock_ruleset *const ruleset); void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset); @@ -303,6 +302,24 @@ landlock_get_scope_mask(const struct landlock_ruleset = *const ruleset, return ruleset->layers[layer_level].handled.scope; } =20 +static inline void +landlock_add_perm_mask(struct landlock_ruleset *const ruleset, + const access_mask_t perm_mask, const u16 layer_level) +{ + access_mask_t mask =3D perm_mask & LANDLOCK_MASK_PERM; + + /* Should already be checked in sys_landlock_create_ruleset(). */ + WARN_ON_ONCE(perm_mask !=3D mask); + ruleset->layers[layer_level].handled.perm |=3D mask; +} + +static inline access_mask_t +landlock_get_perm_mask(const struct landlock_ruleset *const ruleset, + const u16 layer_level) +{ + return ruleset->layers[layer_level].handled.perm; +} + bool landlock_unmask_layers(const struct landlock_rule *const rule, struct layer_access_masks *masks); =20 diff --git a/security/landlock/setup.c b/security/landlock/setup.c index 47dac1736f10..a7ed776b41b4 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -17,6 +17,7 @@ #include "fs.h" #include "id.h" #include "net.h" +#include "ns.h" #include "setup.h" #include "task.h" =20 @@ -68,6 +69,7 @@ static int __init landlock_init(void) landlock_add_task_hooks(); landlock_add_fs_hooks(); landlock_add_net_hooks(); + landlock_add_ns_hooks(); landlock_init_id(); landlock_initialized =3D true; pr_info("Up and running.\n"); diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c index 2aa7b50d875f..152d952e98f6 100644 --- a/security/landlock/syscalls.c +++ b/security/landlock/syscalls.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -34,6 +35,7 @@ #include "fs.h" #include "limits.h" #include "net.h" +#include "ns.h" #include "ruleset.h" #include "setup.h" #include "tsync.h" @@ -95,7 +97,9 @@ static void build_check_abi(void) struct landlock_ruleset_attr ruleset_attr; struct landlock_path_beneath_attr path_beneath_attr; struct landlock_net_port_attr net_port_attr; + struct landlock_namespace_attr namespace_attr; size_t ruleset_size, path_beneath_size, net_port_size; + size_t namespace_size; =20 /* * For each user space ABI structures, first checks that there is no @@ -105,8 +109,9 @@ static void build_check_abi(void) ruleset_size =3D sizeof(ruleset_attr.handled_access_fs); ruleset_size +=3D sizeof(ruleset_attr.handled_access_net); ruleset_size +=3D sizeof(ruleset_attr.scoped); + ruleset_size +=3D sizeof(ruleset_attr.handled_perm); BUILD_BUG_ON(sizeof(ruleset_attr) !=3D ruleset_size); - BUILD_BUG_ON(sizeof(ruleset_attr) !=3D 24); + BUILD_BUG_ON(sizeof(ruleset_attr) !=3D 32); =20 path_beneath_size =3D sizeof(path_beneath_attr.allowed_access); path_beneath_size +=3D sizeof(path_beneath_attr.parent_fd); @@ -117,6 +122,11 @@ static void build_check_abi(void) net_port_size +=3D sizeof(net_port_attr.port); BUILD_BUG_ON(sizeof(net_port_attr) !=3D net_port_size); BUILD_BUG_ON(sizeof(net_port_attr) !=3D 16); + + namespace_size =3D sizeof(namespace_attr.allowed_perm); + namespace_size +=3D sizeof(namespace_attr.namespace_types); + BUILD_BUG_ON(sizeof(namespace_attr) !=3D namespace_size); + BUILD_BUG_ON(sizeof(namespace_attr) !=3D 16); } =20 /* Ruleset handling */ @@ -166,7 +176,7 @@ static const struct file_operations ruleset_fops =3D { * If the change involves a fix that requires userspace awareness, also up= date * the errata documentation in Documentation/userspace-api/landlock.rst . */ -const int landlock_abi_version =3D 8; +const int landlock_abi_version =3D 9; =20 /** * sys_landlock_create_ruleset - Create a new ruleset @@ -249,10 +259,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset, if ((ruleset_attr.scoped | LANDLOCK_MASK_SCOPE) !=3D LANDLOCK_MASK_SCOPE) return -EINVAL; =20 + /* Checks permission content (and 32-bits cast). */ + if ((ruleset_attr.handled_perm | LANDLOCK_MASK_PERM) !=3D + LANDLOCK_MASK_PERM) + return -EINVAL; + /* Checks arguments and transforms to kernel struct. */ ruleset =3D landlock_create_ruleset(ruleset_attr.handled_access_fs, ruleset_attr.handled_access_net, - ruleset_attr.scoped); + ruleset_attr.scoped, + ruleset_attr.handled_perm); if (IS_ERR(ruleset)) return PTR_ERR(ruleset); =20 @@ -390,13 +406,57 @@ static int add_rule_net_port(struct landlock_ruleset = *ruleset, net_port_attr.allowed_access); } =20 +static int add_rule_namespace(struct landlock_ruleset *const ruleset, + const void __user *const rule_attr) +{ + struct landlock_namespace_attr ns_attr; + int res; + access_mask_t mask; + + /* Copies raw user space buffer. */ + res =3D copy_from_user(&ns_attr, rule_attr, sizeof(ns_attr)); + if (res) + return -EFAULT; + + /* Informs about useless rule: empty allowed_perm. */ + if (!ns_attr.allowed_perm) + return -ENOMSG; + + /* The allowed_perm must match LANDLOCK_PERM_NAMESPACE_ENTER. */ + if (ns_attr.allowed_perm !=3D LANDLOCK_PERM_NAMESPACE_ENTER) + return -EINVAL; + + /* Checks that allowed_perm matches the @ruleset constraints. */ + mask =3D landlock_get_perm_mask(ruleset, 0); + if (!(mask & LANDLOCK_PERM_NAMESPACE_ENTER)) + return -EINVAL; + + /* Informs about useless rule: empty namespace_types. */ + if (!ns_attr.namespace_types) + return -ENOMSG; + + /* + * Stores only the namespace types this kernel knows about. + * Unknown bits are silently accepted for forward compatibility: + * user space compiled against newer headers can pass new + * CLONE_NEW* flags without getting EINVAL on older kernels. + * Unknown bits have no effect because no hook checks them. + */ + mutex_lock(&ruleset->lock); + ruleset->layers[0].allowed.ns |=3D landlock_ns_types_to_bits( + ns_attr.namespace_types & CLONE_NS_ALL); + mutex_unlock(&ruleset->lock); + return 0; +} + /** * sys_landlock_add_rule - Add a new rule to a ruleset * * @ruleset_fd: File descriptor tied to the ruleset that should be extended * with the new rule. * @rule_type: Identify the structure type pointed to by @rule_attr: - * %LANDLOCK_RULE_PATH_BENEATH or %LANDLOCK_RULE_NET_PORT. + * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or + * %LANDLOCK_RULE_NAMESPACE. * @rule_attr: Pointer to a rule (matching the @rule_type). * @flags: Must be 0. * @@ -446,6 +506,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_f= d, return add_rule_path_beneath(ruleset, rule_attr); case LANDLOCK_RULE_NET_PORT: return add_rule_net_port(ruleset, rule_attr); + case LANDLOCK_RULE_NAMESPACE: + return add_rule_namespace(ruleset, rule_attr); default: return -EINVAL; } diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/s= elftests/landlock/base_test.c index 0fea236ef4bd..30d37234086c 100644 --- a/tools/testing/selftests/landlock/base_test.c +++ b/tools/testing/selftests/landlock/base_test.c @@ -76,7 +76,7 @@ TEST(abi_version) const struct landlock_ruleset_attr ruleset_attr =3D { .handled_access_fs =3D LANDLOCK_ACCESS_FS_READ_FILE, }; - ASSERT_EQ(8, landlock_create_ruleset(NULL, 0, + ASSERT_EQ(9, landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION)); =20 ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0, --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-8fac.mail.infomaniak.ch (smtp-8fac.mail.infomaniak.ch [83.166.143.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E74DD347507 for ; Thu, 12 Mar 2026 10:14:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=83.166.143.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310491; cv=none; b=B49vhCPlIld0ojR0U0pNgB6Dl1LCnDXzYDiRJhCXX809ghkWG+bLAQ5oe6QZ8zc6P2aMlhSANfEPa8N/Jvi7HBSXmb8dBw1eeKAai9CtVrNeZqSI9Io8nRUD0L3gepAMf9yyQmBsCQkbXspQDUPbpMVQY9a73Ic11G8NB5EU8A0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310491; c=relaxed/simple; bh=SZhr6EkEdi4nJlRHNXcYGAF/KDnpNZXC6fG4YwBCNj0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=suYiSrvukwADSMlNCay0jQOdeLOcb8MgUFOnb3Hw8C/BS+Qhkd9H/alGnLFWkGh7DRrQswriYg2vB5ldJhD/UoD43TyAGfyKDl6kapPGJjfNHtAU8ODuOxyKvFySEvkjy9dtpMk1EUEVtoCGjX9pMMxaLvwbdKypX83sA86DK7k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=dk7TVQys; arc=none smtp.client-ip=83.166.143.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="dk7TVQys" Received: from smtp-3-0001.mail.infomaniak.ch (smtp-3-0001.mail.infomaniak.ch [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsm3G3qzB98; Thu, 12 Mar 2026 11:05:16 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309916; bh=RyM3fOjZ2E9hDyjrGtvii5q6YbgiepGquJ5pTciMM7A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dk7TVQys2ODJ0WI8NBmpfE5XrC6XEIoBNxQVw8VgCN3eAHqL+yiMHDq86J2+XwJM/ 1wnE0GISXB7KekzIlONfjJp8OtkHF+P75i+fCcVLiVJs+VxDoXuV8JoPdMRTznWjyC nkkuovNLXFM4h0BOYXY2CnX7AFzn9fxwOA7vDniM= Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsl6HVqzB6Y; Thu, 12 Mar 2026 11:05:15 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 06/11] landlock: Enforce capability restrictions Date: Thu, 12 Mar 2026 11:04:39 +0100 Message-ID: <20260312100444.2609563-7-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Add Landlock enforcement for capability use via the LSM capable hook. This lets a sandboxed process restrict which Linux capabilities it can exercise, using LANDLOCK_PERM_CAPABILITY_USE and per-capability rules. The capable hook is purely restrictive: it runs after cap_capable() (LSM_ORDER_FIRST), so it can deny capabilities that commoncap would allow, but it can never grant capabilities that commoncap denied. Add hook_capable() that uses landlock_perm_is_denied() to perform a pure bitmask check: if the capability is not in the layer's allowed set, the check is denied. No domain ancestry bypass, no cross-namespace discriminant, just a flat per-layer allowed-caps bitmask, matching the same pattern used by LANDLOCK_PERM_NAMESPACE_ENTER. Adding the 41-bit capability bitfield to struct perm_rules brings it to 49 out of 64 bits used (41 caps + 8 namespace types, 15 bits padding), keeping struct layer_rights at 16 bytes (8 bytes perm_rules + 4 bytes access_masks + 4 bytes tail padding) and the layers[] array at 256 bytes maximum. The caps bitfield is placed first in struct perm_rules (before the ns bitfield) because capabilities use a direct BIT_ULL(cap) mapping that benefits from starting at bit 0 of the storage unit. Non-user namespace operations require both LANDLOCK_PERM_NAMESPACE_ENTER (type allowed) and LANDLOCK_PERM_CAPABILITY_USE (CAP_SYS_ADMIN allowed) when both permissions are handled. This follows naturally from the kernel calling capable(CAP_SYS_ADMIN) before namespace operations: both hooks fire independently and audit logs identify which permission was denied. The enforcement is purely at exercise time via the capable hook, not by modifying the credential's capability sets. Stripping denied capabilities would give processes an accurate capget(2) view of their usable capabilities, but no LSM other than commoncap modifies capability sets; Landlock follows this convention and restricts use without altering what the process holds. A sandboxed process inside a user namespace will see all capabilities via capget(2) but will receive -EPERM when attempting to use any denied capability. Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- include/uapi/linux/landlock.h | 31 ++++++++ security/landlock/Makefile | 1 + security/landlock/access.h | 15 +++- security/landlock/audit.c | 4 + security/landlock/audit.h | 1 + security/landlock/cap.c | 142 ++++++++++++++++++++++++++++++++++ security/landlock/cap.h | 49 ++++++++++++ security/landlock/cred.h | 3 + security/landlock/limits.h | 4 +- security/landlock/setup.c | 2 + security/landlock/syscalls.c | 58 +++++++++++++- 11 files changed, 302 insertions(+), 8 deletions(-) create mode 100644 security/landlock/cap.c create mode 100644 security/landlock/cap.h diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index b76e656241df..0e73be459d47 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -166,6 +166,11 @@ enum landlock_rule_type { * landlock_namespace_attr . */ LANDLOCK_RULE_NAMESPACE, + /** + * @LANDLOCK_RULE_CAPABILITY: Type of a &struct + * landlock_capability_attr . + */ + LANDLOCK_RULE_CAPABILITY, }; =20 /** @@ -237,6 +242,24 @@ struct landlock_namespace_attr { __u64 namespace_types; }; =20 +/** + * struct landlock_capability_attr - Capability definition + * + * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_CAPABILITY. + */ +struct landlock_capability_attr { + /** + * @allowed_perm: Must be set to %LANDLOCK_PERM_CAPABILITY_USE. + */ + __u64 allowed_perm; + /** + * @capabilities: Bitmask of capabilities (``1ULL << CAP_*``) that + * should be allowed for use under this rule. Bits above + * ``CAP_LAST_CAP`` are silently ignored for forward compatibility. + */ + __u64 capabilities; +}; + /** * DOC: fs_access * @@ -432,9 +455,17 @@ struct landlock_namespace_attr { * Landlock domain that handles this permission is denied from entering * namespace types that are not explicitly allowed by a * %LANDLOCK_RULE_NAMESPACE rule. + * - %LANDLOCK_PERM_CAPABILITY_USE: Restrict the use of specific Linux + * capabilities. A process in a Landlock domain that handles this + * permission is denied from exercising capabilities that are not + * explicitly allowed by a %LANDLOCK_RULE_CAPABILITY rule. This hook + * is purely restrictive: it can deny capabilities that the kernel + * would otherwise grant, but it can never grant capabilities that the + * kernel already denied. */ /* clang-format off */ #define LANDLOCK_PERM_NAMESPACE_ENTER (1ULL << 0) +#define LANDLOCK_PERM_CAPABILITY_USE (1ULL << 1) /* clang-format on */ =20 #endif /* _UAPI_LINUX_LANDLOCK_H */ diff --git a/security/landlock/Makefile b/security/landlock/Makefile index 734aed4ac1bf..63311d556f93 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -9,6 +9,7 @@ landlock-y :=3D \ task.o \ fs.o \ ns.o \ + cap.o \ tsync.o =20 landlock-$(CONFIG_INET) +=3D net.o diff --git a/security/landlock/access.h b/security/landlock/access.h index 9c67987a77ae..65227b3064db 100644 --- a/security/landlock/access.h +++ b/security/landlock/access.h @@ -72,6 +72,13 @@ static_assert(sizeof(typeof_member(union access_masks_al= l, masks)) =3D=3D * a single 64-bit storage unit. */ struct perm_rules { + /** + * @caps: Allowed capabilities. Each bit corresponds to a + * ``CAP_*`` value (e.g. ``CAP_NET_RAW`` =3D bit 13). Bits are + * stored directly (sequential mapping) and masked with + * ``CAP_VALID_MASK`` at rule-add time. + */ + u64 caps : LANDLOCK_NUM_PERM_CAP; /** * @ns: Allowed namespace types. Each bit corresponds to a * sequential index assigned by the ``_LANDLOCK_NS_*`` enum @@ -93,10 +100,10 @@ static_assert(sizeof(struct perm_rules) =3D=3D sizeof(= u64)); * landlock_ruleset.layers FAM. * * Unlike filesystem and network access rights, which are tracked per-obje= ct - * in red-black trees, namespace types use a flat bitmask because their - * keyspace is small and bounded (~8 namespace types). A single rule adds - * to the allowed set via bitwise OR; at enforcement time each layer is - * checked directly (no tree lookup needed). + * in red-black trees, namespace types and capabilities use flat bitmasks + * because their keyspaces are small and bounded (~8 namespace types, 41 + * capabilities). A single rule adds to the allowed set via bitwise OR; at + * enforcement time each layer is checked directly (no tree lookup needed). */ struct layer_rights { /** diff --git a/security/landlock/audit.c b/security/landlock/audit.c index 46a635893914..24b7800ec479 100644 --- a/security/landlock/audit.c +++ b/security/landlock/audit.c @@ -82,6 +82,10 @@ get_blocker(const enum landlock_request_type type, case LANDLOCK_REQUEST_NAMESPACE: WARN_ON_ONCE(access_bit !=3D -1); return "perm.namespace_enter"; + + case LANDLOCK_REQUEST_CAPABILITY: + WARN_ON_ONCE(access_bit !=3D -1); + return "perm.capability_use"; } =20 WARN_ON_ONCE(1); diff --git a/security/landlock/audit.h b/security/landlock/audit.h index e9e52fb628f5..fe5d701ea45d 100644 --- a/security/landlock/audit.h +++ b/security/landlock/audit.h @@ -22,6 +22,7 @@ enum landlock_request_type { LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET, LANDLOCK_REQUEST_SCOPE_SIGNAL, LANDLOCK_REQUEST_NAMESPACE, + LANDLOCK_REQUEST_CAPABILITY, }; =20 /* diff --git a/security/landlock/cap.c b/security/landlock/cap.c new file mode 100644 index 000000000000..536e579f63a9 --- /dev/null +++ b/security/landlock/cap.c @@ -0,0 +1,142 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock - Capability hooks + * + * Copyright =C2=A9 2026 Cloudflare + */ + +#include +#include +#include +#include +#include + +#include "audit.h" +#include "cap.h" +#include "cred.h" +#include "limits.h" +#include "ruleset.h" +#include "setup.h" + +static const struct access_masks cap_perm =3D { + .perm =3D LANDLOCK_PERM_CAPABILITY_USE, +}; + +/** + * hook_capable - Deny capability use for Landlock-sandboxed processes + * + * @cred: Credentials being checked. + * @ns: User namespace for the capability check. + * @cap: Capability number (CAP_*). + * @opts: Capability check options. CAP_OPT_NOAUDIT suppresses audit logg= ing. + * + * Pure bitmask check: denies the capability if it is not in the layer's + * allowed set. This hook is purely restrictive: it runs after + * cap_capable() (LSM_ORDER_FIRST), so it can deny capabilities that + * commoncap would allow, but it can never grant capabilities that + * commoncap denied. + * + * Return: 0 if allowed, -EPERM if capability use is denied. + */ +static int hook_capable(const struct cred *cred, struct user_namespace *ns, + int cap, unsigned int opts) +{ + const struct landlock_cred_security *subject; + size_t denied_layer; + + subject =3D landlock_get_applicable_subject(cred, cap_perm, NULL); + if (!subject) + return 0; + + denied_layer =3D landlock_perm_is_denied(subject->domain, + LANDLOCK_PERM_CAPABILITY_USE, + landlock_cap_to_bit(cap)); + if (!denied_layer) + return 0; + + /* + * Respects CAP_OPT_NOAUDIT to suppress audit records for + * capability probes (e.g., ns_capable_noaudit(), + * has_capability_noaudit()). + */ + if (!(opts & CAP_OPT_NOAUDIT)) + landlock_log_denial(subject, + &(struct landlock_request){ + .type =3D LANDLOCK_REQUEST_CAPABILITY, + .audit.type =3D LSM_AUDIT_DATA_CAP, + .audit.u.cap =3D cap, + .layer_plus_one =3D denied_layer, + }); + + return -EPERM; +} + +static struct security_hook_list landlock_hooks[] __ro_after_init =3D { + LSM_HOOK_INIT(capable, hook_capable), +}; + +__init void landlock_add_cap_hooks(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + &landlock_lsmid); +} + +#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST + +#include + +static void test_cap_to_bit(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, BIT_ULL(0), landlock_cap_to_bit(0)); + KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW), + landlock_cap_to_bit(CAP_NET_RAW)); + KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_SYS_ADMIN), + landlock_cap_to_bit(CAP_SYS_ADMIN)); + KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_LAST_CAP), + landlock_cap_to_bit(CAP_LAST_CAP)); +} + +static void test_cap_to_bit_invalid(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(-1)); + KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(CAP_LAST_CAP + 1)); +} + +static void test_caps_to_bits_valid(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, (u64)CAP_VALID_MASK, + landlock_caps_to_bits(CAP_VALID_MASK)); + KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW), + landlock_caps_to_bits(BIT_ULL(CAP_NET_RAW))); +} + +static void test_caps_to_bits_unknown(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, 0ULL, + landlock_caps_to_bits(BIT_ULL(CAP_LAST_CAP + 1))); +} + +static void test_caps_to_bits_zero(struct kunit *const test) +{ + KUNIT_EXPECT_EQ(test, 0ULL, landlock_caps_to_bits(0)); +} + +static struct kunit_case test_cases[] =3D { + /* clang-format off */ + KUNIT_CASE(test_cap_to_bit), + KUNIT_CASE(test_cap_to_bit_invalid), + KUNIT_CASE(test_caps_to_bits_valid), + KUNIT_CASE(test_caps_to_bits_unknown), + KUNIT_CASE(test_caps_to_bits_zero), + {} + /* clang-format on */ +}; + +static struct kunit_suite test_suite =3D { + .name =3D "landlock_cap", + .test_cases =3D test_cases, +}; + +kunit_test_suite(test_suite); + +#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */ diff --git a/security/landlock/cap.h b/security/landlock/cap.h new file mode 100644 index 000000000000..334b6974fb95 --- /dev/null +++ b/security/landlock/cap.h @@ -0,0 +1,49 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock - Capability hooks + * + * Copyright =C2=A9 2026 Cloudflare + */ + +#ifndef _SECURITY_LANDLOCK_CAP_H +#define _SECURITY_LANDLOCK_CAP_H + +#include +#include +#include +#include +#include + +/** + * landlock_cap_to_bit - Convert a capability number to a compact bitmask + * + * @cap: Capability number (CAP_*). + * + * Return: BIT_ULL(@cap), or 0 if @cap is invalid (with a WARN). + */ +static inline __attribute_const__ u64 landlock_cap_to_bit(const int cap) +{ + if (WARN_ON_ONCE(!cap_valid(cap))) + return 0; + + return BIT_ULL(cap); +} + +/** + * landlock_caps_to_bits - Validate and mask a capability bitmask + * + * @capabilities: Bitmask of capabilities (e.g. from user space). + * + * Return: @capabilities masked to known capabilities. Warns if unknown + * bits are present (callers must pre-mask for user input). + */ +static inline __attribute_const__ u64 +landlock_caps_to_bits(const u64 capabilities) +{ + WARN_ON_ONCE(capabilities & ~CAP_VALID_MASK); + return capabilities & CAP_VALID_MASK; +} + +__init void landlock_add_cap_hooks(void); + +#endif /* _SECURITY_LANDLOCK_CAP_H */ diff --git a/security/landlock/cred.h b/security/landlock/cred.h index 68067ff53ead..257197facbae 100644 --- a/security/landlock/cred.h +++ b/security/landlock/cred.h @@ -184,6 +184,9 @@ landlock_perm_is_denied(const struct landlock_ruleset *= const domain, case LANDLOCK_PERM_NAMESPACE_ENTER: allowed =3D domain->layers[layer].allowed.ns; break; + case LANDLOCK_PERM_CAPABILITY_USE: + allowed =3D domain->layers[layer].allowed.caps; + break; default: WARN_ON_ONCE(1); return layer + 1; diff --git a/security/landlock/limits.h b/security/landlock/limits.h index e361b653fcf5..43e832c0deb0 100644 --- a/security/landlock/limits.h +++ b/security/landlock/limits.h @@ -11,6 +11,7 @@ #define _SECURITY_LANDLOCK_LIMITS_H =20 #include +#include #include #include #include @@ -32,11 +33,12 @@ #define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1) #define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE) =20 -#define LANDLOCK_LAST_PERM LANDLOCK_PERM_NAMESPACE_ENTER +#define LANDLOCK_LAST_PERM LANDLOCK_PERM_CAPABILITY_USE #define LANDLOCK_MASK_PERM ((LANDLOCK_LAST_PERM << 1) - 1) #define LANDLOCK_NUM_PERM __const_hweight64(LANDLOCK_MASK_PERM) =20 #define LANDLOCK_NUM_PERM_NS __const_hweight64((u64)(CLONE_NS_ALL)) +#define LANDLOCK_NUM_PERM_CAP (CAP_LAST_CAP + 1) =20 #define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC #define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - = 1) diff --git a/security/landlock/setup.c b/security/landlock/setup.c index a7ed776b41b4..971419d663bb 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -11,6 +11,7 @@ #include #include =20 +#include "cap.h" #include "common.h" #include "cred.h" #include "errata.h" @@ -70,6 +71,7 @@ static int __init landlock_init(void) landlock_add_fs_hooks(); landlock_add_net_hooks(); landlock_add_ns_hooks(); + landlock_add_cap_hooks(); landlock_init_id(); landlock_initialized =3D true; pr_info("Up and running.\n"); diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c index 152d952e98f6..38a4bf92781a 100644 --- a/security/landlock/syscalls.c +++ b/security/landlock/syscalls.c @@ -30,6 +30,7 @@ #include #include =20 +#include "cap.h" #include "cred.h" #include "domain.h" #include "fs.h" @@ -98,8 +99,9 @@ static void build_check_abi(void) struct landlock_path_beneath_attr path_beneath_attr; struct landlock_net_port_attr net_port_attr; struct landlock_namespace_attr namespace_attr; + struct landlock_capability_attr capability_attr; size_t ruleset_size, path_beneath_size, net_port_size; - size_t namespace_size; + size_t namespace_size, capability_size; =20 /* * For each user space ABI structures, first checks that there is no @@ -127,6 +129,11 @@ static void build_check_abi(void) namespace_size +=3D sizeof(namespace_attr.namespace_types); BUILD_BUG_ON(sizeof(namespace_attr) !=3D namespace_size); BUILD_BUG_ON(sizeof(namespace_attr) !=3D 16); + + capability_size =3D sizeof(capability_attr.allowed_perm); + capability_size +=3D sizeof(capability_attr.capabilities); + BUILD_BUG_ON(sizeof(capability_attr) !=3D capability_size); + BUILD_BUG_ON(sizeof(capability_attr) !=3D 16); } =20 /* Ruleset handling */ @@ -449,14 +456,57 @@ static int add_rule_namespace(struct landlock_ruleset= *const ruleset, return 0; } =20 +static int add_rule_capability(struct landlock_ruleset *const ruleset, + const void __user *const rule_attr) +{ + struct landlock_capability_attr cap_attr; + int res; + access_mask_t mask; + + /* Copies raw user space buffer. */ + res =3D copy_from_user(&cap_attr, rule_attr, sizeof(cap_attr)); + if (res) + return -EFAULT; + + /* Informs about useless rule: empty allowed_perm. */ + if (!cap_attr.allowed_perm) + return -ENOMSG; + + /* The allowed_perm must match LANDLOCK_PERM_CAPABILITY_USE. */ + if (cap_attr.allowed_perm !=3D LANDLOCK_PERM_CAPABILITY_USE) + return -EINVAL; + + /* Checks that allowed_perm matches the @ruleset constraints. */ + mask =3D landlock_get_perm_mask(ruleset, 0); + if (!(mask & LANDLOCK_PERM_CAPABILITY_USE)) + return -EINVAL; + + /* Informs about useless rule: empty capabilities. */ + if (!cap_attr.capabilities) + return -ENOMSG; + + /* + * Stores only the capabilities this kernel knows about. + * Unknown bits are silently accepted for forward compatibility: + * user space compiled against newer headers can pass new + * CAP_* bits without getting EINVAL on older kernels. + * Unknown bits have no effect because no hook checks them. + */ + mutex_lock(&ruleset->lock); + ruleset->layers[0].allowed.caps |=3D + landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK); + mutex_unlock(&ruleset->lock); + return 0; +} + /** * sys_landlock_add_rule - Add a new rule to a ruleset * * @ruleset_fd: File descriptor tied to the ruleset that should be extended * with the new rule. * @rule_type: Identify the structure type pointed to by @rule_attr: - * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or - * %LANDLOCK_RULE_NAMESPACE. + * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, + * %LANDLOCK_RULE_NAMESPACE, or %LANDLOCK_RULE_CAPABILITY. * @rule_attr: Pointer to a rule (matching the @rule_type). * @flags: Must be 0. * @@ -508,6 +558,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_f= d, return add_rule_net_port(ruleset, rule_attr); case LANDLOCK_RULE_NAMESPACE: return add_rule_namespace(ruleset, rule_attr); + case LANDLOCK_RULE_CAPABILITY: + return add_rule_capability(ruleset, rule_attr); default: return -EINVAL; } --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-bc0f.mail.infomaniak.ch (smtp-bc0f.mail.infomaniak.ch [45.157.188.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 398423BE650 for ; Thu, 12 Mar 2026 10:13:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.157.188.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310445; cv=none; b=G9iM7Ip68WUZdQ6EKilRW50cIeiSDv0oxQ9CJBV5sgBl7NjOCJTGm+HXiSI6Fq0AzOX8ypv56Zuvw8MUxv3xAwZPzBMma+RfCmoeXDJlbzJ9z34yZqmbOuIlb+owfc6IXMtYcrXUFLted04ZWfCwFjKzDOk8UdXyOfcPQHqyCDM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310445; c=relaxed/simple; bh=i2d7dw+TCLA+boP78ErxydkDAcv2IhHH8PUa3oCh1LA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ijVClst6CSrcotfwuZmPy/sQdwUYKNcxoQ4AY6UFUQdAv45cjjWDi/jaBpD3OXOxTcTUql/DIHiLC9oFuAcZFijniliSXW+RLGxdGpWuslWCaeOT/GUIs2/EeWR36oVCm+cfHqPFbW/jOv++PvBZ+9/ah+YgZMox/ki6bnvAeUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=jqOA1yJ5; arc=none smtp.client-ip=45.157.188.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="jqOA1yJ5" Received: from smtp-4-0001.mail.infomaniak.ch (unknown [IPv6:2001:1600:7:10::a6c]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsn5flVz1BG0; Thu, 12 Mar 2026 11:05:17 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309917; bh=fD6HF1lmUl6p9lC1inUdLVxbMt2reJMKjj9GnF2j1AI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jqOA1yJ5H2xTOJuPiahLQplZx0UN/fOvmyUVZ8EKHeRsefDW4ACg5TOTtIclSTltz BuT6slqEVLIGRie3I22Wm/Vw3t8Bi9g4/n5aAz0nXFzqrtjKW/XXlweoHoZdHPWqeG qmoCoakJGi1LQW/Bt8QlK/CRe+bwfT1rgT3qE0hc= Received: from unknown by smtp-4-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsn1KyRzVV; Thu, 12 Mar 2026 11:05:17 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init Date: Thu, 12 Mar 2026 11:04:40 +0100 Message-ID: <20260312100444.2609563-8-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Non-audit Landlock tests generate audit records as side effects when audit_enabled is non-zero (e.g. from boot configuration). These records accumulate in the kernel audit backlog while no audit daemon socket is open. When the next test opens a new netlink socket and registers as the audit daemon, the stale backlog is delivered, causing baseline record count checks to fail spuriously. Fix this by draining all pending records in audit_init() right after setting the receive timeout. The 1-usec SO_RCVTIMEO causes audit_recv() to return -EAGAIN once the backlog is empty, naturally terminating the drain loop. Domain deallocation records are emitted asynchronously from a work queue, so they may still arrive after the drain. Remove records.domain =3D=3D 0 checks from tests where a stale deallocation record from a previous test could cause spurious failures. Also fix a socket file descriptor leak on error paths in audit_init(): if audit_set_status() or setsockopt() fails (e.g. when another audit daemon is already registered), close the socket before returning. Fix off-by-one checks in matches_log_domain_allocated() and matches_log_domain_deallocated() where snprintf() truncation was detected with ">" instead of ">=3D" (snprintf() returns the length excluding the NUL terminator, so equality means truncation). Cc: G=C3=BCnther Noack Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and dom= ain IDs") Signed-off-by: Micka=C3=ABl Sala=C3=BCn Reviewed-by: G=C3=BCnther Noack --- tools/testing/selftests/landlock/audit.h | 29 +++++++++++++++---- tools/testing/selftests/landlock/audit_test.c | 2 -- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selft= ests/landlock/audit.h index 44eb433e9666..550acaafcc1e 100644 --- a/tools/testing/selftests/landlock/audit.h +++ b/tools/testing/selftests/landlock/audit.h @@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(= int audit_fd, pid_t pid, =20 log_match_len =3D snprintf(log_match, sizeof(log_match), log_template, pid); - if (log_match_len > sizeof(log_match)) + if (log_match_len >=3D sizeof(log_match)) return -E2BIG; =20 return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match, @@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocate= d( =20 log_match_len =3D snprintf(log_match, sizeof(log_match), log_template, num_denials); - if (log_match_len > sizeof(log_match)) + if (log_match_len >=3D sizeof(log_match)) return -E2BIG; =20 return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match, @@ -379,19 +379,36 @@ static int audit_init(void) =20 err =3D audit_set_status(fd, AUDIT_STATUS_ENABLED, 1); if (err) - return err; + goto err_close; =20 err =3D audit_set_status(fd, AUDIT_STATUS_PID, getpid()); if (err) - return err; + goto err_close; =20 /* Sets a timeout for negative tests. */ err =3D setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default, sizeof(audit_tv_default)); - if (err) - return -errno; + if (err) { + err =3D -errno; + goto err_close; + } + + /* + * Drains stale audit records that accumulated in the kernel backlog + * while no audit daemon socket was open. This happens when + * non-audit Landlock tests create domains or trigger denials while + * audit_enabled is non-zero (e.g. from boot configuration), or when + * domain deallocation records arrive asynchronously after a + * previous test's socket was closed. + */ + while (audit_recv(fd, NULL) =3D=3D 0) + ; =20 return fd; + +err_close: + close(fd); + return err; } =20 static int audit_init_filter_exe(struct audit_filter *filter, const char *= path) diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/= selftests/landlock/audit_test.c index 46d02d49835a..f92ba6774faa 100644 --- a/tools/testing/selftests/landlock/audit_test.c +++ b/tools/testing/selftests/landlock/audit_test.c @@ -412,7 +412,6 @@ TEST_F(audit_flags, signal) } else { EXPECT_EQ(1, records.access); } - EXPECT_EQ(0, records.domain); =20 /* Updates filter rules to match the drop record. */ set_cap(_metadata, CAP_AUDIT_CONTROL); @@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open) /* Tests that there was no denial until now. */ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); EXPECT_EQ(0, records.access); - EXPECT_EQ(0, records.domain); =20 /* * Wait for the child to do a first denied action by layer1 and --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-bc09.mail.infomaniak.ch (smtp-bc09.mail.infomaniak.ch [45.157.188.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C0A23BED2A for ; Thu, 12 Mar 2026 10:05:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.157.188.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309925; cv=none; b=PS1wDuvZXLdBw9m43/EvqOzkYjcyiJrftqx6p0Ne7LvClf9WUPVP20VK08gAFTSNsE0J17svUWQTfdsLY2XtBXRCyLTUxqbwnyGb2k6072W8ADrl0u1kdgi11C/1/gau8rCXddFjFWvSwN+EBXmlJFT9HSMJvL9Inek+JRcc5lw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309925; c=relaxed/simple; bh=k4T11ArUUq4m3vRpCUPQ0LJ4dXcSeCNqqOCI3eakTzc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=h/aFHwaJxzPwTbE134bRfjhmbOsjEfV7Rm3ebPSvVqoOulvhcvu6SS7e2fyRm89iAVZkB3PqS5YUrahWCm3iXb2jqoJbBXvMSlkKmQ+NQpKA6mUF4XPUBo4vpgIa03CR5gqkxbXw6geHvX8I5Esg1UfGIC9w1kJGCxDoju7SE+k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=dGVfcKJl; arc=none smtp.client-ip=45.157.188.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="dGVfcKJl" Received: from smtp-4-0000.mail.infomaniak.ch (smtp-4-0000.mail.infomaniak.ch [10.7.10.107]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsq1jQJz1BM1; Thu, 12 Mar 2026 11:05:19 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309919; bh=ObW/Kbd3gF3fQaP496NjUW0pNPXP3EeFyVOjG6RLOeI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dGVfcKJlaL89Zdkt7jCnytAvXqJoMIIlXmRKqhplVVvVTBUYFT1uPn1RswI/3CsUG 1ZTrfnk/0AVcgx+yxLQtizlpnDziD65E3JFY5tYL5NkEbO+nZaL4rCSbOzsnV7tq5f Acmvy2YyAr+CLQ2zXyxyNLoVeenM2ZFFdrp/1JK8= Received: from unknown by smtp-4-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsp4KjWz8jm; Thu, 12 Mar 2026 11:05:18 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests Date: Thu, 12 Mar 2026 11:04:41 +0100 Message-ID: <20260312100444.2609563-9-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Add tests covering the two namespace-related Landlock permission types: LANDLOCK_PERM_NAMESPACE_ENTER (namespace creation via unshare/clone and namespace entry via setns) and its interaction with LANDLOCK_PERM_CAPABILITY_USE. Rule validation tests verify that the kernel correctly accepts known CLONE_NEW* types, silently accepts unknown bits (including holes, upper-range bits, and bit 63) for forward compatibility, and rejects an empty namespace_types bitmask. Invalid allowed_perm combinations and non-zero flags are also covered. Namespace creation tests use FIXTURE_VARIANT to exercise all eight namespace types (user, UTS, IPC, mount, cgroup, PID, network, time) across allowed/denied and privileged/unprivileged combinations. This verifies that security_namespace_alloc() is correctly called for every type. Layer stacking tests verify that any-layer-denies semantics work correctly, including the allow-over-allow case. A combined test exercises both LANDLOCK_PERM_CAPABILITY_USE and LANDLOCK_PERM_NAMESPACE_ENTER in a single domain. Namespace entry tests verify that setns is subject to the same type-based LANDLOCK_PERM_NAMESPACE_ENTER check via security_namespace_install(), including cross-process setns denial and the two-permission interaction where both LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE must allow the operation for non-user namespaces. Audit tests verify that denied namespace creation, denied setns entry, and allowed operations produce the expected audit records (or none). Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- tools/testing/selftests/landlock/common.h | 23 + tools/testing/selftests/landlock/config | 5 + tools/testing/selftests/landlock/ns_test.c | 1379 +++++++++++++++++++ tools/testing/selftests/landlock/wrappers.h | 6 + 4 files changed, 1413 insertions(+) create mode 100644 tools/testing/selftests/landlock/ns_test.c diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/self= tests/landlock/common.h index 90551650299c..e7d1d1e9df74 100644 --- a/tools/testing/selftests/landlock/common.h +++ b/tools/testing/selftests/landlock/common.h @@ -128,6 +128,29 @@ static void __maybe_unused clear_ambient_cap( EXPECT_EQ(0, cap_get_ambient(cap)); } =20 +/* + * Returns true if the current process is in the initial user namespace. + * Compares the readlink targets of /proc/self/ns/user and /proc/1/ns/user. + */ +static bool __maybe_unused is_in_init_user_ns(void) +{ + char self_buf[64], init_buf[64]; + ssize_t self_len, init_len; + + self_len =3D readlink("/proc/self/ns/user", self_buf, sizeof(self_buf)); + if (self_len <=3D 0 || self_len >=3D (ssize_t)sizeof(self_buf)) + return false; + + init_len =3D readlink("/proc/1/ns/user", init_buf, sizeof(init_buf)); + if (init_len <=3D 0 || init_len >=3D (ssize_t)sizeof(init_buf)) + return false; + + if (self_len !=3D init_len) + return false; + + return memcmp(self_buf, init_buf, self_len) =3D=3D 0; +} + /* Receives an FD from a UNIX socket. Returns the received FD, or -errno. = */ static int __maybe_unused recv_fd(int usock) { diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selfte= sts/landlock/config index 8fe9b461b1fd..d09b637bf6ca 100644 --- a/tools/testing/selftests/landlock/config +++ b/tools/testing/selftests/landlock/config @@ -3,6 +3,7 @@ CONFIG_AUDIT=3Dy CONFIG_CGROUPS=3Dy CONFIG_CGROUP_SCHED=3Dy CONFIG_INET=3Dy +CONFIG_IPC_NS=3Dy CONFIG_IPV6=3Dy CONFIG_KEYS=3Dy CONFIG_MPTCP=3Dy @@ -10,10 +11,14 @@ CONFIG_MPTCP_IPV6=3Dy CONFIG_NET=3Dy CONFIG_NET_NS=3Dy CONFIG_OVERLAY_FS=3Dy +CONFIG_PID_NS=3Dy CONFIG_PROC_FS=3Dy CONFIG_SECURITY=3Dy CONFIG_SECURITY_LANDLOCK=3Dy CONFIG_SHMEM=3Dy CONFIG_SYSFS=3Dy +CONFIG_TIME_NS=3Dy CONFIG_TMPFS=3Dy CONFIG_TMPFS_XATTR=3Dy +CONFIG_USER_NS=3Dy +CONFIG_UTS_NS=3Dy diff --git a/tools/testing/selftests/landlock/ns_test.c b/tools/testing/sel= ftests/landlock/ns_test.c new file mode 100644 index 000000000000..5d968dd9f4f5 --- /dev/null +++ b/tools/testing/selftests/landlock/ns_test.c @@ -0,0 +1,1379 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Landlock tests - Namespace restriction + * + * Copyright =C2=A9 2026 Cloudflare + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "audit.h" +#include "common.h" + +/* + * Max length for /proc/self/ns/ paths (longest: + * "/proc/self/ns/cgroup"). + */ +#define NS_PROC_PATH_MAX 32 + +static int create_ns_ruleset(void) +{ + const struct landlock_ruleset_attr attr =3D { + .handled_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + }; + + return landlock_create_ruleset(&attr, sizeof(attr), 0); +} + +static int add_ns_rule(int ruleset_fd, __u64 ns_type) +{ + const struct landlock_namespace_attr attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + .namespace_types =3D ns_type, + }; + + return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, &attr, 0); +} + +/* + * Returns the /proc/self/NS entry name for a given CLONE_NEW* type, or NU= LL + * if unknown. Used to check kernel support without side effects. + */ +static const char *ns_proc_name(__u64 ns_type) +{ + switch (ns_type) { + case CLONE_NEWNS: + return "mnt"; + case CLONE_NEWCGROUP: + return "cgroup"; + case CLONE_NEWUTS: + return "uts"; + case CLONE_NEWIPC: + return "ipc"; + case CLONE_NEWUSER: + return "user"; + case CLONE_NEWPID: + return "pid"; + case CLONE_NEWNET: + return "net"; + case CLONE_NEWTIME: + return "time"; + default: + return NULL; + } +} + +static bool ns_is_supported(__u64 ns_type, char *proc_path, size_t size) +{ + const char *ns_name; + + ns_name =3D ns_proc_name(ns_type); + if (!ns_name) + return false; + + snprintf(proc_path, size, "/proc/self/ns/%s", ns_name); + return access(proc_path, F_OK) =3D=3D 0; +} + +/* Rule validation tests */ + +TEST(add_rule_bad_attr) +{ + const struct landlock_ruleset_attr cap_only_attr =3D { + .handled_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + }; + int ruleset_fd; + struct landlock_namespace_attr attr =3D {}; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* Empty allowed_perm returns ENOMSG (useless deny rule). */ + attr.allowed_perm =3D 0; + attr.namespace_types =3D CLONE_NEWUTS; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + ASSERT_EQ(ENOMSG, errno); + + /* allowed_perm with unhandled bit. */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER | + LANDLOCK_PERM_CAPABILITY_USE; + attr.namespace_types =3D CLONE_NEWUTS; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + ASSERT_EQ(EINVAL, errno); + + /* allowed_perm with wrong type. */ + attr.allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE; + attr.namespace_types =3D CLONE_NEWUTS; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + ASSERT_EQ(EINVAL, errno); + + /* + * Unknown namespace bits (e.g. bit 63) are silently accepted + * for forward compatibility. Only known CLONE_NEW* bits are stored. + */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.namespace_types =3D 1ULL << 63; + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + + /* Useless rule: empty namespace_types bitmask. */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.namespace_types =3D 0; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + ASSERT_EQ(ENOMSG, errno); + + /* + * Bit 1 is not a CLONE_NEW* value but is silently accepted + * for forward compatibility (no hole rejection). + */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.namespace_types =3D (1ULL << 1); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + + /* Multi-bit values are valid (bitmask allows multiple types). */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.namespace_types =3D CLONE_NEWUTS | CLONE_NEWNET; + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + + /* Non-zero flags must be rejected. */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.namespace_types =3D CLONE_NEWUTS; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 1)); + ASSERT_EQ(EINVAL, errno); + + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * Ruleset handles PERM_CAPABILITY_USE but not PERM_NAMESPACE_ENTER: + * adding a namespace rule must be rejected. + */ + ruleset_fd =3D landlock_create_ruleset(&cap_only_attr, + sizeof(cap_only_attr), 0); + ASSERT_LE(0, ruleset_fd); + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.namespace_types =3D CLONE_NEWUTS; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + ASSERT_EQ(EINVAL, errno); + EXPECT_EQ(0, close(ruleset_fd)); +} + +/* + * Unknown namespace types in the upper range are silently accepted + * (allow-list: they have no effect since the kernel never checks them). + */ +TEST(add_rule_unknown) +{ + int ruleset_fd; + struct landlock_namespace_attr attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + }; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* + * Bit 31 is in the lower 32 bits but not a CLONE_NEW* value. + * Silently accepted for forward compatibility (no hole rejection). + */ + attr.namespace_types =3D 1ULL << 31; + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + + /* Bit 32 is in the unknown upper range: silently accepted. */ + attr.namespace_types =3D 1ULL << 32; + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &attr, 0)); + + EXPECT_EQ(0, close(ruleset_fd)); +} + +/* Namespace creation tests (variant-based positive/negative) */ + +/* clang-format off */ +FIXTURE(ns_create) { + char proc_path[NS_PROC_PATH_MAX]; +}; +/* clang-format on */ + +FIXTURE_VARIANT(ns_create) +{ + const __u64 namespace_types; + const bool is_sandboxed; + const bool has_rule; + const bool drop_all_caps; + const int expected_result; +}; + +/* + * Unsandboxed baseline: no Landlock domain is enforced. + * User namespace creation should succeed without any restriction. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, user_unsandboxed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUSER, + .is_sandboxed =3D false, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D 0, +}; + +/* + * User namespace creation denied: handled by Landlock but no rule + * allows CLONE_NEWUSER. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, user_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUSER, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* + * User namespace creation allowed: Landlock rule permits CLONE_NEWUSER. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, user_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUSER, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D false, + .expected_result =3D 0, +}; + +/* + * User namespace creation while unprivileged: the process has no + * capabilities but unshare(CLONE_NEWUSER) is an unprivileged + * operation so it still succeeds. The Landlock rule allows it. + * For setns, the capability check (CAP_SYS_ADMIN) fails first + * since the process has no capabilities, yielding EPERM. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, user_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUSER, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D 0, +}; + +/* + * Unsandboxed baseline for non-user namespace: no Landlock domain, + * process has CAP_SYS_ADMIN. UTS creation should succeed. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, uts_unsandboxed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUTS, + .is_sandboxed =3D false, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D 0, +}; + +/* + * Non-user namespace denied: process has CAP_SYS_ADMIN (passes + * ns_capable), but Landlock denies (no rule). + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, uts_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUTS, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* + * Non-user namespace allowed: process has CAP_SYS_ADMIN and Landlock + * rule permits CLONE_NEWUTS. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, uts_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUTS, .is_sandboxed =3D true, .has_rule =3D = true, + .drop_all_caps =3D false, .expected_result =3D 0, +}; + +/* + * Unprivileged namespace creation: process lacks CAP_SYS_ADMIN, so the + * kernel denies creation regardless of Landlock rules. Landlock cannot + * authorize what the kernel denied (LSM hooks are restriction-only). + * The rule is present to verify Landlock does not change the error code. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, uts_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWUTS, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, ipc_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWIPC, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, ipc_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWIPC, .is_sandboxed =3D true, .has_rule =3D = true, + .drop_all_caps =3D false, .expected_result =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, ipc_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWIPC, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, mnt_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWNS, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, mnt_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWNS, .is_sandboxed =3D true, .has_rule =3D t= rue, + .drop_all_caps =3D false, .expected_result =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, mnt_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWNS, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, cgroup_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWCGROUP, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, cgroup_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWCGROUP, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D false, + .expected_result =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, cgroup_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWCGROUP, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, pid_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWPID, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, pid_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWPID, .is_sandboxed =3D true, .has_rule =3D = true, + .drop_all_caps =3D false, .expected_result =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, pid_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWPID, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, net_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWNET, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, net_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWNET, .is_sandboxed =3D true, .has_rule =3D = true, + .drop_all_caps =3D false, .expected_result =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, net_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWNET, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, time_denied) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWTIME, + .is_sandboxed =3D true, + .has_rule =3D false, + .drop_all_caps =3D false, + .expected_result =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, time_allowed) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWTIME, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D false, + .expected_result =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_create, time_unprivileged) { + /* clang-format on */ + .namespace_types =3D CLONE_NEWTIME, + .is_sandboxed =3D true, + .has_rule =3D true, + .drop_all_caps =3D true, + .expected_result =3D EPERM, +}; + +FIXTURE_SETUP(ns_create) +{ + if (!ns_is_supported(variant->namespace_types, self->proc_path, + sizeof(self->proc_path))) { + /* UML does not support the time namespace. */ + if (variant->namespace_types =3D=3D CLONE_NEWTIME) + SKIP(return, "CLONE_NEWTIME not supported"); + + ASSERT_TRUE(false) + { + TH_LOG("Namespace type 0x%llx not supported", + (unsigned long long)variant->namespace_types); + } + } + + if (variant->drop_all_caps) + drop_caps(_metadata); + else + disable_caps(_metadata); +} + +FIXTURE_TEARDOWN(ns_create) +{ +} + +TEST_F(ns_create, unshare) +{ + int ruleset_fd, err; + + if (variant->is_sandboxed) { + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + + if (variant->has_rule) + ASSERT_EQ(0, add_ns_rule(ruleset_fd, + variant->namespace_types)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + } + + /* + * Non-user namespaces need CAP_SYS_ADMIN for the privileged path. + * User namespaces and unprivileged tests skip this. + */ + if (!variant->drop_all_caps && + variant->namespace_types !=3D CLONE_NEWUSER) + set_cap(_metadata, CAP_SYS_ADMIN); + + err =3D unshare(variant->namespace_types); + if (variant->expected_result) { + EXPECT_EQ(-1, err); + EXPECT_EQ(variant->expected_result, errno); + } else { + EXPECT_EQ(0, err); + } + + if (!variant->drop_all_caps && + variant->namespace_types !=3D CLONE_NEWUSER) + clear_cap(_metadata, CAP_SYS_ADMIN); +} + +/* + * clone3 exercises a different kernel entry point than unshare: it goes + * through kernel_clone() -> copy_process() -> copy_namespaces() -> + * create_new_namespaces(). Both paths converge at __ns_common_init() -> + * security_namespace_alloc(), but the entry point and argument handling + * differ. + */ +TEST_F(ns_create, clone3) +{ + int ruleset_fd, status; + pid_t pid; + struct clone_args args =3D {}; + + if (variant->is_sandboxed) { + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + + if (variant->has_rule) + ASSERT_EQ(0, add_ns_rule(ruleset_fd, + variant->namespace_types)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + } + + if (!variant->drop_all_caps && + variant->namespace_types !=3D CLONE_NEWUSER) + set_cap(_metadata, CAP_SYS_ADMIN); + + args.flags =3D variant->namespace_types; + args.exit_signal =3D SIGCHLD; + pid =3D sys_clone3(&args, sizeof(args)); + if (pid =3D=3D 0) + _exit(EXIT_SUCCESS); + + if (variant->expected_result) { + EXPECT_EQ(-1, pid); + EXPECT_EQ(variant->expected_result, errno); + } else { + EXPECT_LE(0, pid); + ASSERT_EQ(pid, waitpid(pid, &status, 0)); + ASSERT_EQ(1, WIFEXITED(status)); + ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status)); + } + + if (!variant->drop_all_caps && + variant->namespace_types !=3D CLONE_NEWUSER) + clear_cap(_metadata, CAP_SYS_ADMIN); +} + +/* + * setns exercises the namespace install path: validate_ns() -> + * security_namespace_install() -> hook_namespace_install(). This is a + * different LSM hook than creation, so it must be tested separately for + * each type. + * + * Mount namespace setns requires both CAP_SYS_ADMIN and CAP_SYS_CHROOT + * (checked by mntns_install), so the allowed variant sets both. + */ +TEST_F(ns_create, setns) +{ + int ruleset_fd, ns_fd, err, expected; + + /* + * setns into the process's own user NS always returns EINVAL: + * userns_install() rejects re-entry before checking capabilities. + */ + if (variant->namespace_types =3D=3D CLONE_NEWUSER) { + expected =3D EINVAL; + } else { + expected =3D variant->expected_result; + } + + /* Open the NS FD before enforcing the domain. */ + ns_fd =3D open(self->proc_path, O_RDONLY); + ASSERT_LE(0, ns_fd); + + if (variant->is_sandboxed) { + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + + if (variant->has_rule) + ASSERT_EQ(0, add_ns_rule(ruleset_fd, + variant->namespace_types)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + } + + if (!variant->drop_all_caps) { + set_cap(_metadata, CAP_SYS_ADMIN); + /* + * mntns_install() requires CAP_SYS_CHROOT in addition to + * CAP_SYS_ADMIN. + */ + if (variant->namespace_types =3D=3D CLONE_NEWNS) + set_cap(_metadata, CAP_SYS_CHROOT); + } + + err =3D setns(ns_fd, variant->namespace_types); + if (expected) { + EXPECT_EQ(-1, err); + EXPECT_EQ(expected, errno); + } else { + EXPECT_EQ(0, err); + } + + if (!variant->drop_all_caps) { + clear_cap(_metadata, CAP_SYS_ADMIN); + if (variant->namespace_types =3D=3D CLONE_NEWNS) + clear_cap(_metadata, CAP_SYS_CHROOT); + } + + EXPECT_EQ(0, close(ns_fd)); +} + +/* Additional namespace creation tests */ + +/* + * When LANDLOCK_PERM_NAMESPACE_ENTER is not handled by any domain, namesp= ace + * creation must produce the same result as without Landlock. Unlike the + * unsandboxed variants of ns_create (which have no domain at all), this t= est + * verifies that a domain handling only FS access does not interfere with + * namespace operations. + */ +TEST(ns_create_unhandled) +{ + const struct landlock_ruleset_attr attr =3D { + .handled_access_fs =3D LANDLOCK_ACCESS_FS_READ_FILE, + }; + int ruleset_fd; + + disable_caps(_metadata); + + ruleset_fd =3D landlock_create_ruleset(&attr, sizeof(attr), 0); + ASSERT_LE(0, ruleset_fd); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* User namespace creation should still work (unhandled). */ + EXPECT_EQ(0, unshare(CLONE_NEWUSER)); +} + +/* + * Layer stacking: layer 1 always allows CLONE_NEWUSER. Layer 2 + * either allows (both layers agree -> success) or denies (any layer + * can deny -> failure). + */ +/* clang-format off */ +FIXTURE(ns_stacking) {}; +/* clang-format on */ + +FIXTURE_VARIANT(ns_stacking) +{ + bool second_layer_allows; +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_stacking, deny) { + /* clang-format on */ + .second_layer_allows =3D false, +}; + +/* Both layers allow CLONE_NEWUSER -> operation succeeds. */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(ns_stacking, allow) { + /* clang-format on */ + .second_layer_allows =3D true, +}; + +FIXTURE_SETUP(ns_stacking) +{ + disable_caps(_metadata); +} + +FIXTURE_TEARDOWN(ns_stacking) +{ +} + +/* + * Verify that a second Landlock layer cannot override the first layer's + * denial. Each layer stores its permission bitmask independently, and + * enforcement requires all layers to allow an operation. This ensures + * the correct intersection: layer 1 allows CLONE_NEWUSER, but if layer + * 2 does not also allow it, the operation is denied. + */ +TEST_F(ns_stacking, two_layers) +{ + int ruleset_fd; + + /* First layer: allow CLONE_NEWUSER. */ + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* Second layer: allow or deny depending on variant. */ + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + if (variant->second_layer_allows) + ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + if (variant->second_layer_allows) { + EXPECT_EQ(0, unshare(CLONE_NEWUSER)); + } else { + EXPECT_EQ(-1, unshare(CLONE_NEWUSER)); + EXPECT_EQ(EPERM, errno); + } +} + +/* + * Combined capability and namespace permissions in a single domain. + * Verifies that both permission types can coexist and are enforced + * independently. + */ +TEST(combined_cap_ns) +{ + const struct landlock_ruleset_attr attr =3D { + .handled_perm =3D LANDLOCK_PERM_CAPABILITY_USE | + LANDLOCK_PERM_NAMESPACE_ENTER, + }; + const struct landlock_capability_attr cap_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << CAP_SYS_ADMIN), + }; + const struct landlock_namespace_attr ns_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + .namespace_types =3D CLONE_NEWUSER, + }; + int ruleset_fd; + + /* Isolate hostname changes from other tests. */ + ASSERT_EQ(0, unshare(CLONE_NEWUTS)); + + disable_caps(_metadata); + + ruleset_fd =3D landlock_create_ruleset(&attr, sizeof(attr), 0); + ASSERT_LE(0, ruleset_fd); + + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_attr, 0)); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* CAP_SYS_ADMIN use allowed by capability rule. */ + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, sethostname("test", 4)); + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* CAP_SYS_CHROOT denied (not in allowed capability rules). */ + set_cap(_metadata, CAP_SYS_CHROOT); + EXPECT_EQ(-1, chroot("/")); + EXPECT_EQ(EPERM, errno); + + /* + * UTS namespace creation denied by Landlock (not in allowed namespace + * rules). CAP_SYS_ADMIN is needed for the kernel's ns_capable() + * check to pass, so that Landlock's hook is actually reached. + */ + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(-1, unshare(CLONE_NEWUTS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* User namespace creation allowed by namespace rule. */ + EXPECT_EQ(0, unshare(CLONE_NEWUSER)); +} + +/* + * Partial allow: one namespace type is allowed, another is denied. + * Verifies that rules are per-type. + */ +TEST(ns_create_partial) +{ + int ruleset_fd; + + disable_caps(_metadata); + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* Only allow UTS namespace creation. */ + ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* UTS namespace should be allowed. */ + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, unshare(CLONE_NEWUTS)); + + /* User namespace should be denied (no rule). */ + EXPECT_EQ(-1, unshare(CLONE_NEWUSER)); + EXPECT_EQ(EPERM, errno); +} + +/* clang-format off */ +FIXTURE(setns_cross_process) {}; +/* clang-format on */ + +FIXTURE_VARIANT(setns_cross_process) +{ + bool is_sandboxed; + int expected_setns; +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(setns_cross_process, denied) { + /* clang-format on */ + .is_sandboxed =3D true, + .expected_setns =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(setns_cross_process, allowed) { + /* clang-format on */ + .is_sandboxed =3D false, + .expected_setns =3D 0, +}; + +FIXTURE_SETUP(setns_cross_process) +{ +} + +FIXTURE_TEARDOWN(setns_cross_process) +{ +} + +/* + * setns into a child's UTS namespace: when sandboxed with + * LANDLOCK_PERM_NAMESPACE_ENTER denying UTS, the rule-based check + * applies regardless of which process created the namespace. + */ +TEST_F(setns_cross_process, setns) +{ + int ruleset_fd, ns_fd, status; + pid_t child; + int pipe_parent[2], pipe_child[2]; + char buf, path[64]; + + disable_caps(_metadata); + + /* + * Enable dumpable so the parent can read /proc//ns/uts. + * Without this, ptrace access checks (PTRACE_MODE_READ) prevent + * opening another process's namespace entries. + */ + ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)); + + ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC)); + ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC)); + + child =3D fork(); + ASSERT_LE(0, child); + + if (child =3D=3D 0) { + EXPECT_EQ(0, close(pipe_parent[1])); + EXPECT_EQ(0, close(pipe_child[0])); + + /* Child: create a UTS namespace. */ + set_cap(_metadata, CAP_SYS_ADMIN); + ASSERT_EQ(0, unshare(CLONE_NEWUTS)); + + drop_caps(_metadata); + ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)); + + /* Signal parent that the namespace is ready. */ + ASSERT_EQ(1, write(pipe_child[1], ".", 1)); + + /* Wait for parent to finish testing. */ + ASSERT_EQ(1, read(pipe_parent[0], &buf, 1)); + _exit(_metadata->exit_code); + } + + EXPECT_EQ(0, close(pipe_parent[0])); + EXPECT_EQ(0, close(pipe_child[1])); + + /* Wait for child namespace. */ + ASSERT_EQ(1, read(pipe_child[0], &buf, 1)); + EXPECT_EQ(0, close(pipe_child[0])); + + /* Open the child's NS FD BEFORE creating the domain. */ + snprintf(path, sizeof(path), "/proc/%d/ns/uts", child); + ns_fd =3D open(path, O_RDONLY); + ASSERT_LE(0, ns_fd); + + if (variant->is_sandboxed) { + /* Create domain denying UTS entry (no allow rule). */ + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + } + + set_cap(_metadata, CAP_SYS_ADMIN); + if (variant->expected_setns) { + EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS)); + EXPECT_EQ(variant->expected_setns, errno); + } else { + EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS)); + } + clear_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, close(ns_fd)); + + /* Release child. */ + ASSERT_EQ(1, write(pipe_parent[1], ".", 1)); + EXPECT_EQ(0, close(pipe_parent[1])); + ASSERT_EQ(child, waitpid(child, &status, 0)); + ASSERT_EQ(1, WIFEXITED(status)); + ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status)); +} + +/* + * Verify that both LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABI= LITY_USE + * apply simultaneously: creating/entering a non-user namespace + * requires both the namespace type to be allowed AND CAP_SYS_ADMIN + * to be allowed. User namespace creation is the exception (no + * capable() call from the kernel). + */ +TEST(setns_and_create) +{ + int ruleset_fd, ns_fd; + const struct landlock_ruleset_attr attr =3D { + .handled_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER | + LANDLOCK_PERM_CAPABILITY_USE, + }; + const struct landlock_namespace_attr ns_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + .namespace_types =3D CLONE_NEWUTS, + }; + const struct landlock_capability_attr cap_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << CAP_SYS_ADMIN), + }; + + disable_caps(_metadata); + + ruleset_fd =3D landlock_create_ruleset(&attr, sizeof(attr), 0); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0)); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_attr, 0)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* UTS unshare: allowed by NS rule + CAP_SYS_ADMIN allowed. */ + set_cap(_metadata, CAP_SYS_ADMIN); + ASSERT_EQ(0, unshare(CLONE_NEWUTS)); + + /* IPC unshare: denied by NS rule (type not allowed). */ + EXPECT_EQ(-1, unshare(CLONE_NEWIPC)); + EXPECT_EQ(EPERM, errno); + + /* setns into current UTS: allowed by NS rule. */ + ns_fd =3D open("/proc/self/ns/uts", O_RDONLY); + ASSERT_LE(0, ns_fd); + EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS)); + clear_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, close(ns_fd)); + + /* + * User namespace creation: only LANDLOCK_PERM_NAMESPACE_ENTER needed + * (no capable() call from the kernel for user NS). Denied + * because CLONE_NEWUSER is not in the allowed namespace types. + */ + EXPECT_EQ(-1, unshare(CLONE_NEWUSER)); + EXPECT_EQ(EPERM, errno); +} + +/* + * Verify that LANDLOCK_PERM_CAPABILITY_USE can deny the CAP_SYS_ADMIN che= ck + * that the kernel performs before the Landlock namespace hook is + * reached. The NS type is allowed but the required capability is not, + * so the operation fails on the capability check. + * + * User namespace creation is the exception: no capable() call, so the + * operation succeeds with just LANDLOCK_PERM_NAMESPACE_ENTER. + */ +TEST(two_perm_cap_denied) +{ + const struct landlock_ruleset_attr attr =3D { + .handled_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER | + LANDLOCK_PERM_CAPABILITY_USE, + }; + const struct landlock_namespace_attr ns_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + .namespace_types =3D CLONE_NEWUTS | CLONE_NEWUSER, + }; + /* CAP_SYS_ADMIN is NOT allowed. */ + const struct landlock_capability_attr cap_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << CAP_SYS_CHROOT), + }; + int ruleset_fd; + + disable_caps(_metadata); + + ruleset_fd =3D landlock_create_ruleset(&attr, sizeof(attr), 0); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0)); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_attr, 0)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * UTS creation: the process holds CAP_SYS_ADMIN but Landlock + * denies it (not in the cap rule), so the kernel's + * ns_capable(CAP_SYS_ADMIN) gate fails before the namespace + * hook is reached. + */ + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(-1, unshare(CLONE_NEWUTS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* + * User NS creation: no capable() call from the kernel, so + * only LANDLOCK_PERM_NAMESPACE_ENTER applies. CLONE_NEWUSER is in the + * allowed set, so this succeeds. + */ + EXPECT_EQ(0, unshare(CLONE_NEWUSER)); +} + +/* + * Mount namespace setns is unique: the kernel checks both + * CAP_SYS_ADMIN and CAP_SYS_CHROOT in mntns_install(). Verify that + * allowing CAP_SYS_ADMIN alone is not sufficient. + */ +TEST(two_perm_mnt_setns) +{ + const struct landlock_ruleset_attr attr =3D { + .handled_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER | + LANDLOCK_PERM_CAPABILITY_USE, + }; + const struct landlock_namespace_attr ns_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + .namespace_types =3D CLONE_NEWNS, + }; + const struct landlock_capability_attr cap_admin =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << CAP_SYS_ADMIN), + }; + const struct landlock_capability_attr cap_admin_chroot =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << CAP_SYS_ADMIN) | + (1ULL << CAP_SYS_CHROOT), + }; + int ruleset_fd, ns_fd; + + disable_caps(_metadata); + + /* Layer 1: allow mount NS + CAP_SYS_ADMIN only (no CAP_SYS_CHROOT). */ + ruleset_fd =3D landlock_create_ruleset(&attr, sizeof(attr), 0); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0)); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_admin, 0)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + ns_fd =3D open("/proc/self/ns/mnt", O_RDONLY); + ASSERT_LE(0, ns_fd); + + /* + * Fails: mntns_install() checks CAP_SYS_ADMIN (allowed) then + * CAP_SYS_CHROOT (denied by LANDLOCK_PERM_CAPABILITY_USE). + */ + set_cap(_metadata, CAP_SYS_ADMIN); + set_cap(_metadata, CAP_SYS_CHROOT); + EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + clear_cap(_metadata, CAP_SYS_CHROOT); + + /* Layer 2: also allows CAP_SYS_CHROOT. */ + ruleset_fd =3D landlock_create_ruleset(&attr, sizeof(attr), 0); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0)); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_admin_chroot, 0)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * Still fails: layer 1 still denies CAP_SYS_CHROOT. + * Landlock layer stacking means the most restrictive layer wins. + */ + set_cap(_metadata, CAP_SYS_ADMIN); + set_cap(_metadata, CAP_SYS_CHROOT); + EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + clear_cap(_metadata, CAP_SYS_CHROOT); + EXPECT_EQ(0, close(ns_fd)); +} + +/* Audit tests */ + +static int matches_log_ns_create(int audit_fd, __u64 ns_type) +{ + static const char log_template[] =3D REGEX_LANDLOCK_PREFIX + " blockers=3Dperm\\.namespace_enter" + " namespace_type=3D0x%x" + " namespace_inum=3D0$"; + char log_match[sizeof(log_template) + 10]; + int log_match_len; + + log_match_len =3D snprintf(log_match, sizeof(log_match), log_template, + (unsigned int)ns_type); + if (log_match_len >=3D sizeof(log_match)) + return -E2BIG; + + return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match, + NULL); +} + +static int matches_log_ns_setns(int audit_fd, __u64 ns_type) +{ + static const char log_template[] =3D REGEX_LANDLOCK_PREFIX + " blockers=3Dperm\\.namespace_enter" + " namespace_type=3D0x%x" + " namespace_inum=3D[0-9]\\+$"; + char log_match[sizeof(log_template) + 10]; + int log_match_len; + + log_match_len =3D snprintf(log_match, sizeof(log_match), log_template, + (unsigned int)ns_type); + if (log_match_len >=3D sizeof(log_match)) + return -E2BIG; + + return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match, + NULL); +} + +FIXTURE(ns_audit) +{ + struct audit_filter audit_filter; + int audit_fd; +}; + +FIXTURE_SETUP(ns_audit) +{ + ASSERT_TRUE(is_in_init_user_ns()); + + disable_caps(_metadata); + + set_cap(_metadata, CAP_AUDIT_CONTROL); + self->audit_fd =3D audit_init_with_exe_filter(&self->audit_filter); + EXPECT_LE(0, self->audit_fd); + clear_cap(_metadata, CAP_AUDIT_CONTROL); +} + +FIXTURE_TEARDOWN(ns_audit) +{ + set_cap(_metadata, CAP_AUDIT_CONTROL); + EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter)); +} + +/* + * Verifies that a denied namespace creation produces the expected audit + * record with the perm.namespace_enter blocker string and namespace_type. + */ +TEST_F(ns_audit, create_denied) +{ + struct audit_records records; + int ruleset_fd; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(-1, unshare(CLONE_NEWUTS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + + EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS)); + + /* + * No extra access records: the denial was already consumed by + * matches_log_ns_create above. One domain allocation record, + * emitted in the same event as the first access denial for this + * domain. + */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); + EXPECT_EQ(1, records.domain); +} + +TEST_F(ns_audit, create_allowed) +{ + struct audit_records records; + int ruleset_fd; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, unshare(CLONE_NEWUTS)); + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* No records: allowed operations never trigger audit logging. */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); +} + +TEST_F(ns_audit, setns_allowed) +{ + struct audit_records records; + int ruleset_fd, ns_fd; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + ns_fd =3D open("/proc/self/ns/uts", O_RDONLY); + ASSERT_LE(0, ns_fd); + + /* Allowed: should succeed with no audit record. */ + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS)); + clear_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, close(ns_fd)); + + /* No records: allowed setns never triggers audit logging. */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); +} + +TEST_F(ns_audit, setns_denied) +{ + struct audit_records records; + int ruleset_fd, ns_fd; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + /* No rule allows UTS -> denied. */ + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + ns_fd =3D open("/proc/self/ns/uts", O_RDONLY); + ASSERT_LE(0, ns_fd); + + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, close(ns_fd)); + + /* Verify the audit record for setns denial. */ + EXPECT_EQ(0, matches_log_ns_setns(self->audit_fd, CLONE_NEWUTS)); + + /* + * No extra access records: the denial was already consumed by + * matches_log_ns_setns above. One domain allocation record, + * emitted in the same event as the first access denial for this + * domain. + */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); + EXPECT_EQ(1, records.domain); +} + +TEST_F(ns_audit, unshare_denied) +{ + struct audit_records records; + int ruleset_fd; + + ruleset_fd =3D create_ns_ruleset(); + ASSERT_LE(0, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* Deny UTS namespace creation (no allow rule). */ + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(-1, unshare(CLONE_NEWUTS)); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* Verify the audit record for namespace creation denial. */ + EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS)); + + /* + * No extra access records: the denial was already consumed by + * matches_log_ns_create above. One domain allocation record, + * emitted in the same event as the first access denial for this + * domain. + */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); + EXPECT_EQ(1, records.domain); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/landlock/wrappers.h b/tools/testing/se= lftests/landlock/wrappers.h index 65548323e45d..a3266fdb43da 100644 --- a/tools/testing/selftests/landlock/wrappers.h +++ b/tools/testing/selftests/landlock/wrappers.h @@ -9,6 +9,7 @@ =20 #define _GNU_SOURCE #include +#include #include #include #include @@ -45,3 +46,8 @@ static inline pid_t sys_gettid(void) { return syscall(__NR_gettid); } + +static inline pid_t sys_clone3(struct clone_args *args, size_t size) +{ + return syscall(__NR_clone3, args, size); +} --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-8fac.mail.infomaniak.ch (smtp-8fac.mail.infomaniak.ch [83.166.143.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E75DB3B3C04 for ; Thu, 12 Mar 2026 10:14:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=83.166.143.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310491; cv=none; b=i5lv0ctecU4/ASYSo5iKyNArO7YzIHAT7u3hAuu6IsAtdnVuuT9W8oHfXfySypQDpH9RBWK+j28T0xHVEQw7eyNqAWyg4B7UCKEaz2HmxLxscteVPVhQQVSboY6eOY3f+8olAlIkGTVaDpEVugdm29ycu0e5DhHivZwXL0n0X58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773310491; c=relaxed/simple; bh=ho3CmWS2AOI4dVKWDD2zipY4RWDvp0R0ULPZPjOWjvY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mpjA7uOq6yZW86mMUF3SzvLKijzEKO7muPHRc27ITsnJ8N+N883nN69j/Q3727mDyNi8HfFlnNhpJeQV/J3rp288zJLnwZu15IzejmpRAtQsNKsdTms+dfjV6PEe3X+bvhklzRL4pnDXtbCexI+MlmBNAMp3f80TeriKVO4kZBY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=W/RE1Hp9; arc=none smtp.client-ip=83.166.143.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="W/RE1Hp9" Received: from smtp-3-0001.mail.infomaniak.ch (unknown [IPv6:2001:1600:4:17::246c]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsr5CCJzDmt; Thu, 12 Mar 2026 11:05:20 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309920; bh=sZhRsh4SotQ2F9qMT6E2L4sf33F+jeFHCQaWQ6ILlcg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=W/RE1Hp9Kyd9Y5zH98jWY7+NElcYIjizuNODZ86OgQRc6ev87Fw38gozu2UukEtq/ dcUzAxpYA03eyhRaG2Fyr5lEp/RhX/G1sf4lr79fg82bW2XsvHGmLQ7NaiXheLIqRk sTu0pGghlukb0hi8NNWRgDH051hintkwGF64ljAg= Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjsr0JBhzBxD; Thu, 12 Mar 2026 11:05:20 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 09/11] selftests/landlock: Add capability restriction tests Date: Thu, 12 Mar 2026 11:04:42 +0100 Message-ID: <20260312100444.2609563-10-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Add tests to exercise LANDLOCK_PERM_CAPABILITY_USE enforcement. The tests verify that a sandboxed process is denied a handled capability when no rule grants it, and that an explicit rule restores the capability. Unknown capability values above CAP_LAST_CAP are checked to be silently accepted without effect, ensuring the allow-list stays future-proof when new capabilities are added. A stacking test creates two nested domains restricting different capability sets and confirms that both layers' rules are enforced. Invalid rule attributes (wrong flags, out-of-range values) are tested to return the expected errors. Two tests exercise non-standard capability gain paths. The first enforces a domain via CAP_SYS_ADMIN (no_new_privs is not set) and verifies that denied capabilities are blocked even when still in the effective set. The second creates a user namespace under a Landlock domain to verify that capabilities gained through the kernel's user namespace ownership bypass (cap_capable_helper) are still restricted by the domain's rules. Audit tests verify that denied capabilities produce the correct audit record with the capability number, and that allowed capabilities generate no denial record. Test coverage for security/landlock is 90.7% of 2282 lines according to LLVM 21. Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- tools/testing/selftests/landlock/base_test.c | 18 + tools/testing/selftests/landlock/cap_test.c | 614 +++++++++++++++++++ 2 files changed, 632 insertions(+) create mode 100644 tools/testing/selftests/landlock/cap_test.c diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/s= elftests/landlock/base_test.c index 30d37234086c..a55e8111bbde 100644 --- a/tools/testing/selftests/landlock/base_test.c +++ b/tools/testing/selftests/landlock/base_test.c @@ -142,6 +142,24 @@ TEST(errata) ASSERT_EQ(EINVAL, errno); } =20 +#define PERM_LAST LANDLOCK_PERM_CAPABILITY_USE + +TEST(ruleset_with_unknown_perm) +{ + __u64 perm_mask; + + for (perm_mask =3D 1ULL << 63; perm_mask !=3D PERM_LAST; perm_mask >>=3D = 1) { + struct landlock_ruleset_attr ruleset_attr =3D { + .handled_perm =3D perm_mask, + }; + + /* Unknown handled_perm values must be rejected. */ + ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, + sizeof(ruleset_attr), 0)); + ASSERT_EQ(EINVAL, errno); + } +} + /* Tests ordering of syscall argument checks. */ TEST(create_ruleset_checks_ordering) { diff --git a/tools/testing/selftests/landlock/cap_test.c b/tools/testing/se= lftests/landlock/cap_test.c new file mode 100644 index 000000000000..7ae978dff808 --- /dev/null +++ b/tools/testing/selftests/landlock/cap_test.c @@ -0,0 +1,614 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Landlock tests - Capability restriction + * + * Copyright =C2=A9 2026 Cloudflare + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "audit.h" +#include "common.h" + +static int create_cap_ruleset(void) +{ + const struct landlock_ruleset_attr attr =3D { + .handled_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + }; + + return landlock_create_ruleset(&attr, sizeof(attr), 0); +} + +static int add_cap_rule(int ruleset_fd, __u64 cap) +{ + const struct landlock_capability_attr attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << cap), + }; + + return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, &attr, + 0); +} + +TEST(add_rule_bad_attr) +{ + const struct landlock_ruleset_attr ns_only_attr =3D { + .handled_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + }; + int ruleset_fd; + struct landlock_capability_attr attr =3D {}; + + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* Empty allowed_perm returns ENOMSG (useless deny rule). */ + attr.allowed_perm =3D 0; + attr.capabilities =3D (1ULL << CAP_NET_RAW); + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + ASSERT_EQ(ENOMSG, errno); + + /* Useless rule: empty capabilities bitmask. */ + attr.allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE; + attr.capabilities =3D 0; + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + ASSERT_EQ(ENOMSG, errno); + + /* allowed_perm with unhandled bit. */ + attr.allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE | + LANDLOCK_PERM_NAMESPACE_ENTER; + attr.capabilities =3D (1ULL << CAP_NET_RAW); + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + ASSERT_EQ(EINVAL, errno); + + /* allowed_perm with wrong type. */ + attr.allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER; + attr.capabilities =3D (1ULL << CAP_NET_RAW); + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + ASSERT_EQ(EINVAL, errno); + + /* + * Unknown capability bits (e.g. bit 63) are silently accepted + * for forward compatibility. Only known bits are stored. + */ + attr.allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE; + attr.capabilities =3D 1ULL << 63; + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + + /* Non-zero flags must be rejected. */ + attr.allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE; + attr.capabilities =3D (1ULL << CAP_NET_RAW); + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 1)); + ASSERT_EQ(EINVAL, errno); + + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * Ruleset handles PERM_NAMESPACE_ENTER but not PERM_CAPABILITY_USE: + * adding a capability rule must be rejected. + */ + ruleset_fd =3D + landlock_create_ruleset(&ns_only_attr, sizeof(ns_only_attr), 0); + ASSERT_LE(0, ruleset_fd); + attr.allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE; + attr.capabilities =3D (1ULL << CAP_NET_RAW); + ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + ASSERT_EQ(EINVAL, errno); + EXPECT_EQ(0, close(ruleset_fd)); +} + +/* + * Unknown capability values above CAP_LAST_CAP are silently accepted + * (allow-list: they have no effect since the kernel never checks them). + */ +TEST(add_rule_unknown) +{ + int ruleset_fd; + struct landlock_capability_attr attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + }; + + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* Just above CAP_LAST_CAP should succeed. */ + attr.capabilities =3D (1ULL << (CAP_LAST_CAP + 1)); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + + /* High values (below bit 63) should succeed. */ + attr.capabilities =3D (1ULL << 62); + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &attr, 0)); + + EXPECT_EQ(0, close(ruleset_fd)); +} + +/* clang-format off */ +FIXTURE(cap_enforce) {}; +/* clang-format on */ + +FIXTURE_VARIANT(cap_enforce) +{ + const bool is_sandboxed; + const bool handle_caps; + const __u64 allowed_cap; + const int expected_sysadmin; + const int expected_chroot; +}; + +/* + * Unsandboxed baseline: no Landlock domain is enforced. + * Both capabilities should work normally. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_enforce, unsandboxed) { + /* clang-format on */ + .is_sandboxed =3D false, .handle_caps =3D false, .allowed_cap =3D 0, + .expected_sysadmin =3D 0, .expected_chroot =3D 0, +}; + +/* + * Denied: capabilities are handled but no rule allows them. + * All capability checks must be denied by Landlock even if the + * capability is effective. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_enforce, denied) { + /* clang-format on */ + .is_sandboxed =3D true, .handle_caps =3D true, .allowed_cap =3D = 0, + .expected_sysadmin =3D EPERM, .expected_chroot =3D EPERM, +}; + +/* + * Allowed: CAP_SYS_ADMIN is allowed by rule, CAP_SYS_CHROOT is not. + * Only the explicitly allowed capability should succeed. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_enforce, allowed) { + /* clang-format on */ + .is_sandboxed =3D true, .handle_caps =3D true, + .allowed_cap =3D CAP_SYS_ADMIN, .expected_sysadmin =3D 0, + .expected_chroot =3D EPERM, +}; + +/* + * Unhandled: the ruleset does not handle LANDLOCK_PERM_CAPABILITY_USE + * at all (only handles FS access). Both capabilities should work + * since the domain does not restrict them. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_enforce, unhandled) { + /* clang-format on */ + .is_sandboxed =3D true, .handle_caps =3D false, .allowed_cap =3D 0, + .expected_sysadmin =3D 0, .expected_chroot =3D 0, +}; + +FIXTURE_SETUP(cap_enforce) +{ + disable_caps(_metadata); +} + +FIXTURE_TEARDOWN(cap_enforce) +{ +} + +/* + * Capability enforcement: tests the four fundamental enforcement + * scenarios (unsandboxed baseline, denied, allowed, unhandled) using + * two independent capability checks (sethostname for CAP_SYS_ADMIN, + * chroot for CAP_SYS_CHROOT). + */ +TEST_F(cap_enforce, use) +{ + int ruleset_fd; + + /* Isolate hostname changes from other tests. */ + set_cap(_metadata, CAP_SYS_ADMIN); + ASSERT_EQ(0, unshare(CLONE_NEWUTS)); + clear_cap(_metadata, CAP_SYS_ADMIN); + + if (variant->is_sandboxed) { + if (variant->handle_caps) { + ruleset_fd =3D create_cap_ruleset(); + } else { + const struct landlock_ruleset_attr attr =3D { + .handled_access_fs =3D + LANDLOCK_ACCESS_FS_READ_FILE, + }; + + ruleset_fd =3D + landlock_create_ruleset(&attr, sizeof(attr), 0); + } + ASSERT_LE(0, ruleset_fd); + + if (variant->allowed_cap) + ASSERT_EQ(0, add_cap_rule(ruleset_fd, + variant->allowed_cap)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + } + + /* Test CAP_SYS_ADMIN via sethostname. */ + set_cap(_metadata, CAP_SYS_ADMIN); + if (variant->expected_sysadmin) { + EXPECT_EQ(-1, sethostname("test", 4)); + EXPECT_EQ(variant->expected_sysadmin, errno); + } else { + EXPECT_EQ(0, sethostname("test", 4)); + } + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* Test CAP_SYS_CHROOT via chroot. */ + set_cap(_metadata, CAP_SYS_CHROOT); + if (variant->expected_chroot) { + EXPECT_EQ(-1, chroot("/")); + EXPECT_EQ(variant->expected_chroot, errno); + } else { + EXPECT_EQ(0, chroot("/")); + } +} + +/* + * Layer stacking: layer 1 always allows CAP_SYS_ADMIN. Layer 2 + * either allows (both layers agree -> success) or denies (any layer + * can deny -> failure). + */ +/* clang-format off */ +FIXTURE(cap_stacking) {}; +/* clang-format on */ + +FIXTURE_VARIANT(cap_stacking) +{ + const bool is_sandboxed; + const bool second_layer_allows; + const bool second_layer_is_fs_only; + const int expected_sysadmin; + const int expected_chroot; +}; + +/* + * Unsandboxed baseline: no Landlock layers are stacked. + * Both capabilities should work normally. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_stacking, unsandboxed) { + /* clang-format on */ + .is_sandboxed =3D false, + .second_layer_allows =3D false, + .expected_sysadmin =3D 0, + .expected_chroot =3D 0, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_stacking, deny) { + /* clang-format on */ + .is_sandboxed =3D true, + .second_layer_allows =3D false, + .expected_sysadmin =3D EPERM, + .expected_chroot =3D EPERM, +}; + +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_stacking, allow) { + /* clang-format on */ + .is_sandboxed =3D true, + .second_layer_allows =3D true, + .expected_sysadmin =3D 0, + .expected_chroot =3D EPERM, +}; + +/* + * Mixed layers: first layer handles PERM_CAPABILITY_USE (denies all + * caps), second layer is FS-only (does not handle it). The perm + * walker iterates from youngest (layer 1) to oldest (layer 0) and + * must skip the FS-only layer to find the denying layer beneath. + */ +/* clang-format off */ +FIXTURE_VARIANT_ADD(cap_stacking, mixed_layers) { + /* clang-format on */ + .is_sandboxed =3D true, + .second_layer_is_fs_only =3D true, + .expected_sysadmin =3D EPERM, + .expected_chroot =3D EPERM, +}; + +FIXTURE_SETUP(cap_stacking) +{ + disable_caps(_metadata); +} + +FIXTURE_TEARDOWN(cap_stacking) +{ +} + +TEST_F(cap_stacking, two_layers) +{ + int ruleset_fd; + + if (variant->is_sandboxed) { + /* First layer: always handles PERM_CAPABILITY_USE. */ + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + if (!variant->second_layer_is_fs_only) + ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN)); + + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + if (variant->second_layer_is_fs_only) { + /* + * Second layer: FS-only (does not handle + * PERM_CAPABILITY_USE). The perm walker must + * skip this layer. + */ + const struct landlock_ruleset_attr fs_attr =3D { + .handled_access_fs =3D + LANDLOCK_ACCESS_FS_READ_FILE, + }; + + ruleset_fd =3D landlock_create_ruleset( + &fs_attr, sizeof(fs_attr), 0); + } else { + /* Second layer: cap allow or deny. */ + ruleset_fd =3D create_cap_ruleset(); + if (variant->second_layer_allows) + ASSERT_EQ(0, add_cap_rule(ruleset_fd, + CAP_SYS_ADMIN)); + } + ASSERT_LE(0, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + } + + /* Test CAP_SYS_ADMIN via sethostname. */ + set_cap(_metadata, CAP_SYS_ADMIN); + if (variant->expected_sysadmin) { + EXPECT_EQ(-1, sethostname("test", 4)); + EXPECT_EQ(variant->expected_sysadmin, errno); + } else { + EXPECT_EQ(0, sethostname("test", 4)); + } + clear_cap(_metadata, CAP_SYS_ADMIN); + + /* Test CAP_SYS_CHROOT via chroot. */ + set_cap(_metadata, CAP_SYS_CHROOT); + if (variant->expected_chroot) { + EXPECT_EQ(-1, chroot("/")); + EXPECT_EQ(variant->expected_chroot, errno); + } else { + EXPECT_EQ(0, chroot("/")); + } + clear_cap(_metadata, CAP_SYS_CHROOT); +} + +/* + * Verify that LANDLOCK_PERM_CAPABILITY_USE enforces when the domain is ap= plied + * without no_new_privs, using CAP_SYS_ADMIN for landlock_restrict_self() + * authorization instead. Privileged processes (e.g. container managers) + * can sandbox themselves this way. + */ +TEST(cap_without_nnp) +{ + int ruleset_fd; + + disable_caps(_metadata); + + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* Allow CAP_SYS_CHROOT but not CAP_SYS_ADMIN. */ + ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_CHROOT)); + + /* + * Enforce WITHOUT NNP: landlock_restrict_self() succeeds when + * the caller has CAP_SYS_ADMIN (checked before the new domain + * takes effect). + */ + set_cap(_metadata, CAP_SYS_ADMIN); + ASSERT_EQ(0, landlock_restrict_self(ruleset_fd, 0)); + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * CAP_SYS_ADMIN is still in effective set but Landlock denies it: + * cap_capable() returns 0, then hook_capable() returns -EPERM. + */ + EXPECT_EQ(-1, sethostname("test", 4)); + EXPECT_EQ(EPERM, errno); + + /* CAP_SYS_CHROOT is allowed by the rule. */ + set_cap(_metadata, CAP_SYS_CHROOT); + EXPECT_EQ(0, chroot("/")); +} + +/* + * Verify that capabilities gained through user namespace ownership are + * still restricted by LANDLOCK_PERM_CAPABILITY_USE. When a process creat= es a + * user namespace, the kernel grants CAP_FULL_SET in the new namespace + * via cap_capable_helper()'s ownership bypass. Landlock's hook_capable() + * must still deny capabilities not in the allowed set, ensuring that + * user namespace creation cannot be used to escape capability restriction= s. + */ +TEST(cap_userns_ownership_bypass) +{ + pid_t child; + int status; + + child =3D fork(); + ASSERT_LE(0, child); + if (child =3D=3D 0) { + int ruleset_fd; + + disable_caps(_metadata); + + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + + /* Allow CAP_SYS_ADMIN only. */ + ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * Create a user namespace. This is unprivileged and + * does not require capabilities. LANDLOCK_PERM_NAMESPACE_ENTER + * is not handled so namespace creation is unrestricted. + */ + ASSERT_EQ(0, unshare(CLONE_NEWUSER)); + + /* + * After unshare(CLONE_NEWUSER), the kernel set + * cap_effective =3D CAP_FULL_SET in the new namespace. + * Create a UTS namespace (requires CAP_SYS_ADMIN in + * the new user NS). Landlock allows CAP_SYS_ADMIN. + */ + ASSERT_EQ(0, unshare(CLONE_NEWUTS)) + { + TH_LOG("unshare(CLONE_NEWUTS): %s", strerror(errno)); + } + + /* + * sethostname checks against uts_ns->user_ns, which is + * now the new user NS. CAP_SYS_ADMIN is allowed. + */ + EXPECT_EQ(0, sethostname("test", 4)); + + /* + * chroot checks against current_user_ns(), which is + * the new user NS. The process has CAP_SYS_CHROOT in + * cap_effective (from user NS creation), so cap_capable() + * returns 0. But Landlock denies because no rule + * allows CAP_SYS_CHROOT. + */ + EXPECT_EQ(-1, chroot("/")); + EXPECT_EQ(EPERM, errno); + + _exit(_metadata->exit_code); + return; + } + + ASSERT_EQ(child, waitpid(child, &status, 0)); + if (WIFSIGNALED(status) || !WIFEXITED(status) || + WEXITSTATUS(status) !=3D EXIT_SUCCESS) + _metadata->exit_code =3D KSFT_FAIL; +} + +/* Audit tests */ + +static int matches_log_cap(int audit_fd, int cap_number) +{ + static const char log_template[] =3D REGEX_LANDLOCK_PREFIX + " blockers=3Dperm\\.capability_use capability=3D%d $"; + char log_match[sizeof(log_template) + 10]; + int log_match_len; + + log_match_len =3D snprintf(log_match, sizeof(log_match), log_template, + cap_number); + if (log_match_len >=3D sizeof(log_match)) + return -E2BIG; + + return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match, + NULL); +} + +FIXTURE(cap_audit) +{ + struct audit_filter audit_filter; + int audit_fd; +}; + +FIXTURE_SETUP(cap_audit) +{ + ASSERT_TRUE(is_in_init_user_ns()); + + disable_caps(_metadata); + + set_cap(_metadata, CAP_AUDIT_CONTROL); + self->audit_fd =3D audit_init_with_exe_filter(&self->audit_filter); + EXPECT_LE(0, self->audit_fd); + clear_cap(_metadata, CAP_AUDIT_CONTROL); +} + +FIXTURE_TEARDOWN(cap_audit) +{ + set_cap(_metadata, CAP_AUDIT_CONTROL); + EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter)); +} + +/* + * Verifies that a denied capability produces the expected audit record + * with the correct capability number and blocker string. + */ +TEST_F(cap_audit, denied) +{ + struct audit_records records; + int ruleset_fd; + + /* Baseline: chroot works before Landlock. */ + set_cap(_metadata, CAP_SYS_CHROOT); + ASSERT_EQ(0, chroot("/")); + clear_cap(_metadata, CAP_SYS_CHROOT); + + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + /* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */ + ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* Deny CAP_SYS_CHROOT (no allow rule). */ + set_cap(_metadata, CAP_SYS_CHROOT); + EXPECT_EQ(-1, chroot("/")); + EXPECT_EQ(EPERM, errno); + clear_cap(_metadata, CAP_SYS_CHROOT); + + EXPECT_EQ(0, matches_log_cap(self->audit_fd, CAP_SYS_CHROOT)); + + /* + * No extra access records: the denial was already consumed by + * matches_log_cap above. One domain allocation record, emitted + * in the same event as the first access denial for this domain. + */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); + EXPECT_EQ(1, records.domain); +} + +TEST_F(cap_audit, allowed) +{ + struct audit_records records; + int ruleset_fd; + + ruleset_fd =3D create_cap_ruleset(); + ASSERT_LE(0, ruleset_fd); + ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN)); + /* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */ + ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL)); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + set_cap(_metadata, CAP_SYS_ADMIN); + EXPECT_EQ(0, sethostname("test", 4)); + + /* No records: allowed operations never trigger audit logging. */ + EXPECT_EQ(0, audit_count_records(self->audit_fd, &records)); + EXPECT_EQ(0, records.access); +} + +TEST_HARNESS_MAIN --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-42af.mail.infomaniak.ch (smtp-42af.mail.infomaniak.ch [84.16.66.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2A213BFE4D for ; Thu, 12 Mar 2026 10:05:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=84.16.66.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309927; cv=none; b=GZV0CSLsZyFXcIBed+xS/dPHk4tnAfX7IUtrnYdcahKV99iTpvYoMqq+cMLeZP1xpYFOL3DDlDvTPCRO0eAptcxI6W9qk/ggXJfAOtKj/b6qS79Sr3hhJp6V+/hTNLoU9F9MM92FVqF4auITwveRyGKdx37tboK9eyyDVtcF73g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309927; c=relaxed/simple; bh=KMFoveKwUZjOn906w7Y+nskHXGxTM3KoqWG/YccmQb4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XjF7CdXy/9ZxZ9WeZTdlUVB2kzqtN7M2yah9dN0L+A9jXQ9MiJGzqendiH+6KAjMe2kQoPmKK43556sYBWGQHKpYcN4PjOvOirIV6vZ6RKMe+xBdsDv4dlY/RTtWRQlr8KLyJXS06SU/YunGKtf2hrIPzWD9ygCSONqx1FiN7RE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=Nt1HUzl2; arc=none smtp.client-ip=84.16.66.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="Nt1HUzl2" Received: from smtp-4-0001.mail.infomaniak.ch (smtp-4-0001.mail.infomaniak.ch [10.7.10.108]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjst0Fvkz1B98; Thu, 12 Mar 2026 11:05:22 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309921; bh=5YOAoebrRtk/ilT+2NtFnssFwcS8Q0cz8bdsgj2kz0Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Nt1HUzl27Y3wtcwcsgKnPMeSUAn+fW3q5Pxf5EJq5G0kVb6h3VO/CsPCcmiCAqoXh hPalCnDQO0UsI4UzbV2NCtntrFK1jZIvw/c6zhkNlmyGx6TrbLLBKxbKUFdc9DFR/0 Wk18eXJ4GuVgYepGlkmCSHMWOtUA5JYicaomCtQA= Received: from unknown by smtp-4-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjss2ywLz1WJ; Thu, 12 Mar 2026 11:05:21 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support Date: Thu, 12 Mar 2026 11:04:43 +0100 Message-ID: <20260312100444.2609563-11-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Extend the sandboxer sample to demonstrate the new Landlock capability and namespace restriction features. The LL_CAPS environment variable takes a colon-delimited list of allowed capability numbers (e.g. "18" for CAP_SYS_CHROOT). The LL_NS variable takes a colon-delimited list of allowed namespace types by short name (e.g. "user:uts:net"). Update LANDLOCK_ABI_LAST to 9 and add best-effort degradation for older kernels. Allow creating user and UTS namespaces but deny network namespaces (works as an unprivileged user). All capabilities are available (LL_CAPS is not set), but namespace creation is still restricted to the types listed in LL_NS. The first command succeeds because user and UTS types are in the allowed set, and sets the hostname inside the new UTS namespace. The second command fails because the network namespace type is not allowed by the LANDLOCK_PERM_NAMESPACE_ENTER rule: LL_FS_RO=3D/ LL_FS_RW=3D/proc LL_NS=3D"user:uts" \ ./sandboxer /bin/sh -c \ "unshare --user --uts --map-root-user hostname sandbox \ && ! unshare --user --net true" Allow only user namespace creation and CAP_SYS_CHROOT (18), denying all other capabilities and namespace types (works as an unprivileged user). An unprivileged process creates a user namespace (no capability required) and calls chroot inside it using the CAP_SYS_CHROOT granted within the new namespace: LL_FS_RO=3D/ LL_FS_RW=3D"" LL_NS=3D"user" LL_CAPS=3D"18" \ ./sandboxer /bin/sh -c \ "unshare --user --keep-caps chroot / true" Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- samples/landlock/sandboxer.c | 164 +++++++++++++++++++++++++++++++++-- 1 file changed, 155 insertions(+), 9 deletions(-) diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c index 9f21088c0855..09c499703835 100644 --- a/samples/landlock/sandboxer.c +++ b/samples/landlock/sandboxer.c @@ -14,6 +14,8 @@ #include #include #include +#include +#include #include #include #include @@ -22,12 +24,16 @@ #include #include #include -#include =20 #if defined(__GLIBC__) #include #endif =20 +/* From include/linux/bits.h, not available in userspace. */ +#ifndef BITS_PER_TYPE +#define BITS_PER_TYPE(type) (sizeof(type) * 8) +#endif + #ifndef landlock_create_ruleset static inline int landlock_create_ruleset(const struct landlock_ruleset_attr *const attr, @@ -60,6 +66,8 @@ static inline int landlock_restrict_self(const int rulese= t_fd, #define ENV_FS_RW_NAME "LL_FS_RW" #define ENV_TCP_BIND_NAME "LL_TCP_BIND" #define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT" +#define ENV_CAPS_NAME "LL_CAPS" +#define ENV_NS_NAME "LL_NS" #define ENV_SCOPED_NAME "LL_SCOPED" #define ENV_FORCE_LOG_NAME "LL_FORCE_LOG" #define ENV_DELIMITER ":" @@ -226,11 +234,125 @@ static int populate_ruleset_net(const char *const en= v_var, const int ruleset_fd, return ret; } =20 +static __u64 str2ns(const char *const name) +{ + static const struct { + const char *name; + __u64 value; + } ns_map[] =3D { + /* clang-format off */ + { "cgroup", CLONE_NEWCGROUP }, + { "ipc", CLONE_NEWIPC }, + { "mnt", CLONE_NEWNS }, + { "net", CLONE_NEWNET }, + { "pid", CLONE_NEWPID }, + { "time", CLONE_NEWTIME }, + { "user", CLONE_NEWUSER }, + { "uts", CLONE_NEWUTS }, + /* clang-format on */ + }; + size_t i; + + for (i =3D 0; i < sizeof(ns_map) / sizeof(ns_map[0]); i++) { + if (strcmp(name, ns_map[i].name) =3D=3D 0) + return ns_map[i].value; + } + return 0; +} + +static int populate_ruleset_caps(const char *const env_var, + const int ruleset_fd) +{ + int ret =3D 1; + char *env_cap_name, *env_cap_name_next, *strcap; + struct landlock_capability_attr cap_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + }; + + env_cap_name =3D getenv(env_var); + if (!env_cap_name) + return 0; + env_cap_name =3D strdup(env_cap_name); + unsetenv(env_var); + + env_cap_name_next =3D env_cap_name; + while ((strcap =3D strsep(&env_cap_name_next, ENV_DELIMITER))) { + __u64 cap; + + if (strcmp(strcap, "") =3D=3D 0) + continue; + + if (str2num(strcap, &cap) || + cap >=3D BITS_PER_TYPE(cap_attr.capabilities)) { + fprintf(stderr, + "Failed to parse capability at \"%s\"\n", + strcap); + goto out_free_name; + } + cap_attr.capabilities =3D 1ULL << cap; + if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_attr, 0)) { + fprintf(stderr, + "Failed to update the ruleset with capability \"%llu\": %s\n", + (unsigned long long)cap, strerror(errno)); + goto out_free_name; + } + } + ret =3D 0; + +out_free_name: + free(env_cap_name); + return ret; +} + +static int populate_ruleset_ns(const char *const env_var, const int rulese= t_fd) +{ + int ret =3D 1; + char *env_ns_name, *env_ns_name_next, *strns; + struct landlock_namespace_attr ns_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + }; + + env_ns_name =3D getenv(env_var); + if (!env_ns_name) + return 0; + env_ns_name =3D strdup(env_ns_name); + unsetenv(env_var); + + env_ns_name_next =3D env_ns_name; + while ((strns =3D strsep(&env_ns_name_next, ENV_DELIMITER))) { + __u64 ns_type; + + if (strcmp(strns, "") =3D=3D 0) + continue; + + ns_type =3D str2ns(strns); + if (!ns_type) { + fprintf(stderr, "Unknown namespace type \"%s\"\n", + strns); + goto out_free_name; + } + ns_attr.namespace_types =3D ns_type; + if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0)) { + fprintf(stderr, + "Failed to update the ruleset with namespace \"%s\": %s\n", + strns, strerror(errno)); + goto out_free_name; + } + } + ret =3D 0; + +out_free_name: + free(env_ns_name); + return ret; +} + /* Returns true on error, false otherwise. */ static bool check_ruleset_scope(const char *const env_var, struct landlock_ruleset_attr *ruleset_attr) { - char *env_type_scope, *env_type_scope_next, *ipc_scoping_name; + char *env_type_scope, *env_type_scope_next, *scope_name; bool error =3D false; bool abstract_scoping =3D false; bool signal_scoping =3D false; @@ -247,16 +369,14 @@ static bool check_ruleset_scope(const char *const env= _var, =20 env_type_scope =3D strdup(env_type_scope); env_type_scope_next =3D env_type_scope; - while ((ipc_scoping_name =3D - strsep(&env_type_scope_next, ENV_DELIMITER))) { - if (strcmp("a", ipc_scoping_name) =3D=3D 0 && !abstract_scoping) { + while ((scope_name =3D strsep(&env_type_scope_next, ENV_DELIMITER))) { + if (strcmp("a", scope_name) =3D=3D 0 && !abstract_scoping) { abstract_scoping =3D true; - } else if (strcmp("s", ipc_scoping_name) =3D=3D 0 && - !signal_scoping) { + } else if (strcmp("s", scope_name) =3D=3D 0 && !signal_scoping) { signal_scoping =3D true; } else { fprintf(stderr, "Unknown or duplicate scope \"%s\"\n", - ipc_scoping_name); + scope_name); error =3D true; goto out_free_name; } @@ -299,7 +419,7 @@ static bool check_ruleset_scope(const char *const env_v= ar, =20 /* clang-format on */ =20 -#define LANDLOCK_ABI_LAST 8 +#define LANDLOCK_ABI_LAST 9 =20 #define XSTR(s) #s #define STR(s) XSTR(s) @@ -322,6 +442,10 @@ static const char help[] =3D "means an empty list):\n" "* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n" "* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n" + "* " ENV_CAPS_NAME ": capability numbers allowed to use " + "(e.g. 10 for CAP_NET_BIND_SERVICE, 21 for CAP_SYS_ADMIN)\n" + "* " ENV_NS_NAME ": namespace types allowed to enter " + "(cgroup, ipc, mnt, net, pid, time, user, uts)\n" "* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock dom= ain\n" " - \"a\" to restrict opening abstract unix sockets\n" " - \"s\" to restrict sending signals\n" @@ -334,6 +458,8 @@ static const char help[] =3D ENV_FS_RW_NAME "=3D\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" " ENV_TCP_BIND_NAME "=3D\"9418\" " ENV_TCP_CONNECT_NAME "=3D\"80:443\" " + ENV_CAPS_NAME "=3D\"21\" " + ENV_NS_NAME "=3D\"user:uts:net\" " ENV_SCOPED_NAME "=3D\"a:s\" " "%1$s bash -i\n" "\n" @@ -357,6 +483,8 @@ int main(const int argc, char *const argv[], char *cons= t *const envp) LANDLOCK_ACCESS_NET_CONNECT_TCP, .scoped =3D LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET | LANDLOCK_SCOPE_SIGNAL, + .handled_perm =3D LANDLOCK_PERM_CAPABILITY_USE | + LANDLOCK_PERM_NAMESPACE_ENTER, }; int supported_restrict_flags =3D LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON; int set_restrict_flags =3D 0; @@ -438,6 +566,10 @@ int main(const int argc, char *const argv[], char *con= st *const envp) ~LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON; __attribute__((fallthrough)); case 7: + __attribute__((fallthrough)); + case 8: + /* Removes permission support for ABI < 9 */ + ruleset_attr.handled_perm =3D 0; /* Must be printed for any ABI < LANDLOCK_ABI_LAST. */ fprintf(stderr, "Hint: You should update the running kernel " @@ -470,6 +602,14 @@ int main(const int argc, char *const argv[], char *con= st *const envp) ~LANDLOCK_ACCESS_NET_CONNECT_TCP; } =20 + /* Removes capability handling if not set by a user. */ + if (!getenv(ENV_CAPS_NAME)) + ruleset_attr.handled_perm &=3D ~LANDLOCK_PERM_CAPABILITY_USE; + + /* Removes namespace handling if not set by a user. */ + if (!getenv(ENV_NS_NAME)) + ruleset_attr.handled_perm &=3D ~LANDLOCK_PERM_NAMESPACE_ENTER; + if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr)) return 1; =20 @@ -514,6 +654,12 @@ int main(const int argc, char *const argv[], char *con= st *const envp) goto err_close_ruleset; } =20 + if (populate_ruleset_caps(ENV_CAPS_NAME, ruleset_fd)) + goto err_close_ruleset; + + if (populate_ruleset_ns(ENV_NS_NAME, ruleset_fd)) + goto err_close_ruleset; + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { perror("Failed to restrict privileges"); goto err_close_ruleset; --=20 2.53.0 From nobody Tue Apr 7 16:29:57 2026 Received: from smtp-bc0b.mail.infomaniak.ch (smtp-bc0b.mail.infomaniak.ch [45.157.188.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D03A3BFE56 for ; Thu, 12 Mar 2026 10:05:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.157.188.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309928; cv=none; b=jV0N2HHQtE0NCWSOn+P3cAMUXjyyrIOfcf19hUpVCUjraROxOS9rIieFtckcbYFhCRTw6Z2nrbzwXvDRnnRm9vPiH1LVP5j5uEWdM2o2jjEqHBJ9W/ZwNrJD4HltJQXNf0CJ6nFrfhKONUxGB6DcCw6CbrnOVEHgMs4lc4Oy5ek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773309928; c=relaxed/simple; bh=KcDB0tmiXfODk++DTH/HQI0UBiMFz2fXJ13bIuAdIaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZlhsUxXonIOQLFgFw9C72awrDgPm/O6DmgcD3rd3wcEDWnRT1x08YEZh4U/a+bEfVYaE/DPE+qNEsLfOYads+2sIhB9M9GWZfvwxxa1cnnVVoCrAMyC4MstIwMtc7/chueGOHP9d7G36VdUwe3TsWWq/SvgQTZNh1/I8ZiiMMKo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=LbDYjr3J; arc=none smtp.client-ip=45.157.188.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="LbDYjr3J" Received: from smtp-4-0001.mail.infomaniak.ch (smtp-4-0001.mail.infomaniak.ch [10.7.10.108]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4fWjsv3NKXz19sK; Thu, 12 Mar 2026 11:05:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1773309923; bh=WKXPZHKVm9QTYHBll0VFcVaBYcf+UJk9msOBYP0WwaU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LbDYjr3Jrc9My/P7d/sws6tWaVMWRIgLUQy0pz3p4g8u7/EqYlPCzs2iGJILpzyws aYAZ7C61ZBBnVGoBYcpMnh0eWPI9YK1LSeNEg0pnoxKDdksDvObDrtvSc/C9100RvF pVRFQA8mMAc0tTkLqKLj3KL8pshj+cJtRp+NscMY= Received: from unknown by smtp-4-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4fWjst5rdFzwCp; Thu, 12 Mar 2026 11:05:22 +0100 (CET) From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= To: Christian Brauner , =?UTF-8?q?G=C3=BCnther=20Noack?= , Paul Moore , "Serge E . Hallyn" Cc: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Justin Suess , Lennart Poettering , Mikhail Ivanov , Nicolas Bouchinet , Shervin Oloumi , Tingmao Wang , kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions Date: Thu, 12 Mar 2026 11:04:44 +0100 Message-ID: <20260312100444.2609563-12-mic@digikod.net> In-Reply-To: <20260312100444.2609563-1-mic@digikod.net> References: <20260312100444.2609563-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Infomaniak-Routing: alpha Document the two new Landlock permission categories in the userspace API guide, admin guide, and kernel security documentation. The userspace API guide adds sections on capability restriction (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE covering creation via unshare/clone and entry via setns), and the backward-compatible degradation pattern for ABI < 9. A table documents the per-namespace-type capability requirements for both creation and entry. The admin guide adds the new perm.namespace_enter and perm.capability_use audit blocker names with their object identification fields (namespace_type, namespace_inum, capability). The kernel security documentation adds a "Ruleset restriction models" section defining the three models (handled_access_*, handled_perm, scoped), their coverage and compatibility properties, and the criteria for choosing between them for future features. It also documents composability with user namespaces and adds kernel-doc references for the new capability and namespace headers. Cc: Christian Brauner Cc: G=C3=BCnther Noack Cc: Paul Moore Cc: Serge E. Hallyn Signed-off-by: Micka=C3=ABl Sala=C3=BCn --- Documentation/admin-guide/LSM/landlock.rst | 19 ++- Documentation/security/landlock.rst | 80 ++++++++++- Documentation/userspace-api/landlock.rst | 156 ++++++++++++++++++++- 3 files changed, 245 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/adm= in-guide/LSM/landlock.rst index 9923874e2156..99c6a599ce9e 100644 --- a/Documentation/admin-guide/LSM/landlock.rst +++ b/Documentation/admin-guide/LSM/landlock.rst @@ -6,7 +6,7 @@ Landlock: system-wide management =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D =20 :Author: Micka=C3=ABl Sala=C3=BCn -:Date: January 2026 +:Date: March 2026 =20 Landlock can leverage the audit framework to log events. =20 @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS - scope.abstract_unix_socket - Abstract UNIX socket connection den= ied - scope.signal - Signal sending denied =20 + **perm.*** - Permission restrictions (ABI 9+): + - perm.namespace_enter - Namespace entry was denied (creation via + :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via + :manpage:`setns(2)`); + ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask), + ``namespace_inum`` identifies the target namespace for + :manpage:`setns(2)` operations + - perm.capability_use - Capability use was denied; + ``capability`` indicates the capability number + Multiple blockers can appear in a single event (comma-separated) when multiple access rights are missing. For example, creating a regular fi= le in a directory that lacks both ``make_reg`` and ``refer`` rights would= show ``blockers=3Dfs.make_reg,fs.refer``. =20 - The object identification fields (path, dev, ino for filesystem; opid, - ocomm for signals) depend on the type of access being blocked and prov= ide - context about what resource was involved in the denial. + The object identification fields depend on the type of access being bl= ocked: + ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for sig= nals; + ``namespace_type`` and ``namespace_inum`` for namespace operations; + ``capability`` for capability use. =20 =20 AUDIT_LANDLOCK_DOMAIN diff --git a/Documentation/security/landlock.rst b/Documentation/security/l= andlock.rst index 3e4d4d04cfae..cd3d640ca5c9 100644 --- a/Documentation/security/landlock.rst +++ b/Documentation/security/landlock.rst @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 :Author: Micka=C3=ABl Sala=C3=BCn -:Date: September 2025 +:Date: March 2026 =20 Landlock's goal is to create scoped access-control (i.e. sandboxing). To harden a whole system, this feature should be available to any process, @@ -89,6 +89,72 @@ this is required to keep access controls consistent over= the whole system, and this avoids unattended bypasses through file descriptor passing (i.e. conf= used deputy attack). =20 +Composability with user namespaces +---------------------------------- + +Landlock domain-based scoping and the kernel's user namespace-based capabi= lity +scoping enforce isolation over independent hierarchies. Landlock checks d= omain +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry. T= hese +hierarchies are orthogonal: Landlock enforcement is deterministic with res= pect +to its own configuration, regardless of namespace or capability state, and= vice +versa. This orthogonality is a design invariant that must hold for all new +scoped features. + +Ruleset restriction models +-------------------------- + +Landlock provides three restriction models, each with different coverage +and compatibility properties. + +Access rights (``handled_access_*``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Access rights control **enumerated operations on kernel objects** +identified by a rule key (a file hierarchy or a network port). Each +``handled_access_*`` field declares a set of access rights that the +ruleset restricts. Multiple access rights share a single rule type. +Operations for which no access right exists yet remain uncontrolled; +new rights are added incrementally across ABI versions. + +Permissions (``handled_perm``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Permissions control **broad operations enforced at single kernel +chokepoints**, achieving complete deny-by-default coverage. Each +``LANDLOCK_PERM_*`` flag maps to its own rule type. When a ruleset +handles a permission, all instances of that operation are denied unless +explicitly allowed by a rule. New kernel values (new ``CAP_*`` +capabilities, new ``CLONE_NEW*`` namespace types) are automatically +denied without any Landlock update. + +Each permission flag names a single gateway operation whose control +transitively covers an open-ended set of downstream operations: for +example, exercising a capability enables privileged operations across +many subsystems; entering a namespace enables gaining capabilities in a +new context. + +Permission rules identify what to allow using constants defined by other +kernel subsystems (``CAP_*``, ``CLONE_NEW*``). Unknown values are +silently ignored because deny-by-default ensures they are denied anyway. +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are +rejected (``-EINVAL``), since Landlock owns that namespace. + +Scopes (``scoped``) +~~~~~~~~~~~~~~~~~~~~ + +Scopes restrict **cross-domain interactions** categorically, without +rules. Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the +operation to targets outside the Landlock domain or its children. Like +permissions, scopes provide complete coverage of the controlled +operation. + +When adding new Landlock features, new operations on existing rule types +extend the corresponding ``handled_access_*`` field (e.g. a new +filesystem operation extends ``handled_access_fs``). A new object +category with multiple fine-grained operations would use a new +``handled_access_*`` field. New rule types that control a single +chokepoint operation use ``handled_perm``. + Tests =3D=3D=3D=3D=3D =20 @@ -110,6 +176,18 @@ Filesystem .. kernel-doc:: security/landlock/fs.h :identifiers: =20 +Namespace +--------- + +.. kernel-doc:: security/landlock/ns.h + :identifiers: + +Capability +---------- + +.. kernel-doc:: security/landlock/cap.h + :identifiers: + Process credential ------------------ =20 diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/users= pace-api/landlock.rst index 13134bccdd39..238d30a18162 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -8,7 +8,7 @@ Landlock: unprivileged access control =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 :Author: Micka=C3=ABl Sala=C3=BCn -:Date: January 2026 +:Date: March 2026 =20 The goal of Landlock is to enable restriction of ambient rights (e.g. glob= al filesystem or network access) for a set of processes. Because Landlock @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which th= e process intends to perform. A set of rules is aggregated in a ruleset, which can then restri= ct the thread enforcing it, and its future children. =20 -The two existing types of rules are: +The existing types of rules are: =20 Filesystem rules For these rules, the object is a file hierarchy, @@ -44,6 +44,14 @@ Network rules (since ABI v4) For these rules, the object is a TCP port, and the related actions are defined with `network access rights`. =20 +Capability rules (since ABI v9) + For these rules, the object is a set of Linux capabilities, + and the related actions are defined with `permission flags`. + +Namespace rules (since ABI v9) + For these rules, the object is a set of namespace types, + and the related actions are defined with `permission flags`. + Defining and enforcing a security policy ---------------------------------------- =20 @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights. .scoped =3D LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET | LANDLOCK_SCOPE_SIGNAL, + .handled_perm =3D + LANDLOCK_PERM_CAPABILITY_USE | + LANDLOCK_PERM_NAMESPACE_ENTER, }; =20 Because we may not know which kernel version an application will be execut= ed @@ -127,6 +138,12 @@ version, and only use the available subset of access r= ights: /* Removes LANDLOCK_SCOPE_* for ABI < 6 */ ruleset_attr.scoped &=3D ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET | LANDLOCK_SCOPE_SIGNAL); + __attribute__((fallthrough)); + case 6: + case 7: + case 8: + /* Removes permission support for ABI < 9 */ + ruleset_attr.handled_perm =3D 0; } =20 This enables the creation of an inclusive ruleset that will contain our ru= les. @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections. err =3D landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, &net_port, 0); =20 +For capability access-control, we can add rules that allow specific +capabilities. For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed +process can call :manpage:`chroot(2)` inside a user namespace): + +.. code-block:: c + + struct landlock_capability_attr cap_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_CAPABILITY_USE, + .capabilities =3D (1ULL << CAP_SYS_CHROOT), + }; + + err =3D landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, + &cap_attr, 0); + +For namespace access-control, we can add rules that allow entering specific +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone= (2)` +or joining them via :manpage:`setns(2)`). For instance, to allow creating= user +namespaces (which grants all capabilities inside the new namespace): + +.. code-block:: c + + struct landlock_namespace_attr ns_attr =3D { + .allowed_perm =3D LANDLOCK_PERM_NAMESPACE_ENTER, + .namespace_types =3D CLONE_NEWUSER, + }; + + err =3D landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, + &ns_attr, 0); + +Together, these two rules allow an unprivileged process to create a user +namespace and call :manpage:`chroot(2)` inside it, while denying all other +capabilities and namespace types. User namespace creation is the one oper= ation +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed f= or it. +See `Capability and namespace restrictions`_ for details on capability +requirements. + When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``= , a similar backwards compatibility check is needed for the restrict flags (see sys_landlock_restrict_self() documentation for available flags): @@ -354,10 +407,87 @@ The operations which can be scoped are: A :manpage:`sendto(2)` on a socket which was previously connected will= not be restricted. This works for both datagram and stream sockets. =20 -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)= `. +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`. If an operation is scoped within a domain, no rules can be added to allow = access to resources or processes outside of the scope. =20 +Capability and namespace restrictions +------------------------------------- + +See Documentation/security/landlock.rst for the design rationale behind +the permission model (``handled_perm``) and how it differs from access +rights (``handled_access_*``) and scopes (``scoped``). +When a process creates a user namespace, the kernel grants all capabilities +within that namespace. While these capabilities cannot directly bypass La= ndlock +restrictions (Landlock enforces access controls independently of capability +checks), they open kernel code paths that are normally unreachable to +unprivileged users and may contain exploitable bugs. + +Landlock provides two complementary permissions to address this. +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process ca= n use, +even when it holds them. ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which +namespace types a process can create (via :manpage:`unshare(2)` or +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`). After creating a = user +namespace, the granted capabilities are scoped to namespaces owned by that= user +namespace or its descendants; to exercise a capability such as +``CAP_NET_ADMIN``, the process must create a namespace of the correspondin= g type +(e.g., a network namespace). Configuring both permissions together provid= es +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabiliti= es are +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespace= s in +which they can be used. + +When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux +:manpage:`capabilities(7)` are denied by default unless a rule explicitly = allows +them. This is purely restrictive: Landlock can only deny capabilities tha= t the +traditional capability mechanism would have allowed, never grant additiona= l ones. +Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a +&struct landlock_capability_attr. Each rule specifies a set of ``CAP_*`` = values +(as a bitmask) to allow. Capabilities above ``CAP_LAST_CAP`` are silently +accepted but have no effect since the kernel never checks them; this means= new +capabilities introduced by future kernels are automatically denied. + +When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace +creation and entry are denied by default unless a rule explicitly allows t= hem. +Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a +&struct landlock_namespace_attr. Each rule specifies a set of ``CLONE_NEW= *`` +flags to allow. + +In practice, unprivileged processes first create a user namespace (which r= equires +no capability and grants all capabilities within it), then use those capab= ilities +to create other namespace types. All non-user namespace types require +``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount +namespace entry additionally requires ``CAP_SYS_CHROOT``. For +:manpage:`setns(2)`, capabilities are checked relative to the target names= pace, +so a process in an ancestor user namespace naturally satisfies them; this +includes joining user namespaces, which requires ``CAP_SYS_ADMIN``. When +``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabiliti= es +must be explicitly allowed by a rule. + +When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a sing= le +:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly +created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTE= R`` +independently from ``LANDLOCK_PERM_CAPABILITY_USE``. Performing the user +namespace creation and the additional namespace creation in two separate +:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if = the +domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``. + +More generally, Landlock domains and user namespaces form independent +hierarchies: Landlock domains restrict what actions are allowed (each stac= ked +layer narrows the permitted set), while user namespaces restrict where +capabilities take effect (only within the process's own namespace and its +descendants). Landlock access controls are fully determined by the domain +configuration, regardless of the process's position in the user namespace +hierarchy. When creating child user namespaces, it is recommended to also +create a dedicated Landlock domain with restrictions relevant to each name= space +context. + +Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabili= ties, +not their presence in the process's credential. Capability sets can change +after a domain is enforced through user namespace entry, :manpage:`execve(= 2)` of +binaries with file capabilities, or :manpage:`capset(2)`. In all cases, +:manpage:`capget(2)` will report the credential's capability sets, but any +denied capability will fail with ``EPERM`` when exercised. + Truncating files ---------------- =20 @@ -515,7 +645,7 @@ Access rights ------------- =20 .. kernel-doc:: include/uapi/linux/landlock.h - :identifiers: fs_access net_access scope + :identifiers: fs_access net_access scope perm =20 Creating a new ruleset ---------------------- @@ -534,7 +664,8 @@ Extending a ruleset =20 .. kernel-doc:: include/uapi/linux/landlock.h :identifiers: landlock_rule_type landlock_path_beneath_attr - landlock_net_port_attr + landlock_net_port_attr landlock_capability_attr + landlock_namespace_attr =20 Enforcing a ruleset ------------------- @@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the ca= lling process using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to sys_landlock_restrict_self(). =20 +Capability restriction (ABI < 9) +-------------------------------- + +Starting with the Landlock ABI version 9, it is possible to restrict +:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE`` +permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type. + +Namespace restriction (ABI < 9) +------------------------------- + +Starting with the Landlock ABI version 9, it is possible to restrict +namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry +(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permi= ssion +flag and ``LANDLOCK_RULE_NAMESPACE`` rule type. + .. _kernel_support: =20 Kernel support --=20 2.53.0