From nobody Thu Apr 2 20:02:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9A58C32771 for ; Wed, 21 Sep 2022 10:42:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230038AbiIUKmR (ORCPT ); Wed, 21 Sep 2022 06:42:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229814AbiIUKmN (ORCPT ); Wed, 21 Sep 2022 06:42:13 -0400 Received: from us-smtp-delivery-44.mimecast.com (us-smtp-delivery-44.mimecast.com [205.139.111.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE3992A97F for ; Wed, 21 Sep 2022 03:42:08 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-596-fVaee8P7MsmZEkuHPkr8Dw-1; Wed, 21 Sep 2022 06:42:04 -0400 X-MC-Unique: fVaee8P7MsmZEkuHPkr8Dw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4AE83811E81; Wed, 21 Sep 2022 10:42:04 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (unknown [10.40.208.17]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD59C17582; Wed, 21 Sep 2022 10:42:02 +0000 (UTC) From: Alexey Gladkov To: LKML , Linux Containers Cc: Andrew Morton , Christian Brauner , "Eric W . Biederman" , Kees Cook , Manfred Spraul Subject: [PATCH v3 1/3] sysctl: Allow change system v ipc sysctls inside ipc namespace Date: Wed, 21 Sep 2022 12:41:47 +0200 Message-Id: In-Reply-To: References: <202209211737.0Bu0F40t-lkp@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Rootless containers are not allowed to modify kernel IPC parameters. All default limits are set to such high values that in fact there are no limits at all. All limits are not inherited and are initialized to default values when a new ipc_namespace is created. For new ipc_namespace: size_t ipc_ns.shm_ctlmax =3D SHMMAX; // (ULONG_MAX - (1UL << 24)) size_t ipc_ns.shm_ctlall =3D SHMALL; // (ULONG_MAX - (1UL << 24)) int ipc_ns.shm_ctlmni =3D IPCMNI; // (1 << 15) int ipc_ns.shm_rmid_forced =3D 0; unsigned int ipc_ns.msg_ctlmax =3D MSGMAX; // 8192 unsigned int ipc_ns.msg_ctlmni =3D MSGMNI; // 32000 unsigned int ipc_ns.msg_ctlmnb =3D MSGMNB; // 16384 The shm_tot (total amount of shared pages) has also ceased to be global, it is located in ipc_namespace and is not inherited from anywhere. In such conditions, it cannot be said that these limits limit anything. The real limiter for them is cgroups. If we allow rootless containers to change these parameters, then it can only be reduced. Signed-off-by: Alexey Gladkov --- ipc/ipc_sysctl.c | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index ef313ecfb53a..31282e0a630d 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -190,25 +190,57 @@ static int set_is_seen(struct ctl_table_set *set) return ¤t->nsproxy->ipc_ns->ipc_set =3D=3D set; } =20 +static void ipc_set_ownership(struct ctl_table_header *head, + struct ctl_table *table, + kuid_t *uid, kgid_t *gid) +{ + struct ipc_namespace *ns =3D + container_of(head->set, struct ipc_namespace, ipc_set); + + kuid_t ns_root_uid =3D make_kuid(ns->user_ns, 0); + kgid_t ns_root_gid =3D make_kgid(ns->user_ns, 0); + + *uid =3D uid_valid(ns_root_uid) ? ns_root_uid : GLOBAL_ROOT_UID; + *gid =3D gid_valid(ns_root_gid) ? ns_root_gid : GLOBAL_ROOT_GID; +} + static int ipc_permissions(struct ctl_table_header *head, struct ctl_table= *table) { int mode =3D table->mode; =20 #ifdef CONFIG_CHECKPOINT_RESTORE - struct ipc_namespace *ns =3D current->nsproxy->ipc_ns; + struct ipc_namespace *ns =3D + container_of(head->set, struct ipc_namespace, ipc_set); =20 if (((table->data =3D=3D &ns->ids[IPC_SEM_IDS].next_id) || (table->data =3D=3D &ns->ids[IPC_MSG_IDS].next_id) || (table->data =3D=3D &ns->ids[IPC_SHM_IDS].next_id)) && checkpoint_restore_ns_capable(ns->user_ns)) mode =3D 0666; + else #endif - return mode; + { + kuid_t ns_root_uid; + kgid_t ns_root_gid; + + ipc_set_ownership(head, table, &ns_root_uid, &ns_root_gid); + + if (uid_eq(current_euid(), ns_root_uid)) + mode >>=3D 6; + + else if (in_egroup_p(ns_root_gid)) + mode >>=3D 3; + } + + mode &=3D 7; + + return (mode << 6) | (mode << 3) | mode; } =20 static struct ctl_table_root set_root =3D { .lookup =3D set_lookup, .permissions =3D ipc_permissions, + .set_ownership =3D ipc_set_ownership, }; =20 bool setup_ipc_sysctls(struct ipc_namespace *ns) --=20 2.33.4 From nobody Thu Apr 2 20:02:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7887CC32771 for ; Wed, 21 Sep 2022 10:42:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230364AbiIUKmZ (ORCPT ); Wed, 21 Sep 2022 06:42:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229641AbiIUKmN (ORCPT ); Wed, 21 Sep 2022 06:42:13 -0400 Received: from us-smtp-delivery-44.mimecast.com (us-smtp-delivery-44.mimecast.com [207.211.30.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D301310564 for ; Wed, 21 Sep 2022 03:42:08 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-DGMXSy5cObOiTx5N5XIHaA-1; Wed, 21 Sep 2022 06:42:06 -0400 X-MC-Unique: DGMXSy5cObOiTx5N5XIHaA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ECC05801231; Wed, 21 Sep 2022 10:42:05 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (unknown [10.40.208.17]) by smtp.corp.redhat.com (Postfix) with ESMTP id 971971759F; Wed, 21 Sep 2022 10:42:04 +0000 (UTC) From: Alexey Gladkov To: LKML , Linux Containers Cc: Andrew Morton , Christian Brauner , "Eric W . Biederman" , Kees Cook , Manfred Spraul Subject: [PATCH v3 2/3] sysctl: Allow to change limits for posix messages queues Date: Wed, 21 Sep 2022 12:41:48 +0200 Message-Id: <7eb21211c8622e91d226e63416b1b93c079f60ee.1663756794.git.legion@kernel.org> In-Reply-To: References: <202209211737.0Bu0F40t-lkp@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" All parameters of posix messages queues (queues_max/msg_max/msgsize_max) end up being limited by RLIMIT_MSGQUEUE. The code in mqueue_get_inode is where that limiting happens. The RLIMIT_MSGQUEUE is bound to the user namespace and is counted hierarchically. We can allow root in the user namespace to modify the posix messages queues parameters. Signed-off-by: Alexey Gladkov --- ipc/mq_sysctl.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/ipc/mq_sysctl.c b/ipc/mq_sysctl.c index fbf6a8b93a26..ff1054fbbacc 100644 --- a/ipc/mq_sysctl.c +++ b/ipc/mq_sysctl.c @@ -12,6 +12,7 @@ #include #include #include +#include =20 static int msg_max_limit_min =3D MIN_MSGMAX; static int msg_max_limit_max =3D HARD_MSGMAX; @@ -76,8 +77,43 @@ static int set_is_seen(struct ctl_table_set *set) return ¤t->nsproxy->ipc_ns->mq_set =3D=3D set; } =20 +static void mq_set_ownership(struct ctl_table_header *head, + struct ctl_table *table, + kuid_t *uid, kgid_t *gid) +{ + struct ipc_namespace *ns =3D + container_of(head->set, struct ipc_namespace, mq_set); + + kuid_t ns_root_uid =3D make_kuid(ns->user_ns, 0); + kgid_t ns_root_gid =3D make_kgid(ns->user_ns, 0); + + *uid =3D uid_valid(ns_root_uid) ? ns_root_uid : GLOBAL_ROOT_UID; + *gid =3D gid_valid(ns_root_gid) ? ns_root_gid : GLOBAL_ROOT_GID; +} + +static int mq_permissions(struct ctl_table_header *head, struct ctl_table = *table) +{ + int mode =3D table->mode; + kuid_t ns_root_uid; + kgid_t ns_root_gid; + + mq_set_ownership(head, table, &ns_root_uid, &ns_root_gid); + + if (uid_eq(current_euid(), ns_root_uid)) + mode >>=3D 6; + + if (in_egroup_p(ns_root_gid)) + mode >>=3D 3; + + mode &=3D 7; + + return (mode << 6) | (mode << 3) | mode; +} + static struct ctl_table_root set_root =3D { .lookup =3D set_lookup, + .permissions =3D mq_permissions, + .set_ownership =3D mq_set_ownership, }; =20 bool setup_mq_sysctls(struct ipc_namespace *ns) --=20 2.33.4 From nobody Thu Apr 2 20:02:01 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEC5CECAAD8 for ; Wed, 21 Sep 2022 10:42:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230204AbiIUKmd (ORCPT ); Wed, 21 Sep 2022 06:42:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229979AbiIUKmO (ORCPT ); Wed, 21 Sep 2022 06:42:14 -0400 Received: from us-smtp-delivery-44.mimecast.com (us-smtp-delivery-44.mimecast.com [205.139.111.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D438391D07 for ; Wed, 21 Sep 2022 03:42:12 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-60-1IQFo52uNZmnFUZIsYK-2Q-1; Wed, 21 Sep 2022 06:42:08 -0400 X-MC-Unique: 1IQFo52uNZmnFUZIsYK-2Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E2B4E8630C3; Wed, 21 Sep 2022 10:42:07 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (unknown [10.40.208.17]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3A4B01759F; Wed, 21 Sep 2022 10:42:06 +0000 (UTC) From: Alexey Gladkov To: LKML , Linux Containers , linux-doc@vger.kernel.org, linux-man@vger.kernel.org Cc: Andrew Morton , Christian Brauner , "Eric W . Biederman" , Kees Cook , Manfred Spraul Subject: [PATCH v3 3/3] docs: Add information about ipc sysctls limitations Date: Wed, 21 Sep 2022 12:41:49 +0200 Message-Id: In-Reply-To: References: <202209211737.0Bu0F40t-lkp@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After 25b21cb2f6d6 ("[PATCH] IPC namespace core") and 4e9823111bdc ("[PATCH] IPC namespace - shm") the shared memory page count stopped being global and started counting per ipc namespace. The documentation and shmget(2) still says that shmall is a global option. shmget(2): SHMALL System-wide limit on the total amount of shared memory, measured in units of the system page size. On Linux, this limit can be read and modified via /proc/sys/kernel/shmall. I think the changes made in 2006 should be documented. Signed-off-by: Alexey Gladkov Acked-by: "Eric W. Biederman" --- Documentation/admin-guide/sysctl/kernel.rst | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/ad= min-guide/sysctl/kernel.rst index ee6572b1edad..c8b89bd8f004 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -541,6 +541,9 @@ default (``MSGMNB``). ``msgmni`` is the maximum number of IPC queues. 32000 by default (``MSGMNI``). =20 +All of these parameters are set per ipc namespace. The maximum number of b= ytes +in POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is +respected hierarchically in the each user namespace. =20 msg_next_id, sem_next_id, and shm_next_id (System V IPC) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D @@ -1181,15 +1184,20 @@ are doing anyway :) shmall =3D=3D=3D=3D=3D=3D =20 -This parameter sets the total amount of shared memory pages that -can be used system wide. Hence, ``shmall`` should always be at least -``ceil(shmmax/PAGE_SIZE)``. +This parameter sets the total amount of shared memory pages that can be us= ed +inside ipc namespace. The shared memory pages counting occurs for each ipc +namespace separately and is not inherited. Hence, ``shmall`` should always= be at +least ``ceil(shmmax/PAGE_SIZE)``. =20 If you are not sure what the default ``PAGE_SIZE`` is on your Linux system, you can run the following command:: =20 # getconf PAGE_SIZE =20 +To reduce or disable the ability to allocate shared memory, you must creat= e a +new ipc namespace, set this parameter to the required value and prohibit t= he +creation of a new ipc namespace in the current user namespace or cgroups c= an +be used. =20 shmmax =3D=3D=3D=3D=3D=3D --=20 2.33.4