From nobody Sat May 18 08:35:53 2024 Received: from forward103c.mail.yandex.net (forward103c.mail.yandex.net [178.154.239.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD2D7156C6F; Wed, 24 Apr 2024 10:52:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.214 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713955980; cv=none; b=jg2Dl2vlTMfep11gUO50QKpxOz3NCFYeuDwM76GN9wEGG5kTrHyU9S2nUwzPW3q4y7qE7vA5X5mtfMrQ5hr+Dh0G9XAtECtgT9kKypPzkqK9VXKRuCuh75+S7banVRjOGUCQJjArScRWwAD9xo40hYnhCvHtMNqP8nNI3DXTnSE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713955980; c=relaxed/simple; bh=INjyeC9pTtg4+tyevub+TUdWYrn1xrPo6yRFzraEfYA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TBVClYl+pOwt9VOCrrV38aP/cqDEklVqYeyavoiELTAw2lN47sO7uZzHIxLeOFWbQZITN+NpG0PJJ2xftE04gywPYB89s4/Tj+qf49QC49YuCR8mg/59ruu6Q5qFhmcnHYkY6/hBvqlys7FDLnP3kiRCF00zfreLBiyjbzE3B6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex.ru; spf=pass smtp.mailfrom=yandex.ru; dkim=pass (1024-bit key) header.d=yandex.ru header.i=@yandex.ru header.b=ZqoPQu9i; arc=none smtp.client-ip=178.154.239.214 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex.ru Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex.ru Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex.ru header.i=@yandex.ru header.b="ZqoPQu9i" Received: from mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net [IPv6:2a02:6b8:c27:19c8:0:640:13a7:0]) by forward103c.mail.yandex.net (Yandex) with ESMTPS id 35BF8608FF; Wed, 24 Apr 2024 13:52:54 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id oqIZBt9V0a60-qYfBmYpN; Wed, 24 Apr 2024 13:52:53 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1713955973; bh=2fagEnRvIQmMEpl2nH5toxi0dhpK1VGoWJCzVARzQWs=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=ZqoPQu9i2GLs+1J77wf+YRvcfOwWDjif/9Pfgb03+YtBGg5Fcvz77bBpq9iZGIkYf bBI+hdV8/9lhEr61m2bC/a9z55a5VQ1aF/9b1LBCe+t0l+xtarpaRrkYZv9wem9+LY LXx1Vru2K4cR0oRr/KT0LjCKYhFRwRnf8nVRq/gY= Authentication-Results: mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net; dkim=pass header.i=@yandex.ru From: Stas Sergeev To: linux-kernel@vger.kernel.org Cc: Stas Sergeev , Stefan Metzmacher , Eric Biederman , Alexander Viro , Andy Lutomirski , Christian Brauner , Jan Kara , Jeff Layton , Chuck Lever , Alexander Aring , David Laight , linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Paolo Bonzini , =?UTF-8?q?Christian=20G=C3=B6ttsche?= Subject: [PATCH 1/2] fs: reorganize path_openat() Date: Wed, 24 Apr 2024 13:52:47 +0300 Message-ID: <20240424105248.189032-2-stsp2@yandex.ru> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240424105248.189032-1-stsp2@yandex.ru> References: <20240424105248.189032-1-stsp2@yandex.ru> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch moves the call to alloc_empty_file() below the call to path_init(). That changes is needed for the next patch, which adds a cred override for alloc_empty_file(). The needed cred info is only available after the call to path_init(). No functional changes are intended by that patch. Signed-off-by: Stas Sergeev CC: Eric Biederman CC: Alexander Viro CC: Christian Brauner CC: Jan Kara CC: Andy Lutomirski CC: David Laight CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org --- fs/namei.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index c5b2a25be7d0..413eef134234 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3781,23 +3781,30 @@ static struct file *path_openat(struct nameidata *n= d, { struct file *file; int error; + u64 open_flags =3D op->open_flag; =20 - file =3D alloc_empty_file(op->open_flag, current_cred()); - if (IS_ERR(file)) - return file; - - if (unlikely(file->f_flags & __O_TMPFILE)) { - error =3D do_tmpfile(nd, flags, op, file); - } else if (unlikely(file->f_flags & O_PATH)) { - error =3D do_o_path(nd, flags, file); + if (unlikely(open_flags & (__O_TMPFILE | O_PATH))) { + file =3D alloc_empty_file(open_flags, current_cred()); + if (IS_ERR(file)) + return file; + if (open_flags & __O_TMPFILE) + error =3D do_tmpfile(nd, flags, op, file); + else + error =3D do_o_path(nd, flags, file); } else { const char *s =3D path_init(nd, flags); - while (!(error =3D link_path_walk(s, nd)) && - (s =3D open_last_lookups(nd, file, op)) !=3D NULL) - ; + file =3D alloc_empty_file(open_flags, current_cred()); + error =3D PTR_ERR_OR_ZERO(file); + if (!error) { + while (!(error =3D link_path_walk(s, nd)) && + (s =3D open_last_lookups(nd, file, op)) !=3D NULL) + ; + } if (!error) error =3D do_open(nd, file, op); terminate_walk(nd); + if (IS_ERR(file)) + return file; } if (likely(!error)) { if (likely(file->f_mode & FMODE_OPENED)) --=20 2.44.0 From nobody Sat May 18 08:35:53 2024 Received: from forward100c.mail.yandex.net (forward100c.mail.yandex.net [178.154.239.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BACF8156C6F; Wed, 24 Apr 2024 10:53:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.211 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713955986; cv=none; b=CbTlS8NWVuG/O/L4wKrdE85TRhdUE5fFA0Ks9E0KnqV08C9VIdrz8ycEvvQ8+TFVU23ntzaSstSInB10+B66qz/s1YqQexFvCLf4Ix+xIY2V1lPxb7qwjDWbHXA1ObXDsATw/XFEc4thmM0x7iY2ecX42TayzCHe418iY8SXY2o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713955986; c=relaxed/simple; bh=gAV8ofA4Ymr2FEs8+DV+/7YK5SIFt+dgsgtlzx1wyos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uSyXVY7EtbXcwTtLHlPMTKmTBedBHwLobqQ9Iw23AuvqWPZwIEhs7zhYg+aUP5tMyp9h7Z9lQjEx0cUoa3J+hZzfhZrN7scGvO362E0OOs5XtHYeiidvk7GArfgA4OyK9pvN5kUlP9eZ2pt6UxEmI6cud3t4uJggWe6KgSSDex4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex.ru; spf=pass smtp.mailfrom=yandex.ru; dkim=pass (1024-bit key) header.d=yandex.ru header.i=@yandex.ru header.b=dAeCBJh3; arc=none smtp.client-ip=178.154.239.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex.ru Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex.ru Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex.ru header.i=@yandex.ru header.b="dAeCBJh3" Received: from mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net [IPv6:2a02:6b8:c27:19c8:0:640:13a7:0]) by forward100c.mail.yandex.net (Yandex) with ESMTPS id B7A8260AD6; Wed, 24 Apr 2024 13:52:55 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id oqIZBt9V0a60-MUSdSK2a; Wed, 24 Apr 2024 13:52:54 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1713955974; bh=HBbGbG5jk19TBnMq+Pf2G+6SmJykiGGgKkXYdVizOGo=; h=Cc:Message-ID:References:Date:In-Reply-To:Subject:To:From; b=dAeCBJh3CnKTkJN8kgIipSencn93NzSvVPPBAelBzE1wJ3sKSIb/ZodgM3LaEDzXq v1ydwL8kQwpMEKKtu34GnE1wZHVjqODUuSUX0qyvppB3CJuxeY7OhEzdqgBW3CLOXT fD5L4kGqS5H6I9d/TPyfte5F9O6nGvrkR+wVpDWU= Authentication-Results: mail-nwsmtp-smtp-production-main-45.sas.yp-c.yandex.net; dkim=pass header.i=@yandex.ru From: Stas Sergeev To: linux-kernel@vger.kernel.org Cc: Stas Sergeev , Stefan Metzmacher , Eric Biederman , Alexander Viro , Andy Lutomirski , Christian Brauner , Jan Kara , Jeff Layton , Chuck Lever , Alexander Aring , David Laight , linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Paolo Bonzini , =?UTF-8?q?Christian=20G=C3=B6ttsche?= Subject: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Date: Wed, 24 Apr 2024 13:52:48 +0300 Message-ID: <20240424105248.189032-3-stsp2@yandex.ru> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240424105248.189032-1-stsp2@yandex.ru> References: <20240424105248.189032-1-stsp2@yandex.ru> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This flag performs the open operation with the fs credentials (fsuid, fsgid, group_info) that were in effect when dir_fd was opened. This allows the process to pre-open some directories and then change eUID (and all other UIDs/GIDs) to a less-privileged user, retaining the ability to open/create files within these directories. Design goal: The idea is to provide a very light-weight sandboxing, where the process, without the use of any heavy-weight techniques like chroot within namespaces, can restrict the access to the set of pre-opened directories. This patch is just a first step to such sandboxing. If things go well, in the future the same extension can be added to more syscalls. These should include at least unlinkat(), renameat2() and the not-yet-upstreamed setxattrat(). Security considerations: - Only the bare minimal set of credentials is overridden: fsuid, fsgid and group_info. The rest, for example capabilities, are not overridden to avoid unneeded security risks. - To avoid sandboxing escape, this patch makes sure the restricted lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT. - To avoid leaking creds across exec, this patch requires O_CLOEXEC flag on a directory. - Magic /proc symlinks are discarded, as suggested by Andy Lutomirski Use cases: Virtual machines that deal with untrusted code, can use that instead of a more heavy-weighted approaches. Currently the approach is being tested on a dosemu2 VM. Signed-off-by: Stas Sergeev CC: Stefan Metzmacher CC: Eric Biederman CC: Alexander Viro CC: Andy Lutomirski CC: Christian Brauner CC: Jan Kara CC: Jeff Layton CC: Chuck Lever CC: Alexander Aring CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Paolo Bonzini CC: Christian G=C3=B6ttsche --- fs/internal.h | 2 +- fs/namei.c | 56 ++++++++++++++++++++++++++++++++++-- fs/open.c | 10 ++++++- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 ++ 5 files changed, 69 insertions(+), 4 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 7ca738904e34..692b53b19aad 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -169,7 +169,7 @@ static inline void sb_end_ro_state_change(struct super_= block *sb) * open.c */ struct open_flags { - int open_flag; + u64 open_flag; umode_t mode; int acc_mode; int intent; diff --git a/fs/namei.c b/fs/namei.c index 413eef134234..aeb9f504538e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -586,6 +586,9 @@ struct nameidata { int dfd; vfsuid_t dir_vfsuid; umode_t dir_mode; + kuid_t dir_open_fsuid; + kgid_t dir_open_fsgid; + struct group_info *dir_open_groups; } __randomize_layout; =20 #define ND_ROOT_PRESET 1 @@ -695,6 +698,8 @@ static void terminate_walk(struct nameidata *nd) nd->depth =3D 0; nd->path.mnt =3D NULL; nd->path.dentry =3D NULL; + if (nd->dir_open_groups) + put_group_info(nd->dir_open_groups); } =20 /* path_put is needed afterwards regardless of success or failure */ @@ -2414,6 +2419,9 @@ static const char *path_init(struct nameidata *nd, un= signed flags) get_fs_pwd(current->fs, &nd->path); nd->inode =3D nd->path.dentry->d_inode; } + nd->dir_open_fsuid =3D current_cred()->fsuid; + nd->dir_open_fsgid =3D current_cred()->fsgid; + nd->dir_open_groups =3D get_current_groups(); } else { /* Caller must check execute permissions on the starting path component = */ struct fd f =3D fdget_raw(nd->dfd); @@ -2437,6 +2445,10 @@ static const char *path_init(struct nameidata *nd, u= nsigned flags) path_get(&nd->path); nd->inode =3D nd->path.dentry->d_inode; } + nd->dir_open_fsuid =3D f.file->f_cred->fsuid; + nd->dir_open_fsgid =3D f.file->f_cred->fsgid; + nd->dir_open_groups =3D get_group_info( + f.file->f_cred->group_info); fdput(f); } =20 @@ -3776,6 +3788,29 @@ static int do_o_path(struct nameidata *nd, unsigned = flags, struct file *file) return error; } =20 +static const struct cred *openat2_override_creds(struct nameidata *nd) +{ + const struct cred *old_cred; + struct cred *override_cred; + + override_cred =3D prepare_creds(); + if (!override_cred) + return NULL; + + override_cred->fsuid =3D nd->dir_open_fsuid; + override_cred->fsgid =3D nd->dir_open_fsgid; + override_cred->group_info =3D nd->dir_open_groups; + + override_cred->non_rcu =3D 1; + + old_cred =3D override_creds(override_cred); + + /* override_cred() gets its own ref */ + put_cred(override_cred); + + return old_cred; +} + static struct file *path_openat(struct nameidata *nd, const struct open_flags *op, unsigned flags) { @@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd, error =3D do_o_path(nd, flags, file); } else { const char *s =3D path_init(nd, flags); - file =3D alloc_empty_file(open_flags, current_cred()); - error =3D PTR_ERR_OR_ZERO(file); + const struct cred *old_cred =3D NULL; + + error =3D 0; + if (open_flags & OA2_INHERIT_CRED) { + /* Only work with O_CLOEXEC dirs. */ + if (!get_close_on_exec(nd->dfd)) + error =3D -EPERM; + + if (!error) + old_cred =3D openat2_override_creds(nd); + } + if (!error) { + file =3D alloc_empty_file(open_flags, current_cred()); + error =3D PTR_ERR_OR_ZERO(file); + } else { + file =3D ERR_PTR(error); + } if (!error) { while (!(error =3D link_path_walk(s, nd)) && (s =3D open_last_lookups(nd, file, op)) !=3D NULL) @@ -3802,6 +3852,8 @@ static struct file *path_openat(struct nameidata *nd, } if (!error) error =3D do_open(nd, file, op); + if (old_cred) + revert_creds(old_cred); terminate_walk(nd); if (IS_ERR(file)) return file; diff --git a/fs/open.c b/fs/open.c index ee8460c83c77..c871ff8fc6e3 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *ho= w, struct open_flags *op) * values before calling build_open_flags(), but openat2(2) checks all * of its arguments. */ - if (flags & ~VALID_OPEN_FLAGS) + if (flags & ~VALID_OPENAT2_FLAGS) return -EINVAL; if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; @@ -1326,6 +1326,14 @@ inline int build_open_flags(const struct open_how *h= ow, struct open_flags *op) lookup_flags |=3D LOOKUP_CACHED; } =20 + if (flags & OA2_INHERIT_CRED) { + /* Inherit creds only with scoped look-up modes. */ + if (!(lookup_flags & LOOKUP_IS_SCOPED)) + return -EPERM; + /* Reject /proc "magic" links if inheriting creds. */ + lookup_flags |=3D LOOKUP_NO_MAGICLINKS; + } + op->lookup_flags =3D lookup_flags; return 0; } diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..b71f8b162102 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -12,6 +12,8 @@ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) =20 +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) + /* List of all valid flags for the how->resolve argument: */ #define VALID_RESOLVE_FLAGS \ (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h index a5feb7604948..cdd676a10b62 100644 --- a/include/uapi/linux/openat2.h +++ b/include/uapi/linux/openat2.h @@ -40,4 +40,7 @@ struct open_how { return -EAGAIN if that's not possible. */ =20 +/* openat2-specific flags go to upper 4 bytes. */ +#define OA2_INHERIT_CRED (1ULL << 32) + #endif /* _UAPI_LINUX_OPENAT2_H */ --=20 2.44.0