[v5] ceph: support idmapped mounts

[PATCH v5 00/14] ceph: support idmapped mounts

Posted by Alexander Mikhalitsyn 2 years, 8 months ago

Dear friends,

This patchset was originally developed by Christian Brauner but I'll continue
to push it forward. Christian allowed me to do that :)

This feature is already actively used/tested with LXD/LXC project.

Git tree (based on https://github.com/ceph/ceph-client.git master):
v5: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v5
current: https://github.com/mihalicyn/linux/tree/fs.idmapped.ceph

In the version 3 I've changed only two commits:
- fs: export mnt_idmap_get/mnt_idmap_put
- ceph: allow idmapped setattr inode op
and added a new one:
- ceph: pass idmap to __ceph_setattr

In the version 4 I've reworked the ("ceph: stash idmapping in mdsc request")
commit. Now we take idmap refcounter just in place where req->r_mnt_idmap
is filled. It's more safer approach and prevents possible refcounter underflow
on error paths where __register_request wasn't called but ceph_mdsc_release_request is
called.

Changelog for version 5:
- a few commits were squashed into one (as suggested by Xiubo Li)
- started passing an idmapping everywhere (if possible), so a caller
UID/GID-s will be mapped almost everywhere (as suggested by Xiubo Li)

I can confirm that this version passes xfstests.

Links to previous versions:
v1: https://lore.kernel.org/all/20220104140414.155198-1-brauner@kernel.org/
v2: https://lore.kernel.org/lkml/20230524153316.476973-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v2
v3: https://lore.kernel.org/lkml/20230607152038.469739-1-aleksandr.mikhalitsyn@canonical.com/#t
v4: https://lore.kernel.org/lkml/20230607180958.645115-1-aleksandr.mikhalitsyn@canonical.com/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v4

Kind regards,
Alex

Original description from Christian:
========================================================================
This patch series enables cephfs to support idmapped mounts, i.e. the
ability to alter ownership information on a per-mount basis.

Container managers such as LXD support sharaing data via cephfs between
the host and unprivileged containers and between unprivileged containers.
They may all use different idmappings. Idmapped mounts can be used to
create mounts with the idmapping used for the container (or a different
one specific to the use-case).

There are in fact more use-cases such as remapping ownership for
mountpoints on the host itself to grant or restrict access to different
users or to make it possible to enforce that programs running as root
will write with a non-zero {g,u}id to disk.

The patch series is simple overall and few changes are needed to cephfs.
There is one cephfs specific issue that I would like to discuss and
solve which I explain in detail in:

[PATCH 02/12] ceph: handle idmapped mounts in create_request_message()

It has to do with how to handle mds serves which have id-based access
restrictions configured. I would ask you to please take a look at the
explanation in the aforementioned patch.

The patch series passes the vfs and idmapped mount testsuite as part of
xfstests. To run it you will need a config like:

[ceph]
export FSTYP=ceph
export TEST_DIR=/mnt/test
export TEST_DEV=10.103.182.10:6789:/
export TEST_FS_MOUNT_OPTS="-o name=admin,secret=$password

and then simply call

sudo ./check -g idmapped

========================================================================

Alexander Mikhalitsyn (5):
  fs: export mnt_idmap_get/mnt_idmap_put
  ceph: pass idmap to __ceph_setattr
  ceph: pass idmap to ceph_do_getattr
  ceph: pass idmap to __ceph_setxattr
  ceph: pass idmap to ceph_open/ioctl_set_layout

Christian Brauner (9):
  ceph: stash idmapping in mdsc request
  ceph: handle idmapped mounts in create_request_message()
  ceph: pass an idmapping to mknod/symlink/mkdir/rename
  ceph: allow idmapped getattr inode op
  ceph: allow idmapped permission inode op
  ceph: allow idmapped setattr inode op
  ceph/acl: allow idmapped set_acl inode op
  ceph/file: allow idmapped atomic_open inode op
  ceph: allow idmapped mounts

 fs/ceph/acl.c                 |  8 ++++----
 fs/ceph/addr.c                |  3 ++-
 fs/ceph/caps.c                |  3 ++-
 fs/ceph/dir.c                 |  4 ++++
 fs/ceph/export.c              |  2 +-
 fs/ceph/file.c                | 21 ++++++++++++++-----
 fs/ceph/inode.c               | 38 +++++++++++++++++++++--------------
 fs/ceph/ioctl.c               |  9 +++++++--
 fs/ceph/mds_client.c          | 27 +++++++++++++++++++++----
 fs/ceph/mds_client.h          |  1 +
 fs/ceph/quota.c               |  2 +-
 fs/ceph/super.c               |  6 +++---
 fs/ceph/super.h               | 14 ++++++++-----
 fs/ceph/xattr.c               | 18 +++++++++--------
 fs/mnt_idmapping.c            |  2 ++
 include/linux/mnt_idmapping.h |  3 +++
 16 files changed, 111 insertions(+), 50 deletions(-)

-- 
2.34.1

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Xiubo Li 2 years, 8 months ago

On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> Dear friends,
>
> This patchset was originally developed by Christian Brauner but I'll continue
> to push it forward. Christian allowed me to do that :)
>
> This feature is already actively used/tested with LXD/LXC project.
>
> Git tree (based on https://github.com/ceph/ceph-client.git master):

Could you rebase these patches to 'testing' branch ?

And you still have missed several places, for example the following cases:


    1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
              req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR, 
mode);
    2    389  fs/ceph/dir.c <<ceph_readdir>>
              req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
    3    789  fs/ceph/dir.c <<ceph_lookup>>
              req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
    ...


For this requests you also need to set the real idmap.


Thanks

- Xiubo



> v5: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v5
> current: https://github.com/mihalicyn/linux/tree/fs.idmapped.ceph
>
> In the version 3 I've changed only two commits:
> - fs: export mnt_idmap_get/mnt_idmap_put
> - ceph: allow idmapped setattr inode op
> and added a new one:
> - ceph: pass idmap to __ceph_setattr
>
> In the version 4 I've reworked the ("ceph: stash idmapping in mdsc request")
> commit. Now we take idmap refcounter just in place where req->r_mnt_idmap
> is filled. It's more safer approach and prevents possible refcounter underflow
> on error paths where __register_request wasn't called but ceph_mdsc_release_request is
> called.
>
> Changelog for version 5:
> - a few commits were squashed into one (as suggested by Xiubo Li)
> - started passing an idmapping everywhere (if possible), so a caller
> UID/GID-s will be mapped almost everywhere (as suggested by Xiubo Li)
>
> I can confirm that this version passes xfstests.
>
> Links to previous versions:
> v1: https://lore.kernel.org/all/20220104140414.155198-1-brauner@kernel.org/
> v2: https://lore.kernel.org/lkml/20230524153316.476973-1-aleksandr.mikhalitsyn@canonical.com/
> tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v2
> v3: https://lore.kernel.org/lkml/20230607152038.469739-1-aleksandr.mikhalitsyn@canonical.com/#t
> v4: https://lore.kernel.org/lkml/20230607180958.645115-1-aleksandr.mikhalitsyn@canonical.com/#t
> tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v4
>
> Kind regards,
> Alex
>
> Original description from Christian:
> ========================================================================
> This patch series enables cephfs to support idmapped mounts, i.e. the
> ability to alter ownership information on a per-mount basis.
>
> Container managers such as LXD support sharaing data via cephfs between
> the host and unprivileged containers and between unprivileged containers.
> They may all use different idmappings. Idmapped mounts can be used to
> create mounts with the idmapping used for the container (or a different
> one specific to the use-case).
>
> There are in fact more use-cases such as remapping ownership for
> mountpoints on the host itself to grant or restrict access to different
> users or to make it possible to enforce that programs running as root
> will write with a non-zero {g,u}id to disk.
>
> The patch series is simple overall and few changes are needed to cephfs.
> There is one cephfs specific issue that I would like to discuss and
> solve which I explain in detail in:
>
> [PATCH 02/12] ceph: handle idmapped mounts in create_request_message()
>
> It has to do with how to handle mds serves which have id-based access
> restrictions configured. I would ask you to please take a look at the
> explanation in the aforementioned patch.
>
> The patch series passes the vfs and idmapped mount testsuite as part of
> xfstests. To run it you will need a config like:
>
> [ceph]
> export FSTYP=ceph
> export TEST_DIR=/mnt/test
> export TEST_DEV=10.103.182.10:6789:/
> export TEST_FS_MOUNT_OPTS="-o name=admin,secret=$password
>
> and then simply call
>
> sudo ./check -g idmapped
>
> ========================================================================
>
> Alexander Mikhalitsyn (5):
>    fs: export mnt_idmap_get/mnt_idmap_put
>    ceph: pass idmap to __ceph_setattr
>    ceph: pass idmap to ceph_do_getattr
>    ceph: pass idmap to __ceph_setxattr
>    ceph: pass idmap to ceph_open/ioctl_set_layout
>
> Christian Brauner (9):
>    ceph: stash idmapping in mdsc request
>    ceph: handle idmapped mounts in create_request_message()
>    ceph: pass an idmapping to mknod/symlink/mkdir/rename
>    ceph: allow idmapped getattr inode op
>    ceph: allow idmapped permission inode op
>    ceph: allow idmapped setattr inode op
>    ceph/acl: allow idmapped set_acl inode op
>    ceph/file: allow idmapped atomic_open inode op
>    ceph: allow idmapped mounts
>
>   fs/ceph/acl.c                 |  8 ++++----
>   fs/ceph/addr.c                |  3 ++-
>   fs/ceph/caps.c                |  3 ++-
>   fs/ceph/dir.c                 |  4 ++++
>   fs/ceph/export.c              |  2 +-
>   fs/ceph/file.c                | 21 ++++++++++++++-----
>   fs/ceph/inode.c               | 38 +++++++++++++++++++++--------------
>   fs/ceph/ioctl.c               |  9 +++++++--
>   fs/ceph/mds_client.c          | 27 +++++++++++++++++++++----
>   fs/ceph/mds_client.h          |  1 +
>   fs/ceph/quota.c               |  2 +-
>   fs/ceph/super.c               |  6 +++---
>   fs/ceph/super.h               | 14 ++++++++-----
>   fs/ceph/xattr.c               | 18 +++++++++--------
>   fs/mnt_idmapping.c            |  2 ++
>   include/linux/mnt_idmapping.h |  3 +++
>   16 files changed, 111 insertions(+), 50 deletions(-)
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Aleksandr Mikhalitsyn 2 years, 8 months ago

On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > Dear friends,
> >
> > This patchset was originally developed by Christian Brauner but I'll continue
> > to push it forward. Christian allowed me to do that :)
> >
> > This feature is already actively used/tested with LXD/LXC project.
> >
> > Git tree (based on https://github.com/ceph/ceph-client.git master):

Hi Xiubo!

>
> Could you rebase these patches to 'testing' branch ?

Will do in -v6.

>
> And you still have missed several places, for example the following cases:
>
>
>     1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
>               req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> mode);

+

>     2    389  fs/ceph/dir.c <<ceph_readdir>>
>               req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);

+

>     3    789  fs/ceph/dir.c <<ceph_lookup>>
>               req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);

We don't have an idmapping passed to lookup from the VFS layer. As I
mentioned before, it's just impossible now.

I've checked all places with ceph_mdsc_create_request and passed
idmapping everywhere if possible (in v6, that I will send soon).

>     ...
>
>
> For this requests you also need to set the real idmap.

Thanks,
Alex

>
>
> Thanks
>
> - Xiubo
>
>
>
> > v5: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v5
> > current: https://github.com/mihalicyn/linux/tree/fs.idmapped.ceph
> >
> > In the version 3 I've changed only two commits:
> > - fs: export mnt_idmap_get/mnt_idmap_put
> > - ceph: allow idmapped setattr inode op
> > and added a new one:
> > - ceph: pass idmap to __ceph_setattr
> >
> > In the version 4 I've reworked the ("ceph: stash idmapping in mdsc request")
> > commit. Now we take idmap refcounter just in place where req->r_mnt_idmap
> > is filled. It's more safer approach and prevents possible refcounter underflow
> > on error paths where __register_request wasn't called but ceph_mdsc_release_request is
> > called.
> >
> > Changelog for version 5:
> > - a few commits were squashed into one (as suggested by Xiubo Li)
> > - started passing an idmapping everywhere (if possible), so a caller
> > UID/GID-s will be mapped almost everywhere (as suggested by Xiubo Li)
> >
> > I can confirm that this version passes xfstests.
> >
> > Links to previous versions:
> > v1: https://lore.kernel.org/all/20220104140414.155198-1-brauner@kernel.org/
> > v2: https://lore.kernel.org/lkml/20230524153316.476973-1-aleksandr.mikhalitsyn@canonical.com/
> > tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v2
> > v3: https://lore.kernel.org/lkml/20230607152038.469739-1-aleksandr.mikhalitsyn@canonical.com/#t
> > v4: https://lore.kernel.org/lkml/20230607180958.645115-1-aleksandr.mikhalitsyn@canonical.com/#t
> > tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v4
> >
> > Kind regards,
> > Alex
> >
> > Original description from Christian:
> > ========================================================================
> > This patch series enables cephfs to support idmapped mounts, i.e. the
> > ability to alter ownership information on a per-mount basis.
> >
> > Container managers such as LXD support sharaing data via cephfs between
> > the host and unprivileged containers and between unprivileged containers.
> > They may all use different idmappings. Idmapped mounts can be used to
> > create mounts with the idmapping used for the container (or a different
> > one specific to the use-case).
> >
> > There are in fact more use-cases such as remapping ownership for
> > mountpoints on the host itself to grant or restrict access to different
> > users or to make it possible to enforce that programs running as root
> > will write with a non-zero {g,u}id to disk.
> >
> > The patch series is simple overall and few changes are needed to cephfs.
> > There is one cephfs specific issue that I would like to discuss and
> > solve which I explain in detail in:
> >
> > [PATCH 02/12] ceph: handle idmapped mounts in create_request_message()
> >
> > It has to do with how to handle mds serves which have id-based access
> > restrictions configured. I would ask you to please take a look at the
> > explanation in the aforementioned patch.
> >
> > The patch series passes the vfs and idmapped mount testsuite as part of
> > xfstests. To run it you will need a config like:
> >
> > [ceph]
> > export FSTYP=ceph
> > export TEST_DIR=/mnt/test
> > export TEST_DEV=10.103.182.10:6789:/
> > export TEST_FS_MOUNT_OPTS="-o name=admin,secret=$password
> >
> > and then simply call
> >
> > sudo ./check -g idmapped
> >
> > ========================================================================
> >
> > Alexander Mikhalitsyn (5):
> >    fs: export mnt_idmap_get/mnt_idmap_put
> >    ceph: pass idmap to __ceph_setattr
> >    ceph: pass idmap to ceph_do_getattr
> >    ceph: pass idmap to __ceph_setxattr
> >    ceph: pass idmap to ceph_open/ioctl_set_layout
> >
> > Christian Brauner (9):
> >    ceph: stash idmapping in mdsc request
> >    ceph: handle idmapped mounts in create_request_message()
> >    ceph: pass an idmapping to mknod/symlink/mkdir/rename
> >    ceph: allow idmapped getattr inode op
> >    ceph: allow idmapped permission inode op
> >    ceph: allow idmapped setattr inode op
> >    ceph/acl: allow idmapped set_acl inode op
> >    ceph/file: allow idmapped atomic_open inode op
> >    ceph: allow idmapped mounts
> >
> >   fs/ceph/acl.c                 |  8 ++++----
> >   fs/ceph/addr.c                |  3 ++-
> >   fs/ceph/caps.c                |  3 ++-
> >   fs/ceph/dir.c                 |  4 ++++
> >   fs/ceph/export.c              |  2 +-
> >   fs/ceph/file.c                | 21 ++++++++++++++-----
> >   fs/ceph/inode.c               | 38 +++++++++++++++++++++--------------
> >   fs/ceph/ioctl.c               |  9 +++++++--
> >   fs/ceph/mds_client.c          | 27 +++++++++++++++++++++----
> >   fs/ceph/mds_client.h          |  1 +
> >   fs/ceph/quota.c               |  2 +-
> >   fs/ceph/super.c               |  6 +++---
> >   fs/ceph/super.h               | 14 ++++++++-----
> >   fs/ceph/xattr.c               | 18 +++++++++--------
> >   fs/mnt_idmapping.c            |  2 ++
> >   include/linux/mnt_idmapping.h |  3 +++
> >   16 files changed, 111 insertions(+), 50 deletions(-)
> >
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Christian Brauner 2 years, 8 months ago

On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> >
> >
> > On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > > Dear friends,
> > >
> > > This patchset was originally developed by Christian Brauner but I'll continue
> > > to push it forward. Christian allowed me to do that :)
> > >
> > > This feature is already actively used/tested with LXD/LXC project.
> > >
> > > Git tree (based on https://github.com/ceph/ceph-client.git master):
> 
> Hi Xiubo!
> 
> >
> > Could you rebase these patches to 'testing' branch ?
> 
> Will do in -v6.
> 
> >
> > And you still have missed several places, for example the following cases:
> >
> >
> >     1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> >               req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> > mode);
> 
> +
> 
> >     2    389  fs/ceph/dir.c <<ceph_readdir>>
> >               req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> 
> +
> 
> >     3    789  fs/ceph/dir.c <<ceph_lookup>>
> >               req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> 
> We don't have an idmapping passed to lookup from the VFS layer. As I
> mentioned before, it's just impossible now.

->lookup() doesn't deal with idmappings and really can't otherwise you
risk ending up with inode aliasing which is really not something you
want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
value. So better not even risk exposing the idmapping in there at all.

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Aleksandr Mikhalitsyn 2 years, 8 months ago

On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> > On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> > >
> > >
> > > On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > > > Dear friends,
> > > >
> > > > This patchset was originally developed by Christian Brauner but I'll continue
> > > > to push it forward. Christian allowed me to do that :)
> > > >
> > > > This feature is already actively used/tested with LXD/LXC project.
> > > >
> > > > Git tree (based on https://github.com/ceph/ceph-client.git master):
> >
> > Hi Xiubo!
> >
> > >
> > > Could you rebase these patches to 'testing' branch ?
> >
> > Will do in -v6.
> >
> > >
> > > And you still have missed several places, for example the following cases:
> > >
> > >
> > >     1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> > >               req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> > > mode);
> >
> > +
> >
> > >     2    389  fs/ceph/dir.c <<ceph_readdir>>
> > >               req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> >
> > +
> >
> > >     3    789  fs/ceph/dir.c <<ceph_lookup>>
> > >               req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> >
> > We don't have an idmapping passed to lookup from the VFS layer. As I
> > mentioned before, it's just impossible now.
>
> ->lookup() doesn't deal with idmappings and really can't otherwise you
> risk ending up with inode aliasing which is really not something you
> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> value. So better not even risk exposing the idmapping in there at all.

Thanks for adding, Christian!

I agree, every time when we use an idmapping we need to be careful with
what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
idmapping (not mount),
but in this case, Xiubo want's current_fs{u,g}id to be mapped
according to an idmapping.
Anyway, it's impossible at now and IMHO, until we don't have any
practical use case where
UID/GID-based path restriction is used in combination with idmapped
mounts it's not worth to
make such big changes in the VFS layer.

May be I'm not right, but it seems like UID/GID-based path restriction
is not a widespread
feature and I can hardly imagine it to be used with the container
workloads (for instance),
because it will require to always keep in sync MDS permissions
configuration with the
possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
It is useful when cephfs is used as an external storage on the host, but if you
share cephfs with a few containers with different user namespaces idmapping...

Kind regards,
Alex

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Xiubo Li 2 years, 8 months ago

On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
>> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
>>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>>
>>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
>>>>> Dear friends,
>>>>>
>>>>> This patchset was originally developed by Christian Brauner but I'll continue
>>>>> to push it forward. Christian allowed me to do that :)
>>>>>
>>>>> This feature is already actively used/tested with LXD/LXC project.
>>>>>
>>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
>>> Hi Xiubo!
>>>
>>>> Could you rebase these patches to 'testing' branch ?
>>> Will do in -v6.
>>>
>>>> And you still have missed several places, for example the following cases:
>>>>
>>>>
>>>>      1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
>>>>                req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
>>>> mode);
>>> +
>>>
>>>>      2    389  fs/ceph/dir.c <<ceph_readdir>>
>>>>                req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
>>> +
>>>
>>>>      3    789  fs/ceph/dir.c <<ceph_lookup>>
>>>>                req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
>>> We don't have an idmapping passed to lookup from the VFS layer. As I
>>> mentioned before, it's just impossible now.
>> ->lookup() doesn't deal with idmappings and really can't otherwise you
>> risk ending up with inode aliasing which is really not something you
>> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
>> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
>> value. So better not even risk exposing the idmapping in there at all.
> Thanks for adding, Christian!
>
> I agree, every time when we use an idmapping we need to be careful with
> what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> idmapping (not mount),
> but in this case, Xiubo want's current_fs{u,g}id to be mapped
> according to an idmapping.
> Anyway, it's impossible at now and IMHO, until we don't have any
> practical use case where
> UID/GID-based path restriction is used in combination with idmapped
> mounts it's not worth to
> make such big changes in the VFS layer.
>
> May be I'm not right, but it seems like UID/GID-based path restriction
> is not a widespread
> feature and I can hardly imagine it to be used with the container
> workloads (for instance),
> because it will require to always keep in sync MDS permissions
> configuration with the
> possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> It is useful when cephfs is used as an external storage on the host, but if you
> share cephfs with a few containers with different user namespaces idmapping...

Hmm, while this will break the MDS permission check in cephfs then in 
lookup case. If we really couldn't support it we should make it to 
escape the check anyway or some OPs may fail and won't work as expected.

@Greg

For the lookup requests the idmapping couldn't get the mapped UID/GID 
just like all the other requests, which is needed by the MDS permission 
check. Is that okay to make it disable the check for this case ? I am 
afraid this will break the MDS permssions logic.

Any idea ?

Thanks

- Xiubo


> Kind regards,
> Alex
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Gregory Farnum 2 years, 7 months ago

On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>>
> >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> >>>>> Dear friends,
> >>>>>
> >>>>> This patchset was originally developed by Christian Brauner but I'll continue
> >>>>> to push it forward. Christian allowed me to do that :)
> >>>>>
> >>>>> This feature is already actively used/tested with LXD/LXC project.
> >>>>>
> >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> >>> Hi Xiubo!
> >>>
> >>>> Could you rebase these patches to 'testing' branch ?
> >>> Will do in -v6.
> >>>
> >>>> And you still have missed several places, for example the following cases:
> >>>>
> >>>>
> >>>>      1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> >>>>                req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> >>>> mode);
> >>> +
> >>>
> >>>>      2    389  fs/ceph/dir.c <<ceph_readdir>>
> >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> >>> +
> >>>
> >>>>      3    789  fs/ceph/dir.c <<ceph_lookup>>
> >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> >>> We don't have an idmapping passed to lookup from the VFS layer. As I
> >>> mentioned before, it's just impossible now.
> >> ->lookup() doesn't deal with idmappings and really can't otherwise you
> >> risk ending up with inode aliasing which is really not something you
> >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> >> value. So better not even risk exposing the idmapping in there at all.
> > Thanks for adding, Christian!
> >
> > I agree, every time when we use an idmapping we need to be careful with
> > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> > idmapping (not mount),
> > but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > according to an idmapping.
> > Anyway, it's impossible at now and IMHO, until we don't have any
> > practical use case where
> > UID/GID-based path restriction is used in combination with idmapped
> > mounts it's not worth to
> > make such big changes in the VFS layer.
> >
> > May be I'm not right, but it seems like UID/GID-based path restriction
> > is not a widespread
> > feature and I can hardly imagine it to be used with the container
> > workloads (for instance),
> > because it will require to always keep in sync MDS permissions
> > configuration with the
> > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> > It is useful when cephfs is used as an external storage on the host, but if you
> > share cephfs with a few containers with different user namespaces idmapping...
>
> Hmm, while this will break the MDS permission check in cephfs then in
> lookup case. If we really couldn't support it we should make it to
> escape the check anyway or some OPs may fail and won't work as expected.

I don't pretend to know the details of the VFS (or even our linux
client implementation), but I'm confused that this is apparently so
hard. It looks to me like we currently always fill in the "caller_uid"
with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
valid to begin with? If it is, why can't the uid mapping be applied on
that?

As both the client and the server share authority over the inode's
state (including things like mode bits and owners), and need to do
permission checking, being able to tell the server the relevant actor
is inherently necessary. We also let admins restrict keys to
particular UID/GID combinations as they wish, and it's not the most
popular feature but it does get deployed. I would really expect a user
of UID mapping to be one of the *most* likely to employ such a
facility...maybe not with containers, but certainly end-user homedirs
and shared spaces.

Disabling the MDS auth checks is really not an option. I guess we
could require any user employing idmapping to not be uid-restricted,
and set the anonymous UID (does that work, Xiubo, or was it the broken
one? In which case we'd have to default to root?). But that seems a
bit janky to me.
-Greg

> @Greg
>
> For the lookup requests the idmapping couldn't get the mapped UID/GID
> just like all the other requests, which is needed by the MDS permission
> check. Is that okay to make it disable the check for this case ? I am
> afraid this will break the MDS permssions logic.
>
> Any idea ?
>
> Thanks
>
> - Xiubo
>
>
> > Kind regards,
> > Alex
> >
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Xiubo Li 2 years, 7 months ago

On 6/13/23 22:53, Gregory Farnum wrote:
> On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
>>> On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
>>>> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
>>>>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
>>>>>>> Dear friends,
>>>>>>>
>>>>>>> This patchset was originally developed by Christian Brauner but I'll continue
>>>>>>> to push it forward. Christian allowed me to do that :)
>>>>>>>
>>>>>>> This feature is already actively used/tested with LXD/LXC project.
>>>>>>>
>>>>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
>>>>> Hi Xiubo!
>>>>>
>>>>>> Could you rebase these patches to 'testing' branch ?
>>>>> Will do in -v6.
>>>>>
>>>>>> And you still have missed several places, for example the following cases:
>>>>>>
>>>>>>
>>>>>>       1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
>>>>>>                 req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
>>>>>> mode);
>>>>> +
>>>>>
>>>>>>       2    389  fs/ceph/dir.c <<ceph_readdir>>
>>>>>>                 req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
>>>>> +
>>>>>
>>>>>>       3    789  fs/ceph/dir.c <<ceph_lookup>>
>>>>>>                 req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
>>>>> We don't have an idmapping passed to lookup from the VFS layer. As I
>>>>> mentioned before, it's just impossible now.
>>>> ->lookup() doesn't deal with idmappings and really can't otherwise you
>>>> risk ending up with inode aliasing which is really not something you
>>>> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
>>>> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
>>>> value. So better not even risk exposing the idmapping in there at all.
>>> Thanks for adding, Christian!
>>>
>>> I agree, every time when we use an idmapping we need to be careful with
>>> what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
>>> idmapping (not mount),
>>> but in this case, Xiubo want's current_fs{u,g}id to be mapped
>>> according to an idmapping.
>>> Anyway, it's impossible at now and IMHO, until we don't have any
>>> practical use case where
>>> UID/GID-based path restriction is used in combination with idmapped
>>> mounts it's not worth to
>>> make such big changes in the VFS layer.
>>>
>>> May be I'm not right, but it seems like UID/GID-based path restriction
>>> is not a widespread
>>> feature and I can hardly imagine it to be used with the container
>>> workloads (for instance),
>>> because it will require to always keep in sync MDS permissions
>>> configuration with the
>>> possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
>>> It is useful when cephfs is used as an external storage on the host, but if you
>>> share cephfs with a few containers with different user namespaces idmapping...
>> Hmm, while this will break the MDS permission check in cephfs then in
>> lookup case. If we really couldn't support it we should make it to
>> escape the check anyway or some OPs may fail and won't work as expected.
> I don't pretend to know the details of the VFS (or even our linux
> client implementation), but I'm confused that this is apparently so
> hard. It looks to me like we currently always fill in the "caller_uid"
> with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
> valid to begin with? If it is, why can't the uid mapping be applied on
> that?
>
> As both the client and the server share authority over the inode's
> state (including things like mode bits and owners), and need to do
> permission checking, being able to tell the server the relevant actor
> is inherently necessary. We also let admins restrict keys to
> particular UID/GID combinations as they wish, and it's not the most
> popular feature but it does get deployed. I would really expect a user
> of UID mapping to be one of the *most* likely to employ such a
> facility...maybe not with containers, but certainly end-user homedirs
> and shared spaces.
>
> Disabling the MDS auth checks is really not an option. I guess we
> could require any user employing idmapping to not be uid-restricted,
> and set the anonymous UID (does that work, Xiubo, or was it the broken
> one? In which case we'd have to default to root?). But that seems a
> bit janky to me.

Yeah, this also seems risky.

Instead disabling the MDS auth checks there is another option, which is 
we can prevent  the kclient to be mounted or the idmapping to be 
applied. But this still have issues, such as what if admins set the MDS 
auth caps after idmap applied to the kclients ?

IMO there have 2 options: the best way is to fix this in VFS if 
possible. Else to add one option to disable the corresponding MDS auth 
caps in ceph if users want to support the idmap feature.

Thanks

- Xiubo

> -Greg
>
>> @Greg
>>
>> For the lookup requests the idmapping couldn't get the mapped UID/GID
>> just like all the other requests, which is needed by the MDS permission
>> check. Is that okay to make it disable the check for this case ? I am
>> afraid this will break the MDS permssions logic.
>>
>> Any idea ?
>>
>> Thanks
>>
>> - Xiubo
>>
>>
>>> Kind regards,
>>> Alex
>>>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Aleksandr Mikhalitsyn 2 years, 7 months ago

On Wed, Jun 14, 2023 at 3:53 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/13/23 22:53, Gregory Farnum wrote:
> > On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> >>> On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> >>>> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> >>>>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> >>>>>>> Dear friends,
> >>>>>>>
> >>>>>>> This patchset was originally developed by Christian Brauner but I'll continue
> >>>>>>> to push it forward. Christian allowed me to do that :)
> >>>>>>>
> >>>>>>> This feature is already actively used/tested with LXD/LXC project.
> >>>>>>>
> >>>>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> >>>>> Hi Xiubo!
> >>>>>
> >>>>>> Could you rebase these patches to 'testing' branch ?
> >>>>> Will do in -v6.
> >>>>>
> >>>>>> And you still have missed several places, for example the following cases:
> >>>>>>
> >>>>>>
> >>>>>>       1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> >>>>>>                 req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> >>>>>> mode);
> >>>>> +
> >>>>>
> >>>>>>       2    389  fs/ceph/dir.c <<ceph_readdir>>
> >>>>>>                 req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> >>>>> +
> >>>>>
> >>>>>>       3    789  fs/ceph/dir.c <<ceph_lookup>>
> >>>>>>                 req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> >>>>> We don't have an idmapping passed to lookup from the VFS layer. As I
> >>>>> mentioned before, it's just impossible now.
> >>>> ->lookup() doesn't deal with idmappings and really can't otherwise you
> >>>> risk ending up with inode aliasing which is really not something you
> >>>> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> >>>> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> >>>> value. So better not even risk exposing the idmapping in there at all.
> >>> Thanks for adding, Christian!
> >>>
> >>> I agree, every time when we use an idmapping we need to be careful with
> >>> what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> >>> idmapping (not mount),
> >>> but in this case, Xiubo want's current_fs{u,g}id to be mapped
> >>> according to an idmapping.
> >>> Anyway, it's impossible at now and IMHO, until we don't have any
> >>> practical use case where
> >>> UID/GID-based path restriction is used in combination with idmapped
> >>> mounts it's not worth to
> >>> make such big changes in the VFS layer.
> >>>
> >>> May be I'm not right, but it seems like UID/GID-based path restriction
> >>> is not a widespread
> >>> feature and I can hardly imagine it to be used with the container
> >>> workloads (for instance),
> >>> because it will require to always keep in sync MDS permissions
> >>> configuration with the
> >>> possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> >>> It is useful when cephfs is used as an external storage on the host, but if you
> >>> share cephfs with a few containers with different user namespaces idmapping...
> >> Hmm, while this will break the MDS permission check in cephfs then in
> >> lookup case. If we really couldn't support it we should make it to
> >> escape the check anyway or some OPs may fail and won't work as expected.
> > I don't pretend to know the details of the VFS (or even our linux
> > client implementation), but I'm confused that this is apparently so
> > hard. It looks to me like we currently always fill in the "caller_uid"
> > with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
> > valid to begin with? If it is, why can't the uid mapping be applied on
> > that?
> >
> > As both the client and the server share authority over the inode's
> > state (including things like mode bits and owners), and need to do
> > permission checking, being able to tell the server the relevant actor
> > is inherently necessary. We also let admins restrict keys to
> > particular UID/GID combinations as they wish, and it's not the most
> > popular feature but it does get deployed. I would really expect a user
> > of UID mapping to be one of the *most* likely to employ such a
> > facility...maybe not with containers, but certainly end-user homedirs
> > and shared spaces.
> >
> > Disabling the MDS auth checks is really not an option. I guess we
> > could require any user employing idmapping to not be uid-restricted,
> > and set the anonymous UID (does that work, Xiubo, or was it the broken
> > one? In which case we'd have to default to root?). But that seems a
> > bit janky to me.
>
> Yeah, this also seems risky.
>
> Instead disabling the MDS auth checks there is another option, which is
> we can prevent  the kclient to be mounted or the idmapping to be
> applied. But this still have issues, such as what if admins set the MDS
> auth caps after idmap applied to the kclients ?

Hi Xiubo,

I thought about this too and came to the same conclusion, that UID/GID based
restriction can be applied dynamically, so detecting it on mount-time
helps not so much.

>
> IMO there have 2 options: the best way is to fix this in VFS if
> possible. Else to add one option to disable the corresponding MDS auth
> caps in ceph if users want to support the idmap feature.

Dear colleagues,
Dear Xiubo,

Let me try to summarize the previous discussions about cephfs idmapped
mount support.

This discussion about the need of caller's UID/GID mapping is started
from the first
version of this patchset in this [1] thread. Let'me quote Christian here:
> Since the idmapping is a property of the mount and not a property of the
> caller the caller's fs{g,u}id aren't mapped. What is mapped are the
> inode's i{g,u}id when accessed from a particular mount.
>
> The fs{g,u}id are only ever mapped when a new filesystem object is
> created. So if I have an idmapped mount that makes it so that files
> owned by 1000 on-disk appear to be owned by uid 0 then a user with uid 0
> creating a new file will create files with uid 1000 on-disk when going
> through that mount. For cephfs that'd be the uid we would be sending
> with creation requests as I've currently written it.

This is a key part of this discussion. Idmapped mounts is not a way to proxify
caller's UID/GID, but idmapped mounts are designed to perform UID/GID mapping
of inode's owner's UID/GID. Yes, these concepts look really-really
close and from
the first glance it looks like it's just an equivalent thing. But they are not.

From my understanding, if someone wants to verify caller UID/GID then he should
take an unmapped UID/GID and verify it. It's not important if the
caller does something
through an idmapped mount or not, from_kuid(&init_user_ns, req->r_cred->fsuid))
literally "UID of the caller in a root user namespace". But cephfs
mount can be used
from any user namespace (yes, cephfs can't be mounted in user namespaces, but it
can be inherited during CLONE_NEWNS, or used as a detached mount with
open_tree/move_mount).
What I want to say by providing this example is that even now, without
idmapped mounts
we have kinda close problem, that UID/GID based restriction will be
based on the host's (!),
root user namespace, UID/GID-s even if the caller sits inside the user
namespace. And we don't care,
right? Why it's a problem with an idmapped mounts? If someone wants to
control caller's UID/GID
on the MDS side he just needs to take hosts UID/GIDs and use them in
permission rules. That's it.

Next point is that technically idmapped mounts don't break anything,
if someone starts using
idmapped mounts with UID/GID-based restrictions he will get -EACCESS.
Why is this a problem?
A user will check configuration, read the clarification in the
documentation about idmapped mounts
in cephfs and find a warning that these are not fully compatible
things right now.

IMHO, there is only one real problem (which makes UID/GID-based
restrictions is not fully compatible with
an idmapped mounts). Is that we have to map caller's UID/GID according
to a mount idmapping when we
creating a new inode (mknod, mkdir, symlink, open(O_CREAT)). But it's
only because the caller's UID/GIDs are
used as the owner's UID/GID for newly created inode. Ideally, we need
to have two fields in ceph request,
one for a caller's UID/GID and another one for inode owner UID/GID.
But this requires cephfs protocol modification
(yes, it's a bit painful. But global VFS changes are painful too!). As
Christian pointed this is a reason why
he went this way in the first patchset version.

Maybe I'm not right, but both options to properly fix that VFS API
changes or cephfs protocol modification
are too expensive until we don't have a real requestors with a good
use case for idmapped mounts + UID/GID
based permissions. We already have a real and good use case for
idmapped mounts in Cephfs for LXD/LXC.
IMHO, it's better to move this thing forward step by step, because VFS
API/cephfs protocol changes will
take a really big amount of time and it's not obvious that it's worth
it, moreover it's not even clear that VFS API
change is the right way to deal with this problem. It seems to me that
Cephfs protocol change seems like a
more proper way here. At the same time I fully understand that you are
not happy about this option.

Just to conclude, we don't have any kind of cephfs degradation here,
all users without idmapping will not be affected,
all users who start using mount idmappings with cephfs will be aware
of this limitation.

[1] https://lore.kernel.org/all/20220105141023.vrrbfhti5apdvkz7@wittgenstein/

Kind regards,
Alex

>
> Thanks
>
> - Xiubo
>
> > -Greg
> >
> >> @Greg
> >>
> >> For the lookup requests the idmapping couldn't get the mapped UID/GID
> >> just like all the other requests, which is needed by the MDS permission
> >> check. Is that okay to make it disable the check for this case ? I am
> >> afraid this will break the MDS permssions logic.
> >>
> >> Any idea ?
> >>
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>
> >>> Kind regards,
> >>> Alex
> >>>
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Aleksandr Mikhalitsyn 2 years, 7 months ago

On Tue, Jun 13, 2023 at 4:54 PM Gregory Farnum <gfarnum@redhat.com> wrote:
>
> On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@redhat.com> wrote:
> >
> >
> > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> > >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> > >>>>
> > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > >>>>> Dear friends,
> > >>>>>
> > >>>>> This patchset was originally developed by Christian Brauner but I'll continue
> > >>>>> to push it forward. Christian allowed me to do that :)
> > >>>>>
> > >>>>> This feature is already actively used/tested with LXD/LXC project.
> > >>>>>
> > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> > >>> Hi Xiubo!
> > >>>
> > >>>> Could you rebase these patches to 'testing' branch ?
> > >>> Will do in -v6.
> > >>>
> > >>>> And you still have missed several places, for example the following cases:
> > >>>>
> > >>>>
> > >>>>      1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> > >>>>                req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> > >>>> mode);
> > >>> +
> > >>>
> > >>>>      2    389  fs/ceph/dir.c <<ceph_readdir>>
> > >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> > >>> +
> > >>>
> > >>>>      3    789  fs/ceph/dir.c <<ceph_lookup>>
> > >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> > >>> We don't have an idmapping passed to lookup from the VFS layer. As I
> > >>> mentioned before, it's just impossible now.
> > >> ->lookup() doesn't deal with idmappings and really can't otherwise you
> > >> risk ending up with inode aliasing which is really not something you
> > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> > >> value. So better not even risk exposing the idmapping in there at all.
> > > Thanks for adding, Christian!
> > >
> > > I agree, every time when we use an idmapping we need to be careful with
> > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> > > idmapping (not mount),
> > > but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > > according to an idmapping.
> > > Anyway, it's impossible at now and IMHO, until we don't have any
> > > practical use case where
> > > UID/GID-based path restriction is used in combination with idmapped
> > > mounts it's not worth to
> > > make such big changes in the VFS layer.
> > >
> > > May be I'm not right, but it seems like UID/GID-based path restriction
> > > is not a widespread
> > > feature and I can hardly imagine it to be used with the container
> > > workloads (for instance),
> > > because it will require to always keep in sync MDS permissions
> > > configuration with the
> > > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> > > It is useful when cephfs is used as an external storage on the host, but if you
> > > share cephfs with a few containers with different user namespaces idmapping...
> >
> > Hmm, while this will break the MDS permission check in cephfs then in
> > lookup case. If we really couldn't support it we should make it to
> > escape the check anyway or some OPs may fail and won't work as expected.

Dear Gregory,

Thanks for the fast reply!

>
> I don't pretend to know the details of the VFS (or even our linux
> client implementation), but I'm confused that this is apparently so
> hard. It looks to me like we currently always fill in the "caller_uid"
> with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
> valid to begin with? If it is, why can't the uid mapping be applied on
> that?

Applying an idmapping is not hard, it's as simple as replacing
from_kuid(&init_user_ns, req->r_cred->fsuid)
to
from_vfsuid(req->r_mnt_idmap, &init_user_ns, VFSUIDT_INIT(req->r_cred->fsuid))

but the problem is that we don't have req->r_mnt_idmap for all the requests.
For instance, we don't have idmap arguments (that come from the VFS
layer) for ->lookup
operation and many others. There are some reasons for that (Christian
has covered some of them).
So, it's not about my laziness to implement that. It's a real pain ;-)

>
> As both the client and the server share authority over the inode's
> state (including things like mode bits and owners), and need to do
> permission checking, being able to tell the server the relevant actor
> is inherently necessary. We also let admins restrict keys to
> particular UID/GID combinations as they wish, and it's not the most
> popular feature but it does get deployed. I would really expect a user
> of UID mapping to be one of the *most* likely to employ such a
> facility...maybe not with containers, but certainly end-user homedirs
> and shared spaces.
>
> Disabling the MDS auth checks is really not an option. I guess we
> could require any user employing idmapping to not be uid-restricted,
> and set the anonymous UID (does that work, Xiubo, or was it the broken
> one? In which case we'd have to default to root?). But that seems a
> bit janky to me.

That's an interesting point about anonymous UID, but at the same time,
We use these caller's fs UID/GID values as an owner's UID/GID for
newly created inodes.
It means that we can't use anonymous UID everywhere in this case
otherwise all new files/directories
will be owned by an anonymous user.

> -Greg

Kind regards,
Alex

>
> > @Greg
> >
> > For the lookup requests the idmapping couldn't get the mapped UID/GID
> > just like all the other requests, which is needed by the MDS permission
> > check. Is that okay to make it disable the check for this case ? I am
> > afraid this will break the MDS permssions logic.
> >
> > Any idea ?
> >
> > Thanks
> >
> > - Xiubo
> >
> >
> > > Kind regards,
> > > Alex
> > >
> >
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Aleksandr Mikhalitsyn 2 years, 7 months ago

On Tue, Jun 13, 2023 at 3:43 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>>
> >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> >>>>> Dear friends,
> >>>>>
> >>>>> This patchset was originally developed by Christian Brauner but I'll continue
> >>>>> to push it forward. Christian allowed me to do that :)
> >>>>>
> >>>>> This feature is already actively used/tested with LXD/LXC project.
> >>>>>
> >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> >>> Hi Xiubo!
> >>>
> >>>> Could you rebase these patches to 'testing' branch ?
> >>> Will do in -v6.
> >>>
> >>>> And you still have missed several places, for example the following cases:
> >>>>
> >>>>
> >>>>      1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> >>>>                req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> >>>> mode);
> >>> +
> >>>
> >>>>      2    389  fs/ceph/dir.c <<ceph_readdir>>
> >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> >>> +
> >>>
> >>>>      3    789  fs/ceph/dir.c <<ceph_lookup>>
> >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> >>> We don't have an idmapping passed to lookup from the VFS layer. As I
> >>> mentioned before, it's just impossible now.
> >> ->lookup() doesn't deal with idmappings and really can't otherwise you
> >> risk ending up with inode aliasing which is really not something you
> >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> >> value. So better not even risk exposing the idmapping in there at all.
> > Thanks for adding, Christian!
> >
> > I agree, every time when we use an idmapping we need to be careful with
> > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> > idmapping (not mount),
> > but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > according to an idmapping.
> > Anyway, it's impossible at now and IMHO, until we don't have any
> > practical use case where
> > UID/GID-based path restriction is used in combination with idmapped
> > mounts it's not worth to
> > make such big changes in the VFS layer.
> >
> > May be I'm not right, but it seems like UID/GID-based path restriction
> > is not a widespread
> > feature and I can hardly imagine it to be used with the container
> > workloads (for instance),
> > because it will require to always keep in sync MDS permissions
> > configuration with the
> > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> > It is useful when cephfs is used as an external storage on the host, but if you
> > share cephfs with a few containers with different user namespaces idmapping...
>
> Hmm, while this will break the MDS permission check in cephfs then in
> lookup case. If we really couldn't support it we should make it to
> escape the check anyway or some OPs may fail and won't work as expected.

Hi Xiubo!

Disabling UID/GID checks on the MDS side looks reasonable. IMHO the
most important checks are:
- open
- mknod/mkdir/symlink/rename
and for these checks we already have an idmapping.

Also, I want to add that it's a little bit unusual when permission
checks are done against the caller UID/GID.
Usually, if we have opened a file descriptor and, for instance, passed
this file descriptor through a unix socket then
file descriptor holder will be able to use it in accordance with the
flags (O_RDONLY, O_RDWR, ...).
We also have ->f_cred on the struct file that contains credentials of
the file opener and permission checks are usually done
based on this. But in cephfs we are always using syscall caller's
credentials. It makes cephfs file descriptor "not transferable"
in terms of permission checks.

Kind regards,
Alex

>
> @Greg
>
> For the lookup requests the idmapping couldn't get the mapped UID/GID
> just like all the other requests, which is needed by the MDS permission
> check. Is that okay to make it disable the check for this case ? I am
> afraid this will break the MDS permssions logic.
>
> Any idea ?
>
> Thanks
>
> - Xiubo
>
>
> > Kind regards,
> > Alex
> >
>

Re: [PATCH v5 00/14] ceph: support idmapped mounts

Posted by Christian Brauner 2 years, 7 months ago

On Tue, Jun 13, 2023 at 02:46:02PM +0200, Aleksandr Mikhalitsyn wrote:
> On Tue, Jun 13, 2023 at 3:43 AM Xiubo Li <xiubli@redhat.com> wrote:
> >
> >
> > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote:
> > >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@redhat.com> wrote:
> > >>>>
> > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > >>>>> Dear friends,
> > >>>>>
> > >>>>> This patchset was originally developed by Christian Brauner but I'll continue
> > >>>>> to push it forward. Christian allowed me to do that :)
> > >>>>>
> > >>>>> This feature is already actively used/tested with LXD/LXC project.
> > >>>>>
> > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master):
> > >>> Hi Xiubo!
> > >>>
> > >>>> Could you rebase these patches to 'testing' branch ?
> > >>> Will do in -v6.
> > >>>
> > >>>> And you still have missed several places, for example the following cases:
> > >>>>
> > >>>>
> > >>>>      1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> > >>>>                req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR,
> > >>>> mode);
> > >>> +
> > >>>
> > >>>>      2    389  fs/ceph/dir.c <<ceph_readdir>>
> > >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
> > >>> +
> > >>>
> > >>>>      3    789  fs/ceph/dir.c <<ceph_lookup>>
> > >>>>                req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
> > >>> We don't have an idmapping passed to lookup from the VFS layer. As I
> > >>> mentioned before, it's just impossible now.
> > >> ->lookup() doesn't deal with idmappings and really can't otherwise you
> > >> risk ending up with inode aliasing which is really not something you
> > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide
> > >> value. So better not even risk exposing the idmapping in there at all.
> > > Thanks for adding, Christian!
> > >
> > > I agree, every time when we use an idmapping we need to be careful with
> > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem
> > > idmapping (not mount),
> > > but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > > according to an idmapping.
> > > Anyway, it's impossible at now and IMHO, until we don't have any
> > > practical use case where
> > > UID/GID-based path restriction is used in combination with idmapped
> > > mounts it's not worth to
> > > make such big changes in the VFS layer.
> > >
> > > May be I'm not right, but it seems like UID/GID-based path restriction
> > > is not a widespread
> > > feature and I can hardly imagine it to be used with the container
> > > workloads (for instance),
> > > because it will require to always keep in sync MDS permissions
> > > configuration with the
> > > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin.
> > > It is useful when cephfs is used as an external storage on the host, but if you
> > > share cephfs with a few containers with different user namespaces idmapping...
> >
> > Hmm, while this will break the MDS permission check in cephfs then in
> > lookup case. If we really couldn't support it we should make it to
> > escape the check anyway or some OPs may fail and won't work as expected.
> 
> Hi Xiubo!
> 
> Disabling UID/GID checks on the MDS side looks reasonable. IMHO the
> most important checks are:
> - open
> - mknod/mkdir/symlink/rename
> and for these checks we already have an idmapping.
> 
> Also, I want to add that it's a little bit unusual when permission
> checks are done against the caller UID/GID.

The server side permission checking based on the sender's fs{g,u}id is
rather esoteric imho. So I would just disable it for idmapped mounts.

> Usually, if we have opened a file descriptor and, for instance, passed
> this file descriptor through a unix socket then
> file descriptor holder will be able to use it in accordance with the
> flags (O_RDONLY, O_RDWR, ...).
> We also have ->f_cred on the struct file that contains credentials of
> the file opener and permission checks are usually done
> based on this. But in cephfs we are always using syscall caller's
> credentials. It makes cephfs file descriptor "not transferable"
> in terms of permission checks.

Yeah, that's another good point.