[PATCH] ceph: fix cap ref leak via netfs init_request

Patrick Donnelly posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
fs/ceph/addr.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
[PATCH] ceph: fix cap ref leak via netfs init_request
Posted by Patrick Donnelly 1 month, 3 weeks ago
From: Patrick Donnelly <pdonnell@redhat.com>

Log recovered from a user's cluster:

    <7>[ 5413.970692] ceph:  get_cap_refs 00000000958c114b ret 1 got Fr
    <7>[ 5413.970695] ceph:  start_read 00000000958c114b, no cache cap
    ...
    <7>[ 5473.934609] ceph:   my wanted = Fr, used = Fr, dirty -
    <7>[ 5473.934616] ceph:  revocation: pAsLsXsFr -> pAsLsXs (revoking Fr)
    <7>[ 5473.934632] ceph:  __ceph_caps_issued 00000000958c114b cap 00000000f7784259 issued pAsLsXs
    <7>[ 5473.934638] ceph:  check_caps 10000000e68.fffffffffffffffe file_want - used Fr dirty - flushing - issued pAsLsXs revoking Fr retain pAsLsXsFsr  AUTHONLY NOINVAL FLUSH_FORCE

The MDS subsequently complains that the kernel client is late releasing caps.

Closes: https://tracker.ceph.com/issues/67008
Fixes: a5c9dc4451394b2854493944dcc0ff71af9705a3 ("ceph: Make ceph_init_request() check caps on readahead")
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Cc: stable@vger.kernel.org
---
 fs/ceph/addr.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 53fef258c2bc..702c6a730b70 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -489,8 +489,11 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
 	rreq->io_streams[0].sreq_max_len = fsc->mount_options->rsize;
 
 out:
-	if (ret < 0)
+	if (ret < 0) {
+		if (got)
+			ceph_put_cap_refs(ceph_inode(inode), got);
 		kfree(priv);
+	}
 
 	return ret;
 }

base-commit: e32cde8d2bd7d251a8f9b434143977ddf13dcec6
-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Re: [PATCH] ceph: fix cap ref leak via netfs init_request
Posted by Ilya Dryomov 1 month, 3 weeks ago
On Wed, Oct 2, 2024 at 10:08 PM Patrick Donnelly <batrick@batbytes.com> wrote:
>
> From: Patrick Donnelly <pdonnell@redhat.com>
>
> Log recovered from a user's cluster:
>
>     <7>[ 5413.970692] ceph:  get_cap_refs 00000000958c114b ret 1 got Fr
>     <7>[ 5413.970695] ceph:  start_read 00000000958c114b, no cache cap

Hi Patrick,

Noting that start_read() was removed in kernel 5.13 in commit
49870056005c ("ceph: convert ceph_readpages to ceph_readahead").

>     ...
>     <7>[ 5473.934609] ceph:   my wanted = Fr, used = Fr, dirty -
>     <7>[ 5473.934616] ceph:  revocation: pAsLsXsFr -> pAsLsXs (revoking Fr)
>     <7>[ 5473.934632] ceph:  __ceph_caps_issued 00000000958c114b cap 00000000f7784259 issued pAsLsXs
>     <7>[ 5473.934638] ceph:  check_caps 10000000e68.fffffffffffffffe file_want - used Fr dirty - flushing - issued pAsLsXs revoking Fr retain pAsLsXsFsr  AUTHONLY NOINVAL FLUSH_FORCE
>
> The MDS subsequently complains that the kernel client is late releasing caps.
>
> Closes: https://tracker.ceph.com/issues/67008
> Fixes: a5c9dc4451394b2854493944dcc0ff71af9705a3 ("ceph: Make ceph_init_request() check caps on readahead")

I think it's worth going into a bit more detail here because
superficially this commit just replaced

    ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, true, &got);
    if (ret < 0)
            dout("start_read %p, error getting cap\n", inode);
    else if (!(got & want))
            dout("start_read %p, no cache cap\n", inode);

    if (ret <= 0)
            return;

in ceph_readahead() with

    ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, true, &got);
    if (ret < 0) {
            dout("start_read %p, error getting cap\n", inode);
            return ret;
    }

    if (!(got & want)) {
            dout("start_read %p, no cache cap\n", inode);
            return -EACCES;
    }
    if (ret == 0)
            return -EACCES;

in ceph_init_request().  Neither called ceph_put_cap_refs() before
bailing.  It was commit 49870056005c ("ceph: convert ceph_readpages to
ceph_readahead") that turned a direct call to ceph_put_cap_refs() in
start_read() to one in ceph_readahead_cleanup() (later renamed to
ceph_netfs_free_request()).

The actual problem is that netfs_alloc_request() just frees rreq if
init_request() callout fails and ceph_netfs_free_request() is never
called, right?  If so, I'd mention that explicitly and possibly also
reference commit 2de160417315 ("netfs: Change ->init_request() to
return an error code") which introduced that.

> Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  fs/ceph/addr.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 53fef258c2bc..702c6a730b70 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -489,8 +489,11 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
>         rreq->io_streams[0].sreq_max_len = fsc->mount_options->rsize;
>
>  out:
> -       if (ret < 0)
> +       if (ret < 0) {
> +               if (got)
> +                       ceph_put_cap_refs(ceph_inode(inode), got);
>                 kfree(priv);
> +       }
>
>         return ret;
>  }

The change itself looks fine.  Great catch!

Thanks,

                Ilya
Re: [PATCH] ceph: fix cap ref leak via netfs init_request
Posted by David Howells 1 month, 3 weeks ago
Ilya Dryomov <idryomov@gmail.com> wrote:

> The actual problem is that netfs_alloc_request() just frees rreq if
> init_request() callout fails and ceph_netfs_free_request() is never
> called, right?

I could make it call ->free_request() in the case that ->init_request()
returns an error, though I'd prefer that the cleanup be done in
->init_request() rather than passing a partially set-up state to
->free_request().

David
Re: [PATCH] ceph: fix cap ref leak via netfs init_request
Posted by Patrick Donnelly 1 month, 3 weeks ago
On Wed, Oct 2, 2024 at 5:52 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Wed, Oct 2, 2024 at 10:08 PM Patrick Donnelly <batrick@batbytes.com> wrote:
> >
> > From: Patrick Donnelly <pdonnell@redhat.com>
> >
> > Log recovered from a user's cluster:
> >
> >     <7>[ 5413.970692] ceph:  get_cap_refs 00000000958c114b ret 1 got Fr
> >     <7>[ 5413.970695] ceph:  start_read 00000000958c114b, no cache cap
>
> Hi Patrick,
>
> Noting that start_read() was removed in kernel 5.13 in commit
> 49870056005c ("ceph: convert ceph_readpages to ceph_readahead").
>
> >     ...
> >     <7>[ 5473.934609] ceph:   my wanted = Fr, used = Fr, dirty -
> >     <7>[ 5473.934616] ceph:  revocation: pAsLsXsFr -> pAsLsXs (revoking Fr)
> >     <7>[ 5473.934632] ceph:  __ceph_caps_issued 00000000958c114b cap 00000000f7784259 issued pAsLsXs
> >     <7>[ 5473.934638] ceph:  check_caps 10000000e68.fffffffffffffffe file_want - used Fr dirty - flushing - issued pAsLsXs revoking Fr retain pAsLsXsFsr  AUTHONLY NOINVAL FLUSH_FORCE
> >
> > The MDS subsequently complains that the kernel client is late releasing caps.
> >
> > Closes: https://tracker.ceph.com/issues/67008
> > Fixes: a5c9dc4451394b2854493944dcc0ff71af9705a3 ("ceph: Make ceph_init_request() check caps on readahead")
>
> I think it's worth going into a bit more detail here because
> superficially this commit just replaced
>
>     ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, true, &got);
>     if (ret < 0)
>             dout("start_read %p, error getting cap\n", inode);
>     else if (!(got & want))
>             dout("start_read %p, no cache cap\n", inode);
>
>     if (ret <= 0)
>             return;
>
> in ceph_readahead() with
>
>     ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, true, &got);
>     if (ret < 0) {
>             dout("start_read %p, error getting cap\n", inode);
>             return ret;
>     }
>
>     if (!(got & want)) {
>             dout("start_read %p, no cache cap\n", inode);
>             return -EACCES;
>     }
>     if (ret == 0)
>             return -EACCES;
>
> in ceph_init_request().  Neither called ceph_put_cap_refs() before
> bailing.  It was commit 49870056005c ("ceph: convert ceph_readpages to
> ceph_readahead") that turned a direct call to ceph_put_cap_refs() in
> start_read() to one in ceph_readahead_cleanup() (later renamed to
> ceph_netfs_free_request()).
>
> The actual problem is that netfs_alloc_request() just frees rreq if
> init_request() callout fails and ceph_netfs_free_request() is never
> called, right?  If so, I'd mention that explicitly and possibly also
> reference commit 2de160417315 ("netfs: Change ->init_request() to
> return an error code") which introduced that.

Yes, this looks right. To be clear, we were passing "got" as the
"priv" pointer but it was thrown out when 2de160417315 changed the
error handling. Furthermore, a5c9dc445 made it even worse by
discarding "got" completely on error.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D