From nobody Sat Oct 4 22:37:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A345E2EA72F for ; Tue, 12 Aug 2025 09:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754992673; cv=none; b=lpJr5R2FkQ/QxdYZpMUoBiAKFhdu5nYY03AThN94MzsZ6K6tmLzYkR/N9r1Vqxty8Yn21Qu6Pz/DavzJfE2II0V3eEcCbHI9NdtB9Ly9J3PjrPczUqbREqscHijadPj5GKIp5pLIuiAnWBC8UFz+awvh1yvrwY8r9JGHAaq1ARI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754992673; c=relaxed/simple; bh=SklPlbBvDoSPnXKeDj95J8xaphdcnRklEvsYG5qYoik=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ewLobNpBHnpHvqvLNgvgF34TS3MeNtTN7gfuXT7HSyt5iz8WRQSmXs5G9yYtZHMryhLOTCAtHAkafE5znHkAcMGOdoXTm9wecGPqztRhnySu+P+YDDUqRVtfj5kNvrg07CajQZ4eZOkLvbhGTUnKRral9Yzv7j6F0e+gr5LaK24= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QMtM2cQG; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QMtM2cQG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754992668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8/+35zgVMREp5iu4WHb0LfnIA1EFTWDo15YR87u+kvI=; b=QMtM2cQGruZe4WWnM4e1tjOICyEKChtfnLZ9GqWOB077c8v8rGri3UyaPGT8mxpTGl9J+0 C24+1DyCCDc7+S4P4L3neNPe9wijJrkkhDsiaatIYQpeSGIbhH1oHCX1sY2u6Nj78OSLzD Y6JjYKIvGVI2AalqmYvja12I6WriMpY= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-rB2lcbdfMvm38t09ZqZwIg-1; Tue, 12 Aug 2025 05:57:46 -0400 X-MC-Unique: rB2lcbdfMvm38t09ZqZwIg-1 X-Mimecast-MFC-AGG-ID: rB2lcbdfMvm38t09ZqZwIg_1754992666 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-af92f13d2abso349426866b.2 for ; Tue, 12 Aug 2025 02:57:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754992666; x=1755597466; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8/+35zgVMREp5iu4WHb0LfnIA1EFTWDo15YR87u+kvI=; b=unxnth72vAeNOvgE2ARbiLjeTw2m6bHZIXhzBxFAgPCEpcCYNTiuoXkfnYO/hINu4n Mfs9fxM9YF2up3WjeWht8fsTt+86Vt4SRon8DN/roWJBaZPT/kiolTDApNQPcxQVfjLS J/JJBYGu1vdJFjuQUnFypamvHVh2A9StG4sX0fwtE7tYekvQ/eNb0r4IzMtj4iRPhtft pKDGnNXUbofH2YucY2UyUKLEzdDSWNgDTHUVlS7We0a/b5PjgXoi+oOaJrSgmngmf/Dw /yYqwficnqlfRE/Chryos+DvLxf1rSd4csWrpOsGjvXwclzRrNDDGIInt7KoLB0nRuoo rTTw== X-Gm-Message-State: AOJu0Yzj56Is+Fl1IaEi1pH6kYLsftSO0hpNfIBiOX6AunC3AEjcj/Df kuMjMLQqesDyyAcMNK59qTZrMjy8yk7rf5qRB2iBmoIlybXyK5Z6HRBjP7DKlbuNbUpiRDhaWAt tYQeNaCsQN61eJIH2hMrOlkwy0QXTR+CywtAajmdKKo0fPZnAM1eY8MGeRFvDHXub9A== X-Gm-Gg: ASbGncvmPhULEJC6f2FNgUTaHCtqR2Lbj72nwsTvkPDa8cHLokOGP9d6XS7yehlE9US l5O/ep+oJpoFq2W7li7RYTcOjo5NrCkVbeGKPWi5+hDCesRlHtUuKWQcsYLgBem1Y/aoSz+LJ+/ xI8AMXhk1guryPRZe1eLTyaFOvRRthm/mtzXsa/KMzKY6wdV03YVw3lKm8NaOjt9bh3HIa7JGPm jSJYQZ3+ZiJ3E8fdOIvqVRRys8K4mEPc15VHaKe1tDPO28PxVjh3SM9ExniPAPDZB3JN6B4x+HV YU8sKTBJjU78W2fcfYH8DtqbUuFdrxrR29RuaFVPUv09AVwIDzagqzxdcPZbn0I2LMHMqV32mQ= = X-Received: by 2002:a17:907:3e94:b0:af9:a4de:f092 with SMTP id a640c23a62f3a-afa1e1279a7mr263601266b.55.1754992665298; Tue, 12 Aug 2025 02:57:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFbp8tb+iGKIzm9Y8BOOySGRPuBHGHBQarfskNX2G2WxlpFe/XIuH3JTRIQ3ji8UW0emz5hXw== X-Received: by 2002:a17:907:3e94:b0:af9:a4de:f092 with SMTP id a640c23a62f3a-afa1e1279a7mr263598566b.55.1754992664739; Tue, 12 Aug 2025 02:57:44 -0700 (PDT) Received: from cluster.. (4f.55.790d.ip4.static.sl-reverse.com. [13.121.85.79]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-af91a21c0a4sm2180821466b.109.2025.08.12.02.57.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Aug 2025 02:57:44 -0700 (PDT) From: Alex Markuze To: ceph-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Slava.Dubeyko@ibm.com, idryomov@gmail.com, Alex Markuze Subject: [PATCH v4 1/2] ceph: fix client race condition validating r_parent before applying state Date: Tue, 12 Aug 2025 09:57:38 +0000 Message-Id: <20250812095739.3194337-2-amarkuze@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250812095739.3194337-1-amarkuze@redhat.com> References: <20250812095739.3194337-1-amarkuze@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add validation to ensure the cached parent directory inode matches the directory info in MDS replies. This prevents client-side race conditions where concurrent operations (e.g. rename) cause r_parent to become stale between request initiation and reply processing, which could lead to applying state changes to incorrect directory inodes. Signed-off-by: Alex Markuze Reviewed-by: Viacheslav Dubeyko --- fs/ceph/debugfs.c | 14 ++-- fs/ceph/dir.c | 17 ++--- fs/ceph/file.c | 24 +++---- fs/ceph/inode.c | 7 +- fs/ceph/mds_client.c | 165 ++++++++++++++++++++++++++----------------- fs/ceph/mds_client.h | 18 +++-- 6 files changed, 139 insertions(+), 106 deletions(-) diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c index fdd404fc8112..f3fe786b4143 100644 --- a/fs/ceph/debugfs.c +++ b/fs/ceph/debugfs.c @@ -55,8 +55,6 @@ static int mdsc_show(struct seq_file *s, void *p) struct ceph_mds_client *mdsc =3D fsc->mdsc; struct ceph_mds_request *req; struct rb_node *rp; - int pathlen =3D 0; - u64 pathbase; char *path; =20 mutex_lock(&mdsc->mutex); @@ -81,8 +79,8 @@ static int mdsc_show(struct seq_file *s, void *p) if (req->r_inode) { seq_printf(s, " #%llx", ceph_ino(req->r_inode)); } else if (req->r_dentry) { - path =3D ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0); if (IS_ERR(path)) path =3D NULL; spin_lock(&req->r_dentry->d_lock); @@ -91,7 +89,7 @@ static int mdsc_show(struct seq_file *s, void *p) req->r_dentry, path ? path : ""); spin_unlock(&req->r_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); } else if (req->r_path1) { seq_printf(s, " #%llx/%s", req->r_ino1.ino, req->r_path1); @@ -100,8 +98,8 @@ static int mdsc_show(struct seq_file *s, void *p) } =20 if (req->r_old_dentry) { - path =3D ceph_mdsc_build_path(mdsc, req->r_old_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, req->r_old_dentry, &path_info, 0); if (IS_ERR(path)) path =3D NULL; spin_lock(&req->r_old_dentry->d_lock); @@ -111,7 +109,7 @@ static int mdsc_show(struct seq_file *s, void *p) req->r_old_dentry, path ? path : ""); spin_unlock(&req->r_old_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); } else if (req->r_path2 && req->r_op !=3D CEPH_MDS_OP_SYMLINK) { if (req->r_ino2.ino) seq_printf(s, " #%llx/%s", req->r_ino2.ino, diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 1535b6540e9d..08c3129cd25d 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1271,10 +1271,8 @@ static void ceph_async_unlink_cb(struct ceph_mds_cli= ent *mdsc, =20 /* If op failed, mark everyone involved for errors */ if (result) { - int pathlen =3D 0; - u64 base =3D 0; - char *path =3D ceph_mdsc_build_path(mdsc, dentry, &pathlen, - &base, 0); + struct ceph_path_info path_info =3D {0}; + char *path =3D ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); =20 /* mark error on parent + clear complete */ mapping_set_error(req->r_parent->i_mapping, result); @@ -1288,8 +1286,8 @@ static void ceph_async_unlink_cb(struct ceph_mds_clie= nt *mdsc, mapping_set_error(req->r_old_inode->i_mapping, result); =20 pr_warn_client(cl, "failure path=3D(%llx)%s result=3D%d!\n", - base, IS_ERR(path) ? "<>" : path, result); - ceph_mdsc_free_path(path, pathlen); + path_info.vino.ino, IS_ERR(path) ? "<>" : path, result); + ceph_mdsc_free_path_info(&path_info); } out: iput(req->r_old_inode); @@ -1347,8 +1345,6 @@ static int ceph_unlink(struct inode *dir, struct dent= ry *dentry) int err =3D -EROFS; int op; char *path; - int pathlen; - u64 pathbase; =20 if (ceph_snap(dir) =3D=3D CEPH_SNAPDIR) { /* rmdir .snap/foo is RMSNAP */ @@ -1367,14 +1363,15 @@ static int ceph_unlink(struct inode *dir, struct de= ntry *dentry) if (!dn) { try_async =3D false; } else { - path =3D ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, dn, &path_info, 0); if (IS_ERR(path)) { try_async =3D false; err =3D 0; } else { err =3D ceph_mds_check_access(mdsc, path, MAY_WRITE); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dn); =20 /* For none EACCES cases will let the MDS do the mds auth check */ diff --git a/fs/ceph/file.c b/fs/ceph/file.c index c92ef1fc5b8a..d222dd192fec 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -368,8 +368,6 @@ int ceph_open(struct inode *inode, struct file *file) int flags, fmode, wanted; struct dentry *dentry; char *path; - int pathlen; - u64 pathbase; bool do_sync =3D false; int mask =3D MAY_READ; =20 @@ -399,14 +397,15 @@ int ceph_open(struct inode *inode, struct file *file) if (!dentry) { do_sync =3D true; } else { - path =3D ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); if (IS_ERR(path)) { do_sync =3D true; err =3D 0; } else { err =3D ceph_mds_check_access(mdsc, path, mask); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dentry); =20 /* For none EACCES cases will let the MDS do the mds auth check */ @@ -613,15 +612,13 @@ static void ceph_async_create_cb(struct ceph_mds_clie= nt *mdsc, mapping_set_error(req->r_parent->i_mapping, result); =20 if (result) { - int pathlen =3D 0; - u64 base =3D 0; - char *path =3D ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &base, 0); + struct ceph_path_info path_info =3D {0}; + char *path =3D ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0); =20 pr_warn_client(cl, "async create failure path=3D(%llx)%s result=3D%d!\n", - base, IS_ERR(path) ? "<>" : path, result); - ceph_mdsc_free_path(path, pathlen); + path_info.vino.ino, IS_ERR(path) ? "<>" : path, result); + ceph_mdsc_free_path_info(&path_info); =20 ceph_dir_clear_complete(req->r_parent); if (!d_unhashed(dentry)) @@ -789,8 +786,6 @@ int ceph_atomic_open(struct inode *dir, struct dentry *= dentry, int mask; int err; char *path; - int pathlen; - u64 pathbase; =20 doutc(cl, "%p %llx.%llx dentry %p '%pd' %s flags %d mode 0%o\n", dir, ceph_vinop(dir), dentry, dentry, @@ -812,7 +807,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *= dentry, if (!dn) { try_async =3D false; } else { - path =3D ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, dn, &path_info, 0); if (IS_ERR(path)) { try_async =3D false; err =3D 0; @@ -824,7 +820,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *= dentry, mask |=3D MAY_WRITE; err =3D ceph_mds_check_access(mdsc, path, mask); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dn); =20 /* For none EACCES cases will let the MDS do the mds auth check */ diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 4365d17ce6d7..36981950a368 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2502,22 +2502,21 @@ int __ceph_setattr(struct mnt_idmap *idmap, struct = inode *inode, int truncate_retry =3D 20; /* The RMW will take around 50ms */ struct dentry *dentry; char *path; - int pathlen; - u64 pathbase; bool do_sync =3D false; =20 dentry =3D d_find_alias(inode); if (!dentry) { do_sync =3D true; } else { - path =3D ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); if (IS_ERR(path)) { do_sync =3D true; err =3D 0; } else { err =3D ceph_mds_check_access(mdsc, path, MAY_WRITE); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dentry); =20 /* For none EACCES cases will let the MDS do the mds auth check */ diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 9eed6d73a508..f31278859119 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2729,7 +2729,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_= request *req, u32 *plen) * foo/.snap/bar -> foo//bar */ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *de= ntry, - int *plen, u64 *pbase, int for_wire) + struct ceph_path_info *path_info, int for_wire) { struct ceph_client *cl =3D mdsc->fsc->client; struct dentry *cur; @@ -2839,16 +2839,30 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *= mdsc, struct dentry *dentry, return ERR_PTR(-ENAMETOOLONG); } =20 - *pbase =3D base; - *plen =3D PATH_MAX - 1 - pos; + /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + + path_info->vino.ino =3D base; + path_info->pathlen =3D PATH_MAX - 1 - pos; + path_info->path =3D path + pos; + path_info->freepath =3D true; + + /* Set snap from dentry if available */ + if (d_inode(dentry)) + path_info->vino.snap =3D ceph_snap(d_inode(dentry)); + else + path_info->vino.snap =3D CEPH_NOSNAP; + doutc(cl, "on %p %d built %llx '%.*s'\n", dentry, d_count(dentry), - base, *plen, path + pos); + base, PATH_MAX - 1 - pos, path + pos); return path + pos; } =20 + + static int build_dentry_path(struct ceph_mds_client *mdsc, struct dentry *= dentry, - struct inode *dir, const char **ppath, int *ppathlen, - u64 *pino, bool *pfreepath, bool parent_locked) + struct inode *dir, struct ceph_path_info *path_info, + bool parent_locked) { char *path; =20 @@ -2857,41 +2871,47 @@ static int build_dentry_path(struct ceph_mds_client= *mdsc, struct dentry *dentry dir =3D d_inode_rcu(dentry->d_parent); if (dir && parent_locked && ceph_snap(dir) =3D=3D CEPH_NOSNAP && !IS_ENCRYPTED(dir)) { - *pino =3D ceph_ino(dir); + path_info->vino.ino =3D ceph_ino(dir); + path_info->vino.snap =3D ceph_snap(dir); rcu_read_unlock(); - *ppath =3D dentry->d_name.name; - *ppathlen =3D dentry->d_name.len; + path_info->path =3D dentry->d_name.name; + path_info->pathlen =3D dentry->d_name.len; + path_info->freepath =3D false; return 0; } rcu_read_unlock(); - path =3D ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path =3D ceph_mdsc_build_path(mdsc, dentry, path_info, 1); if (IS_ERR(path)) return PTR_ERR(path); - *ppath =3D path; - *pfreepath =3D true; + /* + * ceph_mdsc_build_path already fills path_info, including snap handling. + */ return 0; } =20 -static int build_inode_path(struct inode *inode, - const char **ppath, int *ppathlen, u64 *pino, - bool *pfreepath) +static int build_inode_path(struct inode *inode, struct ceph_path_info *pa= th_info) { struct ceph_mds_client *mdsc =3D ceph_sb_to_mdsc(inode->i_sb); struct dentry *dentry; char *path; =20 if (ceph_snap(inode) =3D=3D CEPH_NOSNAP) { - *pino =3D ceph_ino(inode); - *ppathlen =3D 0; + path_info->vino.ino =3D ceph_ino(inode); + path_info->vino.snap =3D ceph_snap(inode); + path_info->pathlen =3D 0; + path_info->freepath =3D false; return 0; } dentry =3D d_find_alias(inode); - path =3D ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path =3D ceph_mdsc_build_path(mdsc, dentry, path_info, 1); dput(dentry); if (IS_ERR(path)) return PTR_ERR(path); - *ppath =3D path; - *pfreepath =3D true; + /* + * ceph_mdsc_build_path already fills path_info, including snap from dent= ry. + * Override with inode's snap since that's what this function is for. + */ + path_info->vino.snap =3D ceph_snap(inode); return 0; } =20 @@ -2901,26 +2921,31 @@ static int build_inode_path(struct inode *inode, */ static int set_request_path_attr(struct ceph_mds_client *mdsc, struct inod= e *rinode, struct dentry *rdentry, struct inode *rdiri, - const char *rpath, u64 rino, const char **ppath, - int *pathlen, u64 *ino, bool *freepath, + const char *rpath, u64 rino, struct ceph_path_info *path_info, bool parent_locked) { struct ceph_client *cl =3D mdsc->fsc->client; int r =3D 0; =20 + /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + if (rinode) { - r =3D build_inode_path(rinode, ppath, pathlen, ino, freepath); + r =3D build_inode_path(rinode, path_info); doutc(cl, " inode %p %llx.%llx\n", rinode, ceph_ino(rinode), ceph_snap(rinode)); } else if (rdentry) { - r =3D build_dentry_path(mdsc, rdentry, rdiri, ppath, pathlen, ino, - freepath, parent_locked); - doutc(cl, " dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen, *ppath); + r =3D build_dentry_path(mdsc, rdentry, rdiri, path_info, parent_locked); + doutc(cl, " dentry %p %llx/%.*s\n", rdentry, path_info->vino.ino, + path_info->pathlen, path_info->path); } else if (rpath || rino) { - *ino =3D rino; - *ppath =3D rpath; - *pathlen =3D rpath ? strlen(rpath) : 0; - doutc(cl, " path %.*s\n", *pathlen, rpath); + path_info->vino.ino =3D rino; + path_info->vino.snap =3D CEPH_NOSNAP; + path_info->path =3D rpath; + path_info->pathlen =3D rpath ? strlen(rpath) : 0; + path_info->freepath =3D false; + + doutc(cl, " path %.*s\n", path_info->pathlen, rpath); } =20 return r; @@ -2997,11 +3022,8 @@ static struct ceph_msg *create_request_message(struc= t ceph_mds_session *session, struct ceph_client *cl =3D mdsc->fsc->client; struct ceph_msg *msg; struct ceph_mds_request_head_legacy *lhead; - const char *path1 =3D NULL; - const char *path2 =3D NULL; - u64 ino1 =3D 0, ino2 =3D 0; - int pathlen1 =3D 0, pathlen2 =3D 0; - bool freepath1 =3D false, freepath2 =3D false; + struct ceph_path_info path_info1 =3D {0}; + struct ceph_path_info path_info2 =3D {0}; struct dentry *old_dentry =3D NULL; int len; u16 releases; @@ -3011,25 +3033,42 @@ static struct ceph_msg *create_request_message(stru= ct ceph_mds_session *session, u16 request_head_version =3D mds_supported_head_version(session); kuid_t caller_fsuid =3D req->r_cred->fsuid; kgid_t caller_fsgid =3D req->r_cred->fsgid; + bool parent_locked =3D test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_fla= gs); =20 - ret =3D set_request_path_attr(mdsc, req->r_inode, req->r_dentry, - req->r_parent, req->r_path1, req->r_ino1.ino, - &path1, &pathlen1, &ino1, &freepath1, - test_bit(CEPH_MDS_R_PARENT_LOCKED, - &req->r_req_flags)); + ret =3D set_request_path_attr(mdsc, req->r_inode, req->r_dentry, + req->r_parent, req->r_path1, req->r_ino1.ino, + &path_info1, parent_locked); if (ret < 0) { msg =3D ERR_PTR(ret); goto out; } =20 + /* + * When the parent directory's i_rwsem is *not* locked, req->r_parent may + * have become stale (e.g. after a concurrent rename) between the time the + * dentry was looked up and now. If we detect that the stored r_parent + * does not match the inode number we just encoded for the request, switch + * to the correct inode so that the MDS receives a valid parent reference. + */ + if (!parent_locked && + req->r_parent && path_info1.vino.ino && ceph_ino(req->r_parent) !=3D= path_info1.vino.ino) { + struct inode *correct_dir =3D ceph_get_inode(mdsc->fsc->sb, path_info1.v= ino, NULL); + if (!IS_ERR(correct_dir)) { + WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updatin= g\n", + ceph_ino(req->r_parent), path_info1.vino.ino); + iput(req->r_parent); + req->r_parent =3D correct_dir; + } + } + /* If r_old_dentry is set, then assume that its parent is locked */ if (req->r_old_dentry && !(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED)) old_dentry =3D req->r_old_dentry; ret =3D set_request_path_attr(mdsc, NULL, old_dentry, - req->r_old_dentry_dir, - req->r_path2, req->r_ino2.ino, - &path2, &pathlen2, &ino2, &freepath2, true); + req->r_old_dentry_dir, + req->r_path2, req->r_ino2.ino, + &path_info2, true); if (ret < 0) { msg =3D ERR_PTR(ret); goto out_free1; @@ -3060,7 +3099,7 @@ static struct ceph_msg *create_request_message(struct= ceph_mds_session *session, =20 /* filepaths */ len +=3D 2 * (1 + sizeof(u32) + sizeof(u64)); - len +=3D pathlen1 + pathlen2; + len +=3D path_info1.pathlen + path_info2.pathlen; =20 /* cap releases */ len +=3D sizeof(struct ceph_mds_request_release) * @@ -3068,9 +3107,9 @@ static struct ceph_msg *create_request_message(struct= ceph_mds_session *session, !!req->r_old_inode_drop + !!req->r_old_dentry_drop); =20 if (req->r_dentry_drop) - len +=3D pathlen1; + len +=3D path_info1.pathlen; if (req->r_old_dentry_drop) - len +=3D pathlen2; + len +=3D path_info2.pathlen; =20 /* MClientRequest tail */ =20 @@ -3183,8 +3222,8 @@ static struct ceph_msg *create_request_message(struct= ceph_mds_session *session, lhead->ino =3D cpu_to_le64(req->r_deleg_ino); lhead->args =3D req->r_args; =20 - ceph_encode_filepath(&p, end, ino1, path1); - ceph_encode_filepath(&p, end, ino2, path2); + ceph_encode_filepath(&p, end, path_info1.vino.ino, path_info1.path); + ceph_encode_filepath(&p, end, path_info2.vino.ino, path_info2.path); =20 /* make note of release offset, in case we need to replay */ req->r_request_release_offset =3D p - msg->front.iov_base; @@ -3247,11 +3286,9 @@ static struct ceph_msg *create_request_message(struc= t ceph_mds_session *session, msg->hdr.data_off =3D cpu_to_le16(0); =20 out_free2: - if (freepath2) - ceph_mdsc_free_path((char *)path2, pathlen2); + ceph_mdsc_free_path_info(&path_info2); out_free1: - if (freepath1) - ceph_mdsc_free_path((char *)path1, pathlen1); + ceph_mdsc_free_path_info(&path_info1); out: return msg; out_err: @@ -4608,24 +4645,20 @@ static int reconnect_caps_cb(struct inode *inode, i= nt mds, void *arg) struct ceph_pagelist *pagelist =3D recon_state->pagelist; struct dentry *dentry; struct ceph_cap *cap; - char *path; - int pathlen =3D 0, err; - u64 pathbase; + struct ceph_path_info path_info =3D {0}; + int err; u64 snap_follows; =20 dentry =3D d_find_primary(inode); if (dentry) { /* set pathbase to parent dir when msg_version >=3D 2 */ - path =3D ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, + char *path =3D ceph_mdsc_build_path(mdsc, dentry, &path_info, recon_state->msg_version >=3D 2); dput(dentry); if (IS_ERR(path)) { err =3D PTR_ERR(path); goto out_err; } - } else { - path =3D NULL; - pathbase =3D 0; } =20 spin_lock(&ci->i_ceph_lock); @@ -4658,7 +4691,7 @@ static int reconnect_caps_cb(struct inode *inode, int= mds, void *arg) rec.v2.wanted =3D cpu_to_le32(__ceph_caps_wanted(ci)); rec.v2.issued =3D cpu_to_le32(cap->issued); rec.v2.snaprealm =3D cpu_to_le64(ci->i_snap_realm->ino); - rec.v2.pathbase =3D cpu_to_le64(pathbase); + rec.v2.pathbase =3D cpu_to_le64(path_info.vino.ino); rec.v2.flock_len =3D (__force __le32) ((ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) ? 0 : 1); } else { @@ -4673,7 +4706,7 @@ static int reconnect_caps_cb(struct inode *inode, int= mds, void *arg) ts =3D inode_get_atime(inode); ceph_encode_timespec64(&rec.v1.atime, &ts); rec.v1.snaprealm =3D cpu_to_le64(ci->i_snap_realm->ino); - rec.v1.pathbase =3D cpu_to_le64(pathbase); + rec.v1.pathbase =3D cpu_to_le64(path_info.vino.ino); } =20 if (list_empty(&ci->i_cap_snaps)) { @@ -4735,7 +4768,7 @@ static int reconnect_caps_cb(struct inode *inode, int= mds, void *arg) sizeof(struct ceph_filelock); rec.v2.flock_len =3D cpu_to_le32(struct_len); =20 - struct_len +=3D sizeof(u32) + pathlen + sizeof(rec.v2); + struct_len +=3D sizeof(u32) + path_info.pathlen + sizeof(rec.v2); =20 if (struct_v >=3D 2) struct_len +=3D sizeof(u64); /* snap_follows */ @@ -4759,7 +4792,7 @@ static int reconnect_caps_cb(struct inode *inode, int= mds, void *arg) ceph_pagelist_encode_8(pagelist, 1); ceph_pagelist_encode_32(pagelist, struct_len); } - ceph_pagelist_encode_string(pagelist, path, pathlen); + ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.= pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v2)); ceph_locks_to_pagelist(flocks, pagelist, num_fcntl_locks, num_flock_locks); @@ -4770,17 +4803,17 @@ static int reconnect_caps_cb(struct inode *inode, i= nt mds, void *arg) } else { err =3D ceph_pagelist_reserve(pagelist, sizeof(u64) + sizeof(u32) + - pathlen + sizeof(rec.v1)); + path_info.pathlen + sizeof(rec.v1)); if (err) goto out_err; =20 ceph_pagelist_encode_64(pagelist, ceph_ino(inode)); - ceph_pagelist_encode_string(pagelist, path, pathlen); + ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.= pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v1)); } =20 out_err: - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); if (!err) recon_state->nr_caps++; return err; diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 3e2a6fa7c19a..0428a5eaf28c 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -617,14 +617,24 @@ extern int ceph_mds_check_access(struct ceph_mds_clie= nt *mdsc, char *tpath, =20 extern void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc); =20 -static inline void ceph_mdsc_free_path(char *path, int len) +/* + * Structure to group path-related output parameters for build_*_path func= tions + */ +struct ceph_path_info { + const char *path; + int pathlen; + struct ceph_vino vino; + bool freepath; +}; + +static inline void ceph_mdsc_free_path_info(const struct ceph_path_info *p= ath_info) { - if (!IS_ERR_OR_NULL(path)) - __putname(path - (PATH_MAX - 1 - len)); + if (path_info && path_info->freepath && !IS_ERR_OR_NULL(path_info->path)) + __putname((char *)path_info->path - (PATH_MAX - 1 - path_info->pathlen)); } =20 extern char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, - struct dentry *dentry, int *plen, u64 *base, + struct dentry *dentry, struct ceph_path_info *path_info, int for_wire); =20 extern void __ceph_mdsc_drop_dentry_lease(struct dentry *dentry); --=20 2.34.1 From nobody Sat Oct 4 22:37:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45DE42E974D for ; Tue, 12 Aug 2025 09:57:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754992673; cv=none; b=lSjgjS2gH5f2k8UDlTgGtdkm3HbLJ+x0uG625w8HSYwL2KOK1qmVtcILdMu+N7ocO35T8PV0OVWWt/hKeRxcC0RJCCZtugX9TGQrcCN7baEKIiccR/4NZCrPqOz842+cvD0rYAmz97WxNb4YZtE2pnpN8kUYM0YS3MVkcZYYvZ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754992673; c=relaxed/simple; bh=dZaQ1edKSllTGuRXQiHFANbzt+3XERk5gU5vSoToyzM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=WSKv+DFSln481K+zl86wJbw5DRwodA9dAU5EZ0ypaFUwd6elhmVmBgl9Hptw4/f1riN4STOl5ioQvXc7goTnPNh15MbmpJW9IhuT9F+9hi084X45Kr0hSoFP0hu2lZOWoqeE1rIG8fOWF/xAs7jaVq8NGyMiBH22u2ZgPS2ih+s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Iqiy7xzB; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Iqiy7xzB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754992670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8JhP0X98VGM6Ic3v1hqPyqTx6SwefbyP25WkheESX9U=; b=Iqiy7xzBHYGpTobDYZIM82BjPAOrFpuBKuC26TmPjtUDtAPAZH8/x7feRj2swdp64lxDqx v1vP+0BuPViZqtVtBNmF1cUr/DXYA2ICfcEMn1x8KWBK9oLP19QPRmoxQab0rFHfJ4a/Rg 0TXkYB+PkpVPCnDte2ec9XBB/cGH/AM= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-146-Rq0yFYSLO_OZ1Z3D2M5-RQ-1; Tue, 12 Aug 2025 05:57:48 -0400 X-MC-Unique: Rq0yFYSLO_OZ1Z3D2M5-RQ-1 X-Mimecast-MFC-AGG-ID: Rq0yFYSLO_OZ1Z3D2M5-RQ_1754992667 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-ade6db50b9cso456309466b.1 for ; Tue, 12 Aug 2025 02:57:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754992667; x=1755597467; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8JhP0X98VGM6Ic3v1hqPyqTx6SwefbyP25WkheESX9U=; b=FOSkd0Xe/fr+ImmXByx3nAx8ZO/J6RA/0H/M8C1XXDfcGcHFRT1Usp/2k/ug/hI5u6 TMeP66LZqPoDXl6cZY202llr396qE6QYc1iila/mFYYjFi/rrF2KdUn2jGLGm+473EXI bxyojsnIOziMNLz61Vq2jGaZTf0mM6zYLtSFXg1Mt/poyM0vjGCjGQw6sHTrnCIwAYiX 4mACBt4PPJZCh/U24+BnvGDrGMxZcRNvB1ZZiYOG0Zg7zTP84/j6cD1FJ5GntGCDgDpk mUUx3Gi1pd7tqdEo9q2iCGY9igOY3zs4QQTqU8xXhtfuguSf1uA6excvqNRkVDqhGbHs 3EBw== X-Gm-Message-State: AOJu0Ywtb8H+5/PCdjPs31trMbvHUVcpvzjC2GAH3pczWqDYdDLChsZG 62U6cpqbZJh27HNtVELMoKs3M0Yh3N0IGB4zYTQhBBLmnzhnWelu3U9HqG+BX9X1eJs5AEQwtES fBQWdtTrNgkOFXIQK7rBMjCu0PLGEBWTVn9aM/ZyFd//cb+UobdsrgdGZXcABz+UAUUk/UG5Ztf os X-Gm-Gg: ASbGncsAEpB/rHaV0nrRVa2JMjDP68yeLjCkUe1PmdqR0MvLVfHwAJyXS65bBmFkDFm p1/J69Oijp+v4UWLfU2/PCPzyPMq0iFrVJ6K4aOyO74jev0P1LGNP9MLLEwGXCq3X7cXm3H6M9z RlWpZ9te34A0Zn2D+jQCMX0B5vRB7Mq+ZU/EWK94rshWPYWVnOrhZPA0U3TgR86ouqqTA/YyW1u 5TPdmEOemKuI3A+ZNL8s1mU+SAJ6KXCxusVMJEK78NYmNg8leYvDRqXuI8SBv37+hAhDOVhiHw+ jhCZ6aPgzTc2EdYmSL/p+V0FWI+pAXeA2WgFeNIftUQdEKG5xiFzvkp7K0aiVW1ppxCBo4FC/Q= = X-Received: by 2002:a17:907:3f13:b0:af9:adc8:66e3 with SMTP id a640c23a62f3a-afa1e1ee7c5mr182423666b.60.1754992666935; Tue, 12 Aug 2025 02:57:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG+Hk34CBIigzIRSaCUmDcaKHxHvVydbgGKP3jDnDcJjoC7dJmBUl0JwjuTXTLzPv1PW3JbPA== X-Received: by 2002:a17:907:3f13:b0:af9:adc8:66e3 with SMTP id a640c23a62f3a-afa1e1ee7c5mr182420266b.60.1754992666413; Tue, 12 Aug 2025 02:57:46 -0700 (PDT) Received: from cluster.. (4f.55.790d.ip4.static.sl-reverse.com. [13.121.85.79]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-af91a21c0a4sm2180821466b.109.2025.08.12.02.57.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Aug 2025 02:57:46 -0700 (PDT) From: Alex Markuze To: ceph-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Slava.Dubeyko@ibm.com, idryomov@gmail.com, Alex Markuze Subject: [PATCH v4 2/2] ceph: fix client race condition where r_parent becomes stale before sending message Date: Tue, 12 Aug 2025 09:57:39 +0000 Message-Id: <20250812095739.3194337-3-amarkuze@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250812095739.3194337-1-amarkuze@redhat.com> References: <20250812095739.3194337-1-amarkuze@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When the parent directory's i_rwsem is not locked, req->r_parent may become stale due to concurrent operations (e.g. rename) between dentry lookup and message creation. Validate that r_parent matches the encoded parent inode and update to the correct inode if a mismatch is detected. Signed-off-by: Alex Markuze Reviewed-by: Viacheslav Dubeyko --- fs/ceph/inode.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 36981950a368..42110d133f15 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -56,6 +56,51 @@ static int ceph_set_ino_cb(struct inode *inode, void *da= ta) return 0; } =20 +/* + * Check if the parent inode matches the vino from directory reply info + */ +static inline bool ceph_vino_matches_parent(struct inode *parent, struct c= eph_vino vino) +{ + return ceph_ino(parent) =3D=3D vino.ino && ceph_snap(parent) =3D=3D vino.= snap; +} + +/* + * Validate that the directory inode referenced by @req->r_parent matches = the + * inode number and snapshot id contained in the reply's directory record.= If + * they do not match =E2=80=93 which can theoretically happen if the paren= t dentry was + * moved between the time the request was issued and the reply arrived =E2= =80=93 fall + * back to looking up the correct inode in the inode cache. + * + * A reference is *always* returned. Callers that receive a different ino= de + * than the original @parent are responsible for dropping the extra refere= nce + * once the reply has been processed. + */ +static struct inode *ceph_get_reply_dir(struct super_block *sb, + struct inode *parent, + struct ceph_mds_reply_info_parsed *= rinfo) +{ + struct ceph_vino vino; + + if (unlikely(!rinfo->diri.in)) + return parent; /* nothing to compare against */ + + /* If we didn't have a cached parent inode to begin with, just bail ou= t. */ + if (!parent) + return NULL; + + vino.ino =3D le64_to_cpu(rinfo->diri.in->ino); + vino.snap =3D le64_to_cpu(rinfo->diri.in->snapid); + + if (likely(ceph_vino_matches_parent(parent, vino))) + return parent; /* matches =E2=80=93 use the original reference */ + + /* Mismatch =E2=80=93 this should be rare. Emit a WARN and obtain the= correct inode. */ + WARN_ONCE(1, "ceph: reply dir mismatch (parent valid %llx.%llx reply %= llx.%llx)\n", + ceph_ino(parent), ceph_snap(parent), vino.ino, vino.snap); + + return ceph_get_inode(sb, vino, NULL); +} + /** * ceph_new_inode - allocate a new inode in advance of an expected create * @dir: parent directory for new inode @@ -1548,8 +1593,11 @@ int ceph_fill_trace(struct super_block *sb, struct c= eph_mds_request *req) } =20 if (rinfo->head->is_dentry) { - struct inode *dir =3D req->r_parent; - + /* + * r_parent may be stale, in cases when R_PARENT_LOCKED is not set, + * so we need to get the correct inode + */ + struct inode *dir =3D ceph_get_reply_dir(sb, req->r_parent, rinfo); if (dir) { err =3D ceph_fill_inode(dir, NULL, &rinfo->diri, rinfo->dirfrag, session, -1, --=20 2.34.1