From nobody Mon Feb 9 00:40:23 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFF9524677F for ; Mon, 4 Aug 2025 09:59:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754301597; cv=none; b=YLc8WVUXfdwpk5iUvYD8kMRFZspqmLMP0u1tXn5ZNMtAlOX5N7SdnirWGCxaXAmaCeQDgyO/OtJIVu8aK0HYdKxxQCJxk8yz2X3RmAuqbww9LddtRlRFNR4oAprcsL3dS6mov/C2SRYjZq7sfr7Nhv/SQqqw8mQVPjGuBD28Hj8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754301597; c=relaxed/simple; bh=LYwAMBk1rZ2XdUJwKwLMGDxLQGjoz2qmBaWryVlJle0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tIopIrOm8ApObM+LajWXtziZGoGt/Os0q359Jn/m5JthDQ5M0eG6Ts9D4M0ZKpfzeeCVHvP59sUugwXFSg0m868uefefU7jjkbsscPlNbQQgafTwPScTBBGNGT57BWeIhKW8lV38gwEiko8GGeOG8NVMTXG9kqKCb6Zv87ZVh7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hQU784iC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hQU784iC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754301594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ckkZeNmDDLh5+Aa/LXWOQsO5tBPmGT2tsRyUHeTJZxg=; b=hQU784iCDpkwNKJH3TMj/S+XeUxfUkAChtGeEbq4U9iHoDr9VdZdlrAiuWTwAKo5W6q5mo 01QYJQFZCP7IOxgIIY62ehQdfjm2RGmfIs3wJjkLvdE5eYfreltejLqB2UgZOLXXGcXEux ZOV95TmA3elou7tlsEIQhwYM8fDNQgk= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-228-MaZP0n3aPaS-QhrRan7nEw-1; Mon, 04 Aug 2025 05:59:52 -0400 X-MC-Unique: MaZP0n3aPaS-QhrRan7nEw-1 X-Mimecast-MFC-AGG-ID: MaZP0n3aPaS-QhrRan7nEw_1754301592 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-af911f05997so413805366b.3 for ; Mon, 04 Aug 2025 02:59:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754301592; x=1754906392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ckkZeNmDDLh5+Aa/LXWOQsO5tBPmGT2tsRyUHeTJZxg=; b=owRug5oCUn41EnJbcs6XMXHFnQVddzEKoCVmWy2MJJURgSK/n5I9BIbhgvj8UbS6/X baivfXRlXgsAPOGsmNcLZKGvu6Wp/zQFHvdBDKYZPzXJR3T6BahdpYTH2EZkvdK0dF2u PgIi7ABRvx7yYQGGr3m8lDpWZ5JYpZr58X/eDtv3We9mGVn+77wZ1Mix1C6NUPTMNh/O +5zezuCkdgPTbpGvFlnKwufjMiJu4cPg88Yc4guo3t80l/sMfFfLhc1XFIwZ8sr5OJ+i zoCQNAHxcmoCev8cHzIZWbFHY60mOoCDgt/wekdarqPxMfeTW7R/4B5y4inU2OHp+yMo jJJA== X-Gm-Message-State: AOJu0YxxipNXIt00Tic56N5ffi9rh3SaxgAnpugRK5U4JwOH2bN/wir4 w526ztdSsyQY/bjsNnD48y1R9RWRiCTRUMEAisuP0ecXrHMV8GyQWuoJ/auUmZJCdFUXOl0YClM JBl7fzhlcHoytH/3AOsZp1h/YY6mXjtEQY4aaOR+GfQLLApNbIQX9LcVoHv6/BJtfRw== X-Gm-Gg: ASbGnctVTK/jJ3/gWKkTsTVHzkmGsuOVyzaj0fs02Y0xUg0vnasCPzACbcdCBtOQ4vL bn6IZjSMHq+bFLdVO2hizKUrPf6wF4sIA4jwMZgkVsA0PM0ER/rSwW+ENsSVz/Mkzqy135iLCxz rySM4tBYNXFOgGFEqXyUr5rVI2nBShwq8YcXAayFOric3FsVVjFmS1w+ixLbugM2FO4z1BGEJJb bqJ/tO4hA7GMNMU5Q+UXgzyYghgBXCN7spbXUbh7vwBHZiPFIa4kqWU6xxRk0n56hcuHhFEu9de BvmA6oA1V23UON/7w54O+z+EXMiTU2XZwqb/b1JfNbG+NZoFvVogzuj6+A33s82ksSoo/In8hw= = X-Received: by 2002:a17:907:f818:b0:af1:8be4:768 with SMTP id a640c23a62f3a-af93ffb1c22mr997101166b.5.1754301591588; Mon, 04 Aug 2025 02:59:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHfl9MCuoFgrlJ8eVT6c4DN4MPSxZK3sSE9DiF3oiQ790R2rjI6chEPahZQnmqKhQ9dtANEfg== X-Received: by 2002:a17:907:f818:b0:af1:8be4:768 with SMTP id a640c23a62f3a-af93ffb1c22mr997097966b.5.1754301591102; Mon, 04 Aug 2025 02:59:51 -0700 (PDT) Received: from cluster.. (4f.55.790d.ip4.static.sl-reverse.com. [13.121.85.79]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-af91a078bcbsm715458766b.13.2025.08.04.02.59.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Aug 2025 02:59:50 -0700 (PDT) From: Alex Markuze To: ceph-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Slava.Dubeyko@ibm.com, idryomov@gmail.com, Alex Markuze Subject: [PATCH v2 1/2] ceph: fix client race condition validating r_parent before applying state Date: Mon, 4 Aug 2025 09:59:41 +0000 Message-Id: <20250804095942.2167541-2-amarkuze@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250804095942.2167541-1-amarkuze@redhat.com> References: <20250804095942.2167541-1-amarkuze@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add validation to ensure the cached parent directory inode matches the directory info in MDS replies. This prevents client-side race conditions where concurrent operations (e.g. rename) cause r_parent to become stale between request initiation and reply processing, which could lead to applying state changes to incorrect directory inodes. --- fs/ceph/debugfs.c | 6 +- fs/ceph/mds_client.c | 152 +++++++++++++++++++++++++++---------------- fs/ceph/mds_client.h | 12 +++- 3 files changed, 110 insertions(+), 60 deletions(-) diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c index 2ffb29108176..35e621f41039 100644 --- a/fs/ceph/debugfs.c +++ b/fs/ceph/debugfs.c @@ -88,8 +88,8 @@ static int mdsc_show(struct seq_file *s, void *p) if (req->r_inode) { seq_printf(s, " #%llx", ceph_ino(req->r_inode)); } else if (req->r_dentry) { - path =3D ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path =3D ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0); if (IS_ERR(path)) path =3D NULL; spin_lock(&req->r_dentry->d_lock); @@ -98,7 +98,7 @@ static int mdsc_show(struct seq_file *s, void *p) req->r_dentry, path ? path : ""); spin_unlock(&req->r_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path(path, path_info.pathlen); } else if (req->r_path1) { seq_printf(s, " #%llx/%s", req->r_ino1.ino, req->r_path1); diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 8d9fc5e18b17..d2ae862c3dda 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2732,7 +2732,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_= request *req, u32 *plen) * foo/.snap/bar -> foo//bar */ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *de= ntry, - int *plen, u64 *pbase, int for_wire) + struct ceph_path_info *path_info, int for_wire) { struct ceph_client *cl =3D mdsc->fsc->client; struct dentry *cur; @@ -2843,17 +2843,31 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *= mdsc, struct dentry *dentry, return ERR_PTR(-ENAMETOOLONG); } =20 - *pbase =3D base; - *plen =3D PATH_MAX - 1 - pos; - CEPH_SAN_STRNCPY(result_str, sizeof(result_str), path + pos, *plen); + /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + + path_info->vino.ino =3D base; + path_info->pathlen =3D PATH_MAX - 1 - pos; + path_info->path =3D path + pos; + path_info->freepath =3D true; + + /* Set snap from dentry if available */ + if (d_inode(dentry)) + path_info->vino.snap =3D ceph_snap(d_inode(dentry)); + else + path_info->vino.snap =3D CEPH_NOSNAP; + + CEPH_SAN_STRNCPY(result_str, sizeof(result_str), path_info->path, path_in= fo->pathlen); boutc(cl, "on %p %d built %llx '%s'\n", dentry, d_count(dentry), - base, result_str); + path_info->vino.ino, result_str); return path + pos; } =20 + + static int build_dentry_path(struct ceph_mds_client *mdsc, struct dentry *= dentry, - struct inode *dir, const char **ppath, int *ppathlen, - u64 *pino, bool *pfreepath, bool parent_locked) + struct inode *dir, struct ceph_path_info *path_info, + bool parent_locked) { char *path; =20 @@ -2862,41 +2876,47 @@ static int build_dentry_path(struct ceph_mds_client= *mdsc, struct dentry *dentry dir =3D d_inode_rcu(dentry->d_parent); if (dir && parent_locked && ceph_snap(dir) =3D=3D CEPH_NOSNAP && !IS_ENCRYPTED(dir)) { - *pino =3D ceph_ino(dir); + path_info->vino.ino =3D ceph_ino(dir); + path_info->vino.snap =3D ceph_snap(dir); rcu_read_unlock(); - *ppath =3D dentry->d_name.name; - *ppathlen =3D dentry->d_name.len; + path_info->path =3D dentry->d_name.name; + path_info->pathlen =3D dentry->d_name.len; + path_info->freepath =3D false; return 0; } rcu_read_unlock(); - path =3D ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path =3D ceph_mdsc_build_path(mdsc, dentry, path_info, 1); if (IS_ERR(path)) return PTR_ERR(path); - *ppath =3D path; - *pfreepath =3D true; + /* + * ceph_mdsc_build_path already fills path_info, including snap handling. + */ return 0; } =20 -static int build_inode_path(struct inode *inode, - const char **ppath, int *ppathlen, u64 *pino, - bool *pfreepath) +static int build_inode_path(struct inode *inode, struct ceph_path_info *pa= th_info) { struct ceph_mds_client *mdsc =3D ceph_sb_to_mdsc(inode->i_sb); struct dentry *dentry; char *path; =20 if (ceph_snap(inode) =3D=3D CEPH_NOSNAP) { - *pino =3D ceph_ino(inode); - *ppathlen =3D 0; + path_info->vino.ino =3D ceph_ino(inode); + path_info->vino.snap =3D ceph_snap(inode); + path_info->pathlen =3D 0; + path_info->freepath =3D false; return 0; } dentry =3D d_find_alias(inode); - path =3D ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path =3D ceph_mdsc_build_path(mdsc, dentry, path_info, 1); dput(dentry); if (IS_ERR(path)) return PTR_ERR(path); - *ppath =3D path; - *pfreepath =3D true; + /* + * ceph_mdsc_build_path already fills path_info, including snap from dent= ry. + * Override with inode's snap since that's what this function is for. + */ + path_info->vino.snap =3D ceph_snap(inode); return 0; } =20 @@ -2906,28 +2926,31 @@ static int build_inode_path(struct inode *inode, */ static int set_request_path_attr(struct ceph_mds_client *mdsc, struct inod= e *rinode, struct dentry *rdentry, struct inode *rdiri, - const char *rpath, u64 rino, const char **ppath, - int *pathlen, u64 *ino, bool *freepath, + const char *rpath, u64 rino, struct ceph_path_info *path_info, bool parent_locked) { struct ceph_client *cl =3D mdsc->fsc->client; char result_str[128]; int r =3D 0; =20 + /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + if (rinode) { - r =3D build_inode_path(rinode, ppath, pathlen, ino, freepath); + r =3D build_inode_path(rinode, path_info); boutc(cl, " inode %p %llx.%llx\n", rinode, ceph_ino(rinode), - ceph_snap(rinode)); + ceph_snap(rinode)); } else if (rdentry) { - r =3D build_dentry_path(mdsc, rdentry, rdiri, ppath, pathlen, ino, - freepath, parent_locked); - CEPH_SAN_STRNCPY(result_str, sizeof(result_str), *ppath, *pathlen); - boutc(cl, " dentry %p %llx/%s\n", rdentry, *ino, result_str); + r =3D build_dentry_path(mdsc, rdentry, rdiri, path_info, parent_locked); + CEPH_SAN_STRNCPY(result_str, sizeof(result_str), path_info->path, path_i= nfo->pathlen); + boutc(cl, " dentry %p %llx/%s\n", rdentry, path_info->vino.ino, result_s= tr); } else if (rpath || rino) { - *ino =3D rino; - *ppath =3D rpath; - *pathlen =3D rpath ? strlen(rpath) : 0; - CEPH_SAN_STRNCPY(result_str, sizeof(result_str), rpath, *pathlen); + path_info->vino.ino =3D rino; + path_info->vino.snap =3D CEPH_NOSNAP; + path_info->path =3D rpath; + path_info->pathlen =3D rpath ? strlen(rpath) : 0; + path_info->freepath =3D false; + CEPH_SAN_STRNCPY(result_str, sizeof(result_str), rpath, path_info->pathl= en); boutc(cl," path %s\n", result_str); } =20 @@ -3005,11 +3028,8 @@ static struct ceph_msg *create_request_message(struc= t ceph_mds_session *session, struct ceph_client *cl =3D mdsc->fsc->client; struct ceph_msg *msg; struct ceph_mds_request_head_legacy *lhead; - const char *path1 =3D NULL; - const char *path2 =3D NULL; - u64 ino1 =3D 0, ino2 =3D 0; - int pathlen1 =3D 0, pathlen2 =3D 0; - bool freepath1 =3D false, freepath2 =3D false; + struct ceph_path_info path_info1 =3D {0}; + struct ceph_path_info path_info2 =3D {0}; struct dentry *old_dentry =3D NULL; int len; u16 releases; @@ -3019,25 +3039,42 @@ static struct ceph_msg *create_request_message(stru= ct ceph_mds_session *session, u16 request_head_version =3D mds_supported_head_version(session); kuid_t caller_fsuid =3D req->r_cred->fsuid; kgid_t caller_fsgid =3D req->r_cred->fsgid; + bool parent_locked =3D test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_fla= gs); =20 - ret =3D set_request_path_attr(mdsc, req->r_inode, req->r_dentry, - req->r_parent, req->r_path1, req->r_ino1.ino, - &path1, &pathlen1, &ino1, &freepath1, - test_bit(CEPH_MDS_R_PARENT_LOCKED, - &req->r_req_flags)); + ret =3D set_request_path_attr(mdsc, req->r_inode, req->r_dentry, + req->r_parent, req->r_path1, req->r_ino1.ino, + &path_info1, parent_locked); if (ret < 0) { msg =3D ERR_PTR(ret); goto out; } =20 + /* + * When the parent directory's i_rwsem is *not* locked, req->r_parent may + * have become stale (e.g. after a concurrent rename) between the time the + * dentry was looked up and now. If we detect that the stored r_parent + * does not match the inode number we just encoded for the request, switch + * to the correct inode so that the MDS receives a valid parent reference. + */ + if (!parent_locked && + req->r_parent && path_info1.vino.ino && ceph_ino(req->r_parent) !=3D= path_info1.vino.ino) { + struct inode *correct_dir =3D ceph_get_inode(mdsc->fsc->sb, path_info1.v= ino, NULL); + if (!IS_ERR(correct_dir)) { + WARN(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n", + ceph_ino(req->r_parent), path_info1.vino.ino); + iput(req->r_parent); + req->r_parent =3D correct_dir; + } + } + /* If r_old_dentry is set, then assume that its parent is locked */ if (req->r_old_dentry && !(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED)) old_dentry =3D req->r_old_dentry; ret =3D set_request_path_attr(mdsc, NULL, old_dentry, - req->r_old_dentry_dir, - req->r_path2, req->r_ino2.ino, - &path2, &pathlen2, &ino2, &freepath2, true); + req->r_old_dentry_dir, + req->r_path2, req->r_ino2.ino, + &path_info2, true); if (ret < 0) { msg =3D ERR_PTR(ret); goto out_free1; @@ -3068,7 +3105,7 @@ static struct ceph_msg *create_request_message(struct= ceph_mds_session *session, =20 /* filepaths */ len +=3D 2 * (1 + sizeof(u32) + sizeof(u64)); - len +=3D pathlen1 + pathlen2; + len +=3D path_info1.pathlen + path_info2.pathlen; =20 /* cap releases */ len +=3D sizeof(struct ceph_mds_request_release) * @@ -3076,9 +3113,9 @@ static struct ceph_msg *create_request_message(struct= ceph_mds_session *session, !!req->r_old_inode_drop + !!req->r_old_dentry_drop); =20 if (req->r_dentry_drop) - len +=3D pathlen1; + len +=3D path_info1.pathlen; if (req->r_old_dentry_drop) - len +=3D pathlen2; + len +=3D path_info2.pathlen; =20 /* MClientRequest tail */ =20 @@ -3191,8 +3228,8 @@ static struct ceph_msg *create_request_message(struct= ceph_mds_session *session, lhead->ino =3D cpu_to_le64(req->r_deleg_ino); lhead->args =3D req->r_args; =20 - ceph_encode_filepath(&p, end, ino1, path1); - ceph_encode_filepath(&p, end, ino2, path2); + ceph_encode_filepath(&p, end, path_info1.vino.ino, path_info1.path); + ceph_encode_filepath(&p, end, path_info2.vino.ino, path_info2.path); =20 /* make note of release offset, in case we need to replay */ req->r_request_release_offset =3D p - msg->front.iov_base; @@ -3255,11 +3292,11 @@ static struct ceph_msg *create_request_message(stru= ct ceph_mds_session *session, msg->hdr.data_off =3D cpu_to_le16(0); =20 out_free2: - if (freepath2) - ceph_mdsc_free_path((char *)path2, pathlen2); + if (path_info2.freepath) + ceph_mdsc_free_path((char *)path_info2.path, path_info2.pathlen); out_free1: - if (freepath1) - ceph_mdsc_free_path((char *)path1, pathlen1); + if (path_info1.freepath) + ceph_mdsc_free_path((char *)path_info1.path, path_info1.pathlen); out: return msg; out_err: @@ -4623,14 +4660,17 @@ static int reconnect_caps_cb(struct inode *inode, i= nt mds, void *arg) =20 dentry =3D d_find_primary(inode); if (dentry) { + struct ceph_path_info path_info; /* set pathbase to parent dir when msg_version >=3D 2 */ - path =3D ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, + path =3D ceph_mdsc_build_path(mdsc, dentry, &path_info, recon_state->msg_version >=3D 2); dput(dentry); if (IS_ERR(path)) { err =3D PTR_ERR(path); goto out_err; } + pathlen =3D path_info.pathlen; + pathbase =3D path_info.vino.ino; } else { path =3D NULL; pathbase =3D 0; diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index 3e2a6fa7c19a..c4c1ea8d5f5e 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -623,8 +623,18 @@ static inline void ceph_mdsc_free_path(char *path, int= len) __putname(path - (PATH_MAX - 1 - len)); } =20 +/* + * Structure to group path-related output parameters for build_*_path func= tions + */ +struct ceph_path_info { + const char *path; + int pathlen; + struct ceph_vino vino; + bool freepath; +}; + extern char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, - struct dentry *dentry, int *plen, u64 *base, + struct dentry *dentry, struct ceph_path_info *path_info, int for_wire); =20 extern void __ceph_mdsc_drop_dentry_lease(struct dentry *dentry); --=20 2.34.1 From nobody Mon Feb 9 00:40:23 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01BCB2475CF for ; Mon, 4 Aug 2025 09:59:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754301600; cv=none; b=rIqYs68m8Ys96XkP6C23mt1ss7hJrJKJxfQSEhVmw8lReu2zkwqB1Lvl1SrLl/j5s+qJmQ7foHUj5wXOnyO6Fg6ALEKnVtIV5ZDCAL+XXH8rejBc/gBi9km8ng4JZYI9FTDrM9bzWBz0H6rwTzj27OPZC1a1r+t9ppZQid5prhg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754301600; c=relaxed/simple; bh=3ksWP6pfixZY81YtiYC+chRXxSOGQDzyt8L7OQfG2v0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=GSiszHDNhEzH2++HsbpDbZ01J7iUTYJmIZmrbfrFT6b0GcbpUhZz1ZwIwYj8SNMLGj5A707y93YAvbwGjLaK6Ks9WRzygek5C7tYD98wZm4jf3V1NG00inemDKn+A4TfPIDCjnWefuRcNPbu0Hj+jjvYS4EAr+I0sw7zGTphk4E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RVzJ+qqg; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RVzJ+qqg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754301597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4uotoFYubX4ZtXqSpgKHCX/SSQDrgaT9v955ZDKGWl4=; b=RVzJ+qqgQ/vpYBVlq2hS3vrbiTmSo6RAQVMRAFeGZoXQzQa52guqomvk78+ezRZN/gbmJK 4/aOm9HF6I34VjNKMV5fTLL+ix51olXkJYe95oMXahnCh9kA8X2/u0KOr5M3YUtZkE6OzZ bVLBLMmx+b7dw4nS9kn02JH+U1BdYyU= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-460-KCeLmYdMNvyDMZuwBVwAsg-1; Mon, 04 Aug 2025 05:59:55 -0400 X-MC-Unique: KCeLmYdMNvyDMZuwBVwAsg-1 X-Mimecast-MFC-AGG-ID: KCeLmYdMNvyDMZuwBVwAsg_1754301594 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-ade6db50b9cso349768366b.1 for ; Mon, 04 Aug 2025 02:59:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754301593; x=1754906393; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4uotoFYubX4ZtXqSpgKHCX/SSQDrgaT9v955ZDKGWl4=; b=qT+L/VX5ymEpSoTJdOxYStMcwtXuGbDzpyas7GBBMVRFW0MhmgzLgUUPTqWYR6f4GY qlVy1n/J43SYsuVRd/CU/xVulTZEO5a/VsDw7ktOT1Iw4f6IXLs8WNV5GUvJ+2MEGI3s meDFKPXtYzCJxfQtpUXi+O9ewrGwg0UlVC86Ij4KFcMgd0IzEh9/EXYZi5lqFC2I1KLY g0GbJo2jH5/hgdWY1QyPfGfWc5t+aSAGZTMyWIAmfABlZUbexVqsmxQc+DXxuNLfI4ec KFPnvV3mLEaXeafof6hXUE2fAAYiejRCatMGZsEG1ibT8hcAJ3HohXePqUGvnVcrToJs ni7A== X-Gm-Message-State: AOJu0Yyp++wOQNMvc4b3wEEQ9fIjJy5pKc4pIHHbQB4nsh5p0gV/5xVi 4QLr++tU8oKgiTHrl3LKceW0XpgSJ/91PJyVX2ROFtW77hHWGozax/4XV2PSA57anqqPRiECZ/o CA/ianJjWDFXNSDsbeUvkxq/L3W8IW30bAK0lX9gManl5Za/l6AnK8YGX2AzM0NZXKBWSS4Xghp kz X-Gm-Gg: ASbGncuniPZRG214faxyJPOkhOdaSXra7pZK+nLKU3Wrt5MwRP6ni2q9MvqZITMP/K2 x5zDfOI0/8KwxDXL8irRKRl8555g5VHjeWyWq4gONcqF4Httojuqo6OfggyoTXTdrRr6CP3Y40z azyzxMU9G7A8S5j+BKtdlsqVY5DJBiGm6C2eQDpk5vt/k++VE3ce0NbkoViGr8GQLr6e+nKjlNj ofwp16d61zxdezs0KFhcMZuKQZHQvqgaLkZ1dtK9SwTpHgkOOiagIUSjUMcpG8JuNEJZp6TkPcE TNskkqsY1BER/eeuZmlYvIWS867HABSxXZkctfDhi7b9KoG47sLh9PKZpm1QCxBKoGB94k7+IQ= = X-Received: by 2002:a17:907:3fa2:b0:ae0:ad5c:4185 with SMTP id a640c23a62f3a-af94024f22emr924327366b.57.1754301593483; Mon, 04 Aug 2025 02:59:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH+nnHYbeFkYXBWGzroGgRCReKRjMsUUhkQES2+aUsPNuNZysvQn/YuZujog/jnwUM/EWJm9A== X-Received: by 2002:a17:907:3fa2:b0:ae0:ad5c:4185 with SMTP id a640c23a62f3a-af94024f22emr924324866b.57.1754301593017; Mon, 04 Aug 2025 02:59:53 -0700 (PDT) Received: from cluster.. (4f.55.790d.ip4.static.sl-reverse.com. [13.121.85.79]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-af91a078bcbsm715458766b.13.2025.08.04.02.59.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Aug 2025 02:59:52 -0700 (PDT) From: Alex Markuze To: ceph-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Slava.Dubeyko@ibm.com, idryomov@gmail.com, Alex Markuze Subject: [PATCH v2 2/2] ceph: fix client race condition where r_parent becomes stale before sending message Date: Mon, 4 Aug 2025 09:59:42 +0000 Message-Id: <20250804095942.2167541-3-amarkuze@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250804095942.2167541-1-amarkuze@redhat.com> References: <20250804095942.2167541-1-amarkuze@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When the parent directory's i_rwsem is not locked, req->r_parent may become stale due to concurrent operations (e.g. rename) between dentry lookup and message creation. Validate that r_parent matches the encoded parent inode and update to the correct inode if a mismatch is detected. --- fs/ceph/inode.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 814f9e9656a0..7da648b5e901 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -56,6 +56,51 @@ static int ceph_set_ino_cb(struct inode *inode, void *da= ta) return 0; } =20 +/* + * Check if the parent inode matches the vino from directory reply info + */ +static inline bool ceph_vino_matches_parent(struct inode *parent, struct c= eph_vino vino) +{ + return ceph_ino(parent) =3D=3D vino.ino && ceph_snap(parent) =3D=3D vino.= snap; +} + +/* + * Validate that the directory inode referenced by @req->r_parent matches = the + * inode number and snapshot id contained in the reply's directory record.= If + * they do not match =E2=80=93 which can theoretically happen if the paren= t dentry was + * moved between the time the request was issued and the reply arrived =E2= =80=93 fall + * back to looking up the correct inode in the inode cache. + * + * A reference is *always* returned. Callers that receive a different ino= de + * than the original @parent are responsible for dropping the extra refere= nce + * once the reply has been processed. + */ +static struct inode *ceph_get_reply_dir(struct super_block *sb, + struct inode *parent, + struct ceph_mds_reply_info_parsed *= rinfo) +{ + struct ceph_vino vino; + + if (unlikely(!rinfo->diri.in)) + return parent; /* nothing to compare against */ + + /* If we didn't have a cached parent inode to begin with, just bail ou= t. */ + if (!parent) + return NULL; + + vino.ino =3D le64_to_cpu(rinfo->diri.in->ino); + vino.snap =3D le64_to_cpu(rinfo->diri.in->snapid); + + if (likely(ceph_vino_matches_parent(parent, vino))) + return parent; /* matches =E2=80=93 use the original reference */ + + /* Mismatch =E2=80=93 this should be rare. Emit a WARN and obtain the= correct inode. */ + WARN(1, "ceph: reply dir mismatch (parent valid %llx.%llx reply %llx.%= llx)\n", + ceph_ino(parent), ceph_snap(parent), vino.ino, vino.snap); + + return ceph_get_inode(sb, vino, NULL); +} + /** * ceph_new_inode - allocate a new inode in advance of an expected create * @dir: parent directory for new inode @@ -1548,8 +1593,11 @@ int ceph_fill_trace(struct super_block *sb, struct c= eph_mds_request *req) } =20 if (rinfo->head->is_dentry) { - struct inode *dir =3D req->r_parent; - + /* + * r_parent may be stale, in cases when R_PARENT_LOCKED is not set, + * so we need to get the correct inode + */ + struct inode *dir =3D ceph_get_reply_dir(sb, req->r_parent, rinfo); if (dir) { err =3D ceph_fill_inode(dir, NULL, &rinfo->diri, rinfo->dirfrag, session, -1, --=20 2.34.1