From nobody Sat Feb 7 22:06:50 2026 Received: from mail-dy1-f177.google.com (mail-dy1-f177.google.com [74.125.82.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E97F1AAE17 for ; Tue, 13 Jan 2026 03:31:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768275086; cv=none; b=uiF5JKDuUjzzyBbBUAB5a1m4Psn+byZ450a7k98qgSK136XvscZJsZm8f/mAUOFA5e62oXR/ASfHiizhPspuq6JQMvvY35LqCKQlVvWwmq9sRiSANbu/4DPefCWUCD48kaeNLs7AnBUEzXhE5xGsU3NACeKMSbu50j/AbSya1Iw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768275086; c=relaxed/simple; bh=2jZQrijQ+tMvDjnXgemv2Abg5KZ8Ot6NpxIAyBCzVqg=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Q7Jf1WMC70tQFHGM8dCxUbDTWIzXcNuI5cxlpcmYCzyYRH8QoSthDsQK4/eEHCtZcDHoj8pmN9FMuneamjhZ90hlsv97Jz1c40VKq/WYFOozyWySyRNWU4kuU2SY3QwrKjRwY8FF2vDhdKqJ+DoFkl/PrGPKAAzWqi9P0Z1Idk0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hABf+fbg; arc=none smtp.client-ip=74.125.82.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hABf+fbg" Received: by mail-dy1-f177.google.com with SMTP id 5a478bee46e88-2ae53df0be7so11534546eec.1 for ; Mon, 12 Jan 2026 19:31:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768275084; x=1768879884; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=XpToMtc+0/+Om5Rk3nVKl+sextCHgEo4MIq9cpMuk1U=; b=hABf+fbgQRVXra/yyZDwI1+Mhu61/7cuIYmmQBdnt0JnaSiAWzrXZ9EAS8qdWa+6qI A4KQ1I2F5lImAizg8r7HT5TV2OGcM9KhIBCl9jvoLliap1aFDGTGAavXrTZSGREtMVfU f02RyfiG7kcnb4BD+2qkTjsxBwoDiouZ5aCEvbtwillMR5Yb4mgLZDBaEcy5/GVhHfvg O3iylV/mKzFF+i3ZRZgWYQy7WI7L+5+QtC6zNJemaYAIv2T0zwty2NmDfTZYs/jh1Fp+ krxxDAWYo33Iac+TJ37E5mLevJB626UB9bLW8tnw32xVI8RY6bCLSzMlMXhJ4LDAUOqs 5Qgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768275084; x=1768879884; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XpToMtc+0/+Om5Rk3nVKl+sextCHgEo4MIq9cpMuk1U=; b=wTkYY7ghee6y90ZVcp8XYSRaFAOneLjAeJ8W9oPRFpWvy1feeC3mTeF9GZG6WToJBf 0KtgMd90okSPZHJW+ykCO7mrAKMJG3StFdDGhIRsn23sl6v0UVZkRs71/nffWkTUieeK m5fJgNLjnILNQutusq9T03ICJnIG7DpV1vKLvRdgUKlb2Ogp8aAd/oRBAcBz/oGSVevZ VhLUF2BI54q6vHMfk0HCBRUAlKqtpO04ZiAIop4V1aNAwMRA57pGXPbx9apVT7KCtET3 v00Md7eAzOFzdNSyLdASR0PqvuPQXhMurE5exBE6HkFlllRZhcihKsWdFXHeMkjOPxbc VVpw== X-Forwarded-Encrypted: i=1; AJvYcCWOJrG0gyxbeWybsuk5wSZnYyc74I7hDHgSgpdI1057VcZsyLgZYZW91BzKO9osFwYt0AfFdjMo9pfsAvI=@vger.kernel.org X-Gm-Message-State: AOJu0YyN6S92WBRchXQAziwu2nDjix72jNRhhSGUiRcay0rS00f4RFAO tBjGHG6Ly/RGWn/ajYNEUia0pSnbvhA0ZRTHn+JkLtcFZJN7a6w1uPp1 X-Gm-Gg: AY/fxX5f8/aVuVffH12eWyaHm/xgmi4UnwP+NKS9Ps//iRYGDWXMQLSwCDpUq/6POBc R8ngbMDixdN1FB+KtvyqqvYWr0yZHdUPMRAaL1m9gYeReunDO2zxTAI625AMEt7AcrR2shsR9TV fWzxol3VU6Kbcf7GLMO8TEzabodvLYMSXepbYIFq6ejdCFkG1hQQAFzy8h/FvtqIRUlIKxQBq0x jOL39VY9wO+r1P/eSasXkqffoMPWJm2nQMJJOi4VPtbn9n8E6fibSTbV6pRBho5DCQKi6qSC8y0 XilPpWxGLkoTLU7yqhXfublFX6G1gTdWlr/LutrYfryyfATYChsEd1kG9WtR/crxyjAUKRvsUOm /lGvY/khNCfVkl3InVyGc7eve1ODxjyT2eDHoHmRW7ioCWjDkIcEyHxDUVVMKlOnc8kiPUu1RQo P79uWcxygcO9U+/YBynZjAjXknPc63W9dZEaGh1CisA/37FO/kjklXutahYFHe X-Google-Smtp-Source: AGHT+IFfCldV0MFiU8jNbe5w+1ZPbHGM0SmsKjJJIfKvdjWI82rZrG74O32rfwskmaigz78HOgBTPg== X-Received: by 2002:a05:7300:b094:b0:2ae:6146:37ab with SMTP id 5a478bee46e88-2b17d22955dmr13394320eec.1.1768275084279; Mon, 12 Jan 2026 19:31:24 -0800 (PST) Received: from celestia.turtle.lan (static-23-234-115-121.cust.tzulo.com. [23.234.115.121]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2b1706a53cdsm16930800eec.11.2026.01.12.19.31.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jan 2026 19:31:23 -0800 (PST) From: Sam Edwards X-Google-Original-From: Sam Edwards To: Xiubo Li , Ilya Dryomov , Jeff Layton Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, Sam Edwards Subject: [RFC PATCH] libceph: Handle sparse-read replies lacking data length Date: Mon, 12 Jan 2026 19:31:13 -0800 Message-ID: <20260113033113.149842-1-CFSworks@gmail.com> X-Mailer: git-send-email 2.51.2 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When the OSD replies to a sparse-read request, but no extents matched the read (because the object is empty, the read requested a region backed by no extents, ...) it is expected to reply with two 32-bit zeroes: one indicating that there are no extents, the other that the total bytes read is zero. In certain circumstances (e.g. on Ceph 19.2.3, when the requested object is in an EC pool), the OSD sends back only one 32-bit zero. The sparse-read state machine will end up reading something else (such as the data CRC in the footer) and get stuck in a retry loop like: libceph: [0] got 0 extents libceph: data len 142248331 !=3D extent len 0 libceph: osd0 (1)...:6801 socket error on read libceph: data len 142248331 !=3D extent len 0 libceph: osd0 (1)...:6801 socket error on read This is probably a bug in the OSD, but even so, the kernel must handle it to avoid misinterpreting replies and entering a retry loop. Detect this condition when the extent count is zero by checking the `payload_len` field of the op reply. If it is only big enough for the extent count, conclude that the data length is omitted and skip to the next op (which is what the state machine would have done immediately upon reading and validating the data length, if it were present). --- Hi list, RFC: This patch is submitted for comment only. I've tested it for about 2 weeks now and am satisfied that it prevents the hang, but the current approach decodes the entire op reply body while still in the data-gathering step, which is suboptimal; feedback on cleaner alternatives is welcome! I have not searched for nor opened a report with Ceph proper; I'd like a second pair of eyes to confirm that this is indeed an OSD bug before I proceed with that. Reproducer (Ceph 19.2.3, CephFS with an EC pool already created): mount -o sparseread ... /mnt/cephfs cd /mnt/cephfs mkdir ec/ setfattr -n ceph.dir.layout.pool -v 'cephfs-data-ecpool' ec/ echo 'Hello world' > ec/sparsely-packed truncate -s 1048576 ec/sparsely-packed # Read from a hole-backed region via sparse read dd if=3Dec/sparsely-packed bs=3D16 skip=3D10000 count=3D1 iflag=3Ddirect = | xxd # The read hangs and triggers the retry loop described in the patch Hope this works, Sam PS: I would also like to write a pair of patches to our messenger v1/v2 clients to check explicitly that sparse reads consume exactly the number of bytes in the data section, as I see there have already been previous bugs (including CVE-2023-52636) where the sparse-read machinery gets out of sync with the incoming TCP stream. Has this already been proposed? --- net/ceph/osd_client.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 1a7be2f615dc..e9e898a2415f 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -5840,7 +5840,25 @@ static int osd_sparse_read(struct ceph_connection *c= on, sr->sr_state =3D CEPH_SPARSE_READ_DATA_LEN; break; } - /* No extents? Read data len */ + + /* + * No extents? Read data len (which we expect is 0) if present. + * + * Sometimes the OSD will omit this for zero-extent replies + * (e.g. in Ceph 19.2.3 when the object is in an EC pool) which + * is likely a bug in the OSD, but nonetheless we must handle + * it to avoid misinterpreting the reply. + */ + struct MOSDOpReply m; + ret =3D decode_MOSDOpReply(con->in_msg, &m); + if (ret) + return ret; + if (m.outdata_len[o->o_sparse_op_idx] =3D=3D sizeof(sr->sr_count)) { + dout("[%d] missing data length\n", o->o_osd); + sr->sr_state =3D CEPH_SPARSE_READ_HDR; + goto next_op; + } + fallthrough; case CEPH_SPARSE_READ_DATA_LEN: convert_extent_map(sr); --=20 2.51.2