From nobody Mon Jun 8 06:38:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E712141363B; Fri, 5 Jun 2026 10:31:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780655514; cv=none; b=CJjn3DcTcAEEMQP8s9w3NBJmqyyEzmnoaL/HEEBUaZ0rzlpp4gTf0bGM08GAbcmpLCZ26LyywI9VyZZ3Y1h02LscYHd2oRw5k0oJv1qWCAsVHGi797gADY34cKtLg/y+foUZ51e+fhvJgfCbN3Oj2xYklDAHTRULvSnxC0nWwbw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780655514; c=relaxed/simple; bh=00hwPSZA8fZYcFj+RYCIxpnKtgT+aRtEfaPpoZFZJBY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=n3OKoYKMbUTV0foRfxtMN+83UL6Nf4Jyykue/5WMwZz0xuH8YF5O6kgQkeO7kT5t8B2F1lrnrfGo/RxWfF8ZH650iaF6+LQL8EFowwsDVMCFo02v1j6Hx2TmmXqbs7LMye7sAdj0XS/xW8OV3liKpNoC1gjxVTJ8R2PX3FEYZY4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=qy2Po5Ez; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="qy2Po5Ez" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=8DmfM0emdEXvHwwBK/UWoQtQdiUcoSMtiKrMc4myUNY=; b=qy2Po5EzFtqBVmaUj5pARa+sAb 5Gcdae7s3vwdeTUvF4A0iXn6EbxMpcydwhIOL1XgdABvEUbvs8BiYDr7hrdJwLe8qZiojWAMAh44l rwMVScrutwrmzFDX4DVm21m4XCAYYGuwvStuxYet4x2iECzK+FK04i2+dl1XKc4DVh3HnvPk+yPkQ Zo+b8ifcaw1AnwUrA94/R4wGYxHWAeQ7KsB63caW4IZO9OlsQ8yoPBSkfjjzgw+h5vlrXBD6CPOKZ xGFfeZXp1LuM9yToRcD/twQOrp6WEXnVOuZqp4nIbv+AYQ4FL+uN1WlTd+kyGpSU+dZ9g9hKc6wth 8b3Py7FA==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wVRq9-005EEs-2h; Fri, 05 Jun 2026 10:31:50 +0000 From: Breno Leitao Date: Fri, 05 Jun 2026 03:31:38 -0700 Subject: [PATCH net-next v2 1/3] rds: mark snapshot pages dirty in rds_info_getsockopt() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260605-getsock_more-v2-1-80f38cdb8706@debian.org> References: <20260605-getsock_more-v2-0-80f38cdb8706@debian.org> In-Reply-To: <20260605-getsock_more-v2-0-80f38cdb8706@debian.org> To: Allison Henderson , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Shuah Khan Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, rds-devel@oss.oracle.com, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=1093; i=leitao@debian.org; h=from:subject:message-id; bh=00hwPSZA8fZYcFj+RYCIxpnKtgT+aRtEfaPpoZFZJBY=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqIqWNcVYIIJISvcnN7cwAbA0w5GLVZG8jfwZ4Q BxzYfinR9WJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaiKljQAKCRA1o5Of/Hh3 bcWKEACuD3qtWXMRs/X/KxwjB9YhpiCScdOL1ayjmDKa76/qXeVdgQpiS8ufbjI9rEY+mL63N84 XpRle9/j5uLr6KkRrbcP+xOuBdFEvHA1jd3jC6imnxdOnF3JpJGDO1Ct7ZvOR3d/QOuzmjbCgvc /Uy6BF9GcLbEJLKhoZ+L4CC8RLEDAWrPQsuwPNMMOsjNd/liGEbNP2H8MKgfmBhRX5egLEJlM3b 9LegD3FHsR33fZ2DEUF+ru3drqi/aVxF8LaLKGX15YwmHe1kZHKX9ILG+Avjc8HbIw2g8pX1b2C okXGPcsaqO7I2OedE1jpAqnC/W+AEg3yYZb6qu5tD6w208Z/Qgtdk3vi4hV6XGwFcK5Umqg7QYF CEeKqtxZo2Zml5P/e1ecIKHk84IJDn/WMad3xVA4H2H6zMrArR+x/XV7mQ36xwXv3mxsECTIZxg XjrE7Y3n/tmvnnGidXQDOKY42Sk2tj0rSqrwWXiz+++YsgB8KBCJRF50FGniLGhtx2sIWRi0Gid hR8/6K8NsDDc1Co70/85VIvYqAqpDrZXrmcsvVtjAu5NbPhDIQ6WyeVFmBWOSE84D6pvRidte/v xwea8h4aTyJvZmSFQQFKXu1NXT0zOh0KkpchhnT/0fA0nnth3LJYsVXvzNJ/rmj0bZxE+51afx7 co8JzalAEf26YIQ== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao rds_info_getsockopt() pins the destination user pages with FOLL_WRITE and the RDS_INFO_* producers memcpy the snapshot into them through kmap_atomic(). Because that copy goes through the kernel direct map, the dirty bit on the user PTE is never set, so unpin_user_pages() releases the pages without marking them dirty. A file-backed destination page can then be reclaimed without writeback, silently discarding the copied data. Use unpin_user_pages_dirty_lock() with make_dirty=3Dtrue so the modified pages are marked dirty before they are unpinned. Signed-off-by: Breno Leitao --- net/rds/info.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rds/info.c b/net/rds/info.c index f1b29994934a..17061f6ff74e 100644 --- a/net/rds/info.c +++ b/net/rds/info.c @@ -235,7 +235,7 @@ int rds_info_getsockopt(struct socket *sock, int optnam= e, char __user *optval, =20 out: if (pages) - unpin_user_pages(pages, nr_pages); + unpin_user_pages_dirty_lock(pages, nr_pages, true); kfree(pages); =20 return ret; --=20 2.53.0-Meta From nobody Mon Jun 8 06:38:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39B5F413639; Fri, 5 Jun 2026 10:31:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780655520; cv=none; b=d6gwy4q0h4gWP3FC1lrILCF1BqfsDE4OZiB/wvq1mgwTsmgc2SxT7VThX4bMsY08PAd3Ckl0eajyRZ8nVP/pMN0SVoYCVF2J3BP31XuvAbVn+Z+vHcSfM4kgij6bazVP2zaTgCun0NigJxUlPL7Bjle1VapxUcncrzwsxn5JeQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780655520; c=relaxed/simple; bh=2T4DFLH/MMXVWsdzGqJxGHAK4SGM4EQq5vtnTmZSHVc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=POmtwm000C0F12qLWT8+lHjKQr22qKdAkmzbaMwmJ5SDmm3caexCPfu/b2OM9EPiLcOsbP1kUXYEgrcQoR9W2innEEGyB+jPg00urMn18zNvzEZHxVWNGvIAlDzSs8la4l9em+W48fxvU1T+xgrT+ktEPEhuaOm+SfMVDrBNaw0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=Hz0N0Fi9; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="Hz0N0Fi9" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=AFPyGQV821OzzmRoEExC4e6wLm4TD6TcSsxTuBRLxCI=; b=Hz0N0Fi9jjp9xAmvN1XWmbC7u1 zWy60yIezbN8+T3K17vHrP4B1g00Uc6c+WLSuXvDevnNKZ2CIYcKU34qD3DSEUUPuXzvFqCUcYUL6 JxF9+PaiVTqFNclbgMDZ+DR/j/Zh3pbneWtcFwvBiaWb6I/xOjTq/JfjTzw3MiSQDIgrP3uOkyhsp TqUuM/SN9G6p39BuPUc7/TjhUEpVaGPRmxm6Gk85P+TDKNPLDC9P8zDQuGlNkBUtHGR1tTmn+WYsk anH19nmGGJmQXSTHmWgsPIRh7s6stT5/ofUni4w9KQo4KA5tTin8EbDNlXhArmfxMJC6Xhntgq2Dh GuEsi+QQ==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wVRqE-005EEy-0p; Fri, 05 Jun 2026 10:31:54 +0000 From: Breno Leitao Date: Fri, 05 Jun 2026 03:31:39 -0700 Subject: [PATCH net-next v2 2/3] selftests: net: rds: add getsockopt() conversion test Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260605-getsock_more-v2-2-80f38cdb8706@debian.org> References: <20260605-getsock_more-v2-0-80f38cdb8706@debian.org> In-Reply-To: <20260605-getsock_more-v2-0-80f38cdb8706@debian.org> To: Allison Henderson , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Shuah Khan Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, rds-devel@oss.oracle.com, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=7861; i=leitao@debian.org; h=from:subject:message-id; bh=2T4DFLH/MMXVWsdzGqJxGHAK4SGM4EQq5vtnTmZSHVc=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqIqWNApY8c+8Dghkx5y6kcWMqSLt0jZk/ZtPJ3 SCOlJ0sjweJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaiKljQAKCRA1o5Of/Hh3 bcpLD/9OPqR/AQ6JKDhvMzIm29l9yHBNo11ZkJgqoLZvapjSkyqS7Poovsw6rWf7dtnvBpwgooJ tcEskyY1vNkDqNU7jwvYC3pxhu6o45RH4glon3LvTFTMpMGcuyCAEHsZvZ7i79+yZ8x1mpZnlZi GEbeC/67BEB3RHFSwahh+Gm7cy1+o1KiwmWqUhSgwAuxM5xB5AFXo0RFow/FjFPl+ySgdcC6tqS 4MRvLRqd2nEy8ZBfcDvv+8ZgHSAF1KdmLU7Q6xDFloypw0C6nDwEm6g/Ic21X2DW6x8uRXCwT78 YP5HM6y4X13MUqNmk8WoLcqyNEOw7ONYaM3HPB2nPuYdkqbM6maMnn5AvZEsHCvRobV1/NRgkT5 FRFoXmPyEIO2hboM2Pv4ETfBY+V+w/Sk3MwqEgp/ZWzekaX7/lSHakIV1enQDVpyCJ16BHcH8Rg baEgCOyoumgLjfd6hjWI3JBnS/H48YWp9yLzD0yGVMD4wentPIdcdzDY51PCS/Zhpzy5me0nvTx 7AsgFEBCMfVsqkQK91UZ32jyYLA2dTv/xxjs3a5fJgb7RVknTaQKnB8ZlJo6MiRYu1f5g6V3TrC AH8ZtKppGaDxA8QjWVks9PitmiilP3I5XVyissC3GqqZgYpftOeFMEDjmmz1ty97de9Ht3T1VE/ 5YfXBMxXOOc6U8g== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a kselftest that exercises the RDS getsockopt() paths converted to the getsockopt_iter() / sockopt_t callback: - RDS_RECVERR and SO_RDS_TRANSPORT, which return their int value through copy_to_iter() and report the written length in opt->optlen. - RDS_INFO_*, which obtains the userspace buffer pages with iov_iter_extract_pages() (including a non-zero starting page offset) and lets the info producers copy the snapshot in under a spinlock. Signed-off-by: Breno Leitao Reviewed-by: Allison Henderson --- tools/testing/selftests/net/rds/.gitignore | 1 + tools/testing/selftests/net/rds/Makefile | 4 + tools/testing/selftests/net/rds/getsockopt.c | 208 +++++++++++++++++++++++= ++++ 3 files changed, 213 insertions(+) diff --git a/tools/testing/selftests/net/rds/.gitignore b/tools/testing/sel= ftests/net/rds/.gitignore index 1c6f04e2aa11..7ca4b1440f51 100644 --- a/tools/testing/selftests/net/rds/.gitignore +++ b/tools/testing/selftests/net/rds/.gitignore @@ -1 +1,2 @@ include.sh +getsockopt diff --git a/tools/testing/selftests/net/rds/Makefile b/tools/testing/selft= ests/net/rds/Makefile index fe363be8e358..0700d8298eec 100644 --- a/tools/testing/selftests/net/rds/Makefile +++ b/tools/testing/selftests/net/rds/Makefile @@ -5,6 +5,8 @@ all: =20 TEST_PROGS :=3D run.sh =20 +TEST_GEN_PROGS :=3D getsockopt + TEST_FILES :=3D \ include.sh \ settings \ @@ -16,4 +18,6 @@ EXTRA_CLEAN :=3D \ /tmp/rds_logs \ # end of EXTRA_CLEAN =20 +CFLAGS +=3D $(KHDR_INCLUDES) + include ../../lib.mk diff --git a/tools/testing/selftests/net/rds/getsockopt.c b/tools/testing/s= elftests/net/rds/getsockopt.c new file mode 100644 index 000000000000..93ff252c69b8 --- /dev/null +++ b/tools/testing/selftests/net/rds/getsockopt.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Exercise the RDS getsockopt() paths that were converted to the + * getsockopt_iter() / sockopt_t callback. + * + * Three distinct paths are covered: + * + * - RDS_RECVERR and SO_RDS_TRANSPORT, which now return their int value + * through copy_to_iter() and report the written length in opt->optlen. + * + * - RDS_INFO_*, which pins the userspace buffer with + * iov_iter_extract_pages() (including a non-zero starting page offset) + * and lets the info producers memcpy the snapshot in under a spinlock. + * + * The kvec (in-kernel buffer) -> -EOPNOTSUPP path of rds_info_getsockopt() + * is not reachable from a userspace getsockopt() and so is not tested her= e. + */ +#include +#include +#include +#include +#include +#include +#include + +#include "../../kselftest_harness.h" + +#ifndef AF_RDS +#define AF_RDS 21 +#endif + +FIXTURE(rds) { + int fd; +}; + +FIXTURE_SETUP(rds) +{ + self->fd =3D socket(AF_RDS, SOCK_SEQPACKET, 0); + if (self->fd < 0) + SKIP(return, "AF_RDS unavailable (errno %d) - load the rds module", + errno); +} + +FIXTURE_TEARDOWN(rds) +{ + if (self->fd >=3D 0) + close(self->fd); +} + +/* RDS_RECVERR defaults to 0 and is reported back as a 4-byte int. */ +TEST_F(rds, recverr_default) +{ + socklen_t len =3D sizeof(int); + int val =3D 0xdeadbeef; + + ASSERT_EQ(0, getsockopt(self->fd, SOL_RDS, RDS_RECVERR, &val, &len)); + EXPECT_EQ(sizeof(int), len); + EXPECT_EQ(0, val); +} + +/* A value set via setsockopt() must be readable back unchanged. */ +TEST_F(rds, recverr_set_get) +{ + socklen_t len =3D sizeof(int); + int val =3D 1; + + ASSERT_EQ(0, setsockopt(self->fd, SOL_RDS, RDS_RECVERR, &val, len)); + + val =3D 0; + ASSERT_EQ(0, getsockopt(self->fd, SOL_RDS, RDS_RECVERR, &val, &len)); + EXPECT_EQ(sizeof(int), len); + EXPECT_EQ(1, val); +} + +/* A buffer smaller than an int is rejected with EINVAL, not silently. */ +TEST_F(rds, recverr_short_buffer) +{ + socklen_t len =3D sizeof(int) - 1; + char buf[sizeof(int)]; + + EXPECT_EQ(-1, getsockopt(self->fd, SOL_RDS, RDS_RECVERR, buf, &len)); + EXPECT_EQ(EINVAL, errno); +} + +/* An unbound socket reports RDS_TRANS_NONE for SO_RDS_TRANSPORT. */ +TEST_F(rds, transport_unbound) +{ + socklen_t len =3D sizeof(int); + int val =3D 0; + + ASSERT_EQ(0, getsockopt(self->fd, SOL_RDS, SO_RDS_TRANSPORT, &val, + &len)); + EXPECT_EQ(sizeof(int), len); + EXPECT_EQ(RDS_TRANS_NONE, (unsigned int)val); +} + +TEST_F(rds, transport_short_buffer) +{ + socklen_t len =3D sizeof(int) - 1; + char buf[sizeof(int)]; + + EXPECT_EQ(-1, getsockopt(self->fd, SOL_RDS, SO_RDS_TRANSPORT, buf, + &len)); + EXPECT_EQ(EINVAL, errno); +} + +/* + * RDS_INFO_COUNTERS with a zero-length buffer is the "probe" call: it must + * fail with ENOSPC and report the required snapshot size in optlen. + */ +TEST_F(rds, info_counters_probe) +{ + socklen_t len =3D 0; + + EXPECT_EQ(-1, getsockopt(self->fd, SOL_RDS, RDS_INFO_COUNTERS, NULL, + &len)); + EXPECT_EQ(ENOSPC, errno); + EXPECT_GT(len, 0); + /* The snapshot is an array of fixed-size counter records. */ + EXPECT_EQ(0, len % (socklen_t)sizeof(struct rds_info_counter)); +} + +/* + * A real snapshot into an unaligned userspace buffer exercises the + * iov_iter_extract_pages() path, including the non-zero offset0 handling + * that the patch reworked. Place the buffer at a non-page-aligned address + * spanning into the next page to make sure multi-page pinning works too. + */ +TEST_F(rds, info_counters_snapshot) +{ + struct rds_info_counter *ctr; + socklen_t need =3D 0, len; + long pagesz =3D sysconf(_SC_PAGESIZE); + size_t offset, map_len; + unsigned int i, n; + char *region, *buf; + int ret; + + /* Probe for the required size. */ + getsockopt(self->fd, SOL_RDS, RDS_INFO_COUNTERS, NULL, &need); + ASSERT_GT(need, 0); + + /* + * Place the buffer at a non-page-aligned offset that runs past the + * first page boundary, and size the mapping from the probed length so + * the test keeps working if the counter set grows. + */ + offset =3D pagesz - 64; + map_len =3D ((offset + need + pagesz - 1) / pagesz) * pagesz; + + region =3D mmap(NULL, map_len, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(MAP_FAILED, region); + + buf =3D region + offset; + + /* + * On success the RDS_INFO path returns the positive per-element size + * (lens.each) rather than 0, and writes the full snapshot length back + * into optlen. + */ + len =3D need; + ret =3D getsockopt(self->fd, SOL_RDS, RDS_INFO_COUNTERS, buf, &len); + ASSERT_GE(ret, 0) { + TH_LOG("getsockopt snapshot failed: errno %d", errno); + } + EXPECT_EQ(sizeof(struct rds_info_counter), ret); + EXPECT_EQ(need, len); + + /* The counter names must be NUL-terminated, non-empty strings. */ + ctr =3D (struct rds_info_counter *)buf; + n =3D len / sizeof(*ctr); + ASSERT_GT(n, 0); + for (i =3D 0; i < n; i++) { + size_t namelen =3D strnlen((char *)ctr[i].name, + sizeof(ctr[i].name)); + + EXPECT_GT(namelen, 0); + EXPECT_LT(namelen, sizeof(ctr[i].name)); + } + + munmap(region, map_len); +} + +/* + * A non-zero but too-small buffer must report ENOSPC and the full required + * length, without corrupting memory past the buffer. + */ +TEST_F(rds, info_counters_short_buffer) +{ + socklen_t need =3D 0, len; + char small[sizeof(struct rds_info_counter)]; + + getsockopt(self->fd, SOL_RDS, RDS_INFO_COUNTERS, NULL, &need); + ASSERT_GT(need, 0); + + /* Ask with a buffer guaranteed smaller than the full snapshot. */ + if (need <=3D (socklen_t)sizeof(small)) + SKIP(return, "snapshot fits in one record; nothing to test"); + + len =3D 1; /* < sizeof(struct rds_info_counter) */ + EXPECT_EQ(-1, getsockopt(self->fd, SOL_RDS, RDS_INFO_COUNTERS, small, + &len)); + EXPECT_EQ(ENOSPC, errno); + EXPECT_EQ(need, len); +} + +TEST_HARNESS_MAIN --=20 2.53.0-Meta From nobody Mon Jun 8 06:38:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86BD14ADD91; Fri, 5 Jun 2026 10:32:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780655526; cv=none; b=DcefUCIWDwNFR30w3msZobhTddwHiyHikc8R++UmVQ3JOHuPp5KGaZNlHoRF1IGXTK8tHy63BhlUEssa5Y24VKWVDoO0J2Jcz2ZxCNRMphiIRv7210Sh9Z/4fUCh+1ukRyM7nqVYwGwLDnquO9v1S878LsbP8wNBcS+prs4zj3o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780655526; c=relaxed/simple; bh=4JzxiyHuzVHV5u6xPLo9a96EavCnG+zUg49sD/rGMvI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=t8VYUA9W5GhrGnNdi+f7nNd1n8aBoSOHrDH21D+SRQ1+yfrE8lz2QXJIiVoAFh1tRwdSINA/xZmI/y3R+M+ALUzFqFgTTcxcsv1Pzo6g3GlTHl86EN5qyoPpIJaF7aho1l2oXISo569Bq6Q+gsKpTR+M4wLjjw3BEIp9pO9WY5w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=gFB8rLUi; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="gFB8rLUi" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=hz7RK7vwPIuok63s9GZ4PpfT4BdFxPOdoyp+D6Hshek=; b=gFB8rLUiKwhKNLuk1v5vyhI+fY yFrancLNEruQ+LUh94AtzRl/WZUFYHEJVkzHesOHExSFdTghhE5phL7GhYbQ8xV2fxGaGpo0drjHK oVbKjzeqhd83CIrT907Yx3I6SCw1JYrwYJHPdV/o43wGOLCXn68cpH1/2G4VSAsEoV0DX4VZATc9Z rsJslc8+TFsxqhq99cc8AgeffOf4boHlRrbiPrXhwLvnhYPd+x2IAq+LM3bkoI6NfTPGZpHCkvKC1 6RJAdLsloo18OMTlVlCw+/mftGwIlDBleabgl22nWcLBXywRzwzPTs1F2B1e1j3qYOLTUiwKTfNmQ 3GmfnX8A==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wVRqI-005EFA-1u; Fri, 05 Jun 2026 10:31:58 +0000 From: Breno Leitao Date: Fri, 05 Jun 2026 03:31:40 -0700 Subject: [PATCH net-next v2 3/3] rds: convert to getsockopt_iter Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260605-getsock_more-v2-3-80f38cdb8706@debian.org> References: <20260605-getsock_more-v2-0-80f38cdb8706@debian.org> In-Reply-To: <20260605-getsock_more-v2-0-80f38cdb8706@debian.org> To: Allison Henderson , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Shuah Khan Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, rds-devel@oss.oracle.com, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=9081; i=leitao@debian.org; h=from:subject:message-id; bh=4JzxiyHuzVHV5u6xPLo9a96EavCnG+zUg49sD/rGMvI=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqIqWNPsa6CpdJh4dvuEhZUZ8+jLGq5cBu0MEeG sKPw0LgBUyJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaiKljQAKCRA1o5Of/Hh3 bQ91EACkyOJGyYbxqcqRFMP4CLI1rDkhLKBmc2Jqcz/2nKLJ/XMa2BiDP9wjpTwdjiqjY4rvCsj AFfa/qZE78VlUnP3VWid1QenRkzHFgr2u+AP80UK7PUsh5URErdShVmOccu4vc0JJArWyP+/YH0 hjWaCtx/GAO+EUGS5SpPV3suPWvykiVEVl2ifrlgUPuWbPHd6rKXLmATALzbX6afs3YXGyIjb4/ +JsbXs21/nBzYMZJu4KOkqGobDFiqoKgyF3WcXvq+s2EvEaPvmPdMRtg4elMhx4Ep8IXsbRPa1G LunlQCrW0BjtEg671mRyZ+6A8P8VoKyBfjocVyOiBaXikP1SXi+ieM45vmc5tC7RBorx6R+W6hF H/u8omYzrUbl/5m/TWB4m7ZXTSSLRmTZ7q5IsZ2d4ugONsrW+K/JRlyhpqvIKBp+B2d3dcDOw2W Hb8e7bY/knRqs1DYnet0RHTnyBM99/A5wD4tXUEuYSn6BBUp20EcIf1Iatt8DIZjP2r2wEsImhP xNWiPFYLwOBFtXyTD/sm8uGcV2ZWdOc/UnvTcftl2k4fDDoJr6QOzVuPidLj3tciGAqzIs3EYXG LuF+xleoOVCMk8LYg2O/pvonZ3kaNDOOnWRVBOdYVxJxcMSaOZeNyQD94paHJMClPZzKW2A+XsU RUqybvcSkxe2utQ== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Convert RDS socket's getsockopt implementation to use the new getsockopt_iter callback with sockopt_t. Key changes: - Replace (char __user *optval, int __user *optlen) with sockopt_t *opt - Use opt->optlen for buffer length (input) and returned size (output) - Use copy_to_iter() instead of put_user()/copy_to_user() The RDS_INFO_* snapshot path in rds_info_getsockopt() used to pin the userspace buffer with pin_user_pages_fast() on the raw optval address; the info producers then memcpy into those pages under a spinlock via kmap_atomic() and so must not fault. Obtain the same page array and starting offset from opt->iter_out with iov_iter_extract_pages(), which pins for write because iter_out is ITER_DEST. The page array is preallocated here (sized with iov_iter_npages()) and passed in, so iov_iter_extract_pages() fills it in place rather than allocating one for us; RDS therefore keeps ownership of the array on every return path and frees it itself. The rds_info_iterator / rds_info_copy machinery and all producer callbacks are unchanged. Kernel buffers (ITER_KVEC) are not page-backed in a way the info producers can use, so the RDS_INFO path returns -EOPNOTSUPP for them; this matches the previous behaviour, where a kernel-buffer getsockopt hit the WARN_ONCE() path in do_sock_getsockopt() and returned -EOPNOTSUPP. The simple RDS_RECVERR and SO_RDS_TRANSPORT options keep working for kernel buffers via copy_to_iter(). Signed-off-by: Breno Leitao Reviewed-by: Allison Henderson --- net/rds/af_rds.c | 36 +++++++++++++++------------ net/rds/info.c | 76 ++++++++++++++++++++++++++++++++--------------------= ---- net/rds/info.h | 3 +-- 3 files changed, 65 insertions(+), 50 deletions(-) diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 6f4f9cf352bd..d5defe9172e3 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -37,6 +37,7 @@ #include #include #include +#include #include =20 #include "rds.h" @@ -485,35 +486,36 @@ static int rds_setsockopt(struct socket *sock, int le= vel, int optname, } =20 static int rds_getsockopt(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) + sockopt_t *opt) { struct rds_sock *rs =3D rds_sk_to_rs(sock->sk); int ret =3D -ENOPROTOOPT, len; int trans; + int val; =20 if (level !=3D SOL_RDS) goto out; =20 - if (get_user(len, optlen)) { - ret =3D -EFAULT; - goto out; - } + len =3D opt->optlen; =20 switch (optname) { case RDS_INFO_FIRST ... RDS_INFO_LAST: - ret =3D rds_info_getsockopt(sock, optname, optval, - optlen); + ret =3D rds_info_getsockopt(sock, optname, opt); break; =20 case RDS_RECVERR: - if (len < sizeof(int)) + if (len < sizeof(int)) { ret =3D -EINVAL; - else - if (put_user(rs->rs_recverr, (int __user *) optval) || - put_user(sizeof(int), optlen)) + break; + } + val =3D rs->rs_recverr; + if (copy_to_iter(&val, sizeof(int), &opt->iter_out) !=3D + sizeof(int)) { ret =3D -EFAULT; - else + } else { + opt->optlen =3D sizeof(int); ret =3D 0; + } break; case SO_RDS_TRANSPORT: if (len < sizeof(int)) { @@ -522,11 +524,13 @@ static int rds_getsockopt(struct socket *sock, int le= vel, int optname, } trans =3D (rs->rs_transport ? rs->rs_transport->t_type : RDS_TRANS_NONE); /* unbound */ - if (put_user(trans, (int __user *)optval) || - put_user(sizeof(int), optlen)) + if (copy_to_iter(&trans, sizeof(int), &opt->iter_out) !=3D + sizeof(int)) { ret =3D -EFAULT; - else + } else { + opt->optlen =3D sizeof(int); ret =3D 0; + } break; default: break; @@ -653,7 +657,7 @@ static const struct proto_ops rds_proto_ops =3D { .listen =3D sock_no_listen, .shutdown =3D sock_no_shutdown, .setsockopt =3D rds_setsockopt, - .getsockopt =3D rds_getsockopt, + .getsockopt_iter =3D rds_getsockopt, .sendmsg =3D rds_sendmsg, .recvmsg =3D rds_recvmsg, .mmap =3D sock_no_mmap, diff --git a/net/rds/info.c b/net/rds/info.c index 17061f6ff74e..21b32eb16559 100644 --- a/net/rds/info.c +++ b/net/rds/info.c @@ -35,6 +35,7 @@ #include #include #include +#include =20 #include "rds.h" =20 @@ -144,60 +145,68 @@ void rds_info_copy(struct rds_info_iterator *iter, vo= id *data, EXPORT_SYMBOL_GPL(rds_info_copy); =20 /* - * @optval points to the userspace buffer that the information snapshot - * will be copied into. - * - * @optlen on input is the size of the buffer in userspace. @optlen - * on output is the size of the requested snapshot in bytes. + * @opt->iter_out describes the buffer that the information snapshot will = be + * copied into, and @opt->optlen is the size of that buffer on input. On + * output @opt->optlen is set to the size of the requested snapshot in byt= es. * * This function returns -errno if there is a failure, particularly -ENOSPC - * if the given userspace buffer was not large enough to fit the snapshot. - * On success it returns the positive number of bytes of each array element - * in the snapshot. + * if the given buffer was not large enough to fit the snapshot. On succe= ss + * it returns the positive number of bytes of each array element in the + * snapshot. */ -int rds_info_getsockopt(struct socket *sock, int optname, char __user *opt= val, - int __user *optlen) +int rds_info_getsockopt(struct socket *sock, int optname, sockopt_t *opt) { struct rds_info_iterator iter; struct rds_info_lengths lens; unsigned long nr_pages =3D 0; - unsigned long start; rds_info_func func; struct page **pages =3D NULL; + size_t offset0 =3D 0; + int npages =3D 0; int ret; int len; int total; =20 - if (get_user(len, optlen)) { - ret =3D -EFAULT; - goto out; - } + len =3D opt->optlen; =20 /* check for all kinds of wrapping and the like */ - start =3D (unsigned long)optval; - if (len < 0 || len > INT_MAX - PAGE_SIZE + 1 || start + len < start) { + if (len < 0 || len > INT_MAX - PAGE_SIZE + 1) { ret =3D -EINVAL; goto out; } =20 + /* The info producers write into the pages with kmap_atomic() while + * holding a spinlock, so they need a genuine page-backed user buffer. + */ + if (!user_backed_iter(&opt->iter_out)) { + ret =3D -EOPNOTSUPP; + goto out; + } + /* a 0 len call is just trying to probe its length */ if (len =3D=3D 0) goto call_func; =20 - nr_pages =3D (PAGE_ALIGN(start + len) - (start & PAGE_MASK)) - >> PAGE_SHIFT; - - pages =3D kmalloc_objs(struct page *, nr_pages); + /* + * Preallocate the page array and pass it in so that + * iov_iter_extract_pages() fills it in place rather than allocating + * one for us. Handing it a non-NULL array keeps ownership of the + * array with us on every return path, instead of depending on the + * iterator code to allocate and hand it back. + */ + npages =3D iov_iter_npages(&opt->iter_out, INT_MAX); + pages =3D kvmalloc_array(npages, sizeof(*pages), GFP_KERNEL); if (!pages) { ret =3D -ENOMEM; goto out; } - ret =3D pin_user_pages_fast(start, nr_pages, FOLL_WRITE, pages); - if (ret !=3D nr_pages) { - if (ret > 0) - nr_pages =3D ret; - else - nr_pages =3D 0; + + ret =3D iov_iter_extract_pages(&opt->iter_out, &pages, len, npages, + 0, &offset0); + if (ret < 0) + goto out; + nr_pages =3D DIV_ROUND_UP(offset0 + ret, PAGE_SIZE); + if (ret !=3D len) { ret =3D -EAGAIN; /* XXX ? */ goto out; } @@ -213,7 +222,7 @@ int rds_info_getsockopt(struct socket *sock, int optnam= e, char __user *optval, =20 iter.pages =3D pages; iter.addr =3D NULL; - iter.offset =3D start & (PAGE_SIZE - 1); + iter.offset =3D offset0; =20 func(sock, len, &iter, &lens); BUG_ON(lens.each =3D=3D 0); @@ -230,13 +239,16 @@ int rds_info_getsockopt(struct socket *sock, int optn= ame, char __user *optval, ret =3D lens.each; } =20 - if (put_user(len, optlen)) - ret =3D -EFAULT; + opt->optlen =3D len; =20 out: - if (pages) + /* + * iov_iter_extract_pages() pins only user-backed (ubuf) iters; + * iov_iter_extract_will_pin() reports whether an unpin is owed here. + */ + if (pages && iov_iter_extract_will_pin(&opt->iter_out)) unpin_user_pages_dirty_lock(pages, nr_pages, true); - kfree(pages); + kvfree(pages); =20 return ret; } diff --git a/net/rds/info.h b/net/rds/info.h index a069b51c4679..1aab62ab6d00 100644 --- a/net/rds/info.h +++ b/net/rds/info.h @@ -21,8 +21,7 @@ typedef void (*rds_info_func)(struct socket *sock, unsign= ed int len, =20 void rds_info_register_func(int optname, rds_info_func func); void rds_info_deregister_func(int optname, rds_info_func func); -int rds_info_getsockopt(struct socket *sock, int optname, char __user *opt= val, - int __user *optlen); +int rds_info_getsockopt(struct socket *sock, int optname, sockopt_t *opt); void rds_info_copy(struct rds_info_iterator *iter, void *data, unsigned long bytes); void rds_info_iter_unmap(struct rds_info_iterator *iter); --=20 2.53.0-Meta