From nobody Wed Dec 17 09:16:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB008267F57; Tue, 8 Apr 2025 11:45:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744112735; cv=none; b=XTUvxka45kamdKhdfIDm04lCRNd6ZSkb/hhyldgqxgVRIhMEEWr6C47WfYw5i1f22TRQ8m0i09DL5EuifAK0Me9PmwJ0oYm+bdcRgE3syqt/MqdYnTPGWC42a4HnGqjkSQ6y92u41u2tZhAjKKbCo5cT0vni6E0gPIB7XtONufk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744112735; c=relaxed/simple; bh=boIwNzAemyOsPgk8LQh6CNl9rgNqmlDp1qRbFRb8oHQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f/NJbtymph3ZcpVLwqXErgi9AvdMzCKiVCdXUBioUrVyChNqiPFzl9TvvSTc692x+lCfyYPpQYlwQPBc5q0y/9CLresKh3rj1r8L8wUdG16TaOxDfl7gdfkSl1QVrN9W8KJyvvEsJDyAupL2uBERIWaatc0GPkbfezgk/EO7GRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=VIX+8SpH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="VIX+8SpH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FF8DC4CEE5; Tue, 8 Apr 2025 11:45:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1744112734; bh=boIwNzAemyOsPgk8LQh6CNl9rgNqmlDp1qRbFRb8oHQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VIX+8SpHaI/DzCPiNte61sYEhgABb46pz5BhVSeC2o+2R6Lgtu00qcWV7j1ZXBm7q eSAvDP8HdjyftWa3L1j1AlNdPCM4u8R8eHoXR3+hq9LcVEXABPZ45S0TkeUWd0MeWH gpgf5jiQuwtyglTZ02H5S3kNQsowrtKmNq6ROphA= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Roman Gushchin , Jason Gunthorpe , Leon Romanovsky , Maher Sanalla , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Parav Pandit , Sasha Levin Subject: [PATCH 5.15 179/279] RDMA/core: Dont expose hw_counters outside of init net namespace Date: Tue, 8 Apr 2025 12:49:22 +0200 Message-ID: <20250408104831.161960989@linuxfoundation.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250408104826.319283234@linuxfoundation.org> References: <20250408104826.319283234@linuxfoundation.org> User-Agent: quilt/0.68 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 5.15-stable review patch. If anyone has any objections, please let me know. Reviewed-by: Parav Pandit ------------------ From: Roman Gushchin [ Upstream commit a1ecb30f90856b0be4168ad51b8875148e285c1f ] Commit 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") accidentally almost exposed hw counters to non-init net namespaces. It didn't expose them fully, as an attempt to read any of those counters leads to a crash like this one: [42021.807566] BUG: kernel NULL pointer dereference, address: 0000000000000= 028 [42021.814463] #PF: supervisor read access in kernel mode [42021.819549] #PF: error_code(0x0000) - not-present page [42021.824636] PGD 0 P4D 0 [42021.827145] Oops: 0000 [#1] SMP PTI [42021.830598] CPU: 82 PID: 2843922 Comm: switchto-defaul Kdump: loaded Tai= nted: G S W I XXX [42021.841697] Hardware name: XXX [42021.849619] RIP: 0010:hw_stat_device_show+0x1e/0x40 [ib_core] [42021.855362] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f = 44 00 00 49 89 d0 4c 8b 5e 20 48 8b 8f b8 04 00 00 48 81 c7 f0 fa ff ff <48= > 8b 41 28 48 29 ce 48 83 c6 d0 48 c1 ee 04 69 d6 ab aa aa aa 48 [42021.873931] RSP: 0018:ffff97fe90f03da0 EFLAGS: 00010287 [42021.879108] RAX: ffff9406988a8c60 RBX: ffff940e1072d438 RCX: 00000000000= 00000 [42021.886169] RDX: ffff94085f1aa000 RSI: ffff93c6cbbdbcb0 RDI: ffff940c751= 7aef0 [42021.893230] RBP: ffff97fe90f03e70 R08: ffff94085f1aa000 R09: 00000000000= 00000 [42021.900294] R10: ffff94085f1aa000 R11: ffffffffc0775680 R12: ffffffff87c= a2530 [42021.907355] R13: ffff940651602840 R14: ffff93c6cbbdbcb0 R15: ffff94085f1= aa000 [42021.914418] FS: 00007fda1a3b9700(0000) GS:ffff94453fb80000(0000) knlGS:= 0000000000000000 [42021.922423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42021.928130] CR2: 0000000000000028 CR3: 00000042dcfb8003 CR4: 00000000003= 726f0 [42021.935194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000= 00000 [42021.942257] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000000= 00400 [42021.949324] Call Trace: [42021.951756] [42021.953842] [] ? show_regs+0x64/0x70 [42021.959030] [] ? __die+0x78/0xc0 [42021.963874] [] ? page_fault_oops+0x2b5/0x3b0 [42021.969749] [] ? exc_page_fault+0x1a2/0x3c0 [42021.975549] [] ? asm_exc_page_fault+0x26/0x30 [42021.981517] [] ? __pfx_show_hw_stats+0x10/0x10 [ib_co= re] [42021.988482] [] ? hw_stat_device_show+0x1e/0x40 [ib_co= re] [42021.995438] [] dev_attr_show+0x1e/0x50 [42022.000803] [] sysfs_kf_seq_show+0x81/0xe0 [42022.006508] [] seq_read_iter+0xf4/0x410 [42022.011954] [] vfs_read+0x16e/0x2f0 [42022.017058] [] ksys_read+0x6e/0xe0 [42022.022073] [] do_syscall_64+0x6a/0xa0 [42022.027441] [] entry_SYSCALL_64_after_hwframe+0x78/0x= e2 The problem can be reproduced using the following steps: ip netns add foo ip netns exec foo bash cat /sys/class/infiniband/mlx4_0/hw_counters/* The panic occurs because of casting the device pointer into an ib_device pointer using container_of() in hw_stat_device_show() is wrong and leads to a memory corruption. However the real problem is that hw counters should never been exposed outside of the non-init net namespace. Fix this by saving the index of the corresponding attribute group (it might be 1 or 2 depending on the presence of driver-specific attributes) and zeroing the pointer to hw_counters group for compat devices during the initialization. With this fix applied hw_counters are not available in a non-init net namespace: find /sys/class/infiniband/mlx4_0/ -name hw_counters /sys/class/infiniband/mlx4_0/ports/1/hw_counters /sys/class/infiniband/mlx4_0/ports/2/hw_counters /sys/class/infiniband/mlx4_0/hw_counters ip netns add foo ip netns exec foo bash find /sys/class/infiniband/mlx4_0/ -name hw_counters Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attrib= utes") Signed-off-by: Roman Gushchin Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Maher Sanalla Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://patch.msgid.link/20250227165420.3430301-1-roman.gushchin@linu= x.dev Reviewed-by: Parav Pandit Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/core/device.c | 9 +++++++++ drivers/infiniband/core/sysfs.c | 1 + include/rdma/ib_verbs.h | 1 + 3 files changed, 11 insertions(+) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/dev= ice.c index 5d1ce55fda71e..241245e25f004 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -542,6 +542,8 @@ static struct class ib_class =3D { static void rdma_init_coredev(struct ib_core_device *coredev, struct ib_device *dev, struct net *net) { + bool is_full_dev =3D &dev->coredev =3D=3D coredev; + /* This BUILD_BUG_ON is intended to catch layout change * of union of ib_core_device and device. * dev must be the first element as ib_core and providers @@ -553,6 +555,13 @@ static void rdma_init_coredev(struct ib_core_device *c= oredev, =20 coredev->dev.class =3D &ib_class; coredev->dev.groups =3D dev->groups; + + /* + * Don't expose hw counters outside of the init namespace. + */ + if (!is_full_dev && dev->hw_stats_attr_index) + coredev->dev.groups[dev->hw_stats_attr_index] =3D NULL; + device_initialize(&coredev->dev); coredev->owner =3D dev; INIT_LIST_HEAD(&coredev->port_list); diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysf= s.c index afc59048c40c8..f68673c370d2e 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -976,6 +976,7 @@ int ib_setup_device_attrs(struct ib_device *ibdev) for (i =3D 0; i !=3D ARRAY_SIZE(ibdev->groups); i++) if (!ibdev->groups[i]) { ibdev->groups[i] =3D &data->group; + ibdev->hw_stats_attr_index =3D i; return 0; } WARN(true, "struct ib_device->groups is too small"); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fa13bf15feb3e..f4257c2e96b6d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2707,6 +2707,7 @@ struct ib_device { * It is a NULL terminated array. */ const struct attribute_group *groups[4]; + u8 hw_stats_attr_index; =20 u64 uverbs_cmd_mask; =20 --=20 2.39.5