From nobody Fri Nov 29 19:48:03 2024 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam04on2050.outbound.protection.outlook.com [40.107.101.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8199B224D6; Mon, 16 Sep 2024 05:08:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.101.50 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463338; cv=fail; b=DrCrp3A5BqX6hN4yv7QCGXv5L4ZDfs3ZJrf5XQywyb7j4qtl4H8RS6ztG3uRZOuxTVjB2Q2W9C/iFH1nf61a7E8/FwVKE6fqEIAHACSAYR3rjD0WlNjvWXLdBjEln/2hMGzUumjpOy8+JNBAZu5/7V0+tRDcDHGMuG1vKMjj0QU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463338; c=relaxed/simple; bh=/OMPpbyioaekCErvkWbeqRAhGK/B9QKC7+zS3er2en8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pMDokd865skha0prWtJpby72gnCPHe/z/seAsp3jdHprE4d3iQUoiFgMLzJJGeRuwBjmuF8zASSpgzQ6P7bbwGcqWlev10E8n/RDLRgn3/0MIfT6aa0LZvnP4VARg8l6cqf/xpoDsS0DXWTIvVqrGr2MVe5c/wUwqD+Q8ANs03E= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=EhiQEAT0; arc=fail smtp.client-ip=40.107.101.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="EhiQEAT0" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Y8whDK5gJFXerI/MTyfY0FBne/1JRw+lSXXunm7oyn1LPYQyOe/n3IDnWB8ZsulRyz44YH69xw1fU4dFMvLL7alcc6BwmAd7A9ka4erIbIvFTECJS5meOMB9GGwxeMbABj9/z4WtJ/76ciVDCg9WsoqA5bPbeAvtiDoBbMIopx0NGRfB6pigIjCXp4qDs8JBSNtckvgsRdGnbIQGWcHYxb8PwYAG3pQln47zwh7+23H8K2DH1LceDPZ5Xceen4eg/ugvHRAAjj0CwHmu5QN3xtDwOtw9oiR9pXzc9EK+pRZgrWhXcBVD3nSogavqb1kSkuYd4AKAC2H6+OopETz+lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2FBVw+pyjVJ0fOe69NB1TgCfKY+dzuWS4nyeSzMm6vs=; b=F9iKr9ys0TBpBlSR6TtIOykKbwn57qG+5bq4sb2iPURn97ysbHcFalDIXXxEJo66+M9c9XpseoRhbz5LcAS+dreTBy9aAqzkYS/w/xaUKWPfAL7sU5mlTTUDBlcXQENAh74Qr8GGGHduXt798vRzM4k9iaqJw7+qsRUJA4OdVx6+SvkZR0CM8OWCYh51aviZ2uuGf1SFRKci7Aj5Fs9xpIJRK6o+EjGGG3kz+H7j7BLKCUrgCb2yP5Uqy88xPoNh6wMgW71GS/h5TZIXkcjgBliyuVaXek9tw44CjCq2Ut08e4awQZIy/yQzqPQrewnuAygdXbxaGuPjMniTjpHLBg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2FBVw+pyjVJ0fOe69NB1TgCfKY+dzuWS4nyeSzMm6vs=; b=EhiQEAT0/J575zNjcrq9eW8Dx2YftJpvizrBx63itmdBMixYf56PcbMUSOcgf2q7VToFU3I2+CiP7tLL0nOqYuT81e+Biut4+Xowp72q+t9jAtKh56+j9GHBnNTjjr/pgshfpmSlcyIyK63uDF8BI/EOY3FCCEdlnzLJ57HC+QQ= Received: from SJ0PR03CA0281.namprd03.prod.outlook.com (2603:10b6:a03:39e::16) by CY5PR12MB6347.namprd12.prod.outlook.com (2603:10b6:930:20::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.24; Mon, 16 Sep 2024 05:08:50 +0000 Received: from SJ5PEPF000001CE.namprd05.prod.outlook.com (2603:10b6:a03:39e:cafe::b2) by SJ0PR03CA0281.outlook.office365.com (2603:10b6:a03:39e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.29 via Frontend Transport; Mon, 16 Sep 2024 05:08:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001CE.mail.protection.outlook.com (10.167.242.38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:08:49 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:08:43 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 1/6] percpu-refcount: Add managed mode for RCU released objects Date: Mon, 16 Sep 2024 10:38:06 +0530 Message-ID: <20240916050811.473556-2-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001CE:EE_|CY5PR12MB6347:EE_ X-MS-Office365-Filtering-Correlation-Id: 78289c7b-5265-47a9-fb05-08dcd60da6c5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|36860700013|82310400026|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?jfpODw4Pkp7XUxB9GwMvh6Gd9CxIE3uU3HEs3OwN+9y6ZwUaQb5W2Y2qPWhI?= =?us-ascii?Q?c1RwPwT4yNwREf7YAQ1v3dHlC+Fv+RqNtcuj5aR2Kaa2jlXMqtUz//frYrAM?= =?us-ascii?Q?moHejRGmrC5DIi3fqVJLANnPxp+wlNlqmbmq34OSIcRJLD676vfmJrgnoFDV?= =?us-ascii?Q?G6hu9OzHZO+7CCnCRuqw+PHnKWM02zUf8HbUHD+tmdGBEMOSLTAbJGiYzZCT?= =?us-ascii?Q?VZmmNdHWNUGP29Bojepo1YcessldMi98WcNji3/rCKIUHMBwIL9xVcj6yw/I?= =?us-ascii?Q?88ousgZFW6huLIHWbOlKZIQAQFRQRHn+r5kEtFTaGoPUAgxuMWzizvEvpoVB?= =?us-ascii?Q?3NJvftudpBApofX7tg1roGhBV15WjtUDJfhPWrsbw8ODhQyRbn6nm2iEXL4D?= =?us-ascii?Q?VGyUOdclRLYYQBTQbGfytvICcYfp87D8ghmhSxIXaUg/MddrTgvOymo3YpJi?= =?us-ascii?Q?biIpF6FT4nr87IGPckH3xynhH6efHN6WyaZbkBNDr4ssH+BcFLGe3L0nkNM+?= =?us-ascii?Q?wvU6RngpzwT+RT9xqJpJPNhSqG9ZymgX/512tIS+YV60XeJc4xou+aymFjYp?= =?us-ascii?Q?08oZo/FhEqUx+WGyJWqdjI3NXvB4sKXuuA156mRxC+lt+IH5LdzDVvle58qj?= =?us-ascii?Q?nvTT1E7fVvaphbZbJpLuLJ7j81r2cQDq720iXZBH0kD4yrpsXFAOmvSk1mqR?= =?us-ascii?Q?Kh/zU2gb7yf0jxpeU4BFjn7jfcVnlC0rUHBXkFLbyf+24tpXCEmx9j35cbFh?= =?us-ascii?Q?pPeUUrOISqBq1lGOzj0AcmnJvH+uESOcJgODgISpsJJx5iBxbdSSxWO16njG?= =?us-ascii?Q?hudM1k3d/sDNnsZQKt9oVpC7WHkqL2B9PncMgt0dqcvpzJeaIShbKHs7NJAh?= =?us-ascii?Q?mysZj83oISqyxQGFPBNZGV1VoC1D5FjWulX79u4Nsp6ytYSqlrnLjTadhKBf?= =?us-ascii?Q?mqqRmp5244hXXf4Dce4Zr9New2k7FZRrCD8cf6lrE5if5rZcDCBoWxlVix5p?= =?us-ascii?Q?0ATz7PguZj2rN0tNPgHdCu5cVNxeRK/4lWL7IzWYSL3AxEV6fpb7rsOywf9K?= =?us-ascii?Q?YVqHLjsBMxy+kY2w7+odYySURvE88K4iopwmppaLPiDlxs/Pscww2/vtmgtX?= =?us-ascii?Q?vpDnhWB7veZqw5OBIdnPxAZaqbvVFEUUXUEEYNruntJ7nsC3cf4k2Zm1gPzZ?= =?us-ascii?Q?D6teJktY8gFWCDBTpueUoNGeFHEPHuMAgpXOxfCiHHqyoYDEIC7dpgOr0Cn1?= =?us-ascii?Q?tbP8BZFQ9Q0dBtxtA4fIfO4aUbSsKFSqUZRTvPRD8i4loZ/q7Z7jMh5nYLj/?= =?us-ascii?Q?sIcJzYg9TE09rsVtN3PgEQTRGlHiKnyQ+CctkIfT148T8HzlYUIPFQZXNxb2?= =?us-ascii?Q?++X2mzM5//3Mjv9IXMwbwUtvG14ea4j+anov6LVZVIY9lTh4IF21V0irQIMc?= =?us-ascii?Q?Jzytqu762bAlFD45MPHhJWrKo2vHmzfA?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(7416014)(36860700013)(82310400026)(1800799024);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:08:49.9273 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 78289c7b-5265-47a9-fb05-08dcd60da6c5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001CE.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6347 Content-Type: text/plain; charset="utf-8" Add a new "managed mode" to percpu refcounts, to track initial reference drop for refs which use RCU grace period for their object reclaims. Typical usage pattern for such refs is: // Called with elevated refcount get() p =3D get_ptr(); kref_get(&p->count); return p; get() rcu_read_lock(); p =3D get_ptr(); if (p && !kref_get_unless_zero(&p->count)) p =3D NULL; rcu_read_unlock(); return p; release() remove_ptr(p); call_rcu(&p->rcu, freep); release() remove_ptr(p); kfree_rcu((p, rcu); Currently, percpu ref requires users to call percpu_ref_kill() when object usage enters a shutdown phase. Post killi operation, ref increment/ decrement are performed on a atomic counter. For cases where ref is actively acquired and released after percpu_ref_kill(), percpu ref does not provide any performance benefits over using an atomic reference counter. Managed mode offloads tracking of ref kill to a manager thread, thereby not requiring users to explicitly call percpu_ref_kill(). This helps avoid the problem of suboptimal performance if a percpu ref is actively acquired and released after percpu_ref_kill() operation. A percpu ref can be initialized as managed either during percpu_ref_init() by passing PERCPU_REF_REL_MANAGED flag or a reinitable ref can be switched to managed mode using percpu_ref_switch_to_managed() post its initialization. Deferred switch to managed mode can be used for cases like module initialization errors, where a inited percpu ref's initial reference is dropped before the object becomes active and is referenced by other contexts. One such case is Apparmor labels which are not associated yet with a namespace. These labels are freed without waiting for a RCU grace period. So, managed mode cannot be used for these labels until their initialization has completed. Following are the allowed initialization modes for managed ref: Atomic Percpu Dead Reinit Managed Managed-ref Y N Y Y Y Following are the allowed transitions for managed ref: To --> A P P(RI) M D D(RI) D(RI/M) KLL REI RES A y n y y n y y y y y P n n n n y n n y n n M n n n y n n y n y y P(RI) y n y y n y y y y y D(RI) y n y y n y y - y y D(RI/M) n n n y n n y - y y Modes: A - Atomic P - PerCPU M - Managed P(RI) - PerCPU with ReInit D(RI) - Dead with ReInit D(RI/M) - Dead with ReInit and Managed PerCPU Ref Ops: KLL - Kill REI - Reinit RES - Resurrect Once a percpu ref is switched to managed mode, it cannot be switched to any other active mode. On reinit/resurrect, managed ref is reinitialized in managed mode. Signed-off-by: Neeraj Upadhyay --- .../admin-guide/kernel-parameters.txt | 12 + include/linux/percpu-refcount.h | 13 + lib/percpu-refcount.c | 358 +++++++++++++++++- 3 files changed, 364 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 09126bb8cc9f..0f02a1b04fe9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4665,6 +4665,18 @@ allocator. This parameter is primarily for debugging and performance comparison. =20 + percpu_refcount.max_scan_count=3D [KNL] + Specifies the maximum number of percpu ref nodes which + are processed in one run of percpu ref manager thread. + + Default: 100 + + percpu_refcount.scan_interval=3D [KNL] + Specifies the duration (ms) between two runs of manager + thread. + + Default: 5000 ms + pirq=3D [SMP,APIC] Manual mp-table setup See Documentation/arch/x86/i386/IO-APIC.rst. =20 diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcoun= t.h index d73a1c08c3e3..e6aea81b3d01 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -68,6 +68,11 @@ enum { __PERCPU_REF_FLAG_BITS =3D 2, }; =20 +/* Auxiliary flags */ +enum { + __PERCPU_REL_MANAGED =3D 1LU << 0, /* operating in managed mode */ +}; + /* @flags for percpu_ref_init() */ enum { /* @@ -90,6 +95,10 @@ enum { * Allow switching from atomic mode to percpu mode. */ PERCPU_REF_ALLOW_REINIT =3D 1 << 2, + /* + * Manage release of the percpu ref. + */ + PERCPU_REF_REL_MANAGED =3D 1 << 3, }; =20 struct percpu_ref_data { @@ -100,6 +109,9 @@ struct percpu_ref_data { bool allow_reinit:1; struct rcu_head rcu; struct percpu_ref *ref; + unsigned int aux_flags; + struct llist_node node; + }; =20 struct percpu_ref { @@ -126,6 +138,7 @@ void percpu_ref_switch_to_atomic(struct percpu_ref *ref, percpu_ref_func_t *confirm_switch); void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref); void percpu_ref_switch_to_percpu(struct percpu_ref *ref); +int percpu_ref_switch_to_managed(struct percpu_ref *ref); void percpu_ref_kill_and_confirm(struct percpu_ref *ref, percpu_ref_func_t *confirm_kill); void percpu_ref_resurrect(struct percpu_ref *ref); diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 668f6aa6a75d..7b97f9728c5b 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -5,6 +5,9 @@ #include #include #include +#include +#include +#include #include #include =20 @@ -38,6 +41,7 @@ =20 static DEFINE_SPINLOCK(percpu_ref_switch_lock); static DECLARE_WAIT_QUEUE_HEAD(percpu_ref_switch_waitq); +static LLIST_HEAD(percpu_ref_manage_head); =20 static unsigned long __percpu *percpu_count_ptr(struct percpu_ref *ref) { @@ -45,6 +49,8 @@ static unsigned long __percpu *percpu_count_ptr(struct pe= rcpu_ref *ref) (ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC_DEAD); } =20 +int percpu_ref_switch_to_managed(struct percpu_ref *ref); + /** * percpu_ref_init - initialize a percpu refcount * @ref: percpu_ref to initialize @@ -80,6 +86,9 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_fu= nc_t *release, return -ENOMEM; } =20 + if (flags & PERCPU_REF_REL_MANAGED) + flags |=3D PERCPU_REF_ALLOW_REINIT; + data->force_atomic =3D flags & PERCPU_REF_INIT_ATOMIC; data->allow_reinit =3D flags & PERCPU_REF_ALLOW_REINIT; =20 @@ -101,10 +110,73 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_re= f_func_t *release, data->confirm_switch =3D NULL; data->ref =3D ref; ref->data =3D data; + init_llist_node(&data->node); + + if (flags & PERCPU_REF_REL_MANAGED) + percpu_ref_switch_to_managed(ref); + return 0; } EXPORT_SYMBOL_GPL(percpu_ref_init); =20 +static bool percpu_ref_is_managed(struct percpu_ref *ref) +{ + return (ref->data->aux_flags & __PERCPU_REL_MANAGED) !=3D 0; +} + +static void __percpu_ref_switch_mode(struct percpu_ref *ref, + percpu_ref_func_t *confirm_switch); + +static int __percpu_ref_switch_to_managed(struct percpu_ref *ref) +{ + unsigned long __percpu *percpu_count; + struct percpu_ref_data *data; + int ret =3D -1; + + data =3D ref->data; + + if (WARN_ONCE(!percpu_ref_tryget(ref), "Percpu ref is not active")) + return ret; + + if (WARN_ONCE(!data->allow_reinit, "Percpu ref does not allow switch")) + goto err_switch_managed; + + if (WARN_ONCE(percpu_ref_is_managed(ref), "Percpu ref is already managed"= )) + goto err_switch_managed; + + data->aux_flags |=3D __PERCPU_REL_MANAGED; + data->force_atomic =3D false; + if (!__ref_is_percpu(ref, &percpu_count)) + __percpu_ref_switch_mode(ref, NULL); + /* Ensure ordering of percpu mode switch and node scan */ + smp_mb(); + llist_add(&data->node, &percpu_ref_manage_head); + + return 0; + +err_switch_managed: + percpu_ref_put(ref); + return ret; +} + +/** + * percpu_ref_switch_to_managed - Switch an unmanaged ref to percpu mode. + * + * @ref: percpu_ref to switch to managed mode + * + */ +int percpu_ref_switch_to_managed(struct percpu_ref *ref) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + ret =3D __percpu_ref_switch_to_managed(ref); + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + return ret; +} +EXPORT_SYMBOL_GPL(percpu_ref_switch_to_managed); + static void __percpu_ref_exit(struct percpu_ref *ref) { unsigned long __percpu *percpu_count =3D percpu_count_ptr(ref); @@ -283,6 +355,27 @@ static void __percpu_ref_switch_mode(struct percpu_ref= *ref, __percpu_ref_switch_to_percpu(ref); } =20 +static bool __percpu_ref_switch_to_atomic_checked(struct percpu_ref *ref, + percpu_ref_func_t *confirm_switch, + bool check_managed) +{ + unsigned long flags; + + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + if (check_managed && WARN_ONCE(percpu_ref_is_managed(ref), + "Percpu ref is managed, cannot switch to atomic mode")) { + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + return false; + } + + ref->data->force_atomic =3D true; + __percpu_ref_switch_mode(ref, confirm_switch); + + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + + return true; +} + /** * percpu_ref_switch_to_atomic - switch a percpu_ref to atomic mode * @ref: percpu_ref to switch to atomic mode @@ -306,17 +399,16 @@ static void __percpu_ref_switch_mode(struct percpu_re= f *ref, void percpu_ref_switch_to_atomic(struct percpu_ref *ref, percpu_ref_func_t *confirm_switch) { - unsigned long flags; - - spin_lock_irqsave(&percpu_ref_switch_lock, flags); - - ref->data->force_atomic =3D true; - __percpu_ref_switch_mode(ref, confirm_switch); - - spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + (void)__percpu_ref_switch_to_atomic_checked(ref, confirm_switch, true); } EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic); =20 +static void __percpu_ref_switch_to_atomic_sync_checked(struct percpu_ref *= ref, bool check_managed) +{ + if (!__percpu_ref_switch_to_atomic_checked(ref, NULL, check_managed)) + return; + wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch); +} /** * percpu_ref_switch_to_atomic_sync - switch a percpu_ref to atomic mode * @ref: percpu_ref to switch to atomic mode @@ -327,11 +419,28 @@ EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic); */ void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref) { - percpu_ref_switch_to_atomic(ref, NULL); - wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch); + __percpu_ref_switch_to_atomic_sync_checked(ref, true); } EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync); =20 +static void __percpu_ref_switch_to_percpu_checked(struct percpu_ref *ref, = bool check_managed) +{ + unsigned long flags; + + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + + if (check_managed && WARN_ONCE(percpu_ref_is_managed(ref), + "Percpu ref is managed, cannot switch to percpu mode")) { + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + return; + } + + ref->data->force_atomic =3D false; + __percpu_ref_switch_mode(ref, NULL); + + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); +} + /** * percpu_ref_switch_to_percpu - switch a percpu_ref to percpu mode * @ref: percpu_ref to switch to percpu mode @@ -352,14 +461,7 @@ EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync); */ void percpu_ref_switch_to_percpu(struct percpu_ref *ref) { - unsigned long flags; - - spin_lock_irqsave(&percpu_ref_switch_lock, flags); - - ref->data->force_atomic =3D false; - __percpu_ref_switch_mode(ref, NULL); - - spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + __percpu_ref_switch_to_percpu_checked(ref, true); } EXPORT_SYMBOL_GPL(percpu_ref_switch_to_percpu); =20 @@ -472,8 +574,226 @@ void percpu_ref_resurrect(struct percpu_ref *ref) =20 ref->percpu_count_ptr &=3D ~__PERCPU_REF_DEAD; percpu_ref_get(ref); - __percpu_ref_switch_mode(ref, NULL); + if (percpu_ref_is_managed(ref)) { + ref->data->aux_flags &=3D ~__PERCPU_REL_MANAGED; + __percpu_ref_switch_to_managed(ref); + } else { + __percpu_ref_switch_mode(ref, NULL); + } =20 spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); } EXPORT_SYMBOL_GPL(percpu_ref_resurrect); + +#define DEFAULT_SCAN_INTERVAL_MS 5000 +/* Interval duration between two ref scans. */ +static ulong scan_interval =3D DEFAULT_SCAN_INTERVAL_MS; +module_param(scan_interval, ulong, 0444); + +#define DEFAULT_MAX_SCAN_COUNT 100 +/* Number of percpu refs scanned in one iteration of worker execution. */ +static int max_scan_count =3D DEFAULT_MAX_SCAN_COUNT; +module_param(max_scan_count, int, 0444); + +static void percpu_ref_release_work_fn(struct work_struct *work); + +/* + * Sentinel llist nodes for lockless list traveral and deletions by + * the pcpu ref release worker, while nodes are added from + * percpu_ref_init() and percpu_ref_switch_to_managed(). + * + * Sentinel node marks the head of list traversal for the current + * iteration of kworker execution. + */ +struct percpu_ref_sen_node { + bool inuse; + struct llist_node node; +}; + +/* + * We need two sentinel nodes for lockless list manipulations from release + * worker - first node will be used in current reclaim iteration. The seco= nd + * node will be used in next iteration. Next iteration marks the first node + * as free, for use in subsequent iteration. + */ +#define PERCPU_REF_SEN_NODES_COUNT 2 + +/* Track last processed percpu ref node */ +static struct llist_node *last_percpu_ref_node; + +static struct percpu_ref_sen_node + percpu_ref_sen_nodes[PERCPU_REF_SEN_NODES_COUNT]; + +static DECLARE_DELAYED_WORK(percpu_ref_release_work, percpu_ref_release_wo= rk_fn); + +static bool percpu_ref_is_sen_node(struct llist_node *node) +{ + return &percpu_ref_sen_nodes[0].node <=3D node && + node <=3D &percpu_ref_sen_nodes[PERCPU_REF_SEN_NODES_COUNT - 1].node; +} + +static struct llist_node *percpu_ref_get_sen_node(void) +{ + int i; + struct percpu_ref_sen_node *sn; + + for (i =3D 0; i < PERCPU_REF_SEN_NODES_COUNT; i++) { + sn =3D &percpu_ref_sen_nodes[i]; + if (!sn->inuse) { + sn->inuse =3D true; + return &sn->node; + } + } + + return NULL; +} + +static void percpu_ref_put_sen_node(struct llist_node *node) +{ + struct percpu_ref_sen_node *sn =3D container_of(node, struct percpu_ref_s= en_node, node); + + sn->inuse =3D false; + init_llist_node(node); +} + +static void percpu_ref_put_all_sen_nodes_except(struct llist_node *node) +{ + int i; + + for (i =3D 0; i < PERCPU_REF_SEN_NODES_COUNT; i++) { + if (&percpu_ref_sen_nodes[i].node =3D=3D node) + continue; + percpu_ref_sen_nodes[i].inuse =3D false; + init_llist_node(&percpu_ref_sen_nodes[i].node); + } +} + +static struct workqueue_struct *percpu_ref_release_wq; + +static void percpu_ref_release_work_fn(struct work_struct *work) +{ + struct llist_node *pos, *first, *head, *prev, *next; + struct llist_node *sen_node; + struct percpu_ref *ref; + int count =3D 0; + bool held; + + first =3D READ_ONCE(percpu_ref_manage_head.first); + if (!first) + goto queue_release_work; + + /* + * Enqueue a dummy node to mark the start of scan. This dummy + * node is used as start point of scan and ensures that + * there is no additional synchronization required with new + * label node additions to the llist. Any new labels will + * be processed in next run of the kworker. + * + * SCAN START PTR + * | + * v + * +----------+ +------+ +------+ +------+ + * | | | | | | | | + * | head ------> dummy|--->|label |--->| label|--->NULL + * | | | node | | | | | + * +----------+ +------+ +------+ +------+ + * + * + * New label addition: + * + * SCAN START PTR + * | + * v + * +----------+ +------+ +------+ +------+ +------+ + * | | | | | | | | | | + * | head |--> label|--> dummy|--->|label |--->| label|--->NULL + * | | | | | node | | | | | + * +----------+ +------+ +------+ +------+ +------+ + * + */ + if (last_percpu_ref_node =3D=3D NULL || last_percpu_ref_node->next =3D=3D= NULL) { +retry_sentinel_get: + sen_node =3D percpu_ref_get_sen_node(); + /* + * All sentinel nodes are in use? This should not happen, as we + * require only one sentinel for the start of list traversal and + * other sentinel node is freed during the traversal. + */ + if (WARN_ONCE(!sen_node, "All sentinel nodes are in use")) { + /* Use first node as the sentinel node */ + head =3D first->next; + if (!head) { + struct llist_node *ign_node =3D NULL; + /* + * We exhausted sentinel nodes. However, there aren't + * enough nodes in the llist. So, we have leaked + * sentinel nodes. Reclaim sentinels and retry. + */ + if (percpu_ref_is_sen_node(first)) + ign_node =3D first; + percpu_ref_put_all_sen_nodes_except(ign_node); + goto retry_sentinel_get; + } + prev =3D first; + } else { + llist_add(sen_node, &percpu_ref_manage_head); + prev =3D sen_node; + head =3D prev->next; + } + } else { + prev =3D last_percpu_ref_node; + head =3D prev->next; + } + + last_percpu_ref_node =3D NULL; + llist_for_each_safe(pos, next, head) { + /* Free sentinel node which is present in the list */ + if (percpu_ref_is_sen_node(pos)) { + prev->next =3D pos->next; + percpu_ref_put_sen_node(pos); + continue; + } + + ref =3D container_of(pos, struct percpu_ref_data, node)->ref; + __percpu_ref_switch_to_atomic_sync_checked(ref, false); + /* + * Drop the ref while in RCU read critical section to + * prevent obj free while we manipulating node. + */ + rcu_read_lock(); + percpu_ref_put(ref); + held =3D percpu_ref_tryget(ref); + if (!held) { + prev->next =3D pos->next; + init_llist_node(pos); + ref->percpu_count_ptr |=3D __PERCPU_REF_DEAD; + } + rcu_read_unlock(); + if (!held) + continue; + __percpu_ref_switch_to_percpu_checked(ref, false); + count++; + if (count =3D=3D max_scan_count) { + last_percpu_ref_node =3D pos; + break; + } + prev =3D pos; + } + +queue_release_work: + queue_delayed_work(percpu_ref_release_wq, &percpu_ref_release_work, + scan_interval); +} + +static __init int percpu_ref_setup(void) +{ + percpu_ref_release_wq =3D alloc_workqueue("percpu_ref_release_wq", + WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_FREEZABLE, 0); + if (!percpu_ref_release_wq) + return -ENOMEM; + + queue_delayed_work(percpu_ref_release_wq, &percpu_ref_release_work, + scan_interval); + return 0; +} +early_initcall(percpu_ref_setup); --=20 2.34.1 From nobody Fri Nov 29 19:48:03 2024 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2081.outbound.protection.outlook.com [40.107.244.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C84D443ACB; Mon, 16 Sep 2024 05:09:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463361; cv=fail; b=kX7wMf9shFeleAT7ha205Ib0kqyqEYIP3UnF6gnz30Upo2+c3IK178PyoR3wwq3hZB946FdJUpXuYTIsrOVTg3zTZCRuhnWJf7qmp98TPdD4RVrhECOYQMFDhvQggla6uPQFEVRXLmLdTmTMtzegtMosm3U+464CplaEflTUcfY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463361; c=relaxed/simple; bh=cMnCbt8kRYfh4msHZJQH3lxJQDqr94sFrjU2OilMnWs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QFGLnLwpiIF9dWnK2rctnWXLm/T1T893f+ItBu0kPt9gapRLjTY1aSzCKTh3udIhV3dQdaaqwU5vSYLetTvpsu9JyozJxK/OEplZfXAsfXwEolAi0buUnCKSWX2l+ElR1AJhb0OsGccU6lJmLF/2p48GPEOjigK5w2jbP2W/OwM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=X7MRNBtn; arc=fail smtp.client-ip=40.107.244.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="X7MRNBtn" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DbBdgvkRlg4WjNbnEva8nHBQeyZx6eRfCIGDkEqQbV1C6L8CQuaE1buuB8NMTubrObCqsDskq+FwUlqZD/iOfhdZVsnlpGEQG2E8Cm7nK3pCWDkwyHs53D/qOC42ng55+xfNgzc5KvcLg9YL4G8U0z9m2tSAJU/To40osNTkAtsRQPw3caJRE62LNHyjRDqd8uGp0FnGKXNo1KDLMYwSRGK+a/9hhIZlcuKzpmokQ1duI60QwwPWJgEpNh6WXjk/Gp1A86yYUCasrWmo7NNCjmhBJGpSC+p7c4SXa8kgfbO+rqVzUxQ5L7jMHwmKWN1YmkpPueLjLS8SkM4sAoeH+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UAYTEu/r5ZwyE4gU0EfeKdUTT2JqfoHk4L5udSk6Kng=; b=tkTc62pr39PWBh2YweLLSVn6h2zqTo7MMotR6b5oSFMs9CSC/W21EUQeaThnxlSJpFWxolWnLh1i8mEscVAi7wTc/BD2VbUSJ1bPbbWXykFSH7qIq6jDfynpm+kBPJpYKuHqNqemr+td23slUEBTYsPE9O/B+0fzod1Jsn75BGWrcMAGs+DsFWRz1eAtpj8sHPDAkLH8RKfa3w5pEIuDeiw0erzoOBIjO5E1813MlZNdSQp7B8UThjnzsjtPmEwPc9P9Dvk7t03TVcieP/tTtzxu4Xi5LiBHFfvs/Pnz7oE8IFJ2Tnq+hYzxWIAd2SjQpJXl6bVxq3cusCa3dsMJjQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UAYTEu/r5ZwyE4gU0EfeKdUTT2JqfoHk4L5udSk6Kng=; b=X7MRNBtnYmAuur4tWHwjzQ2NPD53v/H864b5YPzoL3r1aB9Hi0CO6f8+jxFNg4gSHboK75cw8SgK3Nnqgocethc+u8QKoBSmLUbAL9FzDrRUSsJlg2sz4QVfKMKkInSL9VWZEJOTLWYAOGVd/2YKwmWiB1d8mKb8GMx+A/j2SrI= Received: from SJ0PR03CA0280.namprd03.prod.outlook.com (2603:10b6:a03:39e::15) by CYXPR12MB9426.namprd12.prod.outlook.com (2603:10b6:930:e3::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.22; Mon, 16 Sep 2024 05:09:13 +0000 Received: from SJ5PEPF000001CE.namprd05.prod.outlook.com (2603:10b6:a03:39e:cafe::e2) by SJ0PR03CA0280.outlook.office365.com (2603:10b6:a03:39e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.26 via Frontend Transport; Mon, 16 Sep 2024 05:09:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001CE.mail.protection.outlook.com (10.167.242.38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:09:13 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:09:07 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 2/6] percpu-refcount: Add torture test for percpu refcount Date: Mon, 16 Sep 2024 10:38:07 +0530 Message-ID: <20240916050811.473556-3-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001CE:EE_|CYXPR12MB9426:EE_ X-MS-Office365-Filtering-Correlation-Id: b7a604ec-097e-40c1-7c2b-08dcd60db4b9 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|1800799024|7416014|376014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?e3GE4r7ZazQvKAGYI45liIL+y6LeoiYsEmL2xsWFns5D9I5bJzdiSxoMEm6r?= =?us-ascii?Q?IoIc5aBiIISi05coB+QcqpnMGFYvtaDmUWWbC3tEtXYoI5aF+/m9Gk9UGKeJ?= =?us-ascii?Q?FG8C6iKEoUZRlz4fxrt09KoKuEHB5rDaQjxCY/txwD8mMYpLtqg3IOkMXozt?= =?us-ascii?Q?EAzaQE8BE5DeJgRp5qcK7A1dQliz8PwMsgj3NK2ZPT/X6lsaVsvR1zJ3KeX9?= =?us-ascii?Q?Wb8yaqxhoJHucY9JnB4CKv8GZ9TtHGhiXQdpOWTzuvZmGgsXY5198KavtftR?= =?us-ascii?Q?wc3A9JekQsEtnEkKiW+Uqsw41KCa3mK718fara+wTg8HsdLafPRfZadu42Yq?= =?us-ascii?Q?lIJMfz+7fh60NrVwUtJkpY9XhSaFkx9Z9hmrkX7P6a4PnT08kQpIZYl0ctbX?= =?us-ascii?Q?qdVCW451AsHhUfLXzW6d9yvMYYVKzc1vHNJxFQXi3OSS71BHiMeUYcDwoCXT?= =?us-ascii?Q?HI2XKZselOEbhlaJow3qVa3dIcclThuGf+e+4HtAu2+geWuOX/l4YdHDKO7Q?= =?us-ascii?Q?3LpsnuKMN/sS06i7XyRPVq3pRgdPuL7RpGOFrfdnYAJG7CsikM+ajsM5dJQo?= =?us-ascii?Q?IEgnpVg9Sjl+jaE9P2rWZlIXSyrd4hT4uYkMdJqRz3o8cNjh/5dImqzP5lJj?= =?us-ascii?Q?XtvQBpv2H09HuAZrkmnDAmkJvQhCL5Tjfa59zSsOtKdVOeCb5triuHDm+o//?= =?us-ascii?Q?W7V6JifbTXlza3wFL/zs2Boi5YMgBcgNDRkVMUptYzfIzHxQJJXUu6zSef4j?= =?us-ascii?Q?BLa2ps1y/yVUIyriNwHgAW2huqADWyaEGpCrvrfAGq+WO4x4DB05e5SG4DTQ?= =?us-ascii?Q?YxSYKsUNr8E/zAi61hrqwrlJGF+X2/r4OUxjCst4YvU9HgSbqn9O5WQknSRN?= =?us-ascii?Q?gqGMF8G/STmm5tGoI4c3ZaRveoYWWJTiBViOCTndQn9KndUMYvpxo3IhXOih?= =?us-ascii?Q?xZvsksNGSicLy0pXza95CwG/dBd0iiQyOJafd9yAU4R+1VTnv6wBWGQeyw1/?= =?us-ascii?Q?IS4kYS5qUiZZeOa7NDSpkIvB3DV94fSQkW5ziuPW1FvQ5qY1LEkRQQsBG/q9?= =?us-ascii?Q?+lzzQ+SfbW91Lq37SwmqvtSFF0nZZXCDhIOp+Ue13FVj61OcOACoN0HfxgCQ?= =?us-ascii?Q?F3cw1f6BNqFJNeJKDhf2vd69GOlIdH/3W/0yrpNkbjvZ0tTy6Sm6z1iuiliu?= =?us-ascii?Q?wNXbCyJ0Gym0iffsmfKxvG+N8epXKkMFBPK7cuat+JLJbDNVMZUc7A0D/xUs?= =?us-ascii?Q?YQLVXGwBL4j18xGlk2d4yA2xQYS5qNr9wJRf3UZWsfVe3ow+Xu2b55/9f0EV?= =?us-ascii?Q?j2x6Y0VumnRuYErI6Rf0uFRdAo+esfkVDdi94RMoFwkecupsmNcTUudthgBX?= =?us-ascii?Q?BniGagVZTjpxBZ4tNg0iuyT5Tnle24ukWq+CuE8Rex0wKY+6oDAPHhjMDFUk?= =?us-ascii?Q?npXmnk646+Mun45NNhEIqJ1vQNMNiVkb?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(1800799024)(7416014)(376014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:09:13.3022 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b7a604ec-097e-40c1-7c2b-08dcd60db4b9 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001CE.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYXPR12MB9426 Content-Type: text/plain; charset="utf-8" Add torture test to verify percpu managed mode operations, verifying that a percpu ref does not have non-zero count when all users have dropped their reference and that there is no early release of the ref while users hold references to it. Signed-off-by: Neeraj Upadhyay --- .../admin-guide/kernel-parameters.txt | 57 +++ lib/Kconfig.debug | 9 + lib/Makefile | 1 + lib/percpu-refcount-torture.c | 367 ++++++++++++++++++ lib/percpu-refcount.c | 49 ++- lib/percpu-refcount.h | 6 + 6 files changed, 483 insertions(+), 6 deletions(-) create mode 100644 lib/percpu-refcount-torture.c create mode 100644 lib/percpu-refcount.h diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 0f02a1b04fe9..225f2dac294d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4677,6 +4677,63 @@ =20 Default: 5000 ms =20 + percpu_refcount_torture.busted_early_ref_release=3D [KNL] + Enable testing buggy release of percpu ref while + there are active users. Used for testing failure + scenarios in the test. + + Default: 0 (disabled) + + percpu_refcount_torture.busted_late_ref_release=3D [KNL] + Enable testing buggy non-zero reference count after + all ref users have dropped their reference. Used for + testing failure scenarios in the test. + + Default: 0 (disabled) + + percpu_refcount_torture.delay_us=3D [KNL] + Delay (in us) between reference increment and decrement + operations of ref users. + + Default: 10 + + percpu_refcount_torture.niterations=3D [KNL] + Number of iterations of ref increment and decrement by + ref users. + + Default: 100 + + percpu_refcount_torture.nrefs=3D [KNL] + Number of percpu ref instances. + + Default: 2 + + percpu_refcount_torture.nusers=3D [KNL] + Number of percpu ref user threads which increment and + decrement a percpu ref. + + Default: 2 + + percpu_refcount_torture.onoff_holdoff=3D [KNL] + Set time (s) after boot for CPU-hotplug testing. + + Default: 0 + + percpu_refcount_torture.onoff_interval=3D [KNL] + Set time (jiffies) between CPU-hotplug operations, + or zero to disable CPU-hotplug testing. + + percpu_refcount_torture.stutter=3D [KNL] + Set wait time (jiffies) between two iterations of + percpu ref operations. + + Default: 0 + + percpu_refcount_torture.verbose=3D [KNL] + Enable additional printk() statements. + + Default: 0 (Disabled) + pirq=3D [SMP,APIC] Manual mp-table setup See Documentation/arch/x86/i386/IO-APIC.rst. =20 diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a30c03a66172..7e0117e01f05 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1611,6 +1611,15 @@ config SCF_TORTURE_TEST module may be built after the fact on the running kernel to be tested, if desired. =20 +config PERCPU_REFCOUNT_TORTURE_TEST + tristate "torture tests for percpu refcount" + select TORTURE_TEST + help + This options provides a kernel module that runs percpu + refcount torture tests for managed percpu refs. The kernel + module may be built after the fact on the running kernel + to be tested, if desired. + config CSD_LOCK_WAIT_DEBUG bool "Debugging for csd_lock_wait(), called from smp_call_function*()" depends on DEBUG_KERNEL diff --git a/lib/Makefile b/lib/Makefile index 322bb127b4dc..d0286f7dfb37 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -50,6 +50,7 @@ obj-y +=3D bcd.o sort.o parser.o debug_locks.o random32.o= \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ generic-radix-tree.o bitmap-str.o obj-$(CONFIG_STRING_KUNIT_TEST) +=3D string_kunit.o +obj-$(CONFIG_PERCPU_REFCOUNT_TORTURE_TEST) +=3D percpu-refcount-torture.o obj-y +=3D string_helpers.o obj-$(CONFIG_STRING_HELPERS_KUNIT_TEST) +=3D string_helpers_kunit.o obj-y +=3D hexdump.o diff --git a/lib/percpu-refcount-torture.c b/lib/percpu-refcount-torture.c new file mode 100644 index 000000000000..686f5a228b40 --- /dev/null +++ b/lib/percpu-refcount-torture.c @@ -0,0 +1,367 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include + +#include "percpu-refcount.h" + +static int busted_early_ref_release; +module_param(busted_early_ref_release, int, 0444); +MODULE_PARM_DESC(busted_early_ref_release, + "Enable busted premature release of ref (default =3D 0), 0 =3D disable"= ); + +static int busted_late_ref_release; +module_param(busted_late_ref_release, int, 0444); +MODULE_PARM_DESC(busted_late_ref_release, + "Enable busted late release of ref (default =3D 0), 0 =3D disable"); + +static long delay_us =3D 10; +module_param(delay_us, long, 0444); +MODULE_PARM_DESC(delay_us, + "delay between reader refcount operations in microseconds (default =3D = 10)"); + +static long nrefs =3D 2; +module_param(nrefs, long, 0444); +MODULE_PARM_DESC(nrefs, "Number of percpu refs (default =3D 2)"); + +static long niterations =3D 100; +module_param(niterations, long, 0444); +MODULE_PARM_DESC(niterations, + "Number of iterations of ref increment and decrement (default =3D 100)"= ); + +static long nusers =3D 2; +module_param(nusers, long, 0444); +MODULE_PARM_DESC(nusers, "Number of refcount users (default =3D 2)"); + +static int onoff_holdoff; +module_param(onoff_holdoff, int, 0444); +MODULE_PARM_DESC(onoff_holdoff, "Time after boot before CPU hotplugs (seco= nds)"); + +static int onoff_interval; +module_param(onoff_interval, int, 0444); +MODULE_PARM_DESC(onoff_interval, "Time between CPU hotplugs (jiffies), 0= =3Ddisable"); + +static int stutter; +module_param(stutter, int, 0444); +MODULE_PARM_DESC(stutter, "Stutter period in jiffies (default =3D 0), 0 = =3D disable"); + +static int verbose =3D 1; +module_param(verbose, int, 0444); +MODULE_PARM_DESC(verbose, "Enable verbose debugging printk()s"); + +static struct task_struct **ref_user_tasks; +static struct task_struct *ref_manager_task; +static struct task_struct **busted_early_release_tasks; +static struct task_struct **busted_late_release_tasks; + +static struct percpu_ref *refs; +static long *num_per_ref_users; + +static atomic_t running; +static atomic_t *ref_running; + +static char *torture_type =3D "percpu-refcount"; + +static int percpu_ref_manager_thread(void *data) +{ + int i; + + while (atomic_read(&running) !=3D 0) { + percpu_ref_test_flush_release_work(); + stutter_wait("percpu_ref_manager_thread"); + } + /* Ensure ordering with ref users */ + smp_mb(); + + percpu_ref_test_flush_release_work(); + + for (i =3D 0; i < nrefs; i++) { + WARN(percpu_ref_test_is_percpu(&refs[i]), + "!!! released ref %d should be in atomic mode", i); + WARN(!percpu_ref_is_zero(&refs[i]), + "!!! released ref %d should have 0 refcount", i); + } + + do { + stutter_wait("percpu_ref_manager_thread"); + } while (!torture_must_stop()); + + torture_kthread_stopping("percpu_ref_manager_thread"); + + return 0; +} + +static int percpu_ref_test_thread(void *data) +{ + struct percpu_ref *ref =3D (struct percpu_ref *)data; + int i =3D 0; + + percpu_ref_get(ref); + + do { + percpu_ref_get(ref); + udelay(delay_us); + percpu_ref_put(ref); + stutter_wait("percpu_ref_test_thread"); + i++; + } while (i < niterations); + + atomic_dec(&ref_running[ref - refs]); + /* Order ref release with ref_running[ref_idx] =3D=3D 0 */ + smp_mb(); + percpu_ref_put(ref); + /* Order ref decrement with running =3D=3D 0 */ + smp_mb(); + atomic_dec(&running); + + do { + stutter_wait("percpu_ref_test_thread"); + } while (!torture_must_stop()); + + torture_kthread_stopping("percpu_ref_test_thread"); + + return 0; +} + +static int percpu_ref_busted_early_thread(void *data) +{ + struct percpu_ref *ref =3D (struct percpu_ref *)data; + int ref_idx =3D ref - refs; + int i =3D 0, j; + + do { + /* Extra ref put momemtarily */ + for (j =3D 0; j < num_per_ref_users[ref_idx]; j++) + percpu_ref_put(ref); + stutter_wait("percpu_ref_busted_early_thread"); + for (j =3D 0; j < num_per_ref_users[ref_idx]; j++) + percpu_ref_get(ref); + i++; + stutter_wait("percpu_ref_busted_early_thread"); + } while (i < niterations * 10); + + do { + stutter_wait("percpu_ref_busted_early_thread"); + } while (!torture_must_stop()); + + torture_kthread_stopping("percpu_ref_busted_early_thread"); + + return 0; +} + +static int percpu_ref_busted_late_thread(void *data) +{ + struct percpu_ref *ref =3D (struct percpu_ref *)data; + int i =3D 0; + + do { + /* Extra ref get momemtarily */ + percpu_ref_get(ref); + stutter_wait("percpu_ref_busted_late_thread"); + percpu_ref_put(ref); + i++; + } while (i < niterations); + + do { + stutter_wait("percpu_ref_busted_late_thread"); + } while (!torture_must_stop()); + + torture_kthread_stopping("percpu_ref_busted_late_thread"); + + return 0; +} + +static void percpu_ref_test_cleanup(void) +{ + int i; + + if (torture_cleanup_begin()) + return; + + if (busted_late_release_tasks) { + for (i =3D 0; i < nrefs; i++) + torture_stop_kthread(busted_late_task, busted_late_release_tasks[i]); + kfree(busted_late_release_tasks); + busted_late_release_tasks =3D NULL; + } + + if (busted_early_release_tasks) { + for (i =3D 0; i < nrefs; i++) + torture_stop_kthread(busted_early_task, busted_early_release_tasks[i]); + kfree(busted_early_release_tasks); + busted_early_release_tasks =3D NULL; + } + + if (ref_manager_task) { + torture_stop_kthread(ref_manager, ref_manager_task); + ref_manager_task =3D NULL; + } + + if (ref_user_tasks) { + for (i =3D 0; i < nusers; i++) + torture_stop_kthread(ref_user, ref_user_tasks[i]); + kfree(ref_user_tasks); + ref_user_tasks =3D NULL; + } + + kfree(ref_running); + ref_running =3D NULL; + + kfree(num_per_ref_users); + num_per_ref_users =3D NULL; + + if (refs) { + for (i =3D 0; i < nrefs; i++) + percpu_ref_exit(&refs[i]); + kfree(refs); + refs =3D NULL; + } + + torture_cleanup_end(); +} + +static void percpu_ref_test_release(struct percpu_ref *ref) +{ + WARN(!!atomic_add_return(0, &ref_running[ref-refs]), "!!! Premature ref r= elease"); +} + +static int __init percpu_ref_torture_init(void) +{ + DEFINE_TORTURE_RANDOM(rand); + struct torture_random_state *trsp =3D &rand; + int flags; + int err; + int ref_idx; + int i; + + if (!torture_init_begin("percpu-refcount", verbose)) + return -EBUSY; + + atomic_set(&running, nusers); + /* Order @running with later increment and decrement operations */ + smp_mb(); + + refs =3D kcalloc(nrefs, sizeof(refs[0]), GFP_KERNEL); + if (!refs) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + for (i =3D 0; i < nrefs; i++) { + flags =3D torture_random(trsp) & 1 ? PERCPU_REF_INIT_ATOMIC : PERCPU_REF= _REL_MANAGED; + err =3D percpu_ref_init(&refs[i], percpu_ref_test_release, + flags, GFP_KERNEL); + if (err) + goto init_err; + if (!(flags & PERCPU_REF_REL_MANAGED)) + percpu_ref_switch_to_managed(&refs[i]); + } + + num_per_ref_users =3D kcalloc(nrefs, sizeof(num_per_ref_users[0]), GFP_KE= RNEL); + if (!num_per_ref_users) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + for (i =3D 0; i < nrefs; i++) + num_per_ref_users[i] =3D 0; + + ref_user_tasks =3D kcalloc(nusers, sizeof(ref_user_tasks[0]), GFP_KERNEL); + if (!ref_user_tasks) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + + ref_running =3D kcalloc(nrefs, sizeof(ref_running[0]), GFP_KERNEL); + if (!ref_running) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + + for (i =3D 0; i < nusers; i++) { + ref_idx =3D torture_random(trsp) % nrefs; + atomic_inc(&ref_running[ref_idx]); + num_per_ref_users[ref_idx]++; + /* Order increments with subquent reads */ + smp_mb(); + err =3D torture_create_kthread(percpu_ref_test_thread, + &refs[ref_idx], ref_user_tasks[i]); + if (torture_init_error(err)) + goto init_err; + } + + err =3D torture_create_kthread(percpu_ref_manager_thread, NULL, ref_manag= er_task); + if (torture_init_error(err)) + goto init_err; + + /* Drop initial reference, after test threads have started running */ + udelay(1); + for (i =3D 0; i < nrefs; i++) + percpu_ref_put(&refs[i]); + + + if (busted_early_ref_release) { + busted_early_release_tasks =3D kcalloc(nrefs, + sizeof(busted_early_release_tasks[0]), + GFP_KERNEL); + if (!busted_early_release_tasks) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + for (i =3D 0; i < nrefs; i++) { + err =3D torture_create_kthread(percpu_ref_busted_early_thread, + &refs[i], busted_early_release_tasks[i]); + if (torture_init_error(err)) + goto init_err; + } + } + + if (busted_late_ref_release) { + busted_late_release_tasks =3D kcalloc(nrefs, sizeof(busted_late_release_= tasks[0]), + GFP_KERNEL); + if (!busted_late_release_tasks) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + for (i =3D 0; i < nrefs; i++) { + err =3D torture_create_kthread(percpu_ref_busted_late_thread, + &refs[i], busted_late_release_tasks[i]); + if (torture_init_error(err)) + goto init_err; + } + } + if (stutter) { + err =3D torture_stutter_init(stutter, stutter); + if (torture_init_error(err)) + goto init_err; + } + + err =3D torture_onoff_init(onoff_holdoff * HZ, onoff_interval, NULL); + if (torture_init_error(err)) + goto init_err; + + torture_init_end(); + return 0; +init_err: + torture_init_end(); + percpu_ref_test_cleanup(); + return err; +} + +static void __exit percpu_ref_torture_exit(void) +{ + percpu_ref_test_cleanup(); +} + +module_init(percpu_ref_torture_init); +module_exit(percpu_ref_torture_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("percpu refcount torture test"); diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 7b97f9728c5b..7d0c85c7ce57 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -11,6 +11,8 @@ #include #include =20 +#include "percpu-refcount.h" + /* * Initially, a percpu refcount is just a set of percpu counters. Initiall= y, we * don't try to detect the ref hitting 0 - which means that get/put can ju= st @@ -677,6 +679,7 @@ static void percpu_ref_release_work_fn(struct work_stru= ct *work) struct percpu_ref *ref; int count =3D 0; bool held; + struct llist_node *last_node =3D READ_ONCE(last_percpu_ref_node); =20 first =3D READ_ONCE(percpu_ref_manage_head.first); if (!first) @@ -711,7 +714,7 @@ static void percpu_ref_release_work_fn(struct work_stru= ct *work) * +----------+ +------+ +------+ +------+ +------+ * */ - if (last_percpu_ref_node =3D=3D NULL || last_percpu_ref_node->next =3D=3D= NULL) { + if (last_node =3D=3D NULL || last_node->next =3D=3D NULL) { retry_sentinel_get: sen_node =3D percpu_ref_get_sen_node(); /* @@ -741,11 +744,10 @@ static void percpu_ref_release_work_fn(struct work_st= ruct *work) head =3D prev->next; } } else { - prev =3D last_percpu_ref_node; + prev =3D last_node; head =3D prev->next; } =20 - last_percpu_ref_node =3D NULL; llist_for_each_safe(pos, next, head) { /* Free sentinel node which is present in the list */ if (percpu_ref_is_sen_node(pos)) { @@ -773,18 +775,53 @@ static void percpu_ref_release_work_fn(struct work_st= ruct *work) continue; __percpu_ref_switch_to_percpu_checked(ref, false); count++; - if (count =3D=3D max_scan_count) { - last_percpu_ref_node =3D pos; - break; + if (count =3D=3D READ_ONCE(max_scan_count)) { + WRITE_ONCE(last_percpu_ref_node, pos); + goto queue_release_work; } prev =3D pos; } =20 + WRITE_ONCE(last_percpu_ref_node, NULL); queue_release_work: queue_delayed_work(percpu_ref_release_wq, &percpu_ref_release_work, scan_interval); } =20 +bool percpu_ref_test_is_percpu(struct percpu_ref *ref) +{ + unsigned long __percpu *percpu_count; + + return __ref_is_percpu(ref, &percpu_count); +} +EXPORT_SYMBOL_GPL(percpu_ref_test_is_percpu); + +void percpu_ref_test_flush_release_work(void) +{ + int max_flush =3D READ_ONCE(max_scan_count); + int max_count =3D 1000; + + /* Complete any executing release work */ + flush_delayed_work(&percpu_ref_release_work); + /* Scan till the end of the llist */ + WRITE_ONCE(max_scan_count, INT_MAX); + /* max scan count update visible to release work */ + smp_mb(); + flush_delayed_work(&percpu_ref_release_work); + /* max scan count update visible to release work */ + smp_mb(); + WRITE_ONCE(max_scan_count, 1); + /* max scan count update visible to work */ + smp_mb(); + flush_delayed_work(&percpu_ref_release_work); + while (READ_ONCE(last_percpu_ref_node) !=3D NULL && max_count--) + flush_delayed_work(&percpu_ref_release_work); + /* max scan count update visible to work */ + smp_mb(); + WRITE_ONCE(max_scan_count, max_flush); +} +EXPORT_SYMBOL_GPL(percpu_ref_test_flush_release_work); + static __init int percpu_ref_setup(void) { percpu_ref_release_wq =3D alloc_workqueue("percpu_ref_release_wq", diff --git a/lib/percpu-refcount.h b/lib/percpu-refcount.h new file mode 100644 index 000000000000..be2ac0411194 --- /dev/null +++ b/lib/percpu-refcount.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +#ifndef __LIB_REFCOUNT_H +#define __LIB_REFCOUNT_H +bool percpu_ref_test_is_percpu(struct percpu_ref *ref); +void percpu_ref_test_flush_release_work(void); +#endif --=20 2.34.1 From nobody Fri Nov 29 19:48:03 2024 Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on2058.outbound.protection.outlook.com [40.107.212.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B0D32744E; Mon, 16 Sep 2024 05:09:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.212.58 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463384; cv=fail; b=PCmgUMHlUN/vqZeBCJCrM6KJYIb/36tH6m2GjgcJR1k6cvrw8cA1Fh6BK7gu6/Vl9IxJhHLmxRhdssHOFHtjUAENFglxa0JZUOByYmytBb8lotERy2rwEmtWwp0IPhMIhDuZuQe9OD5+/jBzZk0CicN6ZkicSnbwLnLzDljWKdY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463384; c=relaxed/simple; bh=w/gWoEVrx1J1LxDqTZO4Oo+g6ejZLoZDO95IPl92WNc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=D530e/OTPrECeXuveTUdaWVMoHchRGHIO7AQAY2R2lEjqvEn/ApoT4Axp76km2qGMi9DvBg70xompre3k7qlendF4jrD/inmjM6/VRdaUg9NDmj0yE20vMSkNkrjhNEicM/TaJA4LNpYdiouQL+h9Uky2eC0hsRC3hRRRZFBSn8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=p0BqNEe5; arc=fail smtp.client-ip=40.107.212.58 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="p0BqNEe5" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=wqUtnxMrjx6LqyfywxM5se9oeFvddKQ0FLQ9kHtdW+lwIJ7I0XEl/FlKxmOwyi+gg3YhjWivBBnF34frJPez9qbnCx/tBLuTn8hybsz6LV9Y2RC7uFUu8+dxjcKxflvsIL92ZoJulLL2Qq/hb57TGM5WU4K3cZ1PvklasJEnh7bkVrBcHIM/pDKMd0ElC8YD9xxS76ES2wQ4la5/x7EZHZzGxPJaqBoc/+a3kDyRyQ4w5g2Y01Og+OA4ZfiNB+G19dAwtUSqnTPn+pvSheizmxQFBj3lBwYGB156TqmDAPyS1wTboEp1NS47XhImgL12qrSj1VOs720u/yOcBsRPXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r604xJvf0kpMD9SUu6dvCf+ZojbB7Oe7MWbhODd9r8k=; b=oBEuog5TwBBe6BHCL/G2+YlSmsR40Os+MQLHvKmCDfgQ8Th7DRyYhuCkbvvfUy0vLI79pz+gpg2gdtGzZytewdKnMYMU9+zhSE/B60Jo2vcYgU6rdC7CY0IcWaRXKovVFpIFNjul1ACkeJxAWpzc9Rd+SwfPSI1y2C8Pyo7OIMyZxb0Y1QTNVKurRqZXgbO0oZb3hRA2e6WqBFXYvWuXm6MXcwl7CQnnnwSLP+XWyu8Gi8Jerl0BKPsdaRbsaEwMO86yi7uqwNjTWhwtgyMjARJ2IDZj82CteiKEYANf2F2/q6Tx5v3VSqdWb0mubN2I9ZscGE8TwjQBhzT4+ZPegg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=r604xJvf0kpMD9SUu6dvCf+ZojbB7Oe7MWbhODd9r8k=; b=p0BqNEe57QVw/NguqJ5X6HfoDI1KREe/QUuLbKSsO5dKhYe2GtC6yKzO5BsIIh22xyY0qShz0e9M1a2TILWJXLlykA0wSQjtUucRy2h5jFmBp2JnTkpallwrAJxR0R+zddwib36pd8b2MT6sBKm1A0eyA1QZ44QNLoHIaM7MXqc= Received: from SJ0PR03CA0168.namprd03.prod.outlook.com (2603:10b6:a03:338::23) by SA1PR12MB8743.namprd12.prod.outlook.com (2603:10b6:806:37c::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.23; Mon, 16 Sep 2024 05:09:37 +0000 Received: from SJ5PEPF000001CF.namprd05.prod.outlook.com (2603:10b6:a03:338:cafe::55) by SJ0PR03CA0168.outlook.office365.com (2603:10b6:a03:338::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 05:09:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001CF.mail.protection.outlook.com (10.167.242.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:09:36 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:09:30 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 3/6] percpu-refcount: Extend managed mode to allow runtime switching Date: Mon, 16 Sep 2024 10:38:08 +0530 Message-ID: <20240916050811.473556-4-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001CF:EE_|SA1PR12MB8743:EE_ X-MS-Office365-Filtering-Correlation-Id: e9d30e1f-8e1a-4901-7e08-08dcd60dc294 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700013|7416014|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?2/eNvxGBUcgfLL6PgOg/waOf6yMmzplo3OmXZg6n5Gf3h/GeAwXbCtnE9c1t?= =?us-ascii?Q?uinzVRcytnuYbUNRwjgd6BZQxsUhIdK00v8ONVTC3C8VvYWgadiZ7tep3MBP?= =?us-ascii?Q?TrXxjkmcbi5xZVMunSygJQjEsZqS16ui4MBqfnmZdMcs38rdYrZIQe81agxq?= =?us-ascii?Q?Am613fIfJHyBYMb++jETLwwD2wVadL/902iv1mS5ubpFMgz9y3R7M0nBYlP1?= =?us-ascii?Q?an+M3Huk34JMVVF1iAFXevpQjvvBZd3RRRxr1FrC1193F1YGLKsmS7H+kc7Q?= =?us-ascii?Q?yl2nC04GdHj3O8u58o2uoaKEseLuf51OCnQFTKIzt4IjMYHUp8IBDtURWNXZ?= =?us-ascii?Q?wF25OovX+/GhsBP3rPZtehNbTgJxXgJl0zquc3X/rqfjofNqRjcxDsupOvca?= =?us-ascii?Q?8BWZbWaTayCU0DuyHzWx50/0xf+EfPEBfug5kwd0lTEw45zM5d+A5VRU3P+1?= =?us-ascii?Q?xDQxpvsjwAcmbm0BlHmzc3ZnKAQ8AlO6vsJuPFQEhLZzk+XWL1pjkVzhuzuO?= =?us-ascii?Q?h+HBuVF8f4jIAQkpdwzksY2uWEUr7nh/d41u2yrNEk2ArbqG8rPaPWpTHqwO?= =?us-ascii?Q?De99GWWuLdspZQSJLHLv96kKnf5GOi+k3ftkS0RG7NMHLExH4TzIDKABHoOe?= =?us-ascii?Q?No8MQMwPrmn8Cg+UgjP+T6K7WGcKo0EkMfD55TXvKEhOOSnkOce9U+Cn6twC?= =?us-ascii?Q?UpJWFgPjC1F7mhUs/ciZ+Vt/1Zk3SOmXD1gmAyqM4Pp2ce+K+fdoohjxPbEa?= =?us-ascii?Q?UBYBpvc/A8ufVHLvhYNCcXexAtbx27auq1qfMWaqmCbkOhlyxpqJmROJnSM0?= =?us-ascii?Q?4oPwbjxqufC/BoSp4tpBYBZD9h+55OI5Dya7kVcBPUiuCnrXzfcqilzCGUND?= =?us-ascii?Q?ytajrJySrDyWDf3A6TftirsXx0ps+ZFeKwZHQ7yFD14jJkQOCCIbmtH0VsaC?= =?us-ascii?Q?fdanTYa0xiO/CAGYlpl0Cr9iHC7xLULCUjeBVki/4zTxxbjQKzglUjPyZGpd?= =?us-ascii?Q?5BrEAf+BNZrx5fZKQkmSbM9Flaefs7N2oEtGTY38zjgsmiZclYYcEx2vI8DW?= =?us-ascii?Q?6m8iIKLg5yvXvz7cAhdSIvZEzspmaM6ExguszBhHkBO/wQBGvOMHMh6fcY+M?= =?us-ascii?Q?xG1gQdtirn12ebrt+YspG3FQpoht6EgURutpximmFSEdRjOYx//hbdFhuaMa?= =?us-ascii?Q?xY5+72xmgrU5pmjt2q2joactJsuEO6dL10okdX4uzoFu85yVBVWT8aEzAB4u?= =?us-ascii?Q?aYH7CzpHie93YjJMEziQojvE26l6PhzaDBVKJ4ekeSDJqFIM7UWOwEq8Rm9h?= =?us-ascii?Q?hdo55TkLIxAkaEg2Ds+jzzmY+aDJpO3yNfvG+x3JV+HS7bC4eBInZg2YAkms?= =?us-ascii?Q?mT+joegLmnjbfcOxpEpni9GwlgraDw8OA1sTjHmR0zmkGUAdJZG3PpZ2q/Vy?= =?us-ascii?Q?oyBS2g4fz7nBp0ncLAyn0WzC+M8NzFM3?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700013)(7416014)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:09:36.5791 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e9d30e1f-8e1a-4901-7e08-08dcd60dc294 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001CF.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB8743 Content-Type: text/plain; charset="utf-8" Provide more flexibility in terms of runtime mode switch for a managed percpu ref. This can be useful for cases when in some scenarios, a managed ref's object enters shutdown phase. Instead of waiting for manager thread to process the ref, user can dirctly invoke percpu_ref_kill() for the ref. The init modes are same as in existing code. Runtime mode switching allows switching back a managed ref to unmanaged mode, which allows transitions to all reinit modes from managed mode. To --> A P P(RI) M D D(RI) D(RI/M) EX REI RES A y n y y n y y y y y P n n n n y n n y n n M y* n y* y n y* y y* y y P(RI) y n y y n y y y y y D(RI) y n y y n y y - y y D(RI/M) y* n y* y n y* y - y y Modes: A - Atomic P - PerCPU M - Managed P(RI) - PerCPU with ReInit D(RI) - Dead with ReInit D(RI/M) - Dead with ReInit and Managed PerCPU Ref Ops: KLL - Kill REI - Reinit RES - Resurrect (RI) is for modes which are initialized with PERCPU_REF_ALLOW_REINIT. The transitions shown above are the allowed transitions and can be indirect transitions. For example, managed ref switches to P(RI) mode when percpu_ref_switch_to_unmanaged() is called for it. P(RI) mode can be directly switched to A mode using percpu_ref_switch_to_atomic(). Signed-off-by: Neeraj Upadhyay --- include/linux/percpu-refcount.h | 3 +- lib/percpu-refcount.c | 248 +++++++++++--------------------- 2 files changed, 88 insertions(+), 163 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcoun= t.h index e6aea81b3d01..fe967db431a6 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -110,7 +110,7 @@ struct percpu_ref_data { struct rcu_head rcu; struct percpu_ref *ref; unsigned int aux_flags; - struct llist_node node; + struct list_head node; =20 }; =20 @@ -139,6 +139,7 @@ void percpu_ref_switch_to_atomic(struct percpu_ref *ref, void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref); void percpu_ref_switch_to_percpu(struct percpu_ref *ref); int percpu_ref_switch_to_managed(struct percpu_ref *ref); +void percpu_ref_switch_to_unmanaged(struct percpu_ref *ref); void percpu_ref_kill_and_confirm(struct percpu_ref *ref, percpu_ref_func_t *confirm_kill); void percpu_ref_resurrect(struct percpu_ref *ref); diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 7d0c85c7ce57..b79e36905aa4 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -5,7 +5,7 @@ #include #include #include -#include +#include #include #include #include @@ -43,7 +43,12 @@ =20 static DEFINE_SPINLOCK(percpu_ref_switch_lock); static DECLARE_WAIT_QUEUE_HEAD(percpu_ref_switch_waitq); -static LLIST_HEAD(percpu_ref_manage_head); +static struct list_head percpu_ref_manage_head =3D LIST_HEAD_INIT(percpu_r= ef_manage_head); +/* Spinlock protects node additions/deletions */ +static DEFINE_SPINLOCK(percpu_ref_manage_lock); +/* Mutex synchronizes node deletions with the node being scanned */ +static DEFINE_MUTEX(percpu_ref_active_switch_mutex); +static struct list_head *next_percpu_ref_node =3D &percpu_ref_manage_head; =20 static unsigned long __percpu *percpu_count_ptr(struct percpu_ref *ref) { @@ -112,7 +117,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_= func_t *release, data->confirm_switch =3D NULL; data->ref =3D ref; ref->data =3D data; - init_llist_node(&data->node); + INIT_LIST_HEAD(&data->node); =20 if (flags & PERCPU_REF_REL_MANAGED) percpu_ref_switch_to_managed(ref); @@ -150,9 +155,9 @@ static int __percpu_ref_switch_to_managed(struct percpu= _ref *ref) data->force_atomic =3D false; if (!__ref_is_percpu(ref, &percpu_count)) __percpu_ref_switch_mode(ref, NULL); - /* Ensure ordering of percpu mode switch and node scan */ - smp_mb(); - llist_add(&data->node, &percpu_ref_manage_head); + spin_lock(&percpu_ref_manage_lock); + list_add(&data->node, &percpu_ref_manage_head); + spin_unlock(&percpu_ref_manage_lock); =20 return 0; =20 @@ -162,7 +167,7 @@ static int __percpu_ref_switch_to_managed(struct percpu= _ref *ref) } =20 /** - * percpu_ref_switch_to_managed - Switch an unmanaged ref to percpu mode. + * percpu_ref_switch_to_managed - Switch an unmanaged ref to percpu manage= d mode. * * @ref: percpu_ref to switch to managed mode * @@ -179,6 +184,47 @@ int percpu_ref_switch_to_managed(struct percpu_ref *re= f) } EXPORT_SYMBOL_GPL(percpu_ref_switch_to_managed); =20 +/** + * percpu_ref_switch_to_unmanaged - Switch a managed ref to percpu mode. + * + * @ref: percpu_ref to switch back to unmanaged percpu mode + * + * Must only be called with elevated refcount. + */ +void percpu_ref_switch_to_unmanaged(struct percpu_ref *ref) +{ + bool mutex_taken =3D false; + struct list_head *node; + unsigned long flags; + + might_sleep(); + + WARN_ONCE(!percpu_ref_is_managed(ref), "Percpu ref is not managed"); + + node =3D &ref->data->node; + spin_lock(&percpu_ref_manage_lock); + if (list_empty(node)) { + spin_unlock(&percpu_ref_manage_lock); + mutex_taken =3D true; + mutex_lock(&percpu_ref_active_switch_mutex); + spin_lock(&percpu_ref_manage_lock); + } + + if (next_percpu_ref_node =3D=3D node) + next_percpu_ref_node =3D next_percpu_ref_node->next; + list_del_init(node); + spin_unlock(&percpu_ref_manage_lock); + if (mutex_taken) + mutex_unlock(&percpu_ref_active_switch_mutex); + + /* Drop the pseudo-init reference */ + percpu_ref_put(ref); + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + ref->data->aux_flags &=3D ~__PERCPU_REL_MANAGED; + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); +} +EXPORT_SYMBOL_GPL(percpu_ref_switch_to_unmanaged); + static void __percpu_ref_exit(struct percpu_ref *ref) { unsigned long __percpu *percpu_count =3D percpu_count_ptr(ref); @@ -599,164 +645,35 @@ module_param(max_scan_count, int, 0444); =20 static void percpu_ref_release_work_fn(struct work_struct *work); =20 -/* - * Sentinel llist nodes for lockless list traveral and deletions by - * the pcpu ref release worker, while nodes are added from - * percpu_ref_init() and percpu_ref_switch_to_managed(). - * - * Sentinel node marks the head of list traversal for the current - * iteration of kworker execution. - */ -struct percpu_ref_sen_node { - bool inuse; - struct llist_node node; -}; - -/* - * We need two sentinel nodes for lockless list manipulations from release - * worker - first node will be used in current reclaim iteration. The seco= nd - * node will be used in next iteration. Next iteration marks the first node - * as free, for use in subsequent iteration. - */ -#define PERCPU_REF_SEN_NODES_COUNT 2 - -/* Track last processed percpu ref node */ -static struct llist_node *last_percpu_ref_node; - -static struct percpu_ref_sen_node - percpu_ref_sen_nodes[PERCPU_REF_SEN_NODES_COUNT]; - static DECLARE_DELAYED_WORK(percpu_ref_release_work, percpu_ref_release_wo= rk_fn); =20 -static bool percpu_ref_is_sen_node(struct llist_node *node) -{ - return &percpu_ref_sen_nodes[0].node <=3D node && - node <=3D &percpu_ref_sen_nodes[PERCPU_REF_SEN_NODES_COUNT - 1].node; -} - -static struct llist_node *percpu_ref_get_sen_node(void) -{ - int i; - struct percpu_ref_sen_node *sn; - - for (i =3D 0; i < PERCPU_REF_SEN_NODES_COUNT; i++) { - sn =3D &percpu_ref_sen_nodes[i]; - if (!sn->inuse) { - sn->inuse =3D true; - return &sn->node; - } - } - - return NULL; -} - -static void percpu_ref_put_sen_node(struct llist_node *node) -{ - struct percpu_ref_sen_node *sn =3D container_of(node, struct percpu_ref_s= en_node, node); - - sn->inuse =3D false; - init_llist_node(node); -} - -static void percpu_ref_put_all_sen_nodes_except(struct llist_node *node) -{ - int i; - - for (i =3D 0; i < PERCPU_REF_SEN_NODES_COUNT; i++) { - if (&percpu_ref_sen_nodes[i].node =3D=3D node) - continue; - percpu_ref_sen_nodes[i].inuse =3D false; - init_llist_node(&percpu_ref_sen_nodes[i].node); - } -} - static struct workqueue_struct *percpu_ref_release_wq; =20 static void percpu_ref_release_work_fn(struct work_struct *work) { - struct llist_node *pos, *first, *head, *prev, *next; - struct llist_node *sen_node; + struct list_head *node; struct percpu_ref *ref; int count =3D 0; bool held; - struct llist_node *last_node =3D READ_ONCE(last_percpu_ref_node); =20 - first =3D READ_ONCE(percpu_ref_manage_head.first); - if (!first) + mutex_lock(&percpu_ref_active_switch_mutex); + spin_lock(&percpu_ref_manage_lock); + if (list_empty(&percpu_ref_manage_head)) { + next_percpu_ref_node =3D &percpu_ref_manage_head; + spin_unlock(&percpu_ref_manage_lock); + mutex_unlock(&percpu_ref_active_switch_mutex); goto queue_release_work; - - /* - * Enqueue a dummy node to mark the start of scan. This dummy - * node is used as start point of scan and ensures that - * there is no additional synchronization required with new - * label node additions to the llist. Any new labels will - * be processed in next run of the kworker. - * - * SCAN START PTR - * | - * v - * +----------+ +------+ +------+ +------+ - * | | | | | | | | - * | head ------> dummy|--->|label |--->| label|--->NULL - * | | | node | | | | | - * +----------+ +------+ +------+ +------+ - * - * - * New label addition: - * - * SCAN START PTR - * | - * v - * +----------+ +------+ +------+ +------+ +------+ - * | | | | | | | | | | - * | head |--> label|--> dummy|--->|label |--->| label|--->NULL - * | | | | | node | | | | | - * +----------+ +------+ +------+ +------+ +------+ - * - */ - if (last_node =3D=3D NULL || last_node->next =3D=3D NULL) { -retry_sentinel_get: - sen_node =3D percpu_ref_get_sen_node(); - /* - * All sentinel nodes are in use? This should not happen, as we - * require only one sentinel for the start of list traversal and - * other sentinel node is freed during the traversal. - */ - if (WARN_ONCE(!sen_node, "All sentinel nodes are in use")) { - /* Use first node as the sentinel node */ - head =3D first->next; - if (!head) { - struct llist_node *ign_node =3D NULL; - /* - * We exhausted sentinel nodes. However, there aren't - * enough nodes in the llist. So, we have leaked - * sentinel nodes. Reclaim sentinels and retry. - */ - if (percpu_ref_is_sen_node(first)) - ign_node =3D first; - percpu_ref_put_all_sen_nodes_except(ign_node); - goto retry_sentinel_get; - } - prev =3D first; - } else { - llist_add(sen_node, &percpu_ref_manage_head); - prev =3D sen_node; - head =3D prev->next; - } - } else { - prev =3D last_node; - head =3D prev->next; } + if (next_percpu_ref_node =3D=3D &percpu_ref_manage_head) + node =3D percpu_ref_manage_head.next; + else + node =3D next_percpu_ref_node; + next_percpu_ref_node =3D node->next; + list_del_init(node); + spin_unlock(&percpu_ref_manage_lock); =20 - llist_for_each_safe(pos, next, head) { - /* Free sentinel node which is present in the list */ - if (percpu_ref_is_sen_node(pos)) { - prev->next =3D pos->next; - percpu_ref_put_sen_node(pos); - continue; - } - - ref =3D container_of(pos, struct percpu_ref_data, node)->ref; + while (!list_is_head(node, &percpu_ref_manage_head)) { + ref =3D container_of(node, struct percpu_ref_data, node)->ref; __percpu_ref_switch_to_atomic_sync_checked(ref, false); /* * Drop the ref while in RCU read critical section to @@ -765,24 +682,31 @@ static void percpu_ref_release_work_fn(struct work_st= ruct *work) rcu_read_lock(); percpu_ref_put(ref); held =3D percpu_ref_tryget(ref); - if (!held) { - prev->next =3D pos->next; - init_llist_node(pos); + if (held) { + spin_lock(&percpu_ref_manage_lock); + list_add(node, &percpu_ref_manage_head); + spin_unlock(&percpu_ref_manage_lock); + __percpu_ref_switch_to_percpu_checked(ref, false); + } else { ref->percpu_count_ptr |=3D __PERCPU_REF_DEAD; } rcu_read_unlock(); - if (!held) - continue; - __percpu_ref_switch_to_percpu_checked(ref, false); + mutex_unlock(&percpu_ref_active_switch_mutex); count++; - if (count =3D=3D READ_ONCE(max_scan_count)) { - WRITE_ONCE(last_percpu_ref_node, pos); + if (count =3D=3D READ_ONCE(max_scan_count)) goto queue_release_work; + mutex_lock(&percpu_ref_active_switch_mutex); + spin_lock(&percpu_ref_manage_lock); + node =3D next_percpu_ref_node; + if (!list_is_head(next_percpu_ref_node, &percpu_ref_manage_head)) { + next_percpu_ref_node =3D next_percpu_ref_node->next; + list_del_init(node); } - prev =3D pos; + spin_unlock(&percpu_ref_manage_lock); } =20 - WRITE_ONCE(last_percpu_ref_node, NULL); + mutex_unlock(&percpu_ref_active_switch_mutex); + queue_release_work: queue_delayed_work(percpu_ref_release_wq, &percpu_ref_release_work, scan_interval); --=20 2.34.1 From nobody Fri Nov 29 19:48:03 2024 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2054.outbound.protection.outlook.com [40.107.243.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 327D02BD19; Mon, 16 Sep 2024 05:09:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.243.54 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463400; cv=fail; b=pBdKDjAZWzIDi1/9kTQ9AZ+MeTVFb9TFxaCTkRecbnarI1JMzoQcP2Hfq+g6GaR6lwuMsidrXTy9/oUfNjd2ek6XNpWO6dPrTKTbd/OAKqegmOKP14apcMimjGwWIAPhstGENioHghOKT0HBZ/RSJm4L3YYnwuLyBnVOvVxqEbc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463400; c=relaxed/simple; bh=9iWo5p6+3i8ZqjG/8dT4qlIGTch2mq8nMCtVnDnETyo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IdnspDBt6Ey0jt+Ex77VRE5kSpadi2FFYJXAix4OO2IMkPYPhz5CvIIR/oYIOq799zueS8SkV3uJ3nKKoYdB9XsDLttyxbX06F9IKAhkn7B0Ac84v4xv5kYfV/pu09KehystqS/blc95IhzK945E6hhIJuyZKgTQpByTWZnUAhI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=APZ+J72m; arc=fail smtp.client-ip=40.107.243.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="APZ+J72m" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yveyjA7KkyBNC1laqiTI+b0ZZ+4qeLDSgM3TKaeUF2LI0YQ6gbFUVBQUYPEN/QWWb2vE3yV5H2ZcPK5JL9+bCuUkQCuV0Y4CzU/TGuX7rfKQ5c/lId8VyGDH6rE4tBR+LxXrG68VLiLfj7SVF3ivt5RNVdrtOADx4tmMmK5Qfxh2K0BQ0IfhJBvEgHyoNrstUN7xLixQKnSo6MuPvQIEiKBbwEUvsjINMVuyRm+LLCUCfcUMppLn1zJAx+neZjrrg+bgrKJ6zTlwbzBJKF+jNbzOTD9DdMOHKnXFV9vE/aKdQ5bUzFTXzPy171+Fnfm/dCddhk9wQsFYVzGkK/50jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FS9htN1kP9pZdhBQxkiiDa24nt1ICyEwzzMLrt19yj8=; b=wppSOUaB7d1Ahk+vLk+tnOkRyEirQsHo3ROoAPnR10OTl+RNwqQ5+bKCPA+kDHIuSY6yg1l3EXN6R8VVzzeHMeH5a1Ati9xFJk1jbFwIkYSd9VP+Hbi0AWS6URyISIPSjxYcuG8fT9I3u+o89qmc/SMTK29KFn1nDAYqiQa8O29Hi1kCTU89Snw7QlIwfSKzcXs6tIm3EEBO6usuhm8L6tXvebVjf5ZcXt+IrXCJLPO+FM3JeGw9URq3zqePFwbZ0rqgfdg1hI6htIREgeZPsHKlV+A5slZkgYe05ovOWK1rcOibx2sY4n49hJN1HSYsS4ExBKCbgIA7hqFBX3wm7A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FS9htN1kP9pZdhBQxkiiDa24nt1ICyEwzzMLrt19yj8=; b=APZ+J72mtV6X1/Epc5jo0S+IJmiXxYIh5xLEceQbaIzwte7CHe0xFk29Iq/F0sVsAIkC8xQsNFjiRrAcWTN9ocrdAI5SdO5rBYSuSneuZX7rx4EfxE0KXUUoEJCkv1pWyqYAA5FzYPeSee1QeGoPp71NChDvjDTOyPKJsdkukBc= Received: from MW4PR04CA0384.namprd04.prod.outlook.com (2603:10b6:303:81::29) by SN7PR12MB6931.namprd12.prod.outlook.com (2603:10b6:806:261::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.22; Mon, 16 Sep 2024 05:09:55 +0000 Received: from SJ5PEPF000001C8.namprd05.prod.outlook.com (2603:10b6:303:81:cafe::9e) by MW4PR04CA0384.outlook.office365.com (2603:10b6:303:81::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 05:09:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001C8.mail.protection.outlook.com (10.167.242.36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:09:54 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:09:48 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 4/6] percpu-refcount-torture: Extend test with runtime mode switches Date: Mon, 16 Sep 2024 10:38:09 +0530 Message-ID: <20240916050811.473556-5-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001C8:EE_|SN7PR12MB6931:EE_ X-MS-Office365-Filtering-Correlation-Id: 8be1d72c-8332-45e0-36a6-08dcd60dcd60 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|36860700013|7416014|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?z70JubxGidW8xyCupr+gZAYJkEF1nNPrmas1rQZf3OKcxAoHLvQ2A2uNSPan?= =?us-ascii?Q?lek5alRGxkWvPYIMisGVkTchurTKylNrcMJiMsSKReA7ihRtLjFYiSuhQMl3?= =?us-ascii?Q?bla9r2oT0plHxxVOM3tLOQs3XF47BSBg2ZCPMNmZpg+2uQ/vP7TY9MEbRoDH?= =?us-ascii?Q?rFuWdgUK0NZA/U7PhqQJInZ8ldW5Fty1zQb2jEG//CE69zla2GrdCCKPQWa7?= =?us-ascii?Q?dolyJgTRk8+jMXGkzUR7oQE+Fycbq0R/3n4d2bcxdhQvzHfa1J2bIv3hK7Ud?= =?us-ascii?Q?Ji3fIqD9YHkl/uitzqafQIcVqdLtHHdZOhCQuSLEDTPopHx5zbHnOBGIOzL+?= =?us-ascii?Q?GaZ+/WXidJd56MStLklEvsitCaBWcG09tvzCznmyJmt3wV9REgb2HPJSBDkL?= =?us-ascii?Q?BgofpBVuS4e3TWt4UYv3BtHAzyA/rQtlxflMSYN8Tioowbq2q2iPl5wt64gO?= =?us-ascii?Q?npUoay0dFd8U70M2kdJZAWu1ubgWDsqM2vBeRcLx0bKxp1ws+p6VYy3jYeQ1?= =?us-ascii?Q?IYhpS2gyAyGAVedikCbIsZ4a+BezlEYZpJOVdKLXI2gL4MOrXMdeANG66sRF?= =?us-ascii?Q?lGtuNl4yrRymvQ3a5U/fPzyWbRbM8pgPb8OaoyIaMwkEF+WfCLAay7PbKgHx?= =?us-ascii?Q?tsSwrzdyPt4pBNKFBhZdcMMzPt5EG0n0hgSH8pCoJJ9hVIFrj2lUMuMXU5vo?= =?us-ascii?Q?v+OnpgH3dXGGRxzAiOKB2P0A7jWvrIrs3fStgQswHMoH4B2TTg7gWH447O7D?= =?us-ascii?Q?3y97PiIObEDalhfEw8KAJuNgs+/T78zfBMZd4chAh7SSQTirmDqS83U7ReWu?= =?us-ascii?Q?F6qRaKDIcZ2bsqT1XYzNcghkRmtWsSd03dpe6BSz5PRMcg1o2qEOTvOntWTA?= =?us-ascii?Q?8B9wC4AXYFJw9kHYfonMfPLvIFUAHa8b0jJPQfbYAqZSiVUQ4/XhIFWexGKM?= =?us-ascii?Q?6Uw+DVNd8Ca8XSzuRq5q1NdPgRbiv+1hc3qYF+bnt3lE5kaz2DTETtnzst0Q?= =?us-ascii?Q?WVSMj9RcU51FhYWFXqi1U92fhdNRo2I4QfEdR2vTpBKbSABy+hCgMt6alRgF?= =?us-ascii?Q?iUOAoofDKDNNMYyLewuHiQfNsb86wPNfjSV/quCr9BFelmEY5MyGtz3LlGa4?= =?us-ascii?Q?ahws4g2cC0/BP8Vri8oEdrDmOp+EyC1Nat3d8jbZS9w8r5R1D3J+kLY/Z6P8?= =?us-ascii?Q?GAqaEvq7vZ6um+fzw4KXFAdLYRhjMRbuSYsMbF9ZwezC3lZJ6jKHhwzSYChQ?= =?us-ascii?Q?NyBapizQFwtroRtstZlOR0uy/+eMBj5kLY3b60VVxfeZYEokQH5ErIgeLTdq?= =?us-ascii?Q?Lfdmcro4uJaieQce+/d6uJV366QnPorH7MeJvv1WZoFZMSumZSX+M3OVjL+c?= =?us-ascii?Q?Sh6ngrhX5568rXKsQyTu8qKhBQfiIJRNYcwBUUo5vcqyQot4VG0ysu7lSRoI?= =?us-ascii?Q?Sjp85BEMV9AzyPuERGP3DVKjs+7Tx/oy?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(36860700013)(7416014)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:09:54.6758 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8be1d72c-8332-45e0-36a6-08dcd60dcd60 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001C8.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB6931 Content-Type: text/plain; charset="utf-8" Extend the test to exercise runtime switching from managed mode to other reinitable active modes. Signed-off-by: Neeraj Upadhyay --- lib/percpu-refcount-torture.c | 41 +++++++++++++++++++++++++++++++++-- lib/percpu-refcount.c | 12 +++++++++- 2 files changed, 50 insertions(+), 3 deletions(-) diff --git a/lib/percpu-refcount-torture.c b/lib/percpu-refcount-torture.c index 686f5a228b40..cb2700b16517 100644 --- a/lib/percpu-refcount-torture.c +++ b/lib/percpu-refcount-torture.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include =20 @@ -59,6 +60,7 @@ static struct task_struct **busted_late_release_tasks; =20 static struct percpu_ref *refs; static long *num_per_ref_users; +static struct mutex *ref_switch_mutexes; =20 static atomic_t running; static atomic_t *ref_running; @@ -97,19 +99,36 @@ static int percpu_ref_manager_thread(void *data) static int percpu_ref_test_thread(void *data) { struct percpu_ref *ref =3D (struct percpu_ref *)data; + DEFINE_TORTURE_RANDOM(rand); + int ref_idx =3D ref - refs; + int do_switch; int i =3D 0; =20 percpu_ref_get(ref); =20 do { percpu_ref_get(ref); + /* Perform checks once per 256 iterations */ + do_switch =3D (torture_random(&rand) & 0xff); udelay(delay_us); + if (do_switch) { + mutex_lock(&ref_switch_mutexes[ref_idx]); + percpu_ref_switch_to_unmanaged(ref); + udelay(delay_us); + percpu_ref_switch_to_atomic_sync(ref); + if (do_switch & 1) + percpu_ref_switch_to_percpu(ref); + udelay(delay_us); + percpu_ref_switch_to_managed(ref); + mutex_unlock(&ref_switch_mutexes[ref_idx]); + udelay(delay_us); + } percpu_ref_put(ref); stutter_wait("percpu_ref_test_thread"); i++; } while (i < niterations); =20 - atomic_dec(&ref_running[ref - refs]); + atomic_dec(&ref_running[ref_idx]); /* Order ref release with ref_running[ref_idx] =3D=3D 0 */ smp_mb(); percpu_ref_put(ref); @@ -213,6 +232,13 @@ static void percpu_ref_test_cleanup(void) kfree(num_per_ref_users); num_per_ref_users =3D NULL; =20 + if (ref_switch_mutexes) { + for (i =3D 0; i < nrefs; i++) + mutex_destroy(&ref_switch_mutexes[i]); + kfree(ref_switch_mutexes); + ref_switch_mutexes =3D NULL; + } + if (refs) { for (i =3D 0; i < nrefs; i++) percpu_ref_exit(&refs[i]); @@ -251,7 +277,8 @@ static int __init percpu_ref_torture_init(void) goto init_err; } for (i =3D 0; i < nrefs; i++) { - flags =3D torture_random(trsp) & 1 ? PERCPU_REF_INIT_ATOMIC : PERCPU_REF= _REL_MANAGED; + flags =3D (torture_random(trsp) & 1) ? PERCPU_REF_INIT_ATOMIC : + PERCPU_REF_REL_MANAGED; err =3D percpu_ref_init(&refs[i], percpu_ref_test_release, flags, GFP_KERNEL); if (err) @@ -269,6 +296,16 @@ static int __init percpu_ref_torture_init(void) for (i =3D 0; i < nrefs; i++) num_per_ref_users[i] =3D 0; =20 + ref_switch_mutexes =3D kcalloc(nrefs, sizeof(ref_switch_mutexes[0]), GFP_= KERNEL); + if (!ref_switch_mutexes) { + TOROUT_ERRSTRING("out of memory"); + err =3D -ENOMEM; + goto init_err; + } + + for (i =3D 0; i < nrefs; i++) + mutex_init(&ref_switch_mutexes[i]); + ref_user_tasks =3D kcalloc(nusers, sizeof(ref_user_tasks[0]), GFP_KERNEL); if (!ref_user_tasks) { TOROUT_ERRSTRING("out of memory"); diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index b79e36905aa4..4e0a453bd51f 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -723,6 +723,7 @@ EXPORT_SYMBOL_GPL(percpu_ref_test_is_percpu); void percpu_ref_test_flush_release_work(void) { int max_flush =3D READ_ONCE(max_scan_count); + struct list_head *next; int max_count =3D 1000; =20 /* Complete any executing release work */ @@ -738,8 +739,17 @@ void percpu_ref_test_flush_release_work(void) /* max scan count update visible to work */ smp_mb(); flush_delayed_work(&percpu_ref_release_work); - while (READ_ONCE(last_percpu_ref_node) !=3D NULL && max_count--) + + while (true) { + if (!max_count--) + break; + spin_lock(&percpu_ref_manage_lock); + next =3D next_percpu_ref_node; + spin_unlock(&percpu_ref_manage_lock); + if (list_is_head(next, &percpu_ref_manage_head)) + break; flush_delayed_work(&percpu_ref_release_work); + } /* max scan count update visible to work */ smp_mb(); WRITE_ONCE(max_scan_count, max_flush); --=20 2.34.1 From nobody Fri Nov 29 19:48:03 2024 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2085.outbound.protection.outlook.com [40.107.223.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F6BD3A8D0; Mon, 16 Sep 2024 05:10:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.223.85 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463419; cv=fail; b=f/wyI6jMgOhJ39ophKJ73eODUvc26RdRk4pAoDVIM+dGJYIoPzmR/1WK5mlRWx1HF2OF1MyK4U06iX3wticiBmw6vKmfhtfN4jd4IuU2HvIP1fweLFkVTnZJyGRzUo/CSO4mf2QDymDod3t9cFvA/QBdTCaaYFrVVBikDOrN7/k= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463419; c=relaxed/simple; bh=ZG69J9fb/MH2pDP9DJ9M1WxKGt76RUByqunLIK7tl1w=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PyAWoxVpsUJ4A8reyyp9QLLcL2UcGnT8ujWsW4yxmz79PSUGwFcbgcDfUyiFG19mQ44bbHT+O6mu9Pi3fJO9KkVNaVia6e/SA3gOtuZHCL+XfRoSH286dpR4xH7sfeVB6yzZdwoHSYdzTt/4dVLywGnokqbxN6JCfokTWtuBf7s= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=KSONu474; arc=fail smtp.client-ip=40.107.223.85 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="KSONu474" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LeeRSJDjnT5XmAG9cL8MVPPV7tbchLg/mdlApLaJgQeDt7N3nFhh2WiChXLfPfpphlt8slwlPePCCNj0VaQqI2xyP7FMbUhWmxOsGVpiJahi2SOLu5olPNn7U3VLpqxGJZQ3E2sAWipoUwy4/GJrT1+wVBm63ChIdIjuZvxLczE4b2peJo6P8CJG7t1mZ3ag3yCDnJWrbkanozGd/tr/0OKhV66IW6QSeRrImSSIYY4OU6NAWY5X5k5FvjQ58vS5/4rCaDdvSr5QA7Cdc2k9N7lwZpxErd7tK+PldDJ25vXRbQugiPFaNvQe5mX3FGOm8jsvSOoJUTdoRTF+HJVXzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6OqzpmHm/kCI17KSwacwKrAFSzbQ0xVh02AwQwCCtsY=; b=Jfx1mhbbVogNZQcy+ONM9uWGGeMVFsse6JpJ6MlGfFAxni8uQ5E5w+l2smP6HbzeaHVNq7z8OboCdGfIJ4otApYZwubST24gYsbVKDAyWQCpbyuWa/IUcCkZQANhGByiK9LSX2U2qjFofG/DCQSPJ5gMyZblEZ1o8HAUttcz4GkIHiXNahCcqb+mnuglX58qABkS1L0dPrXpTO2lBWSy0ltcc5yRtNfZEzjXUp+qmnzRQBcdZU7BMOGtsw/aeVkg8PBVLPH5tyQfeo0NVMdinQCLgkn93GQwMPZXWty4oIpUC4vt40E9+87Z27fMAMi/x1hZsiwuNeZFPKV1tn2HZA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6OqzpmHm/kCI17KSwacwKrAFSzbQ0xVh02AwQwCCtsY=; b=KSONu474m2K6uG+4iUQRZWp/WfoysiVFDfw6n4PCiQ8zHzTNYeHxSbBM0B3AMujdPP9V/5uWKKJHZdMlkIC6aM5d/y6lbRJkO+vUO4LME7jvObvigZP+W/bvKc0pW5Z2FXsREnG3pVWUOOMH/qLMbykPujOZcdCw7/GBniS00S0= Received: from BY3PR10CA0016.namprd10.prod.outlook.com (2603:10b6:a03:255::21) by CH2PR12MB4279.namprd12.prod.outlook.com (2603:10b6:610:af::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.23; Mon, 16 Sep 2024 05:10:13 +0000 Received: from SJ5PEPF000001D3.namprd05.prod.outlook.com (2603:10b6:a03:255:cafe::9d) by BY3PR10CA0016.outlook.office365.com (2603:10b6:a03:255::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 05:10:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001D3.mail.protection.outlook.com (10.167.242.55) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:10:12 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:10:06 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 5/6] apparmor: Switch labels to percpu refcount in atomic mode Date: Mon, 16 Sep 2024 10:38:10 +0530 Message-ID: <20240916050811.473556-6-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D3:EE_|CH2PR12MB4279:EE_ X-MS-Office365-Filtering-Correlation-Id: 9f2f42d9-b4c6-481d-6e1e-08dcd60dd834 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|7416014|376014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?A21RKebqrsJAiHO0VT5nONbFgCNyttdxI4ojSVcCkz9cBK3PTRaWbtU9qBA5?= =?us-ascii?Q?tJqfUdEH5KLtB1VcKIR2k1CiqTxSnxSR45NT3lG7KMm8X2DxbA3Ox817QebB?= =?us-ascii?Q?m0/XwyDH71NyLV0cVfKjqhMfKw7wLBZklL3qWY4xhB9UZXJtNKMAD+L9aML/?= =?us-ascii?Q?+Vsjss/Yq35CrzTn8xwvkqPo0yygX8bKF4gYOcuOPSG4sMeA13cKnm9FdJmg?= =?us-ascii?Q?i/4ZeLohPPXwz8Tqdpi3bfGAgcqw6cCmxxRehfHsDnKBc6Cc76YpU40zHqWg?= =?us-ascii?Q?TiH5QnEkq6V5MFO3yZ2kIjncq35t4uOlyjEQ9I+Avp6JSdx1BsOSeyf/9GBF?= =?us-ascii?Q?A1TYEzWotTxIZUq/ox5DUy9Rxk/CiNmyfumB7MpR5ryttWVx7HuJSr0YJ7qH?= =?us-ascii?Q?k737TUnU/ICLjsrC1iobDlUP4zb0k/E1wCJuBy2mY0+6cg7WvHud1UKL8Kno?= =?us-ascii?Q?U0HnP/fpO47CuHZEAJi+IX+joD1ts32rHPPRHpXxDXRpbcxD5uii3LZazPmy?= =?us-ascii?Q?rAs6ON15jmqIfNOoVWn9VyfWc8RmVtozN4rR/gSeEVvE9fO/EDFgrjQ4Cyrc?= =?us-ascii?Q?2nNw/A/CTOBig1r7CB4M7qkQuF+rM5Dk2fyjPSDdqdxFRrXO5X2cv3x6ubcB?= =?us-ascii?Q?ay/SxZt4044tRLNJNqIS6keXwtGwRa+ioBEdqPIpeWymFp3+o5hb/ftvcNcO?= =?us-ascii?Q?5C+SPtRyr7xMgoRiKsckPr3GXBWHntEDpnju/k7G7CaMZ4t97xIWzHuqfLU4?= =?us-ascii?Q?jDBTzUIFGkeRyUFUsDcyQIErBMOvYO7TAjW5WCzD447FgJPbtTZ8fLCPE9GV?= =?us-ascii?Q?35JsRG0VPJ147UWZiImx+EYKHnKoFxYN4rn79/DwD8qC4Z4L6c20h7I5BENJ?= =?us-ascii?Q?zh2D5CqPF5yhcsb7ZPdn91LL9rUINq8WqI6Dor20h3+spFNvVWctpl8RFeXk?= =?us-ascii?Q?Lmgh1w0zo+97zsRUsEWU6Vu7yLXpKm/fiTlvirYRVRbHDTQazkIcfRWH2Aij?= =?us-ascii?Q?Ycfe4uK45gzfPXHp7EQveqEsoA3/mbbr2WU2Jx/xrJl4t4tn9VLFKnSUXbap?= =?us-ascii?Q?nJBhbyobZjY/vZOL/2VjiASc3pyIemKZAkYzq5j122+lbitknGLTd7BPpHEY?= =?us-ascii?Q?i+mggmLiUSti1yeMJPTqdvjDncop1UQLVRZef6OgU1RxQVirjgN4eSiGDsud?= =?us-ascii?Q?kxmFcNmDc29cCVxBwU0QfuuBYoAHZSwZwXHRtEHtQMhc2k9MxfT3kJ52dBRf?= =?us-ascii?Q?pVGt39wZVjWEloz/BseCKH2sqxldCekiwQOVZ/j95ElzmHeoRJHd2VmqsRAu?= =?us-ascii?Q?5KA8Kgt43hTBepZz++NI2tqyyZAGCccEN/IrtxYhva0puOeHg1ResKGLtPcP?= =?us-ascii?Q?tWnQzXMDCJSGy6+qFlMiP2CGo7wzUCLZOrRdxM5Tmxi19tXgQpCgVTqRRGAY?= =?us-ascii?Q?UXCkFdlHhF7E9iEpVZ7lmlrKuMdcdg02?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(7416014)(376014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:10:12.9220 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9f2f42d9-b4c6-481d-6e1e-08dcd60dd834 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D3.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4279 Content-Type: text/plain; charset="utf-8" In preparation of using percpu refcount for labels, replace label kref with percpu ref. The percpu ref is initialized to atomic mode, as using percpu mode requires tracking ref kill points. As the atomic counter is in a different cacheline now, rearrange some of the fields in aa_label struct - flags, proxy; to optimize some of the fast paths for unconfined labels. In addition to the requirement to cleanup the percpu ref using percpu_ref_exit() in label destruction path, other potential impact from this patch could be: - Increase in memory requirement (for per cpu counters) for each label. - Displacement of aa_label struct members to different cacheline, as percpu ref takes 2 pointers space. - Moving of the atomic counter outside of the cacheline of the aa_label struct. Signed-off-by: Neeraj Upadhyay --- security/apparmor/include/label.h | 16 ++++++++-------- security/apparmor/include/policy.h | 8 ++++---- security/apparmor/label.c | 11 ++++++++--- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/security/apparmor/include/label.h b/security/apparmor/include/= label.h index 2a72e6b17d68..4b29a4679c74 100644 --- a/security/apparmor/include/label.h +++ b/security/apparmor/include/label.h @@ -121,12 +121,12 @@ struct label_it { * @ent: set of profiles for label, actual size determined by @size */ struct aa_label { - struct kref count; + struct percpu_ref count; + long flags; + struct aa_proxy *proxy; struct rb_node node; struct rcu_head rcu; - struct aa_proxy *proxy; __counted char *hname; - long flags; u32 secid; int size; struct aa_profile *vec[]; @@ -276,7 +276,7 @@ void __aa_labelset_update_subtree(struct aa_ns *ns); =20 void aa_label_destroy(struct aa_label *label); void aa_label_free(struct aa_label *label); -void aa_label_kref(struct kref *kref); +void aa_label_percpu_ref(struct percpu_ref *ref); bool aa_label_init(struct aa_label *label, int size, gfp_t gfp); struct aa_label *aa_label_alloc(int size, struct aa_proxy *proxy, gfp_t gf= p); =20 @@ -373,7 +373,7 @@ int aa_label_match(struct aa_profile *profile, struct a= a_ruleset *rules, */ static inline struct aa_label *__aa_get_label(struct aa_label *l) { - if (l && kref_get_unless_zero(&l->count)) + if (l && percpu_ref_tryget(&l->count)) return l; =20 return NULL; @@ -382,7 +382,7 @@ static inline struct aa_label *__aa_get_label(struct aa= _label *l) static inline struct aa_label *aa_get_label(struct aa_label *l) { if (l) - kref_get(&(l->count)); + percpu_ref_get(&(l->count)); =20 return l; } @@ -402,7 +402,7 @@ static inline struct aa_label *aa_get_label_rcu(struct = aa_label __rcu **l) rcu_read_lock(); do { c =3D rcu_dereference(*l); - } while (c && !kref_get_unless_zero(&c->count)); + } while (c && !percpu_ref_tryget(&c->count)); rcu_read_unlock(); =20 return c; @@ -442,7 +442,7 @@ static inline struct aa_label *aa_get_newest_label(stru= ct aa_label *l) static inline void aa_put_label(struct aa_label *l) { if (l) - kref_put(&l->count, aa_label_kref); + percpu_ref_put(&l->count); } =20 =20 diff --git a/security/apparmor/include/policy.h b/security/apparmor/include= /policy.h index 75088cc310b6..5849b6b94cea 100644 --- a/security/apparmor/include/policy.h +++ b/security/apparmor/include/policy.h @@ -329,7 +329,7 @@ static inline aa_state_t ANY_RULE_MEDIATES(struct list_= head *head, static inline struct aa_profile *aa_get_profile(struct aa_profile *p) { if (p) - kref_get(&(p->label.count)); + percpu_ref_get(&(p->label.count)); =20 return p; } @@ -343,7 +343,7 @@ static inline struct aa_profile *aa_get_profile(struct = aa_profile *p) */ static inline struct aa_profile *aa_get_profile_not0(struct aa_profile *p) { - if (p && kref_get_unless_zero(&p->label.count)) + if (p && percpu_ref_tryget(&p->label.count)) return p; =20 return NULL; @@ -363,7 +363,7 @@ static inline struct aa_profile *aa_get_profile_rcu(str= uct aa_profile __rcu **p) rcu_read_lock(); do { c =3D rcu_dereference(*p); - } while (c && !kref_get_unless_zero(&c->label.count)); + } while (c && !percpu_ref_tryget(&c->label.count)); rcu_read_unlock(); =20 return c; @@ -376,7 +376,7 @@ static inline struct aa_profile *aa_get_profile_rcu(str= uct aa_profile __rcu **p) static inline void aa_put_profile(struct aa_profile *p) { if (p) - kref_put(&p->label.count, aa_label_kref); + percpu_ref_put(&p->label.count); } =20 static inline int AUDIT_MODE(struct aa_profile *profile) diff --git a/security/apparmor/label.c b/security/apparmor/label.c index c71e4615dd46..aa9e6eac3ecc 100644 --- a/security/apparmor/label.c +++ b/security/apparmor/label.c @@ -336,6 +336,7 @@ void aa_label_destroy(struct aa_label *label) rcu_assign_pointer(label->proxy->label, NULL); aa_put_proxy(label->proxy); } + percpu_ref_exit(&label->count); aa_free_secid(label->secid); =20 label->proxy =3D (struct aa_proxy *) PROXY_POISON + 1; @@ -369,9 +370,9 @@ static void label_free_rcu(struct rcu_head *head) label_free_switch(label); } =20 -void aa_label_kref(struct kref *kref) +void aa_label_percpu_ref(struct percpu_ref *ref) { - struct aa_label *label =3D container_of(kref, struct aa_label, count); + struct aa_label *label =3D container_of(ref, struct aa_label, count); struct aa_ns *ns =3D labels_ns(label); =20 if (!ns) { @@ -408,7 +409,11 @@ bool aa_label_init(struct aa_label *label, int size, g= fp_t gfp) =20 label->size =3D size; /* doesn't include null */ label->vec[size] =3D NULL; /* null terminate */ - kref_init(&label->count); + if (percpu_ref_init(&label->count, aa_label_percpu_ref, PERCPU_REF_INIT_A= TOMIC, gfp)) { + aa_free_secid(label->secid); + return false; + } + RB_CLEAR_NODE(&label->node); =20 return true; --=20 2.34.1 From nobody Fri Nov 29 19:48:03 2024 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2079.outbound.protection.outlook.com [40.107.220.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31713208CA; Mon, 16 Sep 2024 05:10:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.220.79 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463438; cv=fail; b=rcjMAE5nI3TE8b1YxHJBy9O8tnP1W2ltLHYNDuUHi/EbGPmWkStB3VGkiPXawq4KZb8BWZtyFINVeOqGlXZdDMfp+3UtYz6x5xykBXuZQbp2zMf5Fy+ZLcJU0d/5MvKBoG3WaJrBr9yJTy0xJLKmiCSzP+DCI8WgFnvi4gOOI8U= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726463438; c=relaxed/simple; bh=wvttXrujNLJgzH/rrugvyM3zgnuLWUqavQEIi6zH5Hg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Cr3zMpdjU+Mw4GFbhy9EkLOZAseBM72GaGDUPLKO3sC47u8FRurb0ur/lplUjSuae1Wdic7F9Gf8TCQhoiV8QTQHGiiv1hN2pBtfwOuYhli+rRQnYepvATqOy3eHpKhetuveqxTFuiR88VZh4zt9EJfgyvh8GJqMgxpWSNaD36Q= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=muAX3U3D; arc=fail smtp.client-ip=40.107.220.79 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="muAX3U3D" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=okArEj/DZhKLzPQxzvOAE0CS2lg3ObR8D3J/oEf2l8DLd02+elRZoBJCXwIC2P5xcmBxvmJZQg3e05ihF2XQQW3Nq/x1bDsJRXpZHgGCKj6RNxY0l6okDih5+j/MckqA5dRLHdIJ4cLjiT2kVlJmi+APvTChDvnnQFtW40S5slXOOiJyx46K96gaTBnrFDJbbW3zkHgfYbhJAexcZn3QvmECZ/PqfwftjwU6K07L1P0qtlX0NoQ1Hx1rgu/CPl66zP0viggFncsGKyX4xbIO/3hGOdKrPO56PInG2E1Jhi+4iCCwZFr9Cx1kdII9J6W/pNNyzilLeqrUX5vVpJg1Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QB7BKcbtTTM97yXiOXbhtWdEpIfxhfdw595BbvBII0g=; b=NoCB76d9wY9NbQn9Y8TrmMCix9rhK9TFslxpTbo51UAfTW4Y9cKZ2iHIg73gXed5BAH5VREmj34IxSGATUGnzntBugL58xH49lniIDqxQW6KOqgx+pk44+3TPrNvjMzxJpsvQ1UiYiY46YrPvk9Mvp7BylFbzf5kEdacM3z6vmhANUSDfv7bYwQBTr6V4jeAxpEKY/ed0iQ6Uukd9lUO+0z1XzHrZ6gztG/i1wOveW0kRgWRIokRe83WUyRqf69Yv1EF+1sdKgpVYX1kWOb1kRGvliEId5UKC8DnLap26WwVN8XlgeDJAyFVMaSWK6R7VnsbF16YgbbIyhUjQuJ9YA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QB7BKcbtTTM97yXiOXbhtWdEpIfxhfdw595BbvBII0g=; b=muAX3U3D1KeYTyVg2hYngVRXNdn4J+POh++7h0IBZro70XEHsCApqaL6bWUVH18eTBKbuHtdj2/TkFhsvBR/vymgsMRlD3R6LgqMiDxYG68I17W14MG1R1PnXhMrdfRmWtsENDuDbjMKbytXxK+3oB26PVi8R+QvTkN/S1QSAYw= Received: from SJ0PR13CA0042.namprd13.prod.outlook.com (2603:10b6:a03:2c2::17) by CH3PR12MB8753.namprd12.prod.outlook.com (2603:10b6:610:178::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.24; Mon, 16 Sep 2024 05:10:31 +0000 Received: from SJ5PEPF000001CD.namprd05.prod.outlook.com (2603:10b6:a03:2c2:cafe::a2) by SJ0PR13CA0042.outlook.office365.com (2603:10b6:a03:2c2::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.18 via Frontend Transport; Mon, 16 Sep 2024 05:10:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ5PEPF000001CD.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 05:10:30 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 00:10:25 -0500 From: Neeraj Upadhyay To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC 6/6] apparmor: Switch labels to percpu ref managed mode Date: Mon, 16 Sep 2024 10:38:11 +0530 Message-ID: <20240916050811.473556-7-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> References: <20240916050811.473556-1-Neeraj.Upadhyay@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001CD:EE_|CH3PR12MB8753:EE_ X-MS-Office365-Filtering-Correlation-Id: 2e608f2a-c7e0-4287-eab7-08dcd60de2df X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?ePpUjEb07aoMGi8f7wdjJyRx/HnaoTGdc6a7YBhnMBpf5KQKoUZ4nQPHwGw5?= =?us-ascii?Q?8Q5GktXq3Eq4c6qA1KxSEiXyk7hfgXsIAkz0KZSVW28NTUAP2l9rkP/HL7dH?= =?us-ascii?Q?MoQTzee8q9/5osLbLt875Q3El42vtS487gOrStrabu+QLWInXIM8guwPDu9U?= =?us-ascii?Q?7Bzi3ibJQp3uFtsprkkemLsjIyJPua5fciueQ8jZtZa0DbS+aznLjZY8hHBn?= =?us-ascii?Q?6B3AW3OGcl4l23AAn3eiXL8rf5VUS1tUJ2ZcZhpLP8NH0M2A+icPlRrI/Zxn?= =?us-ascii?Q?zvkeThuB4QsT6CUwoR0coq//579lnXGziVm+VipndwELmE76l6LaYCZ3W+mx?= =?us-ascii?Q?DMHomnwAv+fCa2Vr866wfg8ODK3eYDJxSeWkSmMjXzuD/EJTd5ya+yeWowAh?= =?us-ascii?Q?gLNEQpcos9Qw5QiXu0fJSIpw8bsSWABpZG07GQSNBH0WPUhCxWOFbUOuZiWT?= =?us-ascii?Q?iW2HtAzIJx72pxJ8TFwBWJqdXA8A3/i50wG4XDU+NXIATwMFnZIOf0YMQnLp?= =?us-ascii?Q?mDXQO0EQVKDX3J3RftgEqYGbiSLnJeY/nGzHokgCiiPfYCSQXEgMiI6Amhur?= =?us-ascii?Q?xKDohECYJbNbDt49bOE0w4n7xvlb+AHJycYO9d7VO4WyZ+YqkZD93zP2M2ud?= =?us-ascii?Q?d4FD1b/6gCCoUHU1hgFtwo2stXeMsp1hy+zW1ctZkmPP7pL+h0QDGzUpfmPL?= =?us-ascii?Q?LZMVQ/v//LVm5YBEUP0dZ08SaU/Qj++gqBbFzfKMaQC08B+BUjw+ukAevUuy?= =?us-ascii?Q?h4Lnd072kemS3xjUdtSS4zKEjI/CbdpEOtB2ruMVB42V9nmn0c1T0lSBK5bO?= =?us-ascii?Q?KQ4Aicbo3i0rBbA8m01TlS8cUcdn6cC0dKriUd/GzQqzr0lHrqrpBLCJIRVA?= =?us-ascii?Q?NGkVqN1GybYooL8aiYmteqIN47vng+gYeJ01pxWJBpJt5lEQaKmMA5ecM9uS?= =?us-ascii?Q?AuYkbacODFg7Zs5a+2vu75wxucM0sHCJgJKNGCw28qhrHE2Ah+ZQDu+zZR9w?= =?us-ascii?Q?AQwzCHatLnx/FwC7nkiYdYKsHd191OqxDfq6hE1nCGF9wPJPcQSHhXgDBLcA?= =?us-ascii?Q?aZKuXEU/IQRbWGJj6rYoRlOcVFll5qm8yN/dRYcS6BKd8sO6KAJFbIkZXxga?= =?us-ascii?Q?CW9eqLr57X6XVOoeNNL0eP9b3m0oREfOwnXRfM6FJqe29M8NGTmTnD3TtNNV?= =?us-ascii?Q?EbnMCUps3vy0S0E/1QMiscjBLxGYITixrgiMX1UPymedEg2WvEuNFeyK6kmh?= =?us-ascii?Q?z2gsM/41/Zkhi6y3fmTVxwHjZWjy4Nb2AgwFxeNOkJvx0nJSUg0nKKC+0Pli?= =?us-ascii?Q?Bw+8gHGdUunhIcCAvsY92V17ZL1bFGr+EMx7W8rKeNA467+vPU0AlF77Chl1?= =?us-ascii?Q?jqJgtd1sygA33IvaMn8pXIRi/mw5bA/UBBmqfNgZ7VfAIkC2iUBBd6ewTmAs?= =?us-ascii?Q?6jX3rMsYdUUjxC1nEMnNuhWkJd7RV9zt?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(376014)(1800799024)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 05:10:30.7605 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2e608f2a-c7e0-4287-eab7-08dcd60de2df X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001CD.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8753 Content-Type: text/plain; charset="utf-8" Nginx performance testing with Apparmor enabled (with Nginx running in unconfined profile), on kernel versions 6.1 and 6.5 show significant drop in throughput scalability when Nginx workers are scaled to use higher number of CPUs across various L3 cache domains. Below is one sample data on the throughput scalability loss, based on results on AMD Zen4 system with 96 CPUs with SMT core count 2: Config Cache Domains apparmor=3Doff apparmor=3Don scaling eff (%) scaling eff (%) 8C16T 1 100% 100% 16C32T 2 95% 94% 24C48T 3 94% 93% 48C96T 6 92% 88% 96C192T 12 85% 68% There is a significant drop in scaling efficiency for 96 cores/192 SMT threads. Perf tool shows most of the contention coming from below places: 6.56% nginx [kernel.vmlinux] [k] apparmor_current_getsecid_subj 6.22% nginx [kernel.vmlinux] [k] apparmor_file_open The majority of the CPU cycles is found to be due to memory contention in atomic_fetch_add and atomic_fetch_sub operations from kref_get() and kref_put() operations on AppArmor labels. A part of the contention was fixed with commit 2516fde1fa00 ("apparmor: Optimize retrieving current task secid"). After including this commit, the scaling efficiency improved to below: Config Cache Domains apparmor=3Don apparmor=3Don (patched) scaling eff (%) scaling eff (%) 8C16T 1 100% 100% 16C32T 2 97% 93% 24C48T 3 94% 92% 48C96T 6 88% 88% 96C192T 12 65% 79% However, the scaling efficiency impact is still significant even after including the commit. Also, the performance impact is even higher for >192 CPUs. In addition, the memory contention impact would increase when there is a high frequency of label update operations and labels are marked stale more frequently. Use the new percpu managed mode for tracking release of all Apparmor labels. Using percpu refcount for Apparmor label's refcounting improves throughput scalability for Nginx: Config Cache Domains apparmor=3Don (percpuref) scaling eff (%) 8C16T 1 100% 16C32T 2 96% 24C48T 3 94% 48C96T 6 93% 96C192T 12 90% Signed-off-by: Neeraj Upadhyay --- The apparmor_file_open() refcount contention has been resolved recently with commit f4fee216df7d ("apparmor: try to avoid refing the label in apparmor_file_open"). I have posted this series to get feedback on the approach to improve refcount scalability within apparmor subsystem. security/apparmor/label.c | 1 + security/apparmor/policy_ns.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/security/apparmor/label.c b/security/apparmor/label.c index aa9e6eac3ecc..016a45a180b1 100644 --- a/security/apparmor/label.c +++ b/security/apparmor/label.c @@ -710,6 +710,7 @@ static struct aa_label *__label_insert(struct aa_labels= et *ls, rb_link_node(&label->node, parent, new); rb_insert_color(&label->node, &ls->root); label->flags |=3D FLAG_IN_TREE; + percpu_ref_switch_to_managed(&label->count); =20 return aa_get_label(label); } diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c index 1f02cfe1d974..18eb58b68a60 100644 --- a/security/apparmor/policy_ns.c +++ b/security/apparmor/policy_ns.c @@ -124,6 +124,7 @@ static struct aa_ns *alloc_ns(const char *prefix, const= char *name) goto fail_unconfined; /* ns and ns->unconfined share ns->unconfined refcount */ ns->unconfined->ns =3D ns; + percpu_ref_switch_to_managed(&ns->unconfined->label.count); =20 atomic_set(&ns->uniq_null, 0); =20 @@ -377,6 +378,7 @@ int __init aa_alloc_root_ns(void) } kernel_t =3D &kernel_p->label; root_ns->unconfined->ns =3D aa_get_ns(root_ns); + percpu_ref_switch_to_managed(&root_ns->unconfined->label.count); =20 return 0; } --=20 2.34.1