From nobody Tue Jun 30 01:42:28 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 606BFC433F5 for ; Fri, 28 Jan 2022 05:29:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346161AbiA1F3S (ORCPT ); Fri, 28 Jan 2022 00:29:18 -0500 Received: from mail-mw2nam08on2079.outbound.protection.outlook.com ([40.107.101.79]:32736 "EHLO NAM04-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S238975AbiA1F3O (ORCPT ); Fri, 28 Jan 2022 00:29:14 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FBCE9HLXwOyirykV886S6PqLovUiyjvVRkPVcwd8sX+FdDOp1mHISnpmV5KZQ29XlmEbBFBtoAiJaqPxswl/SVu8GaViiMPn99yD8vPx9LX0oexSA/VvvvPX+I6YnetbkAtFu8QduXjwGHKCuT4Bp75Dm6BRfFb34DVYuNiGkI1QCdvhvNBbwR4JuCj9q2iIWvkMXo/X6xKDiBDhGF6bfWinG1piag8KZjKArE071NvPknBiv8nmen7X6dt5o8sblkQV5F/owksQCrhvK3THVbbjR7k6PxPoLlXsQS2nWA07e3Zg7ZNyGMYfRrqh8KzsG4ilUgZn28py3VLoO4rNtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AWjLsyxWnfKMQw/gTvbL/swrp/QOuIQroTL3TLEQ9pQ=; b=Ydjx7+bTvySnWTAgwDRu8N3OxNgCtYIIBdlt28L/TGp3nxm9e32k+5BLzufnieRPMKYH3B2GEWkkdWKQ2l0VIKSqdP/lg+WbO5g5HvUZBx4QuFkfY254qT4gzqa223A/8HLkopAAuOztXxS8BKVyuerRPhTJyN79RVR91+57l+WoBd+mWDFcjusr7TS4tCqInPZW4oYItKlR8illJFQuxdfUKq+YHvVD6VxCDPPHOCi3OaxXxcyUl82ht4qcg5Cz71HZl8dym/pkaFialPFH9ftlecM7YXZVgkRLxp3Je8M6ccyrfQuD5fEI275ZN5viyGX+Udh88lnoL4tgHeMkIA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AWjLsyxWnfKMQw/gTvbL/swrp/QOuIQroTL3TLEQ9pQ=; b=dCVnyZdXauBEHszTdxYjmTTVlJlbqpinyjBXRV5izWgY0OvHB53Ha3XVAt3i2yy9eGJzl43DupBljkFvkAxaJ8jyjMK7HCNOBlCkwakD13rKpYJw33bYp2bfn/Tapqlz8GlZ682xDl7P4gWxhOumPMkhTf3pXKE0hNDctLP0WNs= Received: from DM5PR10CA0016.namprd10.prod.outlook.com (2603:10b6:4:2::26) by SN6PR12MB4720.namprd12.prod.outlook.com (2603:10b6:805:e6::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.15; Fri, 28 Jan 2022 05:29:12 +0000 Received: from DM6NAM11FT021.eop-nam11.prod.protection.outlook.com (2603:10b6:4:2:cafe::a4) by DM5PR10CA0016.outlook.office365.com (2603:10b6:4:2::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.17 via Frontend Transport; Fri, 28 Jan 2022 05:29:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT021.mail.protection.outlook.com (10.13.173.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4930.15 via Frontend Transport; Fri, 28 Jan 2022 05:29:12 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 27 Jan 2022 23:29:07 -0600 From: Bharata B Rao To: CC: , , , , , , , , , , Wei Huang , Bharata B Rao Subject: [RFC PATCH v0 1/3] sched/numa: Process based autonuma scan period framework Date: Fri, 28 Jan 2022 10:58:49 +0530 Message-ID: <20220128052851.17162-2-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220128052851.17162-1-bharata@amd.com> References: <20220128052851.17162-1-bharata@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 56ecd8e3-b29c-492f-d49e-08d9e21f1e1f X-MS-TrafficTypeDiagnostic: SN6PR12MB4720:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1265; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ctS+Vzlbnlh7Ykk9AtdF63mpWQnf4WV2gd9iRL3AEPbD3Q+WVVn/le9WR81rpRU19zVWWLixaj9+jKxNqZyEpwcQay+S6snFdJVWeCP1YgDJXyHBnzEfW6wvl8aseM6OoMlUU7+wjS9EXgyRCHCH3NW7+R2vL1IAHEKgEGTAziyJvYBkJd0CKNV5oxl86LTePgzwsWMj6cwiefq+MlU3/svMvGQOuLVA2MAjp+DUZsZ21cK0+v6qOG5eVrP+vEzkr5dtLg3bb63GYeulP/vEDrAVDHyZvtrwJPeYWt6gaoYqPDOvvhsAlOn4ddjEZzHgVKD8NLOL7C6goQQ9GwfRf7iqBqD2JKK9X/dT2EtQ7+5xkpCnZq/bWhxvaepabYv4g5Spw0HoQ8PzxfrQxsCxYzZN+PwygECiLayF1+OyLsm5NqW3QRW1OSIl3D56Bd39Yv/ZCJAO4Z/JpfLnVkmW2Lm6V0bghKsjxAJ+H0ke1h91OWELOBbbLK3SYI87nMwf+iDN/YaztFxV8tSRIBjfjnEGNN/EwXr1ewNFGFrAp8brdmGIi9ocks7jSUuB0chgz5FjFcH5iXrjfM7K3jg+lob/BsdaJkd18BLjeqSPe2BED0l+aOuZqB7Mm1+Cvt/Fq2hayzPKx1aHpWS3SaVG3Lv0TJpxmINDZC9kfaodZQ8a4rY70ITVe1h4ZAsbUy5SSislBbBInrttZJ4vGIYLqA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(316002)(40460700003)(336012)(426003)(186003)(6916009)(54906003)(82310400004)(4326008)(86362001)(1076003)(47076005)(26005)(83380400001)(8676002)(16526019)(8936002)(2616005)(70206006)(6666004)(5660300002)(70586007)(2906002)(36756003)(36860700001)(356005)(81166007)(7416002)(7696005)(508600001)(36900700001)(20210929001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jan 2022 05:29:12.5808 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 56ecd8e3-b29c-492f-d49e-08d9e21f1e1f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT021.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR12MB4720 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Disha Talreja Add a new framework that calculates autonuma scan period based on per-process NUMA fault stats. NUMA faults can be classified into different categories, such as local vs. remote, or private vs. shared. It is also important to understand such behavior from the perspective of a process. The per-process fault stats added here will be used for calculating the scan period in the adaptive NUMA algorithm. The actual scan period is still using the original value p->numa_scan_period before the real implementation is added in place in a later commit. Co-developed-by: Wei Huang Signed-off-by: Wei Huang Signed-off-by: Disha Talreja Signed-off-by: Bharata B Rao --- include/linux/mm_types.h | 7 +++++++ kernel/sched/fair.c | 40 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 9db36dc5d4cf..4f978c09d3db 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -610,6 +610,13 @@ struct mm_struct { =20 /* numa_scan_seq prevents two threads setting pte_numa */ int numa_scan_seq; + + /* Process-based Adaptive NUMA */ + atomic_long_t faults_locality[2]; + atomic_long_t faults_shared[2]; + + spinlock_t pan_numa_lock; + unsigned int numa_scan_period; #endif /* * An operation with batched TLB flushing is going on. Anything diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 095b0aa378df..1d6404b2d42e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2099,6 +2099,20 @@ static void numa_group_count_active_nodes(struct num= a_group *numa_group) numa_group->active_nodes =3D active_nodes; } =20 +/**********************************************/ +/* Process-based Adaptive NUMA (PAN) Design */ +/**********************************************/ +/* + * Updates mm->numa_scan_period under mm->pan_numa_lock. + * + * Returns p->numa_scan_period now but updated to return + * p->mm->numa_scan_period in a later patch. + */ +static unsigned long pan_get_scan_period(struct task_struct *p) +{ + return p->numa_scan_period; +} + /* * When adapting the scan rate, the period is divided into NUMA_PERIOD_SLO= TS * increments. The more local the fault statistics are, the higher the scan @@ -2616,6 +2630,9 @@ void task_numa_fault(int last_cpupid, int mem_node, i= nt pages, int flags) task_numa_group(p, last_cpupid, flags, &priv); } =20 + atomic_long_add(pages, &(p->mm->faults_locality[local])); + atomic_long_add(pages, &(p->mm->faults_shared[priv])); + /* * If a workload spans multiple NUMA nodes, a shared fault that * occurs wholly within the set of nodes that the workload is @@ -2702,12 +2719,20 @@ static void task_numa_work(struct callback_head *wo= rk) if (time_before(now, migrate)) return; =20 - if (p->numa_scan_period =3D=3D 0) { + if (p->mm->numa_scan_period =3D=3D 0) { + p->numa_scan_period_max =3D task_scan_max(p); + p->numa_scan_period =3D task_scan_start(p); + mm->numa_scan_period =3D p->numa_scan_period; + } else if (p->numa_scan_period =3D=3D 0) { p->numa_scan_period_max =3D task_scan_max(p); p->numa_scan_period =3D task_scan_start(p); } =20 - next_scan =3D now + msecs_to_jiffies(p->numa_scan_period); + if (!spin_trylock(&p->mm->pan_numa_lock)) + return; + next_scan =3D now + msecs_to_jiffies(pan_get_scan_period(p)); + spin_unlock(&p->mm->pan_numa_lock); + if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) !=3D migrate) return; =20 @@ -2807,6 +2832,16 @@ static void task_numa_work(struct callback_head *wor= k) } } =20 +/* Init Process-based Adaptive NUMA */ +static void pan_init_numa(struct task_struct *p) +{ + struct mm_struct *mm =3D p->mm; + + spin_lock_init(&mm->pan_numa_lock); + mm->numa_scan_period =3D sysctl_numa_balancing_scan_delay; + +} + void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) { int mm_users =3D 0; @@ -2817,6 +2852,7 @@ void init_numa_balancing(unsigned long clone_flags, s= truct task_struct *p) if (mm_users =3D=3D 1) { mm->numa_next_scan =3D jiffies + msecs_to_jiffies(sysctl_numa_balancing= _scan_delay); mm->numa_scan_seq =3D 0; + pan_init_numa(p); } } p->node_stamp =3D 0; --=20 2.25.1 From nobody Tue Jun 30 01:42:28 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAF48C433F5 for ; Fri, 28 Jan 2022 05:29:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346175AbiA1F3Y (ORCPT ); Fri, 28 Jan 2022 00:29:24 -0500 Received: from mail-bn8nam12on2042.outbound.protection.outlook.com ([40.107.237.42]:25441 "EHLO NAM12-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S243094AbiA1F3U (ORCPT ); Fri, 28 Jan 2022 00:29:20 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Sd6BLejFbwrGgN2zU48gYjjhFsqp/8D6UPY2zqdFxmrhBrefPTnFdKVjJsAkI/HIi7gZT3EMyZSalgecw4DI3ZYmOAOPoe3Qp4rffwG3dMulKWbh0BHBCQJiiifcKUNd5IP+Q6MbJQDqx5HR2lv6GVI8YJ10pK3RIa4xTmJzsEwo9UkxIiOsG7IiHgF/lp8PRseunyARwabBEG+GNYOygtxlgBkVc1w3/7ZPplatgesOry7V1RmXv/PGqb0TvMltw5G2Z56S8sIkRzddwbXHi3noOyrsVn0fRQppY8Ocixn2d0ZAD47LCgK4QpZvwhVfX9kI1kT8nLHVJYanME1E/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=K+9wXZVPFFKK3I8JhFqv7WSxDQv49n9zI/mDOCSw3bo=; b=DJN1qX9A0fH1mOikFgbuZOsmzjLHOccz7JelKo4DzYB1tJv00mxekrciw3sXMAsMHMiF975C/WAFrffG3FT9tcKzu9VXqIn7HVLSFetHkXC6FtvM77PmhDycB0z9v/ZPbFZwp7IPMTS6hbjq686wNQV0o/fZrZrR2of6LFsXkSR3YNE8MVpkVlRVLsr0sZLLOvI4Nb7H75IELhiasotM6cl7ydicToCXkNMgSMCUxVeBtXyDgaaUGTNj5Wir9bCvl6UyGvWCwaAMj8mZsGMy1JXwAtMkIBJgQ5FSwXxHaugMz/pxRKbUGOa/1XLBHj/NuhxkboXOVbX7DE0ZUQgexA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=K+9wXZVPFFKK3I8JhFqv7WSxDQv49n9zI/mDOCSw3bo=; b=HGhUdXvAsMObAL5OmOQcHqzhxUe7igx2CfrswYpy2BuJ7vJIRDgctgPgwpDMAXLCgS29N3dcZxwSRvW8bs0UP/+9iN7yOPUIgh3rbqUutUnLFgKuMaOePHihsoqPZ/cVfl1W97husnfF6oGx8IFwLGh1k07FpLe+TjSkVxDdQwQ= Received: from DM6PR06CA0082.namprd06.prod.outlook.com (2603:10b6:5:336::15) by BL0PR12MB4865.namprd12.prod.outlook.com (2603:10b6:208:17c::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.15; Fri, 28 Jan 2022 05:29:18 +0000 Received: from DM6NAM11FT050.eop-nam11.prod.protection.outlook.com (2603:10b6:5:336:cafe::1d) by DM6PR06CA0082.outlook.office365.com (2603:10b6:5:336::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.17 via Frontend Transport; Fri, 28 Jan 2022 05:29:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT050.mail.protection.outlook.com (10.13.173.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4930.15 via Frontend Transport; Fri, 28 Jan 2022 05:29:18 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 27 Jan 2022 23:29:12 -0600 From: Bharata B Rao To: CC: , , , , , , , , , , Wei Huang , Bharata B Rao Subject: [RFC PATCH v0 2/3] sched/numa: Add cumulative history of per-process fault stats Date: Fri, 28 Jan 2022 10:58:50 +0530 Message-ID: <20220128052851.17162-3-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220128052851.17162-1-bharata@amd.com> References: <20220128052851.17162-1-bharata@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b3dd799a-5009-4fcb-24fb-08d9e21f21ab X-MS-TrafficTypeDiagnostic: BL0PR12MB4865:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fAbsx9oUxkPf6JjseY2tpcRoT1o6NlJg2Z7ddxAknHokE9imLR1ba8XlE/UJRh2YE0IJFgZ546aaU4UAAWkVAhRdVJslKvQPLgVwI0Osqq6x4IHrMNPfONE5T7bc52l14nd9mR97KI1X19Wbc8EQyD7xnknMa/mNDuqj0DdJukL8CsvkuTEjqvpDRcXeeYJi5bZv/OOcalpIISWNLyMgxN4iJVFo7UWntN5rSvuUtlZzwx1zozvmj3NXuCS6c5z45THD3n5hdQvxsqHLzNhdt6KV5KY5SacwH4gVGO5xJYpc7lZSFvM7RnJmvchpaOYkucJReUnU/eJprVaB6E4+NYKhZBcUV+hnlazBv5Np0kSy8COPsvJBFbBZQCV+XgHt2Uc3FwsC9WypzzNRIPoH/o+AF2rydQhV/v8VS/30NBNW5k0avJtImG8BanFE38I/zOJWYpYOcir4XuvhErKWjrk/VFLmJMkiBCSzpIt0OthRy2UUtCrtmLH+AZm6FGkEQPRsu5IyAbCPay985rYYPyNdVjV2FvSAHRSBFgdskxYk1LphYAXtStBzgePj+dMEOs1sTHW05j9834VZ3pV4XSp6Pft/lltw2qA4dCqI9fYiafbduFpimRLgiWOcylSljleAer1ABB0Szo8YQC9A7eh9MaXMYKQ7yWC9ct7drdYGKWpaEGrg17+vJYkYUv6JWARZSp0cxmlXSyigQ4ziZg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(5660300002)(7696005)(336012)(2906002)(36860700001)(2616005)(426003)(83380400001)(36756003)(86362001)(82310400004)(8676002)(81166007)(6666004)(508600001)(40460700003)(186003)(47076005)(16526019)(1076003)(26005)(356005)(6916009)(316002)(70206006)(4326008)(8936002)(70586007)(54906003)(7416002)(36900700001)(20210929001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jan 2022 05:29:18.5269 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b3dd799a-5009-4fcb-24fb-08d9e21f21ab X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT050.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB4865 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Disha Talreja The cumulative history of local/remote (lr) and private/shared (ps) will be used for calculating adaptive scan period. Co-developed-by: Wei Huang Signed-off-by: Wei Huang Signed-off-by: Disha Talreja Signed-off-by: Bharata B Rao --- include/linux/mm_types.h | 2 ++ kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4f978c09d3db..2c6f119b947f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -614,6 +614,8 @@ struct mm_struct { /* Process-based Adaptive NUMA */ atomic_long_t faults_locality[2]; atomic_long_t faults_shared[2]; + unsigned long faults_locality_history[2]; + unsigned long faults_shared_history[2]; =20 spinlock_t pan_numa_lock; unsigned int numa_scan_period; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d6404b2d42e..4911b3841d00 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2102,14 +2102,56 @@ static void numa_group_count_active_nodes(struct nu= ma_group *numa_group) /**********************************************/ /* Process-based Adaptive NUMA (PAN) Design */ /**********************************************/ +/* + * Update the cumulative history of local/remote and private/shared + * statistics. If the numbers are too small worthy of updating, + * return FALSE, otherwise return TRUE. + */ +static bool pan_update_history(struct task_struct *p) +{ + unsigned long local, remote, shared, private; + long diff; + int i; + + remote =3D atomic_long_read(&p->mm->faults_locality[0]); + local =3D atomic_long_read(&p->mm->faults_locality[1]); + shared =3D atomic_long_read(&p->mm->faults_shared[0]); + private =3D atomic_long_read(&p->mm->faults_shared[1]); + + /* skip if the activities in this window are too small */ + if (local + remote < 100) + return false; + + /* decay over the time window by 1/4 */ + diff =3D local - (long)(p->mm->faults_locality_history[1] / 4); + p->mm->faults_locality_history[1] +=3D diff; + diff =3D remote - (long)(p->mm->faults_locality_history[0] / 4); + p->mm->faults_locality_history[0] +=3D diff; + + /* decay over the time window by 1/2 */ + diff =3D shared - (long)(p->mm->faults_shared_history[0] / 2); + p->mm->faults_shared_history[0] +=3D diff; + diff =3D private - (long)(p->mm->faults_shared_history[1] / 2); + p->mm->faults_shared_history[1] +=3D diff; + + /* clear the statistics for the next window */ + for (i =3D 0; i < 2; i++) { + atomic_long_set(&(p->mm->faults_locality[i]), 0); + atomic_long_set(&(p->mm->faults_shared[i]), 0); + } + + return true; +} + /* * Updates mm->numa_scan_period under mm->pan_numa_lock. - * * Returns p->numa_scan_period now but updated to return * p->mm->numa_scan_period in a later patch. */ static unsigned long pan_get_scan_period(struct task_struct *p) { + pan_update_history(p); + return p->numa_scan_period; } =20 @@ -2836,10 +2878,15 @@ static void task_numa_work(struct callback_head *wo= rk) static void pan_init_numa(struct task_struct *p) { struct mm_struct *mm =3D p->mm; + int i; =20 spin_lock_init(&mm->pan_numa_lock); mm->numa_scan_period =3D sysctl_numa_balancing_scan_delay; =20 + for (i =3D 0; i < 2; i++) { + mm->faults_locality_history[i] =3D 0; + mm->faults_shared_history[i] =3D 0; + } } =20 void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) --=20 2.25.1 From nobody Tue Jun 30 01:42:28 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57B91C433FE for ; Fri, 28 Jan 2022 05:29:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346188AbiA1F32 (ORCPT ); Fri, 28 Jan 2022 00:29:28 -0500 Received: from mail-bn7nam10on2064.outbound.protection.outlook.com ([40.107.92.64]:39379 "EHLO NAM10-BN7-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S237144AbiA1F31 (ORCPT ); Fri, 28 Jan 2022 00:29:27 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hG/LCVpW7taxE77mZbFjlcROs4hSqakqlzSR9kl0mvu67jeDi9BDO8KGwseKsGoGCE6Ahi5WTaXedYaY8ww1aaQvyJ4YoCuZ8oGhxbgfH7mrriy37HguM9kMZkbfgwsO3k2yTOBvbDl20zM8TxMuEvMj8zpJqUtJx3yIWvuJif7XCdGeGwOkORu97rfmPUuRWaHhdXmbwBK7oU4gWWlS1kJLQuyTaX02QHuNHrNdlL50DOUbsoefhMAYRS0hUjz0wlwf9kTCyOu/NvF0LhQTWSWKSu+IseDeV14xAYwV6iJHanbBnTnulVRxnaUFmrLHScpMT8V2FZciQI31C2jcew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=42ThFpp0OJIWsL99ZQgdkO2klUZledaBlBjFPCCTh1w=; b=UgIazdFrx296VzL7MC34/rWmbZCc9RSHYBfL8LVLOMq6kpKABUkxmAK9YaZZz5SxFM06LioPwVd2q1xNsIsMX9s+bnSq21Xs3I8AFuAoIUVqFCyvA/yMz5MJXRwTvjfa0JUelcm/0hMF1RE930RHr40/t+y81ljfQ/JcbZnemkKdsQhJqrXkrz8pv5c/yu74P0r3GETNgVIo9zfMawPZ3oAvP33Rdo5EjjkZ6UXs9lEjJaAM+04MCO94DkEv69KOURBb1T3TAuToL0IJH4TTfbTjW1nEwiQ1bgHYVdJhCyBkkcv+PVUldumt9dD5RsiBzYoeW2f1DKOZFsDB72g3nA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=42ThFpp0OJIWsL99ZQgdkO2klUZledaBlBjFPCCTh1w=; b=H3gQD2cM511edDs9qWp6fMQu5veSpgJuv3NqJpkDckeLIzsgnbXfvng7RmJ7eAclk07gHCMNCrjamjZBXy+PYGpIeD1IwRlWFloNAqwRZ63ZMd+KDE/NdHgRHWkAbZfNOsj/UHkygZijjqt5wvn6zo2ROIA/uzT7FWeRinBOPMo= Received: from DM3PR12CA0082.namprd12.prod.outlook.com (2603:10b6:0:57::26) by BN8PR12MB2851.namprd12.prod.outlook.com (2603:10b6:408:9f::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4909.14; Fri, 28 Jan 2022 05:29:24 +0000 Received: from DM6NAM11FT032.eop-nam11.prod.protection.outlook.com (2603:10b6:0:57:cafe::f5) by DM3PR12CA0082.outlook.office365.com (2603:10b6:0:57::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.17 via Frontend Transport; Fri, 28 Jan 2022 05:29:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT032.mail.protection.outlook.com (10.13.173.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4930.15 via Frontend Transport; Fri, 28 Jan 2022 05:29:24 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 27 Jan 2022 23:29:18 -0600 From: Bharata B Rao To: CC: , , , , , , , , , , Wei Huang , Bharata B Rao Subject: [RFC PATCH v0 3/3] sched/numa: Add adaptive scan period calculation Date: Fri, 28 Jan 2022 10:58:51 +0530 Message-ID: <20220128052851.17162-4-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220128052851.17162-1-bharata@amd.com> References: <20220128052851.17162-1-bharata@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3bad8d73-c8d7-4ab6-89ad-08d9e21f2521 X-MS-TrafficTypeDiagnostic: BN8PR12MB2851:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:4502; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: TKNlDeJy+8GngI2svatL/W7TMZnzpnn/3g1RuNsxOAuf+fGpjaY7xxwugioQV+UFs+uOwjnwBqmxdZmR/tDoCo9/0nTLr/atJg1jb5CIDTp9zzRa5mvbVsM4hkVzsxXrmPgsPFbKDROH6F614+/E1YlnFDz2l0Y3DIPltITamKeogZAOCQyqN5pQjYd43Nw4GS5wOnfopjScLHBDE6d+jy3X6HQ8jR8RbTBtfC2ltbT3dFmeWIO/zuUoEJ86J51Bl+2x7aEcF4yWZ5YvI09M3BziQFC/a4vkzuSd3mBTfqvvd2bwf2nJz8G9NzzY7GCtMFqCfvswvtAKoiLJoHB4iCU3nBGy5Rjai9gLcURfNkNTIjpHiyrE+YzT/RCGY1ubqeL2kf6UfzSlqawcL15tZqYvOaLHvZeN6a2+OLYkOgDe9VLI9etBDYuxPp9kbJBmxJj64HPTgjatcVtee4+tycqi4Iqy7Y9SLD1laYS7mg0hSMFkX+jlR7J6XIri0GUnMV20Qy740W1GM+40Qzw0sePvCMeJstfXTztr8PUO75+wG2QeJeTFMutiv8SiguudS7KwEqS7KqpmUyMEMHI4OyTBniFx/jnX+Lfx6KqeVezHE7pbrbDr+GA5m1XDqARvJAik2RESruy651P+iKxjtrk2YHbOsy65FuPdzQvmbQ7Bxp8znNQ7GcqLPDWzsSNu0gkYTENokrOaY/f3NkxPpM51zZJEf5NOl04lkJ6XKljpK0q+tAX64f/r8xhJcoqsPKAdbjVTbo7oE5z9ppj3x4t5wjA7Qf2BEpDUnyYJFek= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(40470700004)(36840700001)(7416002)(81166007)(54906003)(426003)(336012)(7696005)(6666004)(70586007)(86362001)(30864003)(498600001)(8936002)(36756003)(356005)(70206006)(186003)(47076005)(2906002)(5660300002)(83380400001)(26005)(82310400004)(36860700001)(16526019)(4326008)(2616005)(1076003)(40460700003)(8676002)(6916009)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jan 2022 05:29:24.3355 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3bad8d73-c8d7-4ab6-89ad-08d9e21f2521 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT032.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR12MB2851 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Disha Talreja This patch implements an adaptive algorithm for calculating the autonuma scan period. In the existing mechanism of scan period calculation, - scan period is derived from the per-thread stats. - static threshold (NUMA_PERIOD_THRESHOLD) is used for changing the scan rate. In this new approach (Process Adaptive autoNUMA), we gather NUMA fault stats at per-process level which allows for capturing the application behaviour better. In addition, the algorithm learns and adjusts the scan rate based on remote fault rate. By not sticking to a static threshold, the algorithm can respond better to different workload behaviours. Since the threads of a processes are already considered as a group, we add a bunch of metrics to the task's mm to track the various types of faults and derive the scan rate from them. The new per-process fault stats contribute only to the per-process scan period calculation, while the existing per-thread stats continue to contribute towards the numa_group stats which eventually determine the thresholds for migrating memory and threads across nodes. In this algorithm, the remote fault rates are maintained for the previous two scan windows. These historical remote fault rates along with the remote fault rate from the current window are used to determine the intended trend of the scanning period. An increase in the trend implies an increased period thereby resulting in slower scanning. A decrease in the trend implies decreased period and hence faster scanning. The intended trends for the last two windows are tracked and the actual trend is reversed (thereby increasing or decreasing the scan period in that window) only if the same trend reversal has been intended in the previous two windows. While the remote fault rate metric is derived from the accumulated remote and local faults from all the threads of the mm, the per-mm private and shared faults also contribute in deciding the trend of the scan period. Co-developed-by: Wei Huang Signed-off-by: Wei Huang Signed-off-by: Disha Talreja Signed-off-by: Bharata B Rao --- include/linux/mm_types.h | 5 + kernel/sched/debug.c | 2 + kernel/sched/fair.c | 265 ++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 2 + 4 files changed, 268 insertions(+), 6 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2c6f119b947f..d57cd96d8df0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -619,6 +619,11 @@ struct mm_struct { =20 spinlock_t pan_numa_lock; unsigned int numa_scan_period; + int remote_fault_rates[2]; /* histogram of remote fault rate */ + long scanned_pages; + bool trend; + int slope; + u8 hist_trend; #endif /* * An operation with batched TLB flushing is going on. Anything diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index aa29211de1bf..060bb46166a6 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -334,6 +334,8 @@ static __init int sched_init_debug(void) debugfs_create_u32("scan_period_min_ms", 0644, numa, &sysctl_numa_balanci= ng_scan_period_min); debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balanci= ng_scan_period_max); debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_sca= n_size); + debugfs_create_u32("pan_scan_period_min", 0644, numa, &sysctl_pan_scan_pe= riod_min); + debugfs_create_u32("pan_scan_period_max", 0644, numa, &sysctl_pan_scan_pe= riod_max); #endif =20 debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops= ); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4911b3841d00..5a9cacfbf9ec 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1026,6 +1026,10 @@ unsigned int sysctl_numa_balancing_scan_size =3D 256; /* Scan @scan_size MB every @scan_period after an initial @scan_delay in m= s */ unsigned int sysctl_numa_balancing_scan_delay =3D 1000; =20 +/* Clips of max and min scanning periods */ +unsigned int sysctl_pan_scan_period_min =3D 50; +unsigned int sysctl_pan_scan_period_max =3D 5000; + struct numa_group { refcount_t refcount; =20 @@ -2102,6 +2106,242 @@ static void numa_group_count_active_nodes(struct nu= ma_group *numa_group) /**********************************************/ /* Process-based Adaptive NUMA (PAN) Design */ /**********************************************/ +#define SLOPE(N, D) ((N)/(D)) + +static unsigned int pan_scan_max(struct task_struct *p) +{ + unsigned long smax, nr_scan_pages; + unsigned long rss =3D 0; + + smax =3D sysctl_pan_scan_period_max; + nr_scan_pages =3D sysctl_numa_balancing_scan_size << (20 - PAGE_SHIFT); + + rss =3D get_mm_rss(p->mm); + if (!rss) + rss =3D nr_scan_pages; + + if (READ_ONCE(p->mm->numa_scan_seq) =3D=3D 0) { + smax =3D p->mm->scanned_pages * sysctl_pan_scan_period_max; + smax =3D smax / rss; + smax =3D max_t(unsigned long, sysctl_pan_scan_period_min, smax); + } + + return smax; +} + +/* + * Process-based Adaptive NUMA scan period update alogirthm + * + * These are the important concepts behind the scan period update: + * + * - increase trend (of scan period) + * scan period =3D> up, memory coverage =3D> down, overhead =3D> down, + * accuracy =3D> down + * - decrease trend + * scan period =3D> down, memory coverage =3D> up, overhead =3D> up, + * accuracy =3D> up + * - trend: Reflects the current active trend + * 1 means increasing trend, 0 means decreasing trend + * - slope + * it controls scan_period: new_scan_period =3D current_scan_period * + * 100 / slope + * - hist_trend: Reflects the intended trend in the last two + * windows. Uses the last two bits (bit0 and bit1) for the same. + * 1 if increasing trend was intended, 0 if decreasing was intended. + */ + +/* + * Check if the scan period needs updation when the remote fault + * rate has changed (delta > 5) + * + * Returns TRUE if scan period needs updation, else FALSE. + */ +static bool pan_changed_rate_update(struct mm_struct *mm, int ps_ratio, + int oldest_remote_fault_rate, + int fault_rate_diff) +{ + u8 value; + + /* + * Set the intended trend for the current window. + * - If the remote fault rate has decreased, set the + * intended trend to increasing. + * - Otherwise leave the intended trend as decreasing. + */ + mm->hist_trend =3D mm->hist_trend << 1; + if (fault_rate_diff < 5) + mm->hist_trend |=3D 0x01; + + value =3D mm->hist_trend & 0x03; + + if (fault_rate_diff < -5 && value =3D=3D 3) { + /* + * The remote fault rate has decreased and the intended + * trend was set to increasing in the previous window. + * + * If on decreasing trend, reverse the trend and change + * the slope using the fault rates from (current-1) + * and (current-2) windows. + * + * If already on increasing trend, change the slope using + * the fault rates from (current) and (current-1) windows. + */ + if (!mm->trend) { + mm->trend =3D true; + mm->slope =3D SLOPE(mm->remote_fault_rates[0] * 100, + oldest_remote_fault_rate); + } else { + mm->slope =3D SLOPE(mm->remote_fault_rates[1] * 100, + mm->remote_fault_rates[0]); + } + } else if (fault_rate_diff > 5 && value =3D=3D 0) { + /* + * The remote fault rate has increased and the intended + * trend was set to decreasing in the previous window. + * + * If on increasing trend, + * - If shared fault ratio is more than 30%, don't yet + * reverse the trend, just mark the intended trend as + * increasing. + * - Otherwise reverse the trend. Change the slope using + * the fault rates from (current-1) and (current-2) windows. + * + * If on decreasing trend + * - Continue with a changed slope using the fault + * rates from (current) and (current-1) windows. + */ + if (mm->trend) { + if (ps_ratio < 7) { + mm->hist_trend |=3D 0x01; + return true; + } + + mm->trend =3D false; + mm->slope =3D SLOPE(mm->remote_fault_rates[0] * 100, + oldest_remote_fault_rate); + } else { + mm->slope =3D SLOPE(mm->remote_fault_rates[1] * 100, + mm->remote_fault_rates[0]); + } + } else if (value =3D=3D 1 || value =3D=3D 2) { + /* + * The intended trend is oscillating + * + * If on decreasing trend and the shared fault ratio + * is more than 30%, reverse the trend and change the slope. + * + * If on increasing trend, continue as is. + */ + if (!mm->trend && ps_ratio < 7) { + mm->hist_trend |=3D 0x01; + mm->trend =3D true; + mm->slope =3D SLOPE(100 * 100, + 100 + ((7 - ps_ratio) * 10)); + } + return false; + } + return true; +} + +/* + * Check if the scan period needs updation when the remote fault + * rate has remained more or less the same (delta <=3D 5) + * + * Returns TRUE if scan period needs updation, else FALSE. + */ +static bool pan_const_rate_update(struct mm_struct *mm, int ps_ratio, + int oldest_remote_fault_rate) +{ + int diff1, diff2; + + mm->hist_trend =3D mm->hist_trend << 1; + + /* + * If we are in the increasing trend, don't change anything + * except the intended trend for this window that was reset + * to decreasing by default. + */ + if (mm->trend) + return false; + + /* We are in the decreasing trend, reverse under some condidtions. */ + diff1 =3D oldest_remote_fault_rate - mm->remote_fault_rates[0]; + diff2 =3D mm->remote_fault_rates[0] - mm->remote_fault_rates[1]; + + if (ps_ratio < 7) { + /* + * More than 30% of the pages are shared, so no point in + * further reducing the scan period. If increasing trend + * was intended in the previous window also, then reverse + * the trend to increasing. Else just record the increasing + * intended trend for this window and return. + */ + mm->hist_trend |=3D 0x01; + if ((mm->hist_trend & 0x03) =3D=3D 3) { + mm->trend =3D true; + mm->slope =3D SLOPE(100 * 100, + (100 + ((7 - ps_ratio) * 10))); + } else + return false; + } else if (diff1 >=3D 0 && diff2 >=3D 0 && mm->numa_scan_seq > 1) { + /* + * Remote fault rate has reduced successively in the last + * two windows and address space has been scanned at least + * once. If increasing trend was intended in the previous + * window also, then reverse the trend to increasing. Else + * just record the increasing trend for this window and return. + */ + mm->hist_trend |=3D 0x01; + if ((mm->hist_trend & 0x03) =3D=3D 3) { + mm->trend =3D true; + mm->slope =3D SLOPE(100 * 100, 110); + mm->hist_trend |=3D 0x03; + } else + return false; + } + return true; +} + +static void pan_calculate_scan_period(struct task_struct *p) +{ + int remote_fault_rate, oldest_remote_fault_rate, ps_ratio, i, diff; + struct mm_struct *mm =3D p->mm; + unsigned long remote_hist =3D mm->faults_locality_history[0]; + unsigned long local_hist =3D mm->faults_locality_history[1]; + unsigned long shared_hist =3D mm->faults_shared_history[0]; + unsigned long priv_hist =3D mm->faults_shared_history[1]; + bool need_update; + + ps_ratio =3D (priv_hist * 10) / (priv_hist + shared_hist + 1); + remote_fault_rate =3D (remote_hist * 100) / (local_hist + remote_hist + 1= ); + + /* Keep the remote fault ratio at least 1% */ + remote_fault_rate =3D max(remote_fault_rate, 1); + for (i =3D 0; i < 2; i++) + if (mm->remote_fault_rates[i] =3D=3D 0) + mm->remote_fault_rates[i] =3D 1; + + /* Shift right in mm->remote_fault_rates[] to keep track of history */ + oldest_remote_fault_rate =3D mm->remote_fault_rates[0]; + mm->remote_fault_rates[0] =3D mm->remote_fault_rates[1]; + mm->remote_fault_rates[1] =3D remote_fault_rate; + diff =3D remote_fault_rate - oldest_remote_fault_rate; + + if (abs(diff) <=3D 5) + need_update =3D pan_const_rate_update(mm, ps_ratio, + oldest_remote_fault_rate); + else + need_update =3D pan_changed_rate_update(mm, ps_ratio, + oldest_remote_fault_rate, + diff); + + if (need_update) { + if (mm->slope =3D=3D 0) + mm->slope =3D 100; + mm->numa_scan_period =3D (100 * mm->numa_scan_period) / mm->slope; + } +} + /* * Update the cumulative history of local/remote and private/shared * statistics. If the numbers are too small worthy of updating, @@ -2145,14 +2385,17 @@ static bool pan_update_history(struct task_struct *= p) =20 /* * Updates mm->numa_scan_period under mm->pan_numa_lock. - * Returns p->numa_scan_period now but updated to return - * p->mm->numa_scan_period in a later patch. */ static unsigned long pan_get_scan_period(struct task_struct *p) { - pan_update_history(p); + if (pan_update_history(p)) + pan_calculate_scan_period(p); + + p->mm->numa_scan_period =3D clamp(p->mm->numa_scan_period, + READ_ONCE(sysctl_pan_scan_period_min), + pan_scan_max(p)); =20 - return p->numa_scan_period; + return p->mm->numa_scan_period; } =20 /* @@ -2860,6 +3103,7 @@ static void task_numa_work(struct callback_head *work) mm->numa_scan_offset =3D start; else reset_ptenuma_scan(p); + mm->scanned_pages +=3D ((sysctl_numa_balancing_scan_size << (20 - PAGE_SH= IFT)) - pages); mmap_read_unlock(mm); =20 /* @@ -2882,10 +3126,15 @@ static void pan_init_numa(struct task_struct *p) =20 spin_lock_init(&mm->pan_numa_lock); mm->numa_scan_period =3D sysctl_numa_balancing_scan_delay; + mm->scanned_pages =3D 0; + mm->trend =3D false; + mm->hist_trend =3D 0; + mm->slope =3D 100; =20 for (i =3D 0; i < 2; i++) { mm->faults_locality_history[i] =3D 0; mm->faults_shared_history[i] =3D 0; + mm->remote_fault_rates[i] =3D 1; } } =20 @@ -2948,6 +3197,9 @@ static void task_tick_numa(struct rq *rq, struct task= _struct *curr) if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next !=3D work) return; =20 + if (!spin_trylock(&curr->mm->pan_numa_lock)) + return; + /* * Using runtime rather than walltime has the dual advantage that * we (mostly) drive the selection from busy threads and that the @@ -2955,16 +3207,17 @@ static void task_tick_numa(struct rq *rq, struct ta= sk_struct *curr) * NUMA placement. */ now =3D curr->se.sum_exec_runtime; - period =3D (u64)curr->numa_scan_period * NSEC_PER_MSEC; + period =3D (u64)curr->mm->numa_scan_period * NSEC_PER_MSEC; =20 if (now > curr->node_stamp + period) { if (!curr->node_stamp) - curr->numa_scan_period =3D task_scan_start(curr); + curr->mm->numa_scan_period =3D task_scan_start(curr); curr->node_stamp +=3D period; =20 if (!time_before(jiffies, curr->mm->numa_next_scan)) task_work_add(curr, work, TWA_RESUME); } + spin_unlock(&curr->mm->pan_numa_lock); } =20 static void update_scan_period(struct task_struct *p, int new_cpu) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index de53be905739..635f96bc989d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2424,6 +2424,8 @@ extern unsigned int sysctl_numa_balancing_scan_delay; extern unsigned int sysctl_numa_balancing_scan_period_min; extern unsigned int sysctl_numa_balancing_scan_period_max; extern unsigned int sysctl_numa_balancing_scan_size; +extern unsigned int sysctl_pan_scan_period_min; +extern unsigned int sysctl_pan_scan_period_max; #endif =20 #ifdef CONFIG_SCHED_HRTICK --=20 2.25.1