From nobody Thu Dec 25 08:33:00 2025 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF92B24218 for ; Thu, 18 Jan 2024 12:39:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581567; cv=none; b=vDdmZbdjGwuzM0eH8Fd3ZjkGF6zGYGhGn2agAwprnt2TxtzX04CsiZKF0VyVK/9FI4ltQeMQEKl4CWaC1tA/GYsI4oCLK4l746egdkdkxvIW6JY/96PfE7WjpdaAM3eNjgQOiq7FlrwzFbnzGkCuCyj1g9UeumHLeFaM3+zNweQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581567; c=relaxed/simple; bh=19wxRPGuWzdYtgWW1oofCQJ69J7HTW3D6AE90rHpxLY=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=gZYwoaNNM0BukFM/sBp4VIM68iZk6KYK05On7D1Q0PqSkj4yclpsPccW1ecqRQl3W9n56mRw90I138ONjRfKwk5Pt3Hl2yt4xj2iHW9XrmGVh5Vk0K873TkQX+DiRSsye8nuhS9AJOY7LBwDvgAvpxVgDDqGu9IQ0vqzeXk6PKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=pG+RITuc; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="pG+RITuc" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581562; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ExNhxk12A1ggZeTlNcXFGggLGZAANw7/uA5h1igFwws=; b=pG+RITucTavbLnmWFp3E6QAPFq6rT1oIcvMFugCJGPD5YHA51y/POAMaILEiYttWA6sMqo fe6xZrMWrKZkEmcTKROczOtPuUp9k7PzdRptuHFedB7YlplZR/bOKcT3VabFLJ3wGJqTlz HkA+MBmGSrPMVAGd220o9Ilf16iE0ac= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 1/7] hugetlb: code clean for hugetlb_hstate_alloc_pages Date: Thu, 18 Jan 2024 20:39:05 +0800 Message-Id: <20240118123911.88833-2-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The readability of `hugetlb_hstate_alloc_pages` is poor. By cleaning the code, its readability can be improved, facilitating future modifications. This patch extracts two functions to reduce the complexity of `hugetlb_hstate_alloc_pages` and has no functional changes. - hugetlb_hstate_alloc_pages_node_specific() to handle iterates through each online node and performs allocation if necessary. - hugetlb_hstate_alloc_pages_report() report error during allocation. And the value of h->max_huge_pages is updated accordingly. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song Reviewed-by: Tim Chen --- mm/hugetlb.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ed1581b670d4..b8e4a6adefd6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3482,6 +3482,33 @@ static void __init hugetlb_hstate_alloc_pages_onenod= e(struct hstate *h, int nid) h->max_huge_pages_node[nid] =3D i; } =20 +static bool __init hugetlb_hstate_alloc_pages_specific_nodes(struct hstate= *h) +{ + int i; + bool node_specific_alloc =3D false; + + for_each_online_node(i) { + if (h->max_huge_pages_node[i] > 0) { + hugetlb_hstate_alloc_pages_onenode(h, i); + node_specific_alloc =3D true; + } + } + + return node_specific_alloc; +} + +static void __init hugetlb_hstate_alloc_pages_errcheck(unsigned long alloc= ated, struct hstate *h) +{ + if (allocated < h->max_huge_pages) { + char buf[32]; + + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); + pr_warn("HugeTLB: allocating %lu of page size %s failed. Only allocated= %lu hugepages.\n", + h->max_huge_pages, buf, allocated); + h->max_huge_pages =3D allocated; + } +} + /* * NOTE: this routine is called in different contexts for gigantic and * non-gigantic pages. @@ -3499,7 +3526,6 @@ static void __init hugetlb_hstate_alloc_pages(struct = hstate *h) struct folio *folio; LIST_HEAD(folio_list); nodemask_t *node_alloc_noretry; - bool node_specific_alloc =3D false; =20 /* skip gigantic hugepages allocation if hugetlb_cma enabled */ if (hstate_is_gigantic(h) && hugetlb_cma_size) { @@ -3508,14 +3534,7 @@ static void __init hugetlb_hstate_alloc_pages(struct= hstate *h) } =20 /* do node specific alloc */ - for_each_online_node(i) { - if (h->max_huge_pages_node[i] > 0) { - hugetlb_hstate_alloc_pages_onenode(h, i); - node_specific_alloc =3D true; - } - } - - if (node_specific_alloc) + if (hugetlb_hstate_alloc_pages_specific_nodes(h)) return; =20 /* below will do all node balanced alloc */ @@ -3558,14 +3577,7 @@ static void __init hugetlb_hstate_alloc_pages(struct= hstate *h) /* list will be empty if hstate_is_gigantic */ prep_and_add_allocated_folios(h, &folio_list); =20 - if (i < h->max_huge_pages) { - char buf[32]; - - string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); - pr_warn("HugeTLB: allocating %lu of page size %s failed. Only allocated= %lu hugepages.\n", - h->max_huge_pages, buf, i); - h->max_huge_pages =3D i; - } + hugetlb_hstate_alloc_pages_errcheck(i, h); kfree(node_alloc_noretry); } =20 --=20 2.20.1 From nobody Thu Dec 25 08:33:00 2025 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 706E424B5D for ; Thu, 18 Jan 2024 12:39:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581569; cv=none; b=lytQXFQBAQi7zyvQEc/+r6IGvQSoy0g/0lFpN3fCTSM7xM01W0pYuRK1AeKtqIK8WoLEBkOEHyBGde3Z1eGc+0KVaF94OQBEOjl/zYYnNNNBvSgyHv9GzzRJmEIy5qNlyboA0bqMIYK9iBRYCYdxMEBPNVIJEBjAvFfRdaFK4KU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581569; c=relaxed/simple; bh=agEAxC+FBmg84td8imvOGHG1bBv9SBt07p6snHTUA6A=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=k3YmreX4J6MW9Yw2AkoDpG4duUDYbdVQFSyO96kFyzmO782sivIrJI9U5fYIBuSFN5DD52O/MBs/7SP7TRWav8WczCGgQfYFzDwLfcsnmBs1xuTO8fp/GOHGssh3lC1Z0Hn5JmWhv+YU2gq5pzLGFYeoZumPzvAG3a6br3Lzy5I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ld/KqK0Y; arc=none smtp.client-ip=95.215.58.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ld/KqK0Y" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=exgQQskroYsjKda3OrjICjnI9J6CYuerAViQBeWOcMM=; b=ld/KqK0Y/Y1LeAn4Qe2WZTyW0NNi8VHKXK3BJy4hanIhJD9xMJcew8Ixgtb3eA8ERmqBdm R6owPfumyPNVF2jVKQgXfW5pFSDWHi8zv1dVKrAEAuDYAznW6a5/5K8zzpwJdwf/TCPvaW gqxmTHiFCvdVIZt/Zobafc1g1WxuGi4= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 2/7] hugetlb: split hugetlb_hstate_alloc_pages Date: Thu, 18 Jan 2024 20:39:06 +0800 Message-Id: <20240118123911.88833-3-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" 1G and 2M huge pages have different allocation and initialization logic, which leads to subtle differences in parallelization. Therefore, it is appropriate to split hugetlb_hstate_alloc_pages into gigantic and non-gigantic. This patch has no functional changes. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Tim Chen Reviewed-by: Muchun Song --- mm/hugetlb.c | 87 ++++++++++++++++++++++++++-------------------------- 1 file changed, 43 insertions(+), 44 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b8e4a6adefd6..98ae108e1fac 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3509,6 +3509,43 @@ static void __init hugetlb_hstate_alloc_pages_errche= ck(unsigned long allocated, } } =20 +static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstat= e *h) +{ + unsigned long i; + + for (i =3D 0; i < h->max_huge_pages; ++i) { + if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE)) + break; + cond_resched(); + } + + return i; +} + +static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) +{ + unsigned long i; + struct folio *folio; + LIST_HEAD(folio_list); + nodemask_t node_alloc_noretry; + + /* Bit mask controlling how hard we retry per-node allocations.*/ + nodes_clear(node_alloc_noretry); + + for (i =3D 0; i < h->max_huge_pages; ++i) { + folio =3D alloc_pool_huge_folio(h, &node_states[N_MEMORY], + &node_alloc_noretry); + if (!folio) + break; + list_add(&folio->lru, &folio_list); + cond_resched(); + } + + prep_and_add_allocated_folios(h, &folio_list); + + return i; +} + /* * NOTE: this routine is called in different contexts for gigantic and * non-gigantic pages. @@ -3522,10 +3559,7 @@ static void __init hugetlb_hstate_alloc_pages_errche= ck(unsigned long allocated, */ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) { - unsigned long i; - struct folio *folio; - LIST_HEAD(folio_list); - nodemask_t *node_alloc_noretry; + unsigned long allocated; =20 /* skip gigantic hugepages allocation if hugetlb_cma enabled */ if (hstate_is_gigantic(h) && hugetlb_cma_size) { @@ -3538,47 +3572,12 @@ static void __init hugetlb_hstate_alloc_pages(struc= t hstate *h) return; =20 /* below will do all node balanced alloc */ - if (!hstate_is_gigantic(h)) { - /* - * Bit mask controlling how hard we retry per-node allocations. - * Ignore errors as lower level routines can deal with - * node_alloc_noretry =3D=3D NULL. If this kmalloc fails at boot - * time, we are likely in bigger trouble. - */ - node_alloc_noretry =3D kmalloc(sizeof(*node_alloc_noretry), - GFP_KERNEL); - } else { - /* allocations done at boot time */ - node_alloc_noretry =3D NULL; - } - - /* bit mask controlling how hard we retry per-node allocations */ - if (node_alloc_noretry) - nodes_clear(*node_alloc_noretry); - - for (i =3D 0; i < h->max_huge_pages; ++i) { - if (hstate_is_gigantic(h)) { - /* - * gigantic pages not added to list as they are not - * added to pools now. - */ - if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE)) - break; - } else { - folio =3D alloc_pool_huge_folio(h, &node_states[N_MEMORY], - node_alloc_noretry); - if (!folio) - break; - list_add(&folio->lru, &folio_list); - } - cond_resched(); - } - - /* list will be empty if hstate_is_gigantic */ - prep_and_add_allocated_folios(h, &folio_list); + if (hstate_is_gigantic(h)) + allocated =3D hugetlb_gigantic_pages_alloc_boot(h); + else + allocated =3D hugetlb_pages_alloc_boot(h); =20 - hugetlb_hstate_alloc_pages_errcheck(i, h); - kfree(node_alloc_noretry); + hugetlb_hstate_alloc_pages_errcheck(allocated, h); } =20 static void __init hugetlb_init_hstates(void) --=20 2.20.1 From nobody Thu Dec 25 08:33:00 2025 Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 549C525579 for ; Thu, 18 Jan 2024 12:39:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581571; cv=none; b=DRZ7hdUOlIOvQ/STKl+Gk5SCgGm9T1DMHCiYS/uqvJjaYFhTc/5VLl6yMAQry1A1nDuPjbBBp//lpfSaX0VAHrpoBNJ5JSlBdsyPlZJuq/+Ri2ELwFQESfPrwyzuSt311cF6bYXVzw7J7AnMgrXEMdkhaeXqam6/2eg1+huvllQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581571; c=relaxed/simple; bh=hOV1vAKEslMDXnSKKyUL4RMeDVj1M1oqk023PSsPXZ8=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=Cgu1JJDLzWXym5qhRF87pBIEsjx0gKNBioeaA2xnc16LcH+b4R+E9pn/vgnePwaB9LoWaVLgY1CeWCwBbbRIVLBHZ7PcZlkBHH5lYjwK+OGWwtS4bo5Aui5f7B3u4OcVvy1gaNk1IU9dzhJlxUePbZ1jUbvg0MiAuXjkUtkKCj8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=CBfV/dbd; arc=none smtp.client-ip=95.215.58.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="CBfV/dbd" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581568; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TzFCclyqcCZ/DPTkdiqBlpFX5oAyD0zr6O7wxsAo568=; b=CBfV/dbd2xHdE+F7YOmZ8GE2oRxwpthKTYpZaNj00rjrNcG/Z6lT98VVTzGw5xvY9Iwof6 dbCeWyIuDNqzO2dQE9ps3z2Qxk/A0VvAR8HDZp233bxQeP3YxsFW41WeXH4eB6WEwKUxc2 jA2LqaFNCxUtLsV24/vypYmZim4cKdU= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 3/7] padata: dispatch works on different nodes Date: Thu, 18 Jan 2024 20:39:07 +0800 Message-Id: <20240118123911.88833-4-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" When a group of tasks that access different nodes are scheduled on the same node, they may encounter bandwidth bottlenecks and access latency. Thus, numa_aware flag is introduced here, allowing tasks to be distributed across different nodes to fully utilize the advantage of multi-node systems. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song --- include/linux/padata.h | 3 +++ kernel/padata.c | 14 ++++++++++++-- mm/mm_init.c | 1 + 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 495b16b6b4d7..f79ccd50e7f4 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -137,6 +137,8 @@ struct padata_shell { * appropriate for one worker thread to do at once. * @max_threads: Max threads to use for the job, actual number may be less * depending on task size and minimum chunk size. + * @numa_aware: Dispatch jobs to different nodes. If a node only has memor= y but + * no CPU, dispatch its jobs to a random CPU. */ struct padata_mt_job { void (*thread_fn)(unsigned long start, unsigned long end, void *arg); @@ -146,6 +148,7 @@ struct padata_mt_job { unsigned long align; unsigned long min_chunk; int max_threads; + bool numa_aware; }; =20 /** diff --git a/kernel/padata.c b/kernel/padata.c index 179fb1518070..10eae3f59203 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -485,7 +485,8 @@ void __init padata_do_multithreaded(struct padata_mt_jo= b *job) struct padata_work my_work, *pw; struct padata_mt_job_state ps; LIST_HEAD(works); - int nworks; + int nworks, nid; + static atomic_t last_used_nid =3D ATOMIC_INIT(0); =20 if (job->size =3D=3D 0) return; @@ -517,7 +518,16 @@ void __init padata_do_multithreaded(struct padata_mt_j= ob *job) ps.chunk_size =3D roundup(ps.chunk_size, job->align); =20 list_for_each_entry(pw, &works, pw_list) - queue_work(system_unbound_wq, &pw->pw_work); + if (job->numa_aware) { + int old_node =3D atomic_read(&last_used_nid); + + do { + nid =3D next_node_in(old_node, node_states[N_CPU]); + } while (!atomic_try_cmpxchg(&last_used_nid, &old_node, nid)); + queue_work_node(nid, system_unbound_wq, &pw->pw_work); + } else { + queue_work(system_unbound_wq, &pw->pw_work); + } =20 /* Use the current thread, which saves starting a workqueue worker. */ padata_work_init(&my_work, padata_mt_helper, &ps, PADATA_WORK_ONSTACK); diff --git a/mm/mm_init.c b/mm/mm_init.c index 2c19f5515e36..549e76af8f82 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2231,6 +2231,7 @@ static int __init deferred_init_memmap(void *data) .align =3D PAGES_PER_SECTION, .min_chunk =3D PAGES_PER_SECTION, .max_threads =3D max_threads, + .numa_aware =3D false, }; =20 padata_do_multithreaded(&job); --=20 2.20.1 From nobody Thu Dec 25 08:33:00 2025 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5CBE25763 for ; Thu, 18 Jan 2024 12:39:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581574; cv=none; b=AScuTLeFStKvqawkvOVMMhJjeBWKH7baaFuqgC0Z5vGqeRDkJEv9VeOK1OJ0SQGcriymcTULVh3DjJmMWPLBAZ64BzfI1XNLPjjxA1q2H8DWSYt5SmcgyivRcqGQy42e6Cm6NJZIRlrpcKXk3ouEtL9iHsd5VeMC1vuGF08uyTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581574; c=relaxed/simple; bh=SPhcMQcybMslY+bt+TdzrJ28BtVXVk3UeMNOvesC3D4=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=RPoKjCx8HMjkDhPZLOEuUswcCXvHLM4pmBz1om50i1I0/BtfMhSFMEYY1YTW6cWdyDWVln+n7KkVjkOwpPh2o+C06J340S2ZyA7EkQsuruZBfoTA1RQFfTm60ExS/R6I5DL2vtyT9sugseYgPA+pXW8shSzQyQ89P6OAjFU7Fro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=HorjKMfK; arc=none smtp.client-ip=95.215.58.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="HorjKMfK" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f8BLudBFy7F7d9VGcgZ6uTbCxD4rT9ItFTD1EH+b2yk=; b=HorjKMfKZszOTJ83qRS01vcjPG50ZRqhtcnxtHTRYvfWUGzWq0C9r0LqFiHnDKR66Rz5Nc 5u4qZ5PIJjK6E36euVVTPgblm7/zBw2ihXLuozf4chhpgLLJBNuj0/YOUr14eHNPUezFaQ sJDNTCbFkX134dX9DokOr/m7nu/AS3c= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 4/7] hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc Date: Thu, 18 Jan 2024 20:39:08 +0800 Message-Id: <20240118123911.88833-5-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" With parallelization of hugetlb allocation across different threads, each thread works on a differnet node to allocate pages from, instead of all allocating from a common node h->next_nid_to_alloc. To address this, it's necessary to assign a separate next_nid_to_alloc for each thread. Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc have been modified to directly accept a *next_nid_to_alloc parameter, ensuring thread-specific allocation and avoiding concurrent access issues. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song Reviewed-by: Tim Chen --- mm/hugetlb.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 98ae108e1fac..effe5539e545 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1464,15 +1464,15 @@ static int get_valid_node_allowed(int nid, nodemask= _t *nodes_allowed) * next node from which to allocate, handling wrap at end of node * mask. */ -static int hstate_next_node_to_alloc(struct hstate *h, +static int hstate_next_node_to_alloc(int *next_node, nodemask_t *nodes_allowed) { int nid; =20 VM_BUG_ON(!nodes_allowed); =20 - nid =3D get_valid_node_allowed(h->next_nid_to_alloc, nodes_allowed); - h->next_nid_to_alloc =3D next_node_allowed(nid, nodes_allowed); + nid =3D get_valid_node_allowed(*next_node, nodes_allowed); + *next_node =3D next_node_allowed(nid, nodes_allowed); =20 return nid; } @@ -1495,10 +1495,10 @@ static int hstate_next_node_to_free(struct hstate *= h, nodemask_t *nodes_allowed) return nid; } =20 -#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask) \ +#define for_each_node_mask_to_alloc(next_node, nr_nodes, node, mask) \ for (nr_nodes =3D nodes_weight(*mask); \ nr_nodes > 0 && \ - ((node =3D hstate_next_node_to_alloc(hs, mask)) || 1); \ + ((node =3D hstate_next_node_to_alloc(next_node, mask)) || 1); \ nr_nodes--) =20 #define for_each_node_mask_to_free(hs, nr_nodes, node, mask) \ @@ -2350,12 +2350,13 @@ static void prep_and_add_allocated_folios(struct hs= tate *h, */ static struct folio *alloc_pool_huge_folio(struct hstate *h, nodemask_t *nodes_allowed, - nodemask_t *node_alloc_noretry) + nodemask_t *node_alloc_noretry, + int *next_node) { gfp_t gfp_mask =3D htlb_alloc_mask(h) | __GFP_THISNODE; int nr_nodes, node; =20 - for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { + for_each_node_mask_to_alloc(next_node, nr_nodes, node, nodes_allowed) { struct folio *folio; =20 folio =3D only_alloc_fresh_hugetlb_folio(h, gfp_mask, node, @@ -3310,7 +3311,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int n= id) goto found; } /* allocate from next node when distributing huge pages */ - for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) { + for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, &node_= states[N_MEMORY]) { m =3D memblock_alloc_try_nid_raw( huge_page_size(h), huge_page_size(h), 0, MEMBLOCK_ALLOC_ACCESSIBLE, node); @@ -3679,7 +3680,7 @@ static int adjust_pool_surplus(struct hstate *h, node= mask_t *nodes_allowed, VM_BUG_ON(delta !=3D -1 && delta !=3D 1); =20 if (delta < 0) { - for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { + for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, nodes= _allowed) { if (h->surplus_huge_pages_node[node]) goto found; } @@ -3794,7 +3795,8 @@ static int set_max_huge_pages(struct hstate *h, unsig= ned long count, int nid, cond_resched(); =20 folio =3D alloc_pool_huge_folio(h, nodes_allowed, - node_alloc_noretry); + node_alloc_noretry, + &h->next_nid_to_alloc); if (!folio) { prep_and_add_allocated_folios(h, &page_list); spin_lock_irq(&hugetlb_lock); --=20 2.20.1 From nobody Thu Dec 25 08:33:00 2025 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACD35288DC for ; Thu, 18 Jan 2024 12:39:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581577; cv=none; b=eZcD0/HK90zi5bdnqKrqQQbITCWsfmx//nkdGt9RXQnj4rsMOx4tK/O6QZM7Ty2IGv8O+P66rWG0HVlb84TcJg1FHUSv91Oxy1XPitQkYhVU0zv/GHmKg9CzQ+Hk/eYxh+abmbkSEiCu85/UZXhXvCH2O01PK0fjmWE+O8+oxcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581577; c=relaxed/simple; bh=XSro0p3/2owMvM8Zh8uPwpVA7jK+Krjd+00B7HUH8fU=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=OMQP4HuzAMDA2oI16B+UTK3tYBdJtOAHqqrNmf07cP8jEQOy4JQ1A7Y5mdY4+Le+aJpsWdYpX/GKmz68dwwNSk5Dkz+gerVR9W84yIFtYwYA0q1lX6+2V13bVzv8utmn1HobXgjDEGH3Uv9WBUPsZxebCp+gH29Kvq3vtd1kYIo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=KtpyW0v2; arc=none smtp.client-ip=95.215.58.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="KtpyW0v2" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3rzCZ8lCNG24OGfPGmikjgH8KJ+w2O3JRacIdnjmilU=; b=KtpyW0v20XM30xjzzrt8V+LXaqpJn/HYDNwQSTu79dCyouAgeE0hQM7GJroLPkZEnHmer0 cBfy/tfhV2ddCHmFsSRKJ/w+AfR7SmvEPO2Z0yNK7aPw4slULAXTf1NDwQUvbVo+W+kEM3 i2FYaMeUJlQpPW/cVSmdx7Y1dz5jERg= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 5/7] hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA Date: Thu, 18 Jan 2024 20:39:09 +0800 Message-Id: <20240118123911.88833-6-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Allow hugetlb use padata_do_multithreaded for parallel initialization. Select CONFIG_PADATA in this case. Signed-off-by: Gang Li Tested-by: David Rientjes Reviewed-by: Muchun Song --- fs/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/Kconfig b/fs/Kconfig index 89fdbefd1075..a57d6e6c41e6 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -262,6 +262,7 @@ menuconfig HUGETLBFS depends on X86 || SPARC64 || ARCH_SUPPORTS_HUGETLBFS || BROKEN depends on (SYSFS || SYSCTL) select MEMFD_CREATE + select PADATA help hugetlbfs is a filesystem backing for HugeTLB pages, based on ramfs. For architectures that support it, say Y here and read --=20 2.20.1 From nobody Thu Dec 25 08:33:00 2025 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F8E228DC3 for ; Thu, 18 Jan 2024 12:39:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581580; cv=none; b=tEWZiFS/FsM1t/eMJIkcSVJS+qx+V6HZlh6DdNmCcjjwM6VZMaGejPzUwGxdabHiUiFT9ERNH9GlHyg94JatoI4ZyjlVyicXLcLklWg++LqiZKFFDtXLkPw/REeM/ipllzZNl8ZIlM8tvmXBlpQSkHOqeI0qcAWTpmRomt7C4YQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581580; c=relaxed/simple; bh=RXho9gcWShyEgzc6VWxyMJAoRzTXkb0OAPzJWwFdVNk=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=JBxx7x0p4aVNQeJO2eSpJ6FqDQfAXhcZD9ZQYBfdBb4WLKURle13XASBLy8RdQ9x1G47lgX2EozL/dqo7pz8G2SIcQXwMDtMcWxXUbG3Pf/nlTDbgmJoOolYY4f0ABLTNcrvl8RdmH26dLPGbju8ToYqpyN9AZ4DSNnzfZoBuxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=X9AMrKwC; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="X9AMrKwC" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1NWoU3oNLKXdGAS21M0bC7+FD15GTIkL0kQlYQf/mAw=; b=X9AMrKwCFkauV1DKTVIEE+qGM2FG1NTvSzWTmV79baKdAnr8JOXd0UzWf4lXz9eRSwPsmJ xUlRA/NNXsMhUd+q5AubG2ApMlLTNXk5xLpCNocAJIIsPrKSI0L4SkGbNsxx3zBpI/B7Hw DoLUtRX1d5mdreKgoItEtz++Kk7gaXI= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 6/7] hugetlb: parallelize 2M hugetlb allocation and initialization Date: Thu, 18 Jan 2024 20:39:10 +0800 Message-Id: <20240118123911.88833-7-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" By distributing both the allocation and the initialization tasks across multiple threads, the initialization of 2M hugetlb will be faster, thereby improving the boot speed. Here are some test results: test no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2t(4 node) 2M 3336 1051 68.52% 128c1t(2 node) 2M 1943 716 63.15% Signed-off-by: Gang Li Tested-by: David Rientjes --- mm/hugetlb.c | 70 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 52 insertions(+), 18 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index effe5539e545..9b348ba418f5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -35,6 +35,7 @@ #include #include #include +#include =20 #include #include @@ -3510,43 +3511,76 @@ static void __init hugetlb_hstate_alloc_pages_errch= eck(unsigned long allocated, } } =20 -static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstat= e *h) +static void __init hugetlb_alloc_node(unsigned long start, unsigned long e= nd, void *arg) { - unsigned long i; + struct hstate *h =3D (struct hstate *)arg; + int i, num =3D end - start; + nodemask_t node_alloc_noretry; + unsigned long flags; + int next_node =3D 0; =20 - for (i =3D 0; i < h->max_huge_pages; ++i) { - if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE)) + /* Bit mask controlling how hard we retry per-node allocations.*/ + nodes_clear(node_alloc_noretry); + + for (i =3D 0; i < num; ++i) { + struct folio *folio =3D alloc_pool_huge_folio(h, &node_states[N_MEMORY], + &node_alloc_noretry, &next_node); + if (!folio) break; + spin_lock_irqsave(&hugetlb_lock, flags); + __prep_account_new_huge_page(h, folio_nid(folio)); + enqueue_hugetlb_folio(h, folio); + spin_unlock_irqrestore(&hugetlb_lock, flags); cond_resched(); } +} =20 - return i; +static void __init hugetlb_vmemmap_optimize_node(unsigned long start, unsi= gned long end, void *arg) +{ + struct hstate *h =3D (struct hstate *)arg; + int nid =3D start; + + hugetlb_vmemmap_optimize_folios(h, &h->hugepage_freelists[nid]); } =20 -static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) +static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstat= e *h) { unsigned long i; - struct folio *folio; - LIST_HEAD(folio_list); - nodemask_t node_alloc_noretry; - - /* Bit mask controlling how hard we retry per-node allocations.*/ - nodes_clear(node_alloc_noretry); =20 for (i =3D 0; i < h->max_huge_pages; ++i) { - folio =3D alloc_pool_huge_folio(h, &node_states[N_MEMORY], - &node_alloc_noretry); - if (!folio) + if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE)) break; - list_add(&folio->lru, &folio_list); cond_resched(); } =20 - prep_and_add_allocated_folios(h, &folio_list); - return i; } =20 +static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h) +{ + struct padata_mt_job job =3D { + .fn_arg =3D h, + .align =3D 1, + .numa_aware =3D true + }; + + job.thread_fn =3D hugetlb_alloc_node; + job.start =3D 0; + job.size =3D h->max_huge_pages; + job.min_chunk =3D h->max_huge_pages / num_node_state(N_MEMORY) / 2; + job.max_threads =3D num_node_state(N_MEMORY) * 2; + padata_do_multithreaded(&job); + + job.thread_fn =3D hugetlb_vmemmap_optimize_node; + job.start =3D 0; + job.size =3D num_node_state(N_MEMORY); + job.min_chunk =3D 1; + job.max_threads =3D num_node_state(N_MEMORY); + padata_do_multithreaded(&job); + + return h->nr_huge_pages; +} + /* * NOTE: this routine is called in different contexts for gigantic and * non-gigantic pages. --=20 2.20.1 From nobody Thu Dec 25 08:33:00 2025 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 255A92941B for ; Thu, 18 Jan 2024 12:39:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581582; cv=none; b=WRyQh6sbTLibwd43lLE4+EXle++93loRY+rt92fja5kO4zXTh1weca2V6hZ0U2FoSnTrSBY/ktJR5qTnR66ZAIhm4Ht/pFJB3e1jxiIaSRdUf1RAGEgeYh6XV6yvY7DmTzvvlQHUS6Gb8CYaIT0PUbSJidw0zARthybjZ8SA374= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705581582; c=relaxed/simple; bh=KAUJjBxia9wk+iRLLIfTP61p2acEJKBvxPRVPu03VSE=; h=X-Report-Abuse:DKIM-Signature:From:To:Cc:Subject:Date:Message-Id: In-Reply-To:References:MIME-Version:Content-Transfer-Encoding: X-Migadu-Flow; b=KnImcLS/p6je5T6sNsbbn+W0AcdRCg4h4uhNMEVLd1g62ecLTgo6PSussSvTezcmK525qEidSKMyejLg7Y/ap5VrtnbQC3dTGNAXFFxqTj54TAm6Isu5Cmb04HkkoolpTekT0oSBnmNybx2bzSSVu0BsDm/0FQkNXKvZcTGvrcc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ur3J0rdF; arc=none smtp.client-ip=95.215.58.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ur3J0rdF" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1705581579; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ASpAHuGdRPUV+HgZTqugmZdcOurDfxWM74nOvo7+sGU=; b=ur3J0rdFoKrBHOlkNFrybggpwIpG8n2CJi1qMK84hfgA7E/TaiI1w9l1BBbJDIM1msiwBL gh9ANKc6TXOhjrRqRFpEcDA4BexqtUJi0cSm6TSIfI5jMiX4hdqupg3XRmcKAxMIh5GVCp t+89Jvp6g6U0aik5bbwsy7+HbeHAzPc= From: Gang Li To: David Hildenbrand , David Rientjes , Mike Kravetz , Muchun Song , Andrew Morton , Tim Chen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v4 7/7] hugetlb: parallelize 1G hugetlb initialization Date: Thu, 18 Jan 2024 20:39:11 +0800 Message-Id: <20240118123911.88833-8-gang.li@linux.dev> In-Reply-To: <20240118123911.88833-1-gang.li@linux.dev> References: <20240118123911.88833-1-gang.li@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Optimizing the initialization speed of 1G huge pages through parallelization. 1G hugetlbs are allocated from bootmem, a process that is already very fast and does not currently require optimization. Therefore, we focus on parallelizing only the initialization phase in `gather_bootmem_prealloc`. Here are some test results: test no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2t(4 node) 1G 4745 2024 57.34% 128c1t(2 node) 1G 3358 1712 49.02% 12t 1G 77000 18300 76.23% Signed-off-by: Gang Li Tested-by: David Rientjes --- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 42 +++++++++++++++++++++++++++++++++-------- 2 files changed, 35 insertions(+), 9 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c1ee640d87b1..77b30a8c6076 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -178,7 +178,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_a= rea_struct *vma, struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); =20 extern int sysctl_hugetlb_shm_group; -extern struct list_head huge_boot_pages; +extern struct list_head huge_boot_pages[MAX_NUMNODES]; =20 /* arch callbacks */ =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9b348ba418f5..2f4b77630ada 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -69,7 +69,7 @@ static bool hugetlb_cma_folio(struct folio *folio, unsign= ed int order) #endif static unsigned long hugetlb_cma_size __initdata; =20 -__initdata LIST_HEAD(huge_boot_pages); +__initdata struct list_head huge_boot_pages[MAX_NUMNODES]; =20 /* for command line parsing */ static struct hstate * __initdata parsed_hstate; @@ -3301,7 +3301,7 @@ int alloc_bootmem_huge_page(struct hstate *h, int nid) int __alloc_bootmem_huge_page(struct hstate *h, int nid) { struct huge_bootmem_page *m =3D NULL; /* initialize for clang */ - int nr_nodes, node; + int nr_nodes, node =3D nid; =20 /* do node specific alloc */ if (nid !=3D NUMA_NO_NODE) { @@ -3339,7 +3339,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int n= id) huge_page_size(h) - PAGE_SIZE); /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); - list_add(&m->list, &huge_boot_pages); + list_add(&m->list, &huge_boot_pages[node]); m->hstate =3D h; return 1; } @@ -3390,8 +3390,6 @@ static void __init prep_and_add_bootmem_folios(struct= hstate *h, /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); =20 - /* Add all new pool pages to free lists in one lock cycle */ - spin_lock_irqsave(&hugetlb_lock, flags); list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { if (!folio_test_hugetlb_vmemmap_optimized(folio)) { /* @@ -3404,23 +3402,27 @@ static void __init prep_and_add_bootmem_folios(stru= ct hstate *h, HUGETLB_VMEMMAP_RESERVE_PAGES, pages_per_huge_page(h)); } + /* Subdivide locks to achieve better parallel performance */ + spin_lock_irqsave(&hugetlb_lock, flags); __prep_account_new_huge_page(h, folio_nid(folio)); enqueue_hugetlb_folio(h, folio); + spin_unlock_irqrestore(&hugetlb_lock, flags); } - spin_unlock_irqrestore(&hugetlb_lock, flags); } =20 /* * Put bootmem huge pages into the standard lists after mem_map is up. * Note: This only applies to gigantic (order > MAX_PAGE_ORDER) pages. */ -static void __init gather_bootmem_prealloc(void) +static void __init __gather_bootmem_prealloc(unsigned long start, unsigned= long end, void *arg) + { + int nid =3D start; LIST_HEAD(folio_list); struct huge_bootmem_page *m; struct hstate *h =3D NULL, *prev_h =3D NULL; =20 - list_for_each_entry(m, &huge_boot_pages, list) { + list_for_each_entry(m, &huge_boot_pages[nid], list) { struct page *page =3D virt_to_page(m); struct folio *folio =3D (void *)page; =20 @@ -3453,6 +3455,22 @@ static void __init gather_bootmem_prealloc(void) prep_and_add_bootmem_folios(h, &folio_list); } =20 +static void __init gather_bootmem_prealloc(void) +{ + struct padata_mt_job job =3D { + .thread_fn =3D __gather_bootmem_prealloc, + .fn_arg =3D NULL, + .start =3D 0, + .size =3D num_node_state(N_MEMORY), + .align =3D 1, + .min_chunk =3D 1, + .max_threads =3D num_node_state(N_MEMORY), + .numa_aware =3D true, + }; + + padata_do_multithreaded(&job); +} + static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, in= t nid) { unsigned long i; @@ -3602,6 +3620,14 @@ static void __init hugetlb_hstate_alloc_pages(struct= hstate *h) return; } =20 + /* hugetlb_hstate_alloc_pages will be called many times, init huge_boot_p= ages once*/ + if (huge_boot_pages[0].next =3D=3D NULL) { + int i =3D 0; + + for (i =3D 0; i < MAX_NUMNODES; i++) + INIT_LIST_HEAD(&huge_boot_pages[i]); + } + /* do node specific alloc */ if (hugetlb_hstate_alloc_pages_specific_nodes(h)) return; --=20 2.20.1