From nobody Mon Feb 9 22:59:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26D13C7EE26 for ; Fri, 19 May 2023 19:09:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230387AbjESTJp (ORCPT ); Fri, 19 May 2023 15:09:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229673AbjESTJl (ORCPT ); Fri, 19 May 2023 15:09:41 -0400 Received: from mx0b-002e3701.pphosted.com (mx0b-002e3701.pphosted.com [148.163.143.35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81617E46 for ; Fri, 19 May 2023 12:09:14 -0700 (PDT) Received: from pps.filterd (m0134424.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34JDWGDs020891; Fri, 19 May 2023 19:07:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=XImtGcXiswX7NuQoNcuvDMJglsAAhZSLOkMprpqznVM=; b=H+smyH+OeM5mpcpRC2aGWDo5poKaztlwPCCowGXm6HXfawYpt6wz/ZcOn+lBzg3vzgzP MQX5qbDxXPbyxgmMvEmF6fMZGoIedhahg56vvPf4/lrO3tQkuHRQ9SvpbnL3iKSuxbub VadgGp0hqntwScxgsQX9KPZcCV/QxrivBzRt+jH/UpeAI+s8VQ2xXGhiZk8sTba0tt2k OXrVQtlEV+pp/5tu8pEA/4ljsEc0P9gXYAe3PICZO/KV8W/mscwrJppDMdGx/SLDxjfn k5eD45nb10OXpJrsQe9VsvcK0yri9FhYOxX1RwH3yz3FxqeX31MUE42XkHilUHADJyA1 NA== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3qnwwf07ug-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 May 2023 19:07:56 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id DF95F13141; Fri, 19 May 2023 19:07:55 +0000 (UTC) Received: from dog.eag.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id E57A98097FC; Fri, 19 May 2023 19:07:54 +0000 (UTC) Received: by dog.eag.rdlabs.hpecorp.net (Postfix, from userid 200934) id 539923031304A; Fri, 19 May 2023 14:07:52 -0500 (CDT) From: Steve Wahl To: Steve Wahl , Dimitri Sivanich , Russ Anderson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org Subject: [PATCH RESEND v5 6/8] x86/platform/uv: UV support for sub-NUMA clustering Date: Fri, 19 May 2023 14:07:50 -0500 Message-Id: <20230519190752.3297140-7-steve.wahl@hpe.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20230519190752.3297140-1-steve.wahl@hpe.com> References: <20230519190752.3297140-1-steve.wahl@hpe.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: s2y7Y3t-POFCydW_bICVtOGG52kSZmNp X-Proofpoint-ORIG-GUID: s2y7Y3t-POFCydW_bICVtOGG52kSZmNp X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-19_14,2023-05-17_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 clxscore=1015 suspectscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305190164 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Sub-NUMA clustering (SNC) invalidates previous assumptions of a 1:1 relationship between blades, sockets, and nodes. Fix these assumptions and build tables correctly when SNC is enabled. Signed-off-by: Steve Wahl --- arch/x86/include/asm/uv/uv_hub.h | 22 ++-- arch/x86/kernel/apic/x2apic_uv_x.c | 162 +++++++++++++++++------------ 2 files changed, 107 insertions(+), 77 deletions(-) diff --git a/arch/x86/include/asm/uv/uv_hub.h b/arch/x86/include/asm/uv/uv_= hub.h index 0acfd1734c8b..5fa76c2ced51 100644 --- a/arch/x86/include/asm/uv/uv_hub.h +++ b/arch/x86/include/asm/uv/uv_hub.h @@ -177,6 +177,7 @@ struct uv_hub_info_s { unsigned short nr_possible_cpus; unsigned short nr_online_cpus; short memory_nid; + unsigned short *node_to_socket; }; =20 /* CPU specific info with a pointer to the hub common info struct */ @@ -531,19 +532,18 @@ static inline void *uv_pnode_offset_to_vaddr(int pnod= e, unsigned long offset) { unsigned int m_val =3D uv_hub_info->m_val; unsigned long base; - unsigned short sockid, node; + unsigned short sockid; =20 if (m_val) return __va(((unsigned long)pnode << m_val) | offset); =20 sockid =3D uv_pnode_to_socket(pnode); - node =3D uv_socket_to_node(sockid); =20 /* limit address of previous socket is our base, except node 0 is 0 */ - if (!node) + if (sockid =3D=3D 0) return __va((unsigned long)offset); =20 - base =3D (unsigned long)(uv_hub_info->gr_table[node - 1].limit); + base =3D (unsigned long)(uv_hub_info->gr_table[sockid - 1].limit); return __va(base << UV_GAM_RANGE_SHFT | offset); } =20 @@ -650,7 +650,7 @@ static inline int uv_cpu_blade_processor_id(int cpu) /* Blade number to Node number (UV2..UV4 is 1:1) */ static inline int uv_blade_to_node(int blade) { - return blade; + return uv_socket_to_node(blade); } =20 /* Blade number of current cpu. Numnbered 0 .. <#blades -1> */ @@ -662,23 +662,27 @@ static inline int uv_numa_blade_id(void) /* * Convert linux node number to the UV blade number. * .. Currently for UV2 thru UV4 the node and the blade are identical. - * .. If this changes then you MUST check references to this function! + * .. UV5 needs conversion when sub-numa clustering is enabled. */ static inline int uv_node_to_blade_id(int nid) { - return nid; + unsigned short *n2s =3D uv_hub_info->node_to_socket; + + return n2s ? n2s[nid] : nid; } =20 /* Convert a CPU number to the UV blade number */ static inline int uv_cpu_to_blade_id(int cpu) { - return uv_node_to_blade_id(cpu_to_node(cpu)); + return uv_cpu_hub_info(cpu)->numa_blade_id; } =20 /* Convert a blade id to the PNODE of the blade */ static inline int uv_blade_to_pnode(int bid) { - return uv_hub_info_list(uv_blade_to_node(bid))->pnode; + unsigned short *s2p =3D uv_hub_info->socket_to_pnode; + + return s2p ? s2p[bid] : bid; } =20 /* Nid of memory node on blade. -1 if no blade-local memory */ diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2ap= ic_uv_x.c index 1bd15b1f7712..10d3bdf874a0 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -546,7 +546,6 @@ unsigned long sn_rtc_cycles_per_second; EXPORT_SYMBOL(sn_rtc_cycles_per_second); =20 /* The following values are used for the per node hub info struct */ -static __initdata unsigned short *_node_to_pnode; static __initdata unsigned short _min_socket, _max_socket; static __initdata unsigned short _min_pnode, _max_pnode, _gr_table_len; static __initdata struct uv_gam_range_entry *uv_gre_table; @@ -554,6 +553,7 @@ static __initdata struct uv_gam_parameters *uv_gp_table; static __initdata unsigned short *_socket_to_node; static __initdata unsigned short *_socket_to_pnode; static __initdata unsigned short *_pnode_to_socket; +static __initdata unsigned short *_node_to_socket; =20 static __initdata struct uv_gam_range_s *_gr_table; =20 @@ -1293,6 +1293,7 @@ static void __init uv_init_hub_info(struct uv_hub_inf= o_s *hi) hi->nasid_shift =3D uv_cpuid.nasid_shift; hi->min_pnode =3D _min_pnode; hi->min_socket =3D _min_socket; + hi->node_to_socket =3D _node_to_socket; hi->pnode_to_socket =3D _pnode_to_socket; hi->socket_to_node =3D _socket_to_node; hi->socket_to_pnode =3D _socket_to_pnode; @@ -1526,6 +1527,11 @@ static void __init free_1_to_1_table(unsigned short = **tp, char *tname, int min, pr_info("UV: %s is 1:1, conversion table removed\n", tname); } =20 +/* + * Build Socket Tables + * If the number of nodes is >1 per socket, socket to node table will + * contain lowest node number on that socket. + */ static void __init build_socket_tables(void) { struct uv_gam_range_entry *gre =3D uv_gre_table; @@ -1552,27 +1558,25 @@ static void __init build_socket_tables(void) /* Allocate and clear tables */ if ((alloc_conv_table(nump, &_pnode_to_socket) < 0) || (alloc_conv_table(nums, &_socket_to_pnode) < 0) - || (alloc_conv_table(numn, &_node_to_pnode) < 0) + || (alloc_conv_table(numn, &_node_to_socket) < 0) || (alloc_conv_table(nums, &_socket_to_node) < 0)) { kfree(_pnode_to_socket); kfree(_socket_to_pnode); - kfree(_node_to_pnode); + kfree(_node_to_socket); return; } =20 /* Fill in pnode/node/addr conversion list values: */ - pr_info("UV: GAM Building socket/pnode conversion tables\n"); for (; gre->type !=3D UV_GAM_RANGE_TYPE_UNUSED; gre++) { if (gre->type =3D=3D UV_GAM_RANGE_TYPE_HOLE) continue; i =3D gre->sockid - minsock; - /* Duplicate: */ - if (_socket_to_pnode[i] !=3D SOCK_EMPTY) - continue; - _socket_to_pnode[i] =3D gre->pnode; + if (_socket_to_pnode[i] =3D=3D SOCK_EMPTY) + _socket_to_pnode[i] =3D gre->pnode; =20 i =3D gre->pnode - minpnode; - _pnode_to_socket[i] =3D gre->sockid; + if (_pnode_to_socket[i] =3D=3D SOCK_EMPTY) + _pnode_to_socket[i] =3D gre->sockid; =20 pr_info("UV: sid:%02x type:%d nasid:%04x pn:%02x pn2s:%2x\n", gre->sockid, gre->type, gre->nasid, @@ -1582,34 +1586,29 @@ static void __init build_socket_tables(void) =20 /* Set socket -> node values: */ lnid =3D NUMA_NO_NODE; - for_each_present_cpu(cpu) { + for_each_possible_cpu(cpu) { int nid =3D cpu_to_node(cpu); int apicid, sockid; =20 if (lnid =3D=3D nid) continue; lnid =3D nid; + apicid =3D per_cpu(x86_cpu_to_apicid, cpu); sockid =3D apicid >> uv_cpuid.socketid_shift; - _socket_to_node[sockid - minsock] =3D nid; - pr_info("UV: sid:%02x: apicid:%04x node:%2d\n", - sockid, apicid, nid); - } =20 - /* Set up physical blade to pnode translation from GAM Range Table: */ - for (lnid =3D 0; lnid < num_possible_nodes(); lnid++) { - unsigned short sockid; + if (_socket_to_node[sockid - minsock] =3D=3D SOCK_EMPTY) + _socket_to_node[sockid - minsock] =3D nid; =20 - for (sockid =3D minsock; sockid <=3D maxsock; sockid++) { - if (lnid =3D=3D _socket_to_node[sockid - minsock]) { - _node_to_pnode[lnid] =3D _socket_to_pnode[sockid - minsock]; - break; - } - } - if (sockid > maxsock) { - pr_err("UV: socket for node %d not found!\n", lnid); - BUG(); - } + if (_node_to_socket[nid] =3D=3D SOCK_EMPTY) + _node_to_socket[nid] =3D sockid; + + pr_info("UV: sid:%02x: apicid:%04x socket:%02d node:%03x s2n:%03x\n", + sockid, + apicid, + _node_to_socket[nid], + nid, + _socket_to_node[sockid - minsock]); } =20 /* @@ -1617,6 +1616,7 @@ static void __init build_socket_tables(void) * system runs faster by removing corresponding conversion table. */ FREE_1_TO_1_TABLE(_socket_to_node, _min_socket, nums, numn); + FREE_1_TO_1_TABLE(_node_to_socket, _min_socket, nums, numn); FREE_1_TO_1_TABLE(_socket_to_pnode, _min_pnode, nums, nump); FREE_1_TO_1_TABLE(_pnode_to_socket, _min_pnode, nums, nump); } @@ -1702,12 +1702,13 @@ static __init int uv_system_init_hubless(void) static void __init uv_system_init_hub(void) { struct uv_hub_info_s hub_info =3D {0}; - int bytes, cpu, nodeid; + int bytes, cpu, nodeid, bid; unsigned short min_pnode =3D USHRT_MAX, max_pnode =3D 0; char *hub =3D is_uv5_hub() ? "UV500" : is_uv4_hub() ? "UV400" : is_uv3_hub() ? "UV300" : is_uv2_hub() ? "UV2000/3000" : NULL; + struct uv_hub_info_s **uv_hub_info_list_blade; =20 if (!hub) { pr_err("UV: Unknown/unsupported UV hub\n"); @@ -1730,9 +1731,12 @@ static void __init uv_system_init_hub(void) build_uv_gr_table(); set_block_size(); uv_init_hub_info(&hub_info); - uv_possible_blades =3D num_possible_nodes(); - if (!_node_to_pnode) + /* If UV2 or UV3 may need to get # blades from HW */ + if (is_uv(UV2|UV3) && !uv_gre_table) boot_init_possible_blades(&hub_info); + else + /* min/max sockets set in decode_gam_rng_tbl */ + uv_possible_blades =3D (_max_socket - _min_socket) + 1; =20 /* uv_num_possible_blades() is really the hub count: */ pr_info("UV: Found %d hubs, %d nodes, %d CPUs\n", uv_num_possible_blades(= ), num_possible_nodes(), num_possible_cpus()); @@ -1741,79 +1745,98 @@ static void __init uv_system_init_hub(void) hub_info.coherency_domain_number =3D sn_coherency_id; uv_rtc_init(); =20 + /* + * __uv_hub_info_list[] is indexed by node, but there is only + * one hub_info structure per blade. First, allocate one + * structure per blade. Further down we create a per-node + * table (__uv_hub_info_list[]) pointing to hub_info + * structures for the correct blade. + */ + bytes =3D sizeof(void *) * uv_num_possible_blades(); - __uv_hub_info_list =3D kzalloc(bytes, GFP_KERNEL); - BUG_ON(!__uv_hub_info_list); + uv_hub_info_list_blade =3D kzalloc(bytes, GFP_KERNEL); + if (WARN_ON_ONCE(!uv_hub_info_list_blade)) + return; =20 bytes =3D sizeof(struct uv_hub_info_s); - for_each_node(nodeid) { + for_each_possible_blade(bid) { struct uv_hub_info_s *new_hub; =20 - if (__uv_hub_info_list[nodeid]) { - pr_err("UV: Node %d UV HUB already initialized!?\n", nodeid); - BUG(); + /* Allocate & fill new per hub info list */ + new_hub =3D (bid =3D=3D 0) ? &uv_hub_info_node0 + : kzalloc_node(bytes, GFP_KERNEL, uv_blade_to_node(bid)); + if (WARN_ON_ONCE(!new_hub)) { + /* do not kfree() bid 0, which is statically allocated */ + while (--bid > 0) + kfree(uv_hub_info_list_blade[bid]); + kfree(uv_hub_info_list_blade); + return; } =20 - /* Allocate new per hub info list */ - new_hub =3D (nodeid =3D=3D 0) ? &uv_hub_info_node0 : kzalloc_node(bytes= , GFP_KERNEL, nodeid); - BUG_ON(!new_hub); - __uv_hub_info_list[nodeid] =3D new_hub; - new_hub =3D uv_hub_info_list(nodeid); - BUG_ON(!new_hub); + uv_hub_info_list_blade[bid] =3D new_hub; *new_hub =3D hub_info; =20 /* Use information from GAM table if available: */ - if (_node_to_pnode) - new_hub->pnode =3D _node_to_pnode[nodeid]; + if (uv_gre_table) + new_hub->pnode =3D uv_blade_to_pnode(bid); else /* Or fill in during CPU loop: */ new_hub->pnode =3D 0xffff; =20 - new_hub->numa_blade_id =3D uv_node_to_blade_id(nodeid); + new_hub->numa_blade_id =3D bid; new_hub->memory_nid =3D NUMA_NO_NODE; new_hub->nr_possible_cpus =3D 0; new_hub->nr_online_cpus =3D 0; } =20 + /* + * Now populate __uv_hub_info_list[] for each node with the + * pointer to the struct for the blade it resides on. + */ + + bytes =3D sizeof(void *) * num_possible_nodes(); + __uv_hub_info_list =3D kzalloc(bytes, GFP_KERNEL); + if (WARN_ON_ONCE(!__uv_hub_info_list)) { + for_each_possible_blade(bid) + /* bid 0 is statically allocated */ + if (bid !=3D 0) + kfree(uv_hub_info_list_blade[bid]); + kfree(uv_hub_info_list_blade); + return; + } + + for_each_node(nodeid) + __uv_hub_info_list[nodeid] =3D uv_hub_info_list_blade[uv_node_to_blade_i= d(nodeid)]; + /* Initialize per CPU info: */ for_each_possible_cpu(cpu) { - int apicid =3D per_cpu(x86_cpu_to_apicid, cpu); - int numa_node_id; + int apicid =3D early_per_cpu(x86_cpu_to_apicid, cpu); + unsigned short bid; unsigned short pnode; =20 - nodeid =3D cpu_to_node(cpu); - numa_node_id =3D numa_cpu_node(cpu); pnode =3D uv_apicid_to_pnode(apicid); + bid =3D uv_pnode_to_socket(pnode) - _min_socket; =20 - uv_cpu_info_per(cpu)->p_uv_hub_info =3D uv_hub_info_list(nodeid); + uv_cpu_info_per(cpu)->p_uv_hub_info =3D uv_hub_info_list_blade[bid]; uv_cpu_info_per(cpu)->blade_cpu_id =3D uv_cpu_hub_info(cpu)->nr_possible= _cpus++; if (uv_cpu_hub_info(cpu)->memory_nid =3D=3D NUMA_NO_NODE) uv_cpu_hub_info(cpu)->memory_nid =3D cpu_to_node(cpu); =20 - /* Init memoryless node: */ - if (nodeid !=3D numa_node_id && - uv_hub_info_list(numa_node_id)->pnode =3D=3D 0xffff) - uv_hub_info_list(numa_node_id)->pnode =3D pnode; - else if (uv_cpu_hub_info(cpu)->pnode =3D=3D 0xffff) + if (uv_cpu_hub_info(cpu)->pnode =3D=3D 0xffff) uv_cpu_hub_info(cpu)->pnode =3D pnode; } =20 - for_each_node(nodeid) { - unsigned short pnode =3D uv_hub_info_list(nodeid)->pnode; + for_each_possible_blade(bid) { + unsigned short pnode =3D uv_hub_info_list_blade[bid]->pnode; =20 - /* Add pnode info for pre-GAM list nodes without CPUs: */ - if (pnode =3D=3D 0xffff) { - unsigned long paddr; + if (pnode =3D=3D 0xffff) + continue; =20 - paddr =3D node_start_pfn(nodeid) << PAGE_SHIFT; - pnode =3D uv_gpa_to_pnode(uv_soc_phys_ram_to_gpa(paddr)); - uv_hub_info_list(nodeid)->pnode =3D pnode; - } min_pnode =3D min(pnode, min_pnode); max_pnode =3D max(pnode, max_pnode); - pr_info("UV: UVHUB node:%2d pn:%02x nrcpus:%d\n", - nodeid, - uv_hub_info_list(nodeid)->pnode, - uv_hub_info_list(nodeid)->nr_possible_cpus); + pr_info("UV: HUB:%2d pn:%02x nrcpus:%d\n", + bid, + uv_hub_info_list_blade[bid]->pnode, + uv_hub_info_list_blade[bid]->nr_possible_cpus); } =20 pr_info("UV: min_pnode:%02x max_pnode:%02x\n", min_pnode, max_pnode); @@ -1821,6 +1844,9 @@ static void __init uv_system_init_hub(void) map_mmr_high(max_pnode); map_mmioh_high(min_pnode, max_pnode); =20 + kfree(uv_hub_info_list_blade); + uv_hub_info_list_blade =3D NULL; + uv_nmi_setup(); uv_cpu_init(); uv_setup_proc_files(0); --=20 2.26.2