From nobody Tue Nov 26 17:28:13 2024 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E4312141CB for ; Wed, 16 Oct 2024 19:25:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729106716; cv=none; b=RAHz2/6M2x9Wj6h1KncYmUh7ZEXNB6F/nWDaLTbzkAcjkNA7bH/p+dInPB2a5CfBmjJ4HOE8mO36M2vySByr0mN+wMEJZFYedTUEg8TiAzaPufA8YmJASs5/8wZRwtsQ7gOf8rdmjEf/ErPSKJnPCb9jn2ziB0bIoEoUDfJDYlQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729106716; c=relaxed/simple; bh=rLKKQNdoNnZ4UTvW+UG2w5DmYKje0EiIGyVUNfkrTC0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FhQqHzy3pH8n8wQZDHEt3WqjOh30yxbDg1p8dNNF+0LbtpjeV5E4hXL1+QYQAY/z5bWESaDpEzNjXha98+5M0fcZza1vx9eTwq3vNb1H6eSE38cJY7LNMUa/7vzth1nUmDnJ4LembBb06Lt5/1fi3i+StJxrLXdLsTkAKkkpuoE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=amaHU/tY; arc=none smtp.client-ip=209.85.210.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="amaHU/tY" Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-7157d323709so77813a34.1 for ; Wed, 16 Oct 2024 12:25:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1729106714; x=1729711514; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dBo6zWcPK+3VP3Pj2DhU6SEkNl/D37D0T08p/JVF3UI=; b=amaHU/tYMXX9U1kUDE4602xt0R7oOruijN+vo+7JqTIwYppyFOtikAhA/x45uUQccG EvfrdlHwm5MKCN/2C/UubmcUEPK/Y/uWDk1bjK3B3Qb27djHrspS7h/xUENj6st8EMw9 viaYgyAif1a0LenO2Qnb6AvDphx/NbVDNcNg0bfVHGYCiHBLqZEVUghH+cm6BZlze5vy 9wr4YJwjd9Acfxmj9aantRadPqF8BpVvXRRHCppwopDAEdsqjxE9Hl5tTsYvgSZanmxL cnORH5p0+2tr5X0fj+Mq5uNJuAVTsa1Cs6dXIYqo4kpK0cNxu7jxSjRsLz/1i+SOXVGt 4JmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729106714; x=1729711514; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dBo6zWcPK+3VP3Pj2DhU6SEkNl/D37D0T08p/JVF3UI=; b=IvROdx1oyrN4kjcYCMdNOBPvaYvWRYFeDgHoLgRQZyx+zqf7U/awOE4ZZVi02oN+VZ Oqujlb17k+46g90YtgBouWbXV2DDSnes01M2KZyiVbQh/hq4V4kQdgqLszqFlKmN9OZG 0nEy3rQtCYKHX3nhe9ujHW5hjo1FRtgXP0BqvXZlrUuRO7UYD1zRBWgw1H4a8TBRyAbA 8WKJ2GsjJpSmRtTCWVuI/48ifvAXJfIiUy5RmqukfqEd824GlDCKNv/vetqG93SBSeDR oDfNLCqGelgu/yK6hXcG1ghZwKS/IW6WQGlRpM5a+2jwyjWcw6hLpuu4RGbvjeoi+q9X IaNg== X-Forwarded-Encrypted: i=1; AJvYcCVoF9XhoMgOimYRqNj0QFTMcnmo5qTk9wqcKk6CxOR51w/jRC8MWz6xu+UDtFafrbjdHkXCwpSmca/EtLk=@vger.kernel.org X-Gm-Message-State: AOJu0Yx2GRQaLcyIbV7k16hpZJiQChnyI4LIFCbkxaRitw6JvKgWcogZ jN5mVXj+yM40nkaNVF4bL+Hm7Cv+Ugcdhe2+PhTepkU38O9Ucps6uz0AgcSp2hA= X-Google-Smtp-Source: AGHT+IHgEaZY8iMsb9K6bD0HeyumMxOtRYAvr41KKGuerxwMHdKgbYwRirurzLA7V5zk8rmjifacXA== X-Received: by 2002:a05:6358:618a:b0:1b8:688e:6ea3 with SMTP id e5c5f4694b2df-1c3784dedc0mr449907255d.18.1729106714352; Wed, 16 Oct 2024 12:25:14 -0700 (PDT) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4607b38ce69sm20271651cf.90.2024.10.16.12.25.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Oct 2024 12:25:14 -0700 (PDT) From: Gregory Price To: x86@kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: dan.j.williams@intel.com, ira.weiny@intel.com, david@redhat.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, rafael@kernel.org, lenb@kernel.org, rppt@kernel.org, akpm@linux-foundation.org, alison.schofield@intel.com, Jonathan.Cameron@huawei.com, rrichter@amd.com, ytcoode@gmail.com, haibo1.xu@intel.com, dave.jiang@intel.com Subject: [PATCH v2 1/3] mm/memblock: implement memblock_advise_size_order and probe functions Date: Wed, 16 Oct 2024 15:24:43 -0400 Message-ID: <20241016192445.3118-2-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241016192445.3118-1-gourry@gourry.net> References: <20241016192445.3118-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Hotplug memory sources may have opinions on what the memblock size should be - usually for alignment purposes. For example, CXL memory extents can be as small as 256MB with a matching physical alignment. Implement memblock_advise_size_order for use during early init, prior to allocator and smp init, for software to advise the system as to what the preferred block size should be. The probe function is meant for arch_init code to fetch this value once during memblock size calculation. Use of the advisement value is arch-specific, and no guarantee is made that it will be used. Calls to either function after probe results in -EBUSY to signal that advisement is ignored or that memblock_get_size_bytes should be used. Suggested-by: Ira Weiny Signed-off-by: Gregory Price Suggested-by: Dan Williams Suggested-by: David Hildenbrand --- include/linux/memblock.h | 2 ++ mm/memblock.c | 49 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index fc4d75c6cec3..efb1f7cfbd58 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -111,6 +111,8 @@ static inline void memblock_discard(void) {} #endif =20 void memblock_allow_resize(void); +int memblock_advise_size_order(int order); +int memblock_probe_size_order(void); int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid, enum memblock_flags flags); int memblock_add(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 3b9dc2d89b8a..e0bdba011564 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -2009,6 +2009,55 @@ void __init memblock_allow_resize(void) memblock_can_resize =3D 1; } =20 +/* + * @order: bit-order describing the preferred minimum block size + * + * Intended for use by early-boot software prior to smp and allocator init= to + * advise the architecture what the minimum block size should be. Should o= nly + * be called during arch init before allocator and smp init. + * + * This value can only decrease after it has been initially set, the inten= tion + * is to identify the smallest supported alignment across all opinions. + * + * Use of this advisement value is arch-specific. + * + * Returns: 0 on success, -EINVAL if order is <=3D0, and -EBUSY if already= probed + */ +static int memblock_sz_order; +#define MEMBLOCK_SZO_PROBED (-1) +int memblock_advise_size_order(int order) +{ + if (order <=3D 0) + return -EINVAL; + + if (memblock_sz_order =3D=3D MEMBLOCK_SZO_PROBED) + return -EBUSY; + + if (memblock_sz_order) + memblock_sz_order =3D min(order, memblock_sz_order); + else + memblock_sz_order =3D order; + + return 0; +} + +/* + * memblock_probe_size_order is intended for arch init code to probe one t= ime, + * for a suggested memory block size. After the first call, the result wi= ll + * always be -EBUSY. A late user should call memory_block_size_bytes inste= ad to + * determine the actual block size in use. + * + * Should only be called during arch init prior to allocator and smp init. + * + * Returns: block size order, 0 if never set, or -EBUSY if previously prob= ed. + */ +int memblock_probe_size_order(void) +{ + int rv =3D xchg(&memblock_sz_order, -1); + + return (rv =3D=3D -1) ? -EBUSY : rv; +} + static int __init early_memblock(char *p) { if (p && strstr(p, "debug")) --=20 2.43.0 From nobody Tue Nov 26 17:28:13 2024 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 580222141C3 for ; Wed, 16 Oct 2024 19:25:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729106718; cv=none; b=Rnq08jOGC9naWizEtcGdu+zzniBBcR4pERgkxGmsXTU8PQXRh3Wq0FrYdqKIaDlePwEt46X+gcWHupijsZa1nQBhEeg8MK1wlrwZnqdp6o0X+qriBZ+Y/RxzrsDLxTZnEk3IbkFfS2zemu5+iH7cvQZ3RG3efxtNo0oM0bxbeXY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729106718; c=relaxed/simple; bh=UFARBbqayRLFnWF98y3uEkaCwwYUXXMM3OzZ3HiPfXw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VAlUK3bKlf2eyI/vt+UCUIyp00gD+Vo+Prbv4uHAQol+xiWLNYTS6FxFeuJiqdskfb3tvG3dG9H9hHhXvGjm3Jg9/ZtcT7r/KbW8H59n07riPCW0P5uADLNqFQ5CR0/ZA0v5LPTrmoDgaAJwGGlK0PYZ7Tq/M82pIIfPOLbhtbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=Hk5bp7aR; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="Hk5bp7aR" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-46094b68e30so1380531cf.0 for ; Wed, 16 Oct 2024 12:25:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1729106716; x=1729711516; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KhSiZfD9mtqxRJdpC7TxMcZ82+QcCBZVMQrw8INyhYE=; b=Hk5bp7aR84sV4AFuoNdhLQKYRO1SAzKp65QOnd3latWq2n7RjMIOFxLRmQKJjo1AMM Ztkn2eRk1mra+G1ouMF1rJMJtntTMuIQbUduj++L1WCshVGa1INCz34oZaraXbZ6XuYo O3a6a6Bffj1J3ncunOU/gzIolYy/+CuXakCGqQvF0bBbUzpL0S44anj9f5LqkiXxdsbO SLdnw/TC81o/6M5XvZiFDjucVNYycfXv9Mf8Cx+IHg8JNh+svjM2Kly3+s1tiPmj5pow 1BHDHpRyZnXSp55iTNrx2kqHLNPCXdRrWF34pTVUC//6j4brkXdY3afOewwoQjOgomlB 9XzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729106716; x=1729711516; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KhSiZfD9mtqxRJdpC7TxMcZ82+QcCBZVMQrw8INyhYE=; b=RBCeFqVqq8JXZbk/xLLg/QsYRWatEc8G2kRtnv3qhJCBp+xrILsiJpE+3qbyJg+wVJ Xs9B2w08okdoS7iWhhpGJP9va4RqlGGcC+ILf3Z2WIZ/T8EKytsZGMjR+5PSXu84aRGB GrmFppFY2lH4XVLvi788pNwD7w2N8/mabV/Eiuyw4Am1Hem4h8CD50ddKEoUopD7Bt7Y KDmy8MgWIi8jjqv+FbYTlNGtmj556/NT97IBnLcdYP++FEq+bw9P5mUtRaUUFnDEpCuh PX3TXXdKsKQYjdC/L8rBM8pfkqCtLRoxWeL0C4TqimXHf2pbmBvHpZCZ8SculsKgndTn RU4A== X-Forwarded-Encrypted: i=1; AJvYcCXsFijOxpAGvTpo5DIHqWfPOrjnXwknlfmalfPy2DQvCL2oTvVhbVtPDN/aEcc5rllO+MPXhBLaHwse1Vw=@vger.kernel.org X-Gm-Message-State: AOJu0YzArfKhSM5y3dFn4kygnJ7rRGQrCi5MPCplIsDQpix2SLVL6udA 7FT2+qtxpdd+q6wnYJLj4oTu8/yC+ZP8Xx59aHMLk4Xn59mrSHxsY1CF+f2wpls= X-Google-Smtp-Source: AGHT+IFAUHFf6NguMqIqPpOUU8ECxUJfxWjg38Snyk4DLApeDSEpJ+qoxL7DfuWK9Mow+ZcYMf8Ygg== X-Received: by 2002:a05:622a:244b:b0:45f:784:1b5a with SMTP id d75a77b69052e-46058444fa6mr282643391cf.25.1729106716169; Wed, 16 Oct 2024 12:25:16 -0700 (PDT) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4607b38ce69sm20271651cf.90.2024.10.16.12.25.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Oct 2024 12:25:15 -0700 (PDT) From: Gregory Price To: x86@kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: dan.j.williams@intel.com, ira.weiny@intel.com, david@redhat.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, rafael@kernel.org, lenb@kernel.org, rppt@kernel.org, akpm@linux-foundation.org, alison.schofield@intel.com, Jonathan.Cameron@huawei.com, rrichter@amd.com, ytcoode@gmail.com, haibo1.xu@intel.com, dave.jiang@intel.com Subject: [PATCH v2 2/3] x86: probe memblock size advisement value during mm init Date: Wed, 16 Oct 2024 15:24:44 -0400 Message-ID: <20241016192445.3118-3-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241016192445.3118-1-gourry@gourry.net> References: <20241016192445.3118-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Systems with hotplug may provide an advisement value on what the memblock size should be. Probe this value when the rest of the configuration values are considered. The new heuristic is as follows 1) set_memory_block_size_order value if already set (cmdline param) 2) minimum block size if memory is less than large block limit 3) [new] hotplug advise: lesser of advise value or memory alignment 4) Max block size if system is bare-metal 5) Largest size that aligns to end of memory. Suggested-by: David Hildenbrand Signed-off-by: Gregory Price Suggested-by: Dan Williams Suggested-by: Ira Weiny --- arch/x86/mm/init_64.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index ff253648706f..b72923b12d99 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1439,6 +1439,7 @@ static unsigned long probe_memory_block_size(void) { unsigned long boot_mem_end =3D max_pfn << PAGE_SHIFT; unsigned long bz; + int order; =20 /* If memory block size has been set, then use it */ bz =3D set_memory_block_size; @@ -1451,6 +1452,21 @@ static unsigned long probe_memory_block_size(void) goto done; } =20 + /* Consider hotplug advisement value (if set) */ + order =3D memblock_probe_size_order(); + bz =3D order > 0 ? (1UL << order) : 0; + if (bz) { + /* Align down to max and up to min supported */ + bz =3D max(min(bz, MAX_BLOCK_SIZE), MIN_MEMORY_BLOCK_SIZE); + /* Use lesser of advisement and end of memory alignment */ + for (; bz > MIN_MEMORY_BLOCK_SIZE; bz >>=3D 1) { + if (IS_ALIGNED(boot_mem_end, bz)) + goto done; + } + /* Barring clean alignment, default to min block size */ + goto done; + } + /* * Use max block size to minimize overhead on bare metal, where * alignment for memory hotplug isn't a concern. --=20 2.43.0 From nobody Tue Nov 26 17:28:13 2024 Received: from mail-vs1-f50.google.com (mail-vs1-f50.google.com [209.85.217.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78E78216A2C for ; Wed, 16 Oct 2024 19:25:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729106721; cv=none; b=kYzzHYKKNnaQ7dRj4SPPJmrYqPQHsO6yXctKkEcHlMWUf+2ffXm7hGVhwov0buRIBhZRLf48S9BR773h4VJ2XXFT+uZtjF98qA8vGScedgMveDlCt/ksaBHU/czkZeJZ/KxgGbWViQd5ME5WvBj/RZn1hcmKfdf0mrRToR8RHe0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729106721; c=relaxed/simple; bh=5d0MbCIw8EE2ZSOCUaim9gDCiiqCphBCQxQbxbqbXHY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uKx9wb8sKEspwHBDM5WhXfr8Pa3qRGBtcDNN+MqsvJLYRVBjeKIZM7Lgv8g0BhBlUvzNUXSkcvIePndtewVxCiQzlldVsAwNMUhy4R8zSEhSccGvjU6A65dz8bE9tFXVfRmVosWE4CK/38yAbN05dZItEZjPrO9f4Qv4PxR+FF8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=mPKEV0fu; arc=none smtp.client-ip=209.85.217.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="mPKEV0fu" Received: by mail-vs1-f50.google.com with SMTP id ada2fe7eead31-4a470d330a5so46383137.3 for ; Wed, 16 Oct 2024 12:25:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1729106718; x=1729711518; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HrmbhAeCdsnN6DuwhCdjgoaV0xg9bEsX7zAMN4X3gqU=; b=mPKEV0fuQxrYnrKL/HSLmS3sJ2k+KOxcrykrrDwRBvQur6XljwNuA7m1D4FsWjTxkT p3aDYUXaNUHuXhVeWVHYnQwVBjkuxjCSW8eYi+Xs1JuTt2zALQeTx4PvekEuFcd+52pf hEQqA+nguO8ErvQ2Tiovn9GDWCym76ilBr1hkCUuwyZyh/YzDpZ3fcsxzXnlDgn7Sm4g pPtsnAu0Li8ddrtlMvsGC2Imk/a/KK3DirPg7B2wQl2qaLOmm20WdtWGXS0S6XsR1DNn hQua5qAsMyimY4E1YAsCQzSkpnjmTCSWbuT0O99UtX8noO39lLUhI3kytPGPTR/U8B+b VJeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729106718; x=1729711518; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HrmbhAeCdsnN6DuwhCdjgoaV0xg9bEsX7zAMN4X3gqU=; b=hZbzQAXgetl/R+4V/nzsn3oMtG7VPO0exB03wOksNCwHpY0d1rE03/SLfQMAF3G/xw XFA9oOPONYQBeI46hh0pZxPgqJySmOdcAyxaTu1yZHeHyN5hqZJ0OaDehhEIGJuiuf8R wkKrnbLLuUBSxGJv0iQ1Bix1QiHKwHv8x6IkFbadP82Ksw1qf/kPUGNUskNNMU2GvYEJ 7rlHK+1axghorQDmody9hZovuOOi5bKIXVPLKm2/qlQjei/tHfPcVecFtws41IZADtHy 7su+BOU6yLk7P1AOl2jbXKPB0/ybaJCuyu9IhnpJemliVAPgymXv8bpb0EX3YvRqFSp1 SkKw== X-Forwarded-Encrypted: i=1; AJvYcCV+LP19CHGB2lUE5MhyBgEv9h6pp4j1S2JeGVnbNYqzHN4QyiF38ewyZf10vJ37Gc9sWb+dlt8v+bwbRSE=@vger.kernel.org X-Gm-Message-State: AOJu0YxEIKTpNrplh9/BCpt9T+Xmw8Rjgv/n6Brh/rH6lgu6UDNQpUqX mz83LL53+M9ESyYk2vP3KTGToD6z8J3j27mFfEt0I4wZPAw1Tr+HBo4S02bYWEk= X-Google-Smtp-Source: AGHT+IFClvWp9HyykvU9mzj84/mtwGebanX8vIYbiVaXquh13pEcA6fQ2yrNp4hn0Qu9cQWW9s3xDg== X-Received: by 2002:a05:6102:d92:b0:4a5:ba70:1c6e with SMTP id ada2fe7eead31-4a5ba702ba3mr3418941137.29.1729106718285; Wed, 16 Oct 2024 12:25:18 -0700 (PDT) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4607b38ce69sm20271651cf.90.2024.10.16.12.25.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Oct 2024 12:25:17 -0700 (PDT) From: Gregory Price To: x86@kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: dan.j.williams@intel.com, ira.weiny@intel.com, david@redhat.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, rafael@kernel.org, lenb@kernel.org, rppt@kernel.org, akpm@linux-foundation.org, alison.schofield@intel.com, Jonathan.Cameron@huawei.com, rrichter@amd.com, ytcoode@gmail.com, haibo1.xu@intel.com, dave.jiang@intel.com Subject: [PATCH v2 3/3] acpi,srat: reduce memory block size if CFMWS has a smaller alignment Date: Wed, 16 Oct 2024 15:24:45 -0400 Message-ID: <20241016192445.3118-4-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241016192445.3118-1-gourry@gourry.net> References: <20241016192445.3118-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The CXL Fixed Memory Window allows for memory aligned down to the size of 256MB. However, by default on x86, memory blocks increase in size as total System RAM capacity increases. On x86, this caps out at 2G when 64GB of System RAM is reached. When the CFMWS regions are not aligned to memory block size, this results in lost capacity on either side of the alignment. Parse all CFMWS to detect the largest common denomenator among all regions, and advise memblock to reduce the block size accordingly. Suggested-by: Dan Williams Signed-off-by: Gregory Price Suggested-by: David Hildenbrand Suggested-by: Ira Weiny --- drivers/acpi/numa/srat.c | 42 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 44f91f2c6c5d..5fc03a99570e 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -333,6 +334,35 @@ acpi_parse_memory_affinity(union acpi_subtable_headers= *header, return 0; } =20 +/* + * CXL allows CFMW to be aligned along 256MB boundaries, but large memory + * systems default to larger alignments (2GB on x86). Misalignments can + * cause some capacity to become unreachable. Calculate the largest suppor= ted + * alignment for all CFMW to maximize the amount of mappable capacity. + */ +static int __init acpi_align_cfmws(union acpi_subtable_headers *header, + void *arg, const unsigned long table_end) +{ + struct acpi_cedt_cfmws *cfmws =3D (struct acpi_cedt_cfmws *)header; + u64 start =3D cfmws->base_hpa; + u64 size =3D cfmws->window_size; + unsigned long *fin_bz =3D arg; + unsigned long bz; + + for (bz =3D SZ_64T; bz >=3D SZ_256M; bz >>=3D 1) { + if (IS_ALIGNED(start, bz) && IS_ALIGNED(size, bz)) + break; + } + + /* Only adjust downward, we never want to increase block size */ + if (bz < *fin_bz && bz >=3D SZ_256M) + *fin_bz =3D bz; + else if (bz < SZ_256M) + pr_err("CFMWS: [BIOS BUG] base/size alignment violates spec\n"); + + return 0; +} + static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, void *arg, const unsigned long table_end) { @@ -501,6 +531,7 @@ acpi_table_parse_srat(enum acpi_srat_type id, int __init acpi_numa_init(void) { int i, fake_pxm, cnt =3D 0; + unsigned long bz =3D SZ_64T; =20 if (acpi_disabled) return -EINVAL; @@ -552,6 +583,17 @@ int __init acpi_numa_init(void) } last_real_pxm =3D fake_pxm; fake_pxm++; + + /* Calculate and set largest supported memory block size alignment */ + acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_align_cfmws, &bz); + if (bz >=3D SZ_256M) { + if (memblock_advise_size_order(ffs(bz)-1) < 0) + pr_warn("CFMWS: memblock size advise failed\n"); + else + pr_info("CFMWS: memblock advised size(%ld)\n", bz); + } + + /* Then parse and fill the numa nodes with the described memory */ acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, &fake_pxm); =20 --=20 2.43.0