From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAEAE3C4561 for ; Fri, 5 Jun 2026 21:19:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694359; cv=none; b=Q56ZrPvIsOQENuB9M2i61bzGIPBTFQqSKMKm5Iviq8V3s4vN8vxpSfZ/MRd/VGUbfm3l13oJIRctnqm/Gh7M5hpSWyR6Ss/SKA+XIHa4wtjqwX29XMZYU5h4egT7V7ITFWbrWiPWeaT16ckWr7OYCpCBLVARhznfkP2JHOTFFp8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694359; c=relaxed/simple; bh=YJUG2rEhZbZqBSQUXpRy6AdNHxz+6J4JkjRoL4CcAe4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ip9K0n4e0Bs/cb7iigOZP/uHIO98dRcPbNOQViy0APvZvdMdVgX+jgbjlsVrkMA4cmHIuw3M9/idcgdQqQoNBCkq8hCWrhsKEPMC+WDjHYSdA1d4kVg5ULBjiy9+vIDqS3RkLOKJidwJmjYRZZUTGVIB07kLiy9I63UCjdh92UY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=dW6h/ZpC; arc=none smtp.client-ip=209.85.219.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="dW6h/ZpC" Received: by mail-qv1-f45.google.com with SMTP id 6a1803df08f44-8cceaacd07bso24426226d6.3 for ; Fri, 05 Jun 2026 14:19:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694357; x=1781299157; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f8jLd2zqvW28XOaQQClXDDIz8D3hNjBgUh/nqgs8cgE=; b=dW6h/ZpCK+KXuztZ7tEiWySIC9UaJFQEnGX0l+MBOiFUAzMijD33W78SUYk+Q8Ysm7 pEArvIxiJeSgh9gJxhKzpZOXPaBCsr+d42cDt1fJHzgF0wa99gpisD7KFIPHTVd8OOl/ XiZmy9WSuc2pC5qwTgXi4pW0RckTbJdnqsIthMk4+OLf+QB78Mi56Idv1CvL722uIFiZ 5xXFxdVA1Td8OkbKcxyZkC7+RtbiqvIgWKEE9L9nwD+tgr8Odj6IusiHNOkVu0s4+mOy UEKJ+4fu8BO2SSbNaqrKLsfyWHt0cnxsjgUXxIjv4mZ1MzecOW71on7fAc5bOegh2kZV km3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694357; x=1781299157; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=f8jLd2zqvW28XOaQQClXDDIz8D3hNjBgUh/nqgs8cgE=; b=P5d4JdQoDduWTvDg+PSL28wkxZV7XVOsWNRv/eCYBgvrAh7KEV4aMZNty3VTGl0PHf BD4izNGONB3JkGHLNMLh6Jpfpj/PTRZ1reeW4QKASNM8HUTSpu5C/QYp+uArLokWT3rW IwokNIbYbYXk0O80R0ZOPvuTRroMipvZbjmS8f51RV53nSQi6xgtaxXfG8IrXOOXikNv w9k2MmPt6e2np0mBrO6YpVWDFGkUuyFXmWAB59C4/Zk93+JrE7KiMzjFiESqyCmKc+KG ejicfuCfe64iZqY956qxvUtMHt1sS0kSG6I3df6yrk0q2FtMvR91icL4I7JA2w/mzKdW dahA== X-Gm-Message-State: AOJu0Yzcr8oVBGkySAHKTmQDEauPPGLARp23KKbZRPlX62JAeNgfLpVS U1NjAq3meswP5mb3Y0suxnRlKEPAkyY7f8KmJU7NpHpGyckxuNvwJBBV6mYBj6iOOdw= X-Gm-Gg: Acq92OFz85VP+E4NVl9IZytum1mM2DnAXuFM8poNIFtpBsUE1OvMD9yPX/x/Da6CpW2 TE96y8je926WYlthVfCnsiTsoOgdVpnOcHgZGzWaq1s9Yxv/Ag/JVKWWXmAN4zY/RN5IxoFoBos SLsZZmo9LkxRwKPojtzfTB5Vc09xPnMPLnOogkOSaRJS00+rPfjRqDUnLwddMeGIDoNA+KFWOf6 R6GJecKXGuAo6bRl+CIU6uWTrGjThIYhamSJHPlWG2W4qbOuqSIYKxa2ktWqGSwe1POn1X/Gj8D PPYm+bOh7RB0JYBiSrtTK2/Fhdxynq5sMqd6i/Vp3priUQTlvNmzZX5sKJ0Z6s59HvyB3fSZm6R 0C8MzWbrl0SOesEMFcmrS7Vv6MNRBNku/GLqf5Z/eF3DzdYeDF9N4w4MHlyfdNqd9G+qU0JWBVR Nhk0DIRP0TJcrAVlHMNU332dtvaKYnEXOxFx+iaYc5/lg2BY/iHCfoBeIGsHsHDxYpZBmhZFFCm NhCVnk477CF9i0/0qpCqCc= X-Received: by 2002:ad4:560e:0:b0:8cc:f88e:2703 with SMTP id 6a1803df08f44-8cee5fe4e30mr72255516d6.12.1780694356950; Fri, 05 Jun 2026 14:19:16 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:16 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 1/9] mm/memory: add memory_block_aligned_range() helper Date: Fri, 5 Jun 2026 22:19:03 +0100 Message-ID: <20260605211911.2160954-2-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Memory hotplug operations require ranges aligned to memory block boundaries. This is a generic operation for hotplug. Add memory_block_aligned_range() as a common helper in that aligns the start address up and end address down to memory block boundaries. Update dax/kmem to use this helper. Signed-off-by: Gregory Price --- drivers/dax/kmem.c | 4 +--- include/linux/memory.h | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index a18e2b968e4d..592171ec10f4 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -33,9 +33,7 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i,= struct range *r) struct dev_dax_range *dax_range =3D &dev_dax->ranges[i]; struct range *range =3D &dax_range->range; =20 - /* memory-block align the hotplug range */ - r->start =3D ALIGN(range->start, memory_block_size_bytes()); - r->end =3D ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1; + *r =3D memory_block_aligned_range(range); if (r->start >=3D r->end) { r->start =3D range->start; r->end =3D range->end; diff --git a/include/linux/memory.h b/include/linux/memory.h index 463dc02f6cff..9f5ef0309f77 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -20,6 +20,7 @@ #include #include #include +#include =20 #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) =20 @@ -100,6 +101,27 @@ int arch_get_memory_phys_device(unsigned long start_pf= n); unsigned long memory_block_size_bytes(void); int set_memory_block_size_order(unsigned int order); =20 +/** + * memory_block_aligned_range - align a physical address range to memory b= locks + * @range: the input range to align + * + * Aligns the start address up and the end address down to memory block + * boundaries. This is required for memory hotplug operations which must + * operate on memory-block aligned ranges. + * + * Returns the aligned range. Callers should check that the returned + * range is valid (aligned.start < aligned.end) before using it. + */ +static inline struct range memory_block_aligned_range(const struct range *= range) +{ + struct range aligned; + + aligned.start =3D ALIGN(range->start, memory_block_size_bytes()); + aligned.end =3D ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1; + + return aligned; +} + struct memory_notify { unsigned long start_pfn; unsigned long nr_pages; --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 833FA3D3333 for ; Fri, 5 Jun 2026 21:19:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694363; cv=none; b=O0YuQglQUwxzB79ALVzMmpKt4TxeB/zn/Sroz9kNMzxkzd6Ubzjk0AW/pJMJ2TQxyDwNHqbDYlwa2i8+dHa8AKw73UgUNfEgEA9wuF7zlT1/3uokG1JGA5cGrp9251chspwktoRQ4QonG0oIv0qFGGrq1CWTt3CmUQcXABjva08= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694363; c=relaxed/simple; bh=5Ke3NaIM8Nos+HKJ6P0Knao9PHLgdSQ3RyZDgwl5zPo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qnKASiB+fIEIYk7cNFHYuKSv2PIcW5Ka86Gvl0z6WMUMyK8QEDtleeMe6GIIFrdBsMAdrVVY3r0Yat8zdSCweJy7Yjj4hdHhPMSb6oXuRs1/l2TL2LxwDj0MAoELrav7bjKtDaPbdVoMU+5FmZLuiX2sklLHw78CaZ+yIFZx6Hg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=oinAKT2G; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="oinAKT2G" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-8ccf01ba514so19767026d6.0 for ; Fri, 05 Jun 2026 14:19:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694358; x=1781299158; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oucyR6ExWtK/Mt3ZHMUJ1XBxl0rtHFiWxfVOhnoVW3U=; b=oinAKT2Gymn+/oDqzROE9bnryQuZ3vuRYexanN8h7eCN+NIjzUk1YXIfoz7J7QOKb1 Wj/4ZhiWpwBSDTbCbJUbhGfLndAc3eouVdofHo2zF+rBBTKYp9rG1pIOmxP5MTTy4WL8 3fhTWk8QdhC3R/Bpq7crTk31P80L17wn3YJE0w2hOg9c6xmlQL/SiIQDniXumj/pSwXh QrlhMGF2BpwW/QpTtdHLjWkad5fInsannvvbnz2ZV9RoUj+I3VvXSxaLcR35TkZo27mV AePyf4oTkex8TS/XN/qlaPFzIxpQWmEo022P6PKYkDhgz6oI9Du7JX0sULh3L38/ApgR Kn+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694358; x=1781299158; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=oucyR6ExWtK/Mt3ZHMUJ1XBxl0rtHFiWxfVOhnoVW3U=; b=BmyjvVlMlr24sC8gJcm17RtkIjRMhUukzxjOgHEm8+NGoxreE7zVkQvfQo3VaSe/xq lI9cROGAdv9wKLnJbdgULjektUmWBNexegYWW5d+SFsj8s2fSR4E9+KVeUzMC67Qcfgm kNUiw/oBwC0el6PNPl3m+DTdTztcBzWwWplA+FsD2/D2FqcLXdd1Mgtznv+8oHs1t9F1 5sXskd718s45vkUmsd1vLfl3ZOilCGXsGqd9Ie0ZfDmQM39QTsAIgxO8ar/LmhXExAn5 JH+4yLLLA5Ecti57LMVXhQ253iU9R7RxkAGQfMWw8tdKd5Cb8e7lIvVdqGq+9icXXOAP gRmg== X-Gm-Message-State: AOJu0YzF738TdqUNlygKBApkCHNJCbQUYdqEAnyMMZxmtqVgAKrfRyf+ +WYDg3/6EBGoyswX4h+mxGy8hFCuyksuuROROm5iIHK7T7tiBc+yhMCAz1TJgg0w+ZephStpzHB Hlntc11OzTQ== X-Gm-Gg: Acq92OFugtECjDbP/UJHujhJ8JGiGYVXsve+l18arY/4OV0r/0/Q9Qax7SAQY917cXT QFoPLEr0+O0wmlmhtchZEG/GOoLKIpZWy21lLUKbEh/0QEzdDBpOdjtpPbyIumn78sp+4U7VRzE jq1vccobcQ3qVhj/YCqUZDCgE46YrCpwVmPT0fBawrOWxE4Sn6xmo5cZCZ4qouquhPurRU3g7eq IsrT2C1XXBqt0jhOYn+THEBtDzbazYMyHBoJzU8ZRLrpkRW4RvmCDD5jMXX9SI+DbF72jHyJBc0 jESvOKZL3J63A2NBADv9kOYe3XbfzRKu99UZ5fIGwxXEzTG+kbeLffd2IAC6pM3arcdp11ngGRG 6vFULjNvAbMdRzLkhF3yCcXBsRtuOLYmyJvymxi901s7sstgZ7UMAzW03bI1NHntA9Kgn+HkfZM q1Tvz44Dh/pYfVQ1Sz+RuadokgqGSbOoO/L4FLIoWrNlTk3HM+MANzcU9njvzq72R++nxMrGO/K kV/10PRVXx54u9QFHK9Pr8= X-Received: by 2002:a05:6214:2687:b0:8cc:d4fd:702c with SMTP id 6a1803df08f44-8cee62655c6mr91017826d6.43.1780694358540; Fri, 05 Jun 2026 14:19:18 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:18 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Date: Fri, 5 Jun 2026 22:19:04 +0100 Message-ID: <20260605211911.2160954-3-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Modify online_memory_block() to accept the online type through its arg parameter rather than calling mhp_get_default_online_type() internally. This prepares for allowing callers to specify explicit online types. Update the caller in add_memory_resource() to pass the default online type via a local variable. No functional change. Cc: Oscar Salvador Cc: Andrew Morton Acked-by: David Hildenbrand (Red Hat) Signed-off-by: Gregory Price --- mm/memory_hotplug.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7ac19fab2263..6833208cc17c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1337,7 +1337,9 @@ static int check_hotplug_memory_range(u64 start, u64 = size) =20 static int online_memory_block(struct memory_block *mem, void *arg) { - mem->online_type =3D mhp_get_default_online_type(); + enum mmop *online_type =3D arg; + + mem->online_type =3D *online_type; return device_online(&mem->dev); } =20 @@ -1494,6 +1496,7 @@ static int create_altmaps_and_memory_blocks(int nid, = struct memory_group *group, int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) { struct mhp_params params =3D { .pgprot =3D pgprot_mhp(PAGE_KERNEL) }; + enum mmop online_type =3D mhp_get_default_online_type(); enum memblock_flags memblock_flags =3D MEMBLOCK_NONE; struct memory_group *group =3D NULL; u64 start, size; @@ -1582,7 +1585,8 @@ int add_memory_resource(int nid, struct resource *res= , mhp_t mhp_flags) =20 /* online pages if requested */ if (mhp_get_default_online_type() !=3D MMOP_OFFLINE) - walk_memory_blocks(start, size, NULL, online_memory_block); + walk_memory_blocks(start, size, &online_type, + online_memory_block); =20 return ret; error: --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9DE834A799 for ; Fri, 5 Jun 2026 21:19:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694363; cv=none; b=qJCzjfTpcOHj2SQnB23IjRT31f3ryryCBffW5ZH4NgcdYfYL8mt89gAQrxpYU0cnF0oO4nMJYI6beRlCrHRTy6CmnX4O3mfH3Jd6+Ojb3zWyAYStGWt6UwwuS1KKtLaR5IsDjSO6AvF6HsCbjt+jbqTS8/O9i7+P/kwwCUSlozU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694363; c=relaxed/simple; bh=bzVWvGN8pdlnq4o4z+hOySj1+S3a7FDyojT5FT1eDQg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AA8lkhYNB/RvF0GsdqyL2u6j3u9ql7EryRDZnlXLiR0vpjKr/93vEf+kQUoHv3AL8SoeBDn1Kt1wsoqNRvv++/aU8e6E5ukMC6q1x1OkeTPzyHGo9SQrxDE8TJyUtZkCh3LK8EiYbEiHvzLLUVsZve7VEY5zeaXhY+LyGirigXs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=DBqwXdKD; arc=none smtp.client-ip=209.85.219.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="DBqwXdKD" Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-8cceaacd07bso24426836d6.3 for ; Fri, 05 Jun 2026 14:19:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694361; x=1781299161; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oI6t/rke/+VLpdgjZBrnvU5+vhrLOr9MvTf8Ltd88rk=; b=DBqwXdKDGR/NQdBlrfv3MPd9QMBTY2A+Y7akFU/4a3MLfzt2jYu42kU2Qnuicz7GLZ 8yGSQ+WXFpGMhYrX2rl7rNa2BTWwZONGQnwK5CO700XOZrDMk91Y1lJBO2lgyFqEWtUb cveYtRWLmNgnhtM3NfPhwxTpJssojY0fPfbmnd7u2h5o6pED2VhGJENyDw0VCBuK6GRN 4cd/0Pqzq8OteqjVI5pRhCxRoTi7fwbaG2x1qRANKjurfjRSGxnJa2sDi+AEp9VHg2+H /18px/uZKZ97nAl9TUWq3mEkCTuFPBqWmVaB1MRfUz5+tKRob0VPDMQo2Qcxd5h7k6eY zChg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694361; x=1781299161; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=oI6t/rke/+VLpdgjZBrnvU5+vhrLOr9MvTf8Ltd88rk=; b=mIdd2FbVdoePK0QhrhHXPe2Tm+7d079B0S42Lf0BPAXZC4Jrzs+/FYyFRSTdeubkZS r0RPT5yuVDCPyszGGyLrXeEzIdqJgfF3VbvxtjqF9DQ1Y01PpUaF6Wbh6YtZVrufMG13 zO5dFvsW1U8O+JqtC8gJbRj0qIlmRROdaLfQ6OitbrdO4d8wBbt/7Dr68OTT5r8UJRur Nk3ybs5oTMH2te4urv8En6pw009VV9O6v7K/yyaSYItf0fRNKhEZAwFBceyF/mVU3Fff zPyhG4gDLbHZ6W6aP4F0JLPjPsekxHIOZEZ+htZvUpwIZlxUUTSC7vu3EgydIR70MZbC G3Zw== X-Gm-Message-State: AOJu0Ywxfnjjm7qoXiRNkYPUPx9do7VVtE1F/OBDbD2A5hFg3KsXoB2k 6wGaqJ52ZOeDGUKlw2UH6S3trI+Tgta86zVUQM9FqpEKoXDz7e2tKg0KpQSUAbvyP18= X-Gm-Gg: Acq92OHAKqW3SeED1zmyVubu8QPvwmjU8gjaUVHlKtFLbqHz2q5iiKx+E7PnExm0uLs 9zHKRrh/PTaSNxAe3u/g9mqejkn8GYsPAVKrp4EXgUOrMmqLGcx7piQyOC1vtzUJDzhLZsK0Md4 UpP45SpNNi/935OOXsQS7XmO6ivGjCatkuiuUxGAJQ3FOwb2rVmHJZTCqAeNNLjcYUyN0rkDLc3 WGZ3c5mviCHUAWardpguYeB0n+tiS2W+jdBp6ve/UtNhUkO9KM14M6j3qktjeUe0s+ZlMsJ4IqZ KyncbDbg7FHLY3tbhJzfEzPdOFuvAFj54kadW+eCbxY0JJYi4zvxmTSi3CpNWugZVwJDin4z7b2 dMDu9Z9RBusB6QDsIY9vEBLbhGG9TplCjuvR8esBG+9XdeZq2zRUEtDUgG1FUUjJvm+GB282x90 3nG/GhPApAG1nuceZdYKsx+OIktGaiTbyi6AwjmGBItgzUE6qCLdfSeih26H7dxq9nL4mKSAKov TbJ0eXnVfNYEoQ6lSfhd3n4dFIAfDHXvA== X-Received: by 2002:a05:6214:53c4:b0:8cc:e8f4:1630 with SMTP id 6a1803df08f44-8cee614991fmr103632176d6.30.1780694360689; Fri, 05 Jun 2026 14:19:20 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:20 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type Date: Fri, 5 Jun 2026 22:19:05 +0100 Message-ID: <20260605211911.2160954-4-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drivers which may pass hotplug policy down to DAX need MMOP_ symbols and the mhp_get_default_online_type function for hotplug use cases. Some drivers (cxl) co-mingle their hotplug and devdax use-cases into the same driver code, and chose the dax_kmem path as the default driver path - making it difficult to require hotplug as a predicate to building the overall driver (it may break other non-hotplug use-cases). Export mhp_get_default_online_type function to allow these drivers to build when hotplug is disabled and still use the DAX use case. In the built-out case we simply return MMOP_OFFLINE as it's non-destructive. The internal function can never return -1 either, so we choose this to allow for defining the function with 'enum mmop'. Signed-off-by: Gregory Price --- include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 7c9d66729c60..f059025f8f8b 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -316,6 +316,8 @@ extern struct zone *zone_for_pfn_range(enum mmop online= _type, extern int arch_create_linear_mapping(int nid, u64 start, u64 size, struct mhp_params *params); void arch_remove_linear_mapping(u64 start, u64 size); +#else +static inline enum mmop mhp_get_default_online_type(void) { return MMOP_OF= FLINE; } #endif /* CONFIG_MEMORY_HOTPLUG */ =20 #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6833208cc17c..494257054095 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -239,6 +239,7 @@ enum mmop mhp_get_default_online_type(void) =20 return mhp_default_online_type; } +EXPORT_SYMBOL_GPL(mhp_get_default_online_type); =20 void mhp_set_default_online_type(enum mmop online_type) { --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AF163D16E6 for ; Fri, 5 Jun 2026 21:19:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694373; cv=none; b=t2D4dV8NvWd2WLOWmiMk6mAZlo27V+tmBN95hLDqC84XTwe8ZmQsaRQEGtr4+tmHYgsBlahgZgpzTvISlIPULNDkHceey3X7hZrTvM8DXvucTErE6r41p/K0Iul9ImE9Ktm0AdeNrsdwSQeeYk1eAIMoXFWtYn3FI4/KbyiQ1gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694373; c=relaxed/simple; bh=43zFHi5h2geuF8B5bCDMfCmgrfTcOhgF/U/MHNHLK9s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fnEVF+QJH0GNvGA2TOwvkQoHtFQkOpHQgOjXsZPstxd3vtk6vE8RXvTqj9kOvq+zz/D892vPxjV0AtZyBxQ0/lGtptIqORV3j5sPl+k541OcY+faAmlsMyD4Rbggh82xQFW7p6iHeLHwt+2hhV6ueof59A8JAaXngPb3glKiPyc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=QIHTVSE0; arc=none smtp.client-ip=209.85.219.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="QIHTVSE0" Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-8ccda0ac4fcso23913316d6.2 for ; Fri, 05 Jun 2026 14:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694364; x=1781299164; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/dc8rrr2XcrLtsxKN6sh//CYz2F36Y4vG25xCNA+gt8=; b=QIHTVSE0c0+StUhXBxpssnjOgjk6DiHtPsPVfBtlfNB4Kmzqeb9jZk1cGdMmYk83bN 5S2diiBm6kMO+CtEh02e0hUZi2ZcqbatzToVZvKB/KQxB5Ljp055H6K6TdiOd1HNmq+h 3K7bmnOgtqHF1jV6vW4sEFZG7O5J+y4UvLytbn2XxiCkM7Tb//dS2Z4LBlBjyVvB4oyi W2SUcLEb+mlkC9wnJ6LFfFshrQcte0HrwoThnuXDbSP7Xk9miyt0AwihA+pltGNYFb52 klD7FqGYTdyya8D3O6ITpoOgKRKqwlVDwj38AtmxsUvJ96W4W0+KZOByS87VxvmBcgHi sTOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694364; x=1781299164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=/dc8rrr2XcrLtsxKN6sh//CYz2F36Y4vG25xCNA+gt8=; b=FLrdH5L2Hy+7HcttlmBI0FtzF09vuiN40egPvZwB51k5gUJyzhQ9yryzB/GzguGtl7 Z83vdzR1LtJBMsgn+omvIv8d79k+fp3m/c/cGVFTmmOjyFG7ju13kHTOm3COXVwWoB92 i03NBnHrvGsO6P0bUuFSMSlK7ZS+6ogZn5LrMdKgmLw/0z3DaKufjMwzheNe4mUf+BW7 N810K/LkunON4NvWOwh9vIu4keVR8vWBm6O9Ho05k68J4QV01tEDT+D/MLEkuG61ZLVC bfvOM6136C2FGiC2Il+8f+4uPlWa/zTAgGntT07wLbaiH3O5qcnE07uP3rrbRdDKY5mS SqUg== X-Gm-Message-State: AOJu0Yx39aCVCYO93FAPR63Gi79JQOp/EEu8q0a8sOnZ6L39qzj5PzK0 A/npnD8lgjoxHZkWoFaOXtIRwW1mcb/kS82IeG0wIIeJGO5NEWurcEUL/v6z2l2ab2U= X-Gm-Gg: Acq92OFosg/BjO6w6oE2Cix5b6n9Eykjh5Zb2wGjcuDq4NfWqyfAvXEprEc8Io58+NM EgrhwD65Wlt1u5vMsBfEOIXkkswjW8ocURMKZH9f9kO7njs/MCUf+q+epsYalXALb3TVaSL2NxW WCf3rE+a8+agQ2/juZ+hvWnJFuhWm+qJ7TrhJePcm1YlYbt1jWNyAM4R/S6TZR0Wfd+U90fQ7jv aTJEsxEMJ/SOJGQTaG+YkhEsLBwVNVvc43Hsq9zvW5uxdaNdEQ3+NpGr9rC3E7xjnFU12HMGpuO BsDF1Fr4kWFAFKe8tFWjH7Rq5Z//z4Scuk0nTsNib0dX/xgO+numR7/b3AUUexrSRIyK6mY5Y+q KLK2T8mQxQgJ3kJweGN8PvOgKVShsFRmEb1CqoJDgTm84ATvDTQsxhdt2GalLDLHpETaizeNWOX w7XJMVvkIjV88kKXJ1bYiH+tTYFCdyloNn3QUjkP0M4WD7RzmvowpmF/SnPSIzFHCthBaseNbQB 0b+ICPazhxr0Q5G25O4BUTPxBvWGnx+ig== X-Received: by 2002:a05:6214:5904:b0:8cc:f135:52ab with SMTP id 6a1803df08f44-8cee625b0ffmr97629096d6.39.1780694363997; Fri, 05 Jun 2026 14:19:23 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:23 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Date: Fri, 5 Jun 2026 22:19:06 +0100 Message-ID: <20260605211911.2160954-5-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Existing callers of add_memory_driver_managed cannot select the preferred online type (ZONE_NORMAL vs ZONE_MOVABLE), requiring it to hot-add memory as offline blocks, and then follow up by onlining each memory block individually. Most drivers prefer the system default, but the CXL driver wants to plumb a preferred policy through the dax kmem driver. Refactor APIs to add a new interface which allows the dax kmem module to select a preferred policy. Overriding the configured auto-online policy is only safe for known in-tree modules, where we know the override reflects a different, user-requested policy. We do not want arbitrary out-of-tree drivers silently overriding the system-wide onlining policy, so restrict the new interface to the kmem module using EXPORT_SYMBOL_FOR_MODULES() rather than a plain EXPORT_SYMBOL_GPL(). Other in-tree modules (e.g. cxl_core) can be added to the allowed list as the need arises. Refactor add_memory_driver_managed, extract __add_memory_driver_managed - Add proper kernel-doc for add_memory_driver_managed while refactoring - New helper accepts an explicit online_type. - New helper validates online_type is between OFFLINE and ONLINE_MOVABLE Refactor: add_memory_resource, extract __add_memory_resource - new helper accepts an explicit online_type Original APIs now explicitly pass the system-default to new helpers. No functional change for existing users. Cc: Oscar Salvador Cc: Andrew Morton Acked-by: David Hildenbrand (Arm) Signed-off-by: Gregory Price --- include/linux/memory_hotplug.h | 3 ++ mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++----- 2 files changed, 56 insertions(+), 8 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index f059025f8f8b..d3edeb80aadb 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -294,6 +294,9 @@ extern int __add_memory(int nid, u64 start, u64 size, m= hp_t mhp_flags); extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags); extern int add_memory_resource(int nid, struct resource *resource, mhp_t mhp_flags); +int __add_memory_driver_managed(int nid, u64 start, u64 size, + const char *resource_name, mhp_t mhp_flags, + enum mmop online_type); extern int add_memory_driver_managed(int nid, u64 start, u64 size, const char *resource_name, mhp_t mhp_flags); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 494257054095..7d145217adc6 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1494,10 +1494,10 @@ static int create_altmaps_and_memory_blocks(int nid= , struct memory_group *group, * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) +static int __add_memory_resource(int nid, struct resource *res, mhp_t mhp_= flags, + enum mmop online_type) { struct mhp_params params =3D { .pgprot =3D pgprot_mhp(PAGE_KERNEL) }; - enum mmop online_type =3D mhp_get_default_online_type(); enum memblock_flags memblock_flags =3D MEMBLOCK_NONE; struct memory_group *group =3D NULL; u64 start, size; @@ -1585,7 +1585,7 @@ int add_memory_resource(int nid, struct resource *res= , mhp_t mhp_flags) merge_system_ram_resource(res); =20 /* online pages if requested */ - if (mhp_get_default_online_type() !=3D MMOP_OFFLINE) + if (online_type !=3D MMOP_OFFLINE) walk_memory_blocks(start, size, &online_type, online_memory_block); =20 @@ -1603,7 +1603,13 @@ int add_memory_resource(int nid, struct resource *re= s, mhp_t mhp_flags) return ret; } =20 -/* requires device_hotplug_lock, see add_memory_resource() */ +int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) +{ + return __add_memory_resource(nid, res, mhp_flags, + mhp_get_default_online_type()); +} + +/* requires device_hotplug_lock, see __add_memory_resource() */ int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags) { struct resource *res; @@ -1631,7 +1637,15 @@ int add_memory(int nid, u64 start, u64 size, mhp_t m= hp_flags) } EXPORT_SYMBOL_GPL(add_memory); =20 -/* +/** + * __add_memory_driver_managed - add driver-managed memory with explicit o= nline_type + * @nid: NUMA node ID where the memory will be added + * @start: Start physical address of the memory range + * @size: Size of the memory range in bytes + * @resource_name: Resource name in format "System RAM ($DRIVER)" + * @mhp_flags: Memory hotplug flags + * @online_type: Auto-Online behavior (offline, online, kernel, movable) + * * Add special, driver-managed memory to the system as system RAM. Such * memory is not exposed via the raw firmware-provided memmap as system * RAM, instead, it is detected and added by a driver - during cold boot, @@ -1639,6 +1653,7 @@ EXPORT_SYMBOL_GPL(add_memory); * * Reasons why this memory should not be used for the initial memmap of a * kexec kernel or for placing kexec images: + * * - The booting kernel is in charge of determining how this memory will be * used (e.g., use persistent memory as system RAM) * - Coordination with a hypervisor is required before this memory @@ -1651,9 +1666,12 @@ EXPORT_SYMBOL_GPL(add_memory); * * The resource_name (visible via /proc/iomem) has to have the format * "System RAM ($DRIVER)". + * + * Return: 0 on success, negative error code on failure. */ -int add_memory_driver_managed(int nid, u64 start, u64 size, - const char *resource_name, mhp_t mhp_flags) +int __add_memory_driver_managed(int nid, u64 start, u64 size, + const char *resource_name, mhp_t mhp_flags, + enum mmop online_type) { struct resource *res; int rc; @@ -1663,6 +1681,9 @@ int add_memory_driver_managed(int nid, u64 start, u64= size, resource_name[strlen(resource_name) - 1] !=3D ')') return -EINVAL; =20 + if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE) + return -EINVAL; + lock_device_hotplug(); =20 res =3D register_memory_resource(start, size, resource_name); @@ -1671,7 +1692,7 @@ int add_memory_driver_managed(int nid, u64 start, u64= size, goto out_unlock; } =20 - rc =3D add_memory_resource(nid, res, mhp_flags); + rc =3D __add_memory_resource(nid, res, mhp_flags, online_type); if (rc < 0) release_memory_resource(res); =20 @@ -1679,6 +1700,30 @@ int add_memory_driver_managed(int nid, u64 start, u6= 4 size, unlock_device_hotplug(); return rc; } +EXPORT_SYMBOL_FOR_MODULES(__add_memory_driver_managed, "kmem"); + +/** + * add_memory_driver_managed - add driver-managed memory + * @nid: NUMA node ID where the memory will be added + * @start: Start physical address of the memory range + * @size: Size of the memory range in bytes + * @resource_name: Resource name in format "System RAM ($DRIVER)" + * @mhp_flags: Memory hotplug flags + * + * Add driver-managed memory with the system default online type set by + * build config or kernel boot parameter. + * + * See __add_memory_driver_managed for more details. + * + * Return: 0 on success, negative error code on failure. + */ +int add_memory_driver_managed(int nid, u64 start, u64 size, + const char *resource_name, mhp_t mhp_flags) +{ + return __add_memory_driver_managed(nid, start, size, resource_name, + mhp_flags, + mhp_get_default_online_type()); +} EXPORT_SYMBOL_GPL(add_memory_driver_managed); =20 /* --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2110234A799 for ; Fri, 5 Jun 2026 21:19:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694368; cv=none; b=F/fCTf1uP3kdK5X5enQNNe6YmvgSP9yTp2KqqK7dxFQSayoBNIOxLh4Yqkuzw/tk1LBuH55fiBwGYtpcSssv1lcd29UHRBes/4mqYmfnIvqGrCEqeUPogasMzkyAGKHXRqDxa4o7OaQtaatmY4lTOxIK2YMPrkCQlbx1x+ngPzg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694368; c=relaxed/simple; bh=8XSA39MWql4+HD8uM30BGvMeffgmWZcxL5DHKZ0ylW0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CHKDsUbRqPOPP3MM3qbsW9kaBcMCg45SnRXabqogLJDEFk5JSE3LJn/C3GUjG5oxQyJ4tE9/o3pBsZgAU3dYcbAvQf1fWo6XoVE1/pCaLL7vHDQ7arsGay/rO8jUE2HjFQ2oC5iqpZRmYo+1ia8B47JgpU/N7d50yofgDOrrFew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=Gj0fcZsD; arc=none smtp.client-ip=209.85.219.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="Gj0fcZsD" Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-8ce65629acaso27119316d6.3 for ; Fri, 05 Jun 2026 14:19:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694366; x=1781299166; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JknNzkAaj+qc6G+hjBE1CQp9oF9WvErr7R4W+FLnrFQ=; b=Gj0fcZsDiDxYJ9+4rWV4Cob6ElkJ5jbEO3soIb8/DM3qp2ssQC2+2r0lYbMGxHVfN7 C0jFB7YOECtUtSQSiKYTXG5rpj8rbjAHOeMR6BJmi3ad3LMMB79A+ZBOykao5x8V+l+g MavV1k/YHmzSO99eW2/Nn97OD7N+9usPpRSQeVdWH9v2tiueKl8mt6oFT4TVaFGQ61CI bS7YDjLZTmgTGTrRrMCXfqbQO8jHPuKvY5yyGDWNLUbDgR4ngkq6glZ1UNpV3v58CNgX +yFh2LSyjaPNkKmI4OGIk23GjQfsWSi5Qqu8ph7beRdjpEu05/w9BgTtyOEOrCQP1cCW M6RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694366; x=1781299166; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=JknNzkAaj+qc6G+hjBE1CQp9oF9WvErr7R4W+FLnrFQ=; b=EiarmjjFsleW+fMHOeKm/8pbnvM5joyKGyJRlEYsiCmodgRm1UwAIWt3R7nYOctHUZ oUqRB9WYjH14z/ZmxonnqyLFBCQmSNgvty0nxapjr92D12OtmliVmwaLxgJcoYeaVMLt Emm2zP3p48sDk+zER/eLpNqJ9xWMIwQGeixw0QCKjkXilGVeu+yR2+GzXN9gxEM+j8Fx y0XoL/AT0loyUzYqF0ukSuTKq4fBireCaMZEtJ3qVny/eTxhKyQY0deOgRU82Al4X0LO GneDf98MKurvZo+uh0P1gd6G9v3bSRbhNLNA9o6gLRd/6cIpUqA9adWbnssaY/SOOH56 WtBw== X-Gm-Message-State: AOJu0YwGecQwHZqYtmv58EbXAgKC/5fktRR7dtH+ERyPLHv9F5gm6zee LX41IN9YH6qGTHsv8gOq166cV0at9lFBzK2w08HcZ8FZc6y9phwXP+0AS7jbZoR3hBY= X-Gm-Gg: Acq92OFYXTB2z+bznfcbDZX8yiS44sytrH0gtJHY5HzV61kGjjSgWdTVNlmceD0EGSD 2GlM9tW0C1Gh0T/rL0mw9WcIx+IFh1M34Vm/IC2+YZyaYM8D49At1ZZ+syVVYaqIgyuwqWlhfxw c75tweeFESfjWe5gUoUXFrCkId1U+4y+kQpLMSSLPE3rcTJlw5wNslQATwrNGaCE91jDUSutI60 9yflix8wBBrrDxe6Nui5pUpBjLIedgrju7mftNdYX/PjIJvl54llcdjPbm1kZBTvZWY9rA/Kwk5 U38FJdhWJXzjnlen/G3SNTQRroQRiIq7eEWoHl7yRG21Y/LUFKJGx1QGDmoU8mdwi8LEnt0wLOH P3NGJCw2qMMpv7sK90e9tKCTd6sEtBiog2gYIULzqO8bn7lxyVvV+F7gmvqYe6hGcNGaGlT4EAE lTgS4e6pz7ihJIUT/zAJXqd/xNV/sWPCzN235yvgKdXKSBj6lEQLXguyl+YLwrqOMom7PVI68aE 89KhO5kRbj87Yy362CkUxQ= X-Received: by 2002:a05:6214:4ed0:b0:8cc:dd20:7a46 with SMTP id 6a1803df08f44-8cee6137878mr86899206d6.37.1780694366285; Fri, 05 Jun 2026 14:19:26 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:25 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug Date: Fri, 5 Jun 2026 22:19:07 +0100 Message-ID: <20260605211911.2160954-6-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" offline_and_remove_memory() handles a single contiguous range. Callers that manage a device composed of several ranges (e.g. dax/kmem) currently have to call it in a loop, which gives up atomicity. This creates a race condition where another daemon can online a block that was just offlined while other blocks are being offlined, causing the eventual (original) unplug operation to fail. Add offline_and_remove_memory_ranges(), which takes an array of ranges and processes them as one operation under a single lock_device_hotplug(): - Phase 1 offlines every block of every range, remembering each block's previous online type. - Phase 2 removes the ranges only once all of them are offline. - If any offline fails, the offlining done so far is reverted and nothing is removed. This gives callers all-or-nothing semantics for the offline step, so a failed or interrupted unplug leaves every range online as before rather than in an inconsistent partially-removed state. Suggested-by: David Hildenbrand (Arm) Signed-off-by: Gregory Price --- include/linux/memory_hotplug.h | 7 +++ mm/memory_hotplug.c | 95 ++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d3edeb80aadb..7f1da7c428dc 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -267,6 +267,7 @@ extern int offline_pages(unsigned long start_pfn, unsig= ned long nr_pages, extern int remove_memory(u64 start, u64 size); extern void __remove_memory(u64 start, u64 size); extern int offline_and_remove_memory(u64 start, u64 size); +int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ra= nges); =20 #else static inline void try_offline_node(int nid) {} @@ -283,6 +284,12 @@ static inline int remove_memory(u64 start, u64 size) } =20 static inline void __remove_memory(u64 start, u64 size) {} + +static inline int offline_and_remove_memory_ranges(const struct range *ran= ges, + int nr_ranges) +{ + return -EBUSY; +} #endif /* CONFIG_MEMORY_HOTREMOVE */ =20 #ifdef CONFIG_MEMORY_HOTPLUG diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7d145217adc6..e486d35c22b2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -2483,4 +2483,99 @@ int offline_and_remove_memory(u64 start, u64 size) return rc; } EXPORT_SYMBOL_GPL(offline_and_remove_memory); + +/** + * offline_and_remove_memory_ranges - offline and remove multiple memory r= anges + * @ranges: array of physical address ranges to offline and remove + * @nr_ranges: number of entries in @ranges + * + * Offline and remove several memory ranges as one operation, serialized + * against other hotplug operations by a single lock_device_hotplug(). + * + * Unlike calling offline_and_remove_memory() in a loop, this offlines *al= l* + * ranges before removing any of them. If offlining any range fails, the + * offlining of the ranges processed so far is reverted and nothing is + * removed, leaving every range online as it was before the call. This gi= ves + * callers all-or-nothing semantics for the offline step, so a failed unpl= ug + * does not leave a device split between online and removed ranges. + * + * Each range must be memory-block aligned in start and size. + * + * Return: 0 on success, negative errno otherwise. On failure no range has + * been removed. + */ +int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ra= nges) +{ + unsigned long mb_total =3D 0; + uint8_t *online_types, *tmp; + int i, rc =3D 0; + + if (!ranges || nr_ranges <=3D 0) + return -EINVAL; + + for (i =3D 0; i < nr_ranges; i++) { + u64 start =3D ranges[i].start; + u64 size =3D range_len(&ranges[i]); + + if (!IS_ALIGNED(start, memory_block_size_bytes()) || + !IS_ALIGNED(size, memory_block_size_bytes()) || !size) + return -EINVAL; + mb_total +=3D size / memory_block_size_bytes(); + } + + /* + * Remember the old online type of every memory block across all ranges, + * so we can revert if offlining a later block fails. All entries start + * as MMOP_OFFLINE so blocks we never touched are skipped on rollback. + */ + online_types =3D kmalloc_array(mb_total, sizeof(*online_types), + GFP_KERNEL); + if (!online_types) + return -ENOMEM; + memset(online_types, MMOP_OFFLINE, mb_total); + + lock_device_hotplug(); + + /* Phase 1: offline every block in every range. */ + tmp =3D online_types; + for (i =3D 0; i < nr_ranges; i++) { + rc =3D walk_memory_blocks(ranges[i].start, range_len(&ranges[i]), + &tmp, try_offline_memory_block); + if (rc) + break; + } + + /* + * Phase 2: only once everything is offline, remove it. This cannot + * fail as the memory can no longer be onlined in the meantime. + */ + if (!rc) { + for (i =3D 0; i < nr_ranges; i++) { + rc =3D try_remove_memory(ranges[i].start, + range_len(&ranges[i])); + if (rc) { + pr_err("%s: Failed to remove memory: %d", + __func__, rc); + break; + } + } + } + + /* + * Roll back the offlining if anything failed. Blocks we never offlined + * are marked MMOP_OFFLINE and skipped by try_reonline_memory_block(). + */ + if (rc) { + tmp =3D online_types; + for (i =3D 0; i < nr_ranges; i++) + walk_memory_blocks(ranges[i].start, + range_len(&ranges[i]), &tmp, + try_reonline_memory_block); + } + unlock_device_hotplug(); + + kfree(online_types); + return rc; +} +EXPORT_SYMBOL_GPL(offline_and_remove_memory_ranges); #endif /* CONFIG_MEMORY_HOTREMOVE */ --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01BB63CBE80 for ; Fri, 5 Jun 2026 21:19:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694370; cv=none; b=ketOmcLOUisFtmLoWjzUpGkJp+yBzgDW0qk0O2XfVvSEl57y0ZoBfJPROpcEPBNnF7MEI2iahaOdzoVWq/RnnB12QVRyZnOYkeS7ZcvXhWGWYTfatXvzGi5Kc71U+yfmRDjkOdUGpr1LFdfIeK+ycTeAENcCoSuiTLUeFkHfj50= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694370; c=relaxed/simple; bh=1zTws9fmQmhgRciJvlP2AWzY+uWAfxRF9YfrcryFsNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MA9y4eqJL5ZhYseoYPwSQnCtLIKeyz0kxwxq69yij6BBlqI/IcFxLcMgoHSr8P++eWqL7NQbSwGGDv1ZfnxwwBpQ6fwyreAsCw5VbK5ID+nEwe9nbuhLAZkU8fbhAxBOS2x5UBWVyhQlDLkVO2gY7BFed6Vp1KM0nO0aaqNdvHE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=gqFaGE0K; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="gqFaGE0K" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-5176b9c476aso18250381cf.3 for ; Fri, 05 Jun 2026 14:19:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694368; x=1781299168; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P2vGSHo9hdkHiErFpk6jdcKlA83TnjxiiQnZVrYQBSY=; b=gqFaGE0K9vKJMVXyKCgcKJk8t+bfz18rdbV9SZqyxylippjH89EXDmZoioIHwAOTf/ jL9GS0UPHGtDknh1gzcNX76TNl5JIVQuyPoSBfHR/YGkc+yWZTVH6u39DKjAnekHdYo+ IOpLWg4KwHwo1m/qWoArd81Su5scesilDhxxPwZTWyEK3Uk34Wvs97A/Ud2YIdUqgZwB UB3J19z0c8v3L659nIzUMgW9vZXSEUDO9+el+Cqvkth2vgmVvVTZBmcGB53b0bPlVD7E njUnQMfnCIADmD3TEpnfcx/K+mqrNf5LxU97cCTPNRN7jM5ckywzpo+vvWoL9amzStry MqYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694368; x=1781299168; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=P2vGSHo9hdkHiErFpk6jdcKlA83TnjxiiQnZVrYQBSY=; b=Awfi5TughyPPVIbF9kZ0U0Mm/RxcxB3mNreM5R26YMk09cinkVuZuuIgIKVwqGUwpa /tmS81OQhahRC22iL9vuapjxJKz1D6jBBSLlZJguNpTfNAnQI+vVa0tqsjWBln05DJ0H ZZ6yH8nNXCQvnQwXZ8iljgWF6bxX7GV7aPR5upWKrhCyKIbnbCNnwWmlWAUHCKJRt8TV Bx1FbRMcwoBdCC35dBGwrBGHnUhZMqYa74JxORAeuBG8eZznYUgcMHERRvXRn5xqBV9y Tf5ZZ/WPB7eqZcLrLaQ4qHEiN0IkUnMgwp7gb8H5Qkug+hUqbtOuFLnSc7VWXUnt8WnY J4cQ== X-Gm-Message-State: AOJu0YwqeDhuIJDyQ2ACNW+3Oazp3mcnwlWQFKNnjLgPBIctfwRN6mH2 pgnARN0nRb1H2FYLbTlCNvJOJC5ZSKgwebTxiwx685FotMGPcByJ1vK/liksLHyVcYU= X-Gm-Gg: Acq92OEt5kMaSnNP7jVj0ZR/vyx5NO+EN0wT/MGmorMkv6SbSObPf6xtqc4LrAMgOgc lHjF9RxRhfJEv+z3Th4xGcVYcGHCwzjmeaVhzJFG0f3y4EQrg8lPpHgqL7GRS8nj+T4Q5UIncTd XKyJYS85O9yxddPRnOXA5bplnrcfcaJxAfXqD3vOP4gwhJqfsLyffbAI01CrolVfRMfxyBW6DyF lLByl+BbANCNLh7EGR/aujbokFnTGo1kfqUE0jAfh1vBbvfWd+hOkK5Qz7uCNBfbVFeAVYIvPFy 8VLEDGxp0gkRlh/zfCIfLSu0LoFcNIM5ZlZDD1R6lcx9ufnd9G8EjCTEnxghV4+pcmgdLqCID/3 /XsDn0e3H+9u56hucCQz0k3SevPBzPsOk2urN5VhGASbzk/hRZXRdg0x0rBe5ktlZaQ4u8tCAXv ld6Ke4Dr1sB9lhc8OHlJKgsPoTKkOpVZBIxJCVWpUTUUB6/kAAZcjWJU6qBZgsgRW3zeLBrFHHi dB0zNoJvQleA6JMT1b0jJee1KXKiPPqNA== X-Received: by 2002:a05:622a:514:b0:50f:6415:1eb4 with SMTP id d75a77b69052e-51795bbf29bmr79531471cf.49.1780694367941; Fri, 05 Jun 2026 14:19:27 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:27 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 6/9] dax: plumb hotplug online_type through dax Date: Fri, 5 Jun 2026 22:19:08 +0100 Message-ID: <20260605211911.2160954-7-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is no way for drivers leveraging dax_kmem to plumb through a preferred auto-online policy - the system default policy is forced. Add 'enum mmop' field to DAX device creation path to allow drivers to specify an auto-online policy when using the kmem driver. Current callers initialize online_type to mhp_get_default_online_type() to retain backward compatibility and to make explicit to the drivers what is actually happening underneath. No functional changes to existing callers. Signed-off-by: Gregory Price --- drivers/dax/bus.c | 3 +++ drivers/dax/bus.h | 2 ++ drivers/dax/cxl.c | 1 + drivers/dax/dax-private.h | 3 +++ drivers/dax/hmem/hmem.c | 1 + drivers/dax/kmem.c | 5 +++-- drivers/dax/pmem.c | 1 + 7 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 492573b47f66..6611fe399f59 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2017-2018 Intel Corporation. All rights reserved. */ #include +#include #include #include #include @@ -394,6 +395,7 @@ static ssize_t create_store(struct device *dev, struct = device_attribute *attr, .size =3D 0, .id =3D -1, .memmap_on_memory =3D false, + .online_type =3D mhp_get_default_online_type(), }; struct dev_dax *dev_dax =3D __devm_create_dev_dax(&data); =20 @@ -1527,6 +1529,7 @@ static struct dev_dax *__devm_create_dev_dax(struct d= ev_dax_data *data) ida_init(&dev_dax->ida); =20 dev_dax->memmap_on_memory =3D data->memmap_on_memory; + dev_dax->online_type =3D data->online_type; =20 inode =3D dax_inode(dax_dev); dev->devt =3D inode->i_rdev; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 5909171a4428..c53a9427f8e4 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -3,6 +3,7 @@ #ifndef __DAX_BUS_H__ #define __DAX_BUS_H__ #include +#include #include #include #include @@ -26,6 +27,7 @@ struct dev_dax_data { resource_size_t size; int id; bool memmap_on_memory; + enum mmop online_type; }; =20 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data); diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 3ab39b77843d..0eaef700bb2a 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -27,6 +27,7 @@ static int cxl_dax_region_probe(struct device *dev) .id =3D -1, .size =3D range_len(&cxlr_dax->hpa_range), .memmap_on_memory =3D true, + .online_type =3D mhp_get_default_online_type(), }; =20 return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data)); diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 81e4af49e39c..0787325bc8dd 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -8,6 +8,7 @@ #include #include #include +#include =20 /* private routines between core files */ struct dax_device; @@ -79,6 +80,7 @@ struct dev_dax_range { * @dev: device core * @pgmap: pgmap for memmap setup / lifetime (driver owned) * @memmap_on_memory: allow kmem to put the memmap in the memory + * @online_type: MMOP_* online type for memory hotplug * @nr_range: size of @ranges * @ranges: range tuples of memory used */ @@ -95,6 +97,7 @@ struct dev_dax { struct device dev; struct dev_pagemap *pgmap; bool memmap_on_memory; + enum mmop online_type; int nr_range; struct dev_dax_range *ranges; }; diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index af21f66bf872..0ef6e9ae660d 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -37,6 +37,7 @@ static int dax_hmem_probe(struct platform_device *pdev) .id =3D -1, .size =3D region_idle ? 0 : range_len(&mri->range), .memmap_on_memory =3D false, + .online_type =3D mhp_get_default_online_type(), }; =20 return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data)); diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 592171ec10f4..41ccb618a146 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -172,8 +172,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) * Ensure that future kexec'd kernels will not treat * this as RAM automatically. */ - rc =3D add_memory_driver_managed(data->mgid, range.start, - range_len(&range), kmem_name, mhp_flags); + rc =3D __add_memory_driver_managed(data->mgid, range.start, + range_len(&range), kmem_name, mhp_flags, + dev_dax->online_type); =20 if (rc) { dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index bee93066a849..a5f987814da5 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -63,6 +63,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *de= v) .pgmap =3D &pgmap, .size =3D range_len(&range), .memmap_on_memory =3D false, + .online_type =3D mhp_get_default_online_type(), }; =20 return devm_create_dev_dax(&data); --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1AF23C5838 for ; Fri, 5 Jun 2026 21:19:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694373; cv=none; b=TIi7Q2SF1Wd07NuL3khr7LXc4auq2RZxtMrq4YzD6NnQc7SLwGEkt07R3kcjMyk9TN4bcZXGoa+fAZB8UDZ90SCS8ETAMNJv2J+UwnhaOY6SL0ru0cmZxr6GIT99N+gKfuSX32F9PRTBa9QrVDZcg0PGKloUXXnl/y97njFxkIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694373; c=relaxed/simple; bh=c8EfnShEsA2QY6sUET2ohyxfsCseiWwTIpKTlBZ8T0I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ersfm0iCZuiAhtKm9M9KJrX71Hd0Q/h16rofHH1XJwZhVsk9NImZpcHfRIEpNoSajvOiih9o6r2huN3yQMHZsi3BSwes7f35vpgby8izY3ja5tpyzDs5pKuUb7KTJThXsGJJat6BFoEQogHfm3sZl/iDDZsSDXpLWE71CWfnT+c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=R+xmGtzm; arc=none smtp.client-ip=209.85.219.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="R+xmGtzm" Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-8cceaa6f75bso34564906d6.0 for ; Fri, 05 Jun 2026 14:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694370; x=1781299170; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1H+lnA3hHkYKrmCU4qT2yVGlOrCCjEQC0F6y9K5Vd5I=; b=R+xmGtzmEbiPGSIwDeQvYT8cYQ1x7jZ/2kGiuxwDMjb30TG60ccR6KlSMdAaZGnUTI 1k9ppTQilPSxu6ygzg2N0/GcLxsnqplYU2m6tIGJm5OOSWdKbeDlWHXFAJrko4xb1dAw 1SvcVPvreJziOF14YEuwVj4TG7IoEF1yy7ilfQ7/X7UjKZeq1ipoHRb+bBsFCBEvl8Gv mGpNPTQPtLR6+q/Zaq9S/YDuw9ehu05PHJ0CnoNNQ5DuUjfNYaWeA2tk7agEIQEmECSZ iB/CSqQG/WTYVMv8yppr29BWJ0cU5mAxU3YlBU3uC5nDiQBc3Vo00n27agWIhtHIRTza 0vsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694370; x=1781299170; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=1H+lnA3hHkYKrmCU4qT2yVGlOrCCjEQC0F6y9K5Vd5I=; b=cGsQ3cJixR4r14WzleiwHqEglLOAGouE6b3PKfjlltNpKXzKm5J8Hfe/wFeGLX5Ocs i+ngYxMazuqy2A1mCa3q42tEfHaVOsETzoJ3OJ5DSMEy2dZ0zE2NlgWMgygL4KWDWJBT VvMD4wTzo9EEjze5klPxAq/nROFm3knUg3eiKzR/UUinJvYSeJHgLaPPI5ZvxwC/pZwK uy4Dd+Yor0KJVwrvJERPKXV7xMiSEW3C3fdOzrylyL56dASjdQxVx3cXN1TeoGE+n45/ XDqOyI2ShVEaewkmTxqf25wgqxLXuV+PuR8I+fBz8RsGgi4FRshrXh0Z59gCDORAawQx +c5A== X-Gm-Message-State: AOJu0YxSHlISvAbAiSP1CtL5GxwX01OGTtdwpAfi8Ng2B+VOkv97FTH2 bE/RbJZBJivGZmDOHw6W/VS+G6PtwYnwHeUu74m1skXFB0+4M14gWJyJ6tXrAJsB4qw= X-Gm-Gg: Acq92OFLkPPiLrZzTW23W5izVAEnhdPIYOJlkmrVKSBx1hTGaLmyIkZ6LaLzHK9LQ9e ewH+QN3wUnOLR/tSL7ZCh5ZIEAncUsWpfBtuz5OkipEa9YeIC86rxVgnny2+F0r49nvpHh6qxPL KS6+3gXlETCT30RlI/ehGDzwvsqS+0CYMu8GTxQuImA0BdMCtez4nKaxuvJzqXfO5Uh5Kg27548 vG9PC+jzRDGn6r5PsZR0vHVn+OPzLopN6Ez7nOcy4t+SW38aIRUYdvf5DNX3zak4iTeso67Ttwt hsB3YVrG0jQ5HRLioDOzw2sNipZ0rJ2d//B8qs/9D3g/jtLV0FgesCgAYyD/5naXE35/060GCqL dD0bCYa5QRRtV0ZKTxXBjivYCrY83KWsbPYIrJShe9u10k5/Qt9F2es0Bgsoa7dgUwWc3djiUrn Rfj5WMt0PTUQ9OOywQ5idbL45xxP73UxybvslHrYmy4AAMjoX/4AHzPmA84JWltBZBnSeesT5RP behr4X/iMHfX6TNpH4nOXs7aGbPl9Kzlg== X-Received: by 2002:a05:6214:55ca:b0:8cc:f3e7:7c1d with SMTP id 6a1803df08f44-8cee61473ebmr73558816d6.32.1780694369678; Fri, 05 Jun 2026 14:19:29 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:29 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions Date: Fri, 5 Jun 2026 22:19:09 +0100 Message-ID: <20260605211911.2160954-8-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor kmem _probe() _remove() by extracting init, cleanup, hotplug, and hot-remove logic into separate helper functions: - dax_kmem_init_resources: inits IO_RESOURCE w/ request_mem_region - dax_kmem_cleanup_resources: cleans up initialized IO_RESOURCE - dax_kmem_do_hotplug: handles memory region reservation and adding - dax_kmem_do_hotremove: handles memory removal and resource cleanup This is a pure refactoring with no functional change. The helpers will enable future extensions to support more granular control over memory hotplug operations. We need to split hotplug/remove and init/cleanup in order to have the resources available for hot-add. Otherwise, when probe occurs, the dax devices are never added to sysfs because the resources are never registered. Signed-off-by: Gregory Price --- drivers/dax/kmem.c | 315 +++++++++++++++++++++++++++++++-------------- 1 file changed, 215 insertions(+), 100 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 41ccb618a146..5bf36ab73f86 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -63,14 +63,195 @@ static void kmem_put_memory_types(void) mt_put_memory_types(&kmem_memory_types); } =20 +/** + * dax_kmem_do_hotplug - hotplug memory for dax kmem device + * @dev_dax: the dev_dax instance + * @data: the dax_kmem_data structure with resource tracking + * + * Hotplugs all ranges in the dev_dax region as system memory. + * + * Returns the number of successfully mapped ranges, or negative error. + */ +static int dax_kmem_do_hotplug(struct dev_dax *dev_dax, + struct dax_kmem_data *data, + int online_type) +{ + struct device *dev =3D &dev_dax->dev; + int i, rc, onlined =3D 0; + mhp_t mhp_flags; + + for (i =3D 0; i < dev_dax->nr_range; i++) { + struct range range; + + rc =3D dax_kmem_range(dev_dax, i, &range); + if (rc) + continue; + + mhp_flags =3D MHP_NID_IS_MGID; + if (dev_dax->memmap_on_memory) + mhp_flags |=3D MHP_MEMMAP_ON_MEMORY; + + /* + * Ensure that future kexec'd kernels will not treat + * this as RAM automatically. + */ + rc =3D __add_memory_driver_managed(data->mgid, range.start, + range_len(&range), kmem_name, mhp_flags, + online_type); + + if (rc) { + dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", + i, range.start, range.end); + /* + * Release the reservation for the range that failed to + * add so a later hotremove does not try to remove memory + * that was never added. + */ + if (data->res[i]) { + remove_resource(data->res[i]); + kfree(data->res[i]); + data->res[i] =3D NULL; + } + if (onlined) + continue; + return rc; + } + onlined++; + } + + return onlined; +} + +/** + * dax_kmem_init_resources - create memory regions for dax kmem + * @dev_dax: the dev_dax instance + * @data: the dax_kmem_data structure with resource tracking + * + * Initializes all the resources for the DAX + * + * Returns the number of successfully mapped ranges, or negative error. + */ +static int dax_kmem_init_resources(struct dev_dax *dev_dax, + struct dax_kmem_data *data) +{ + struct device *dev =3D &dev_dax->dev; + int i, rc, mapped =3D 0; + + for (i =3D 0; i < dev_dax->nr_range; i++) { + struct resource *res; + struct range range; + + rc =3D dax_kmem_range(dev_dax, i, &range); + if (rc) + continue; + + /* Skip ranges already added */ + if (data->res[i]) + continue; + + /* Region is permanently reserved if hotremove fails. */ + res =3D request_mem_region(range.start, range_len(&range), + data->res_name); + if (!res) { + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n", + i, range.start, range.end); + /* + * Once some memory has been onlined we can't + * assume that it can be un-onlined safely. + */ + if (mapped) + continue; + return -EBUSY; + } + data->res[i] =3D res; + /* + * Set flags appropriate for System RAM. Leave ..._BUSY clear + * so that add_memory() can add a child resource. Do not + * inherit flags from the parent since it may set new flags + * unknown to us that will break add_memory() below. + */ + res->flags =3D IORESOURCE_SYSTEM_RAM; + mapped++; + } + return mapped; +} + +#ifdef CONFIG_MEMORY_HOTREMOVE +/** + * dax_kmem_do_hotremove - hot-remove memory for dax kmem device + * @dev_dax: the dev_dax instance + * @data: the dax_kmem_data structure with resource tracking + * + * Removes all ranges in the dev_dax region. + * + * Returns the number of successfully removed ranges. + */ +static int dax_kmem_do_hotremove(struct dev_dax *dev_dax, + struct dax_kmem_data *data) +{ + struct device *dev =3D &dev_dax->dev; + int i, success =3D 0; + + for (i =3D 0; i < dev_dax->nr_range; i++) { + struct range range; + int rc; + + rc =3D dax_kmem_range(dev_dax, i, &range); + if (rc) + continue; + + /* range was never added during probe, count as removed */ + if (!data->res[i]) { + success++; + continue; + } + + rc =3D remove_memory(range.start, range_len(&range)); + if (rc =3D=3D 0) { + /* Release the resource for the successfully removed range */ + remove_resource(data->res[i]); + kfree(data->res[i]); + data->res[i] =3D NULL; + success++; + continue; + } + any_hotremove_failed =3D true; + dev_err(dev, "mapping%d: %#llx-%#llx hotremove failed\n", + i, range.start, range.end); + } + + return success; +} +#endif /* CONFIG_MEMORY_HOTREMOVE */ + +/** + * dax_kmem_cleanup_resources - remove the dax memory resources + * @dev_dax: the dev_dax instance + * @data: the dax_kmem_data structure with resource tracking + * + * Removes all resources in the dev_dax region. + */ +static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax, + struct dax_kmem_data *data) +{ + int i; + + for (i =3D 0; i < dev_dax->nr_range; i++) { + if (!data->res[i]) + continue; + remove_resource(data->res[i]); + kfree(data->res[i]); + data->res[i] =3D NULL; + } +} + static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { struct device *dev =3D &dev_dax->dev; unsigned long total_len =3D 0, orig_len =3D 0; struct dax_kmem_data *data; struct memory_dev_type *mtype; - int i, rc, mapped =3D 0; - mhp_t mhp_flags; + int i, rc; int numa_node; int adist =3D MEMTIER_DEFAULT_DAX_ADISTANCE; =20 @@ -132,68 +313,25 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) goto err_reg_mgid; data->mgid =3D rc; =20 - for (i =3D 0; i < dev_dax->nr_range; i++) { - struct resource *res; - struct range range; - - rc =3D dax_kmem_range(dev_dax, i, &range); - if (rc) - continue; - - /* Region is permanently reserved if hotremove fails. */ - res =3D request_mem_region(range.start, range_len(&range), data->res_nam= e); - if (!res) { - dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n", - i, range.start, range.end); - /* - * Once some memory has been onlined we can't - * assume that it can be un-onlined safely. - */ - if (mapped) - continue; - rc =3D -EBUSY; - goto err_request_mem; - } - data->res[i] =3D res; - - /* - * Set flags appropriate for System RAM. Leave ..._BUSY clear - * so that add_memory() can add a child resource. Do not - * inherit flags from the parent since it may set new flags - * unknown to us that will break add_memory() below. - */ - res->flags =3D IORESOURCE_SYSTEM_RAM; - - mhp_flags =3D MHP_NID_IS_MGID; - if (dev_dax->memmap_on_memory) - mhp_flags |=3D MHP_MEMMAP_ON_MEMORY; - - /* - * Ensure that future kexec'd kernels will not treat - * this as RAM automatically. - */ - rc =3D __add_memory_driver_managed(data->mgid, range.start, - range_len(&range), kmem_name, mhp_flags, - dev_dax->online_type); + dev_set_drvdata(dev, data); =20 - if (rc) { - dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", - i, range.start, range.end); - remove_resource(res); - kfree(res); - data->res[i] =3D NULL; - if (mapped) - continue; - goto err_request_mem; - } - mapped++; - } + rc =3D dax_kmem_init_resources(dev_dax, data); + if (rc < 0) + goto err_resources; =20 - dev_set_drvdata(dev, data); + /* + * Hotplug using the configured online type for this device. + */ + rc =3D dax_kmem_do_hotplug(dev_dax, data, dev_dax->online_type); + if (rc < 0) + goto err_hotplug; =20 return 0; =20 -err_request_mem: +err_hotplug: + dax_kmem_cleanup_resources(dev_dax, data); +err_resources: + dev_set_drvdata(dev, NULL); memory_group_unregister(data->mgid); err_reg_mgid: kfree(data->res_name); @@ -207,7 +345,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) #ifdef CONFIG_MEMORY_HOTREMOVE static void dev_dax_kmem_remove(struct dev_dax *dev_dax) { - int i, success =3D 0; + int success; int node =3D dev_dax->target_node; struct device *dev =3D &dev_dax->dev; struct dax_kmem_data *data =3D dev_get_drvdata(dev); @@ -218,48 +356,25 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_d= ax) * there is no way to hotremove this memory until reboot because device * unbind will succeed even if we return failure. */ - for (i =3D 0; i < dev_dax->nr_range; i++) { - struct range range; - int rc; - - rc =3D dax_kmem_range(dev_dax, i, &range); - if (rc) - continue; - - /* range was never added during probe */ - if (!data->res[i]) { - success++; - continue; - } - - rc =3D remove_memory(range.start, range_len(&range)); - if (rc =3D=3D 0) { - remove_resource(data->res[i]); - kfree(data->res[i]); - data->res[i] =3D NULL; - success++; - continue; - } - any_hotremove_failed =3D true; - dev_err(dev, - "mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n", - i, range.start, range.end); + success =3D dax_kmem_do_hotremove(dev_dax, data); + if (success < dev_dax->nr_range) { + dev_err(dev, "Hotplug regions stuck online until reboot\n"); + return; } =20 - if (success >=3D dev_dax->nr_range) { - memory_group_unregister(data->mgid); - kfree(data->res_name); - kfree(data); - dev_set_drvdata(dev, NULL); - /* - * Clear the memtype association on successful unplug. - * If not, we have memory blocks left which can be - * offlined/onlined later. We need to keep memory_dev_type - * for that. This implies this reference will be around - * till next reboot. - */ - clear_node_memory_type(node, NULL); - } + dax_kmem_cleanup_resources(dev_dax, data); + memory_group_unregister(data->mgid); + kfree(data->res_name); + kfree(data); + dev_set_drvdata(dev, NULL); + /* + * Clear the memtype association on successful unplug. + * If not, we have memory blocks left which can be + * offlined/onlined later. We need to keep memory_dev_type + * for that. This implies this reference will be around + * till next reboot. + */ + clear_node_memory_type(node, NULL); } #else static void dev_dax_kmem_remove(struct dev_dax *dev_dax) --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C10DD305691 for ; Fri, 5 Jun 2026 21:19:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694375; cv=none; b=R+1hX8o67L4p28qUleCxE7QifPkFHFRk0JJXy4pKfpX4pwvcr9lEWElFcuf06IGG/bJdJzQxA5KWk90uRZdLnfOHTCMx5j3nKtSS7Pbda/9mk+IOuk7FJtlOs5iXwVPz7dbUooofbYhQMD+/lQsGXL22O02IaYfMyZuPnt/5D88= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694375; c=relaxed/simple; bh=9CpRZ1PTLXL19tu3viFtgHaeaSck+ZLsJxDbLqcK7oc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qxrBKN4CvNsItxX+S9A6iKxKnidRn3rM8hI/nrKd6i9n2yUKeij8yqtKyEdzE9WmFgOrnvQrjG8cvfuJkMTSJUtcT8ptwHF6/tsN+uYGJRdCD3hFV53UBus2fjLzN6+HAz+cQEHuh1eh6RQnEtv5NbVtXvj9ko5Ezn4dAhy6HMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=F87GPwXQ; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="F87GPwXQ" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-8cce87d7995so24710926d6.0 for ; Fri, 05 Jun 2026 14:19:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694372; x=1781299172; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5exm6bvtYER1f7kjLSkpnggeDTqAdK/YdweX4Bm/Ql8=; b=F87GPwXQfsXc1SuUCDwkQZUWW8QcVlhXub3ii2oYJzASBXVEWw1a9xTa4maiIIE20U Hd1+0mngomDaQkZPDP54EUFzJeCJMTSqn+RKORIZ+WuctW8iAq9RkKTYa/XIowRC9za/ 50UbZWn8BaVylso3UdQgRnSlIoOYAAirKOr39RAp0ZMyX8oj2MHZCD/aICFK6fIEZ9r7 Ab9cTuEjcW/sQOqoq1i4p/H7e95eLk3fis55IV7O33k3oE/dLcFP3QH6U0qyu6GY5CVv VybZgqjCu0dHapQgHF0Niu6ISKST/nTI0FG2yJXRik2/9j5oCjw9tj+dZz6xlsXLEL1j +MLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694372; x=1781299172; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=5exm6bvtYER1f7kjLSkpnggeDTqAdK/YdweX4Bm/Ql8=; b=MOqvJ23BqUMurF5/qZhLAhB3SGVdCfcrlVqxJtiYpSFA50KdBLkTNel4ut4t0pCPRN hx9Y7F5RynQZa/5LqRLgfC89v7ZnzZDb1T4lya5Xg5JmCgCZvhCgB7wlB78ezbTGMFd0 NT8+V7xvNUTW6DRy69gVoZvvfjm9sRAeVOMPNxWZfJaNC6DvOYG0Kmp9+Uwa3Y8qRPGQ H20UuMRR4StpLp2rzcj79TDongASGolEliNAQomyW+9wa2dU3a07kgahYl8zDsZNQxA6 cTB0NujLlWAxrqlXeld4QUFvr39aKvXrslGC2B/u0c9W903eUQQWmcp8seYr+rTx7Lus Q++A== X-Gm-Message-State: AOJu0Yz87xxaww6E7i3/OTNyU3HytWmjBkQXEO0+7WeJZ7tNG6TNAT6n o30QD/VamhrYLd5JY8YU9E32h3JDIho8ED8Hd8eU5YIzGjWomjUzamALlBykFqwvfu2LlPlsFA6 hS2fkPxF7FQ== X-Gm-Gg: Acq92OHYaFRKkjMtU98PgMWymJ6DHzv+3cDPbCksdw83VHC0OaAMLee2VcgyV+M4NaK FSBXghdHub8Pcef9XQBw+wU02bxdD/I6Nf3aNEIGCFz5646xFHZxvQmbOGDZqLoKA3eEw3AGxTy H47qvfSeRs5DRANdv/okl9/GV5jR+c2TBn10b7eAngNAkQ0UmNPl97MwWbrtIY1JCJYhu5RxFlD 6Tgb2QKLTSziaF/aTzA4qxFO+/06S4atw0vVBzh5BOcwVKI35xlH/VJyPp0G714VgePX/liGZKl 3ECHmIGPWORPpPwhSl9oP7UZ4rrN+JWuSLglqqpNKbUxbkNm2poQfYRDpEAuYSp2kKdTdjW4atJ WZMitHPF+jYeRD6db2mZ2yzWkzNiZn6XP7N+2x1u0b2YzKJJPWVetYSo+O2SSar8QG9F8NkD2wx PF8RGeBz79M6znBHP0BWvrN3QZ0o1PjiSUfGIggYoAvtJ6jl8tCUknL4d8jIr9xTS6sGgp1bFSN ggfcRz8dCjiqrbmmr/D7Nw= X-Received: by 2002:ad4:4087:0:b0:8ce:e29b:6a91 with SMTP id 6a1803df08f44-8cee6290932mr70372996d6.42.1780694371381; Fri, 05 Jun 2026 14:19:31 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:31 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com, Hannes Reinecke Subject: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug Date: Fri, 5 Jun 2026 22:19:10 +0100 Message-ID: <20260605211911.2160954-9-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The dax kmem driver currently onlines memory automatically during probe using the system's default online policy but provides no way to control or query the entire region state at runtime. Additionally, there is no atomic mechanism to offline and remove the entire set of memory blocks together. Instead, this is presently done in two steps: (offline all, remove all). This creates a race condition where external entities can operate directly on the blocks and cause hot-unplug to fail. Add a new 'hotplug' sysfs attribute that allows userspace to control and query the entire memory region state. The writable states mirror the per-block /sys/devices/system/memory/memoryX/state ABI: - "unplugged": memory blocks are not present - "online": memory is online, zone chosen by the kernel - "online_kernel": memory is online in ZONE_NORMAL - "online_movable": memory is online in ZONE_MOVABLE The "unplugged" state is new and only applies to kmem/hotplug. Valid transitions: - unplugged -> online[_kernel|_movable] - online | online_kernel | online_movable -> unplugged - offline -> unplugged A device can only be onlined from "unplugged", so it must be returned there before being onlined into a different state. For backwards compatibility the memory blocks are always created at probe: existing tools expect them to be present once the kmem driver binds. When the configured policy (mhp_get_default_online_type()) selects an online state the blocks are onlined into that policy's zone; when the policy is offline the blocks are created but left offline and the device reports the state "offline". "offline" is therefore a reportable state but is not writable: it only arises from the legacy auto_online_blocks=3Doffline policy. Onlining such a device through this attribute requires unplugging it first. The "offline" state may be deprecated later if the memory block ABI changes and userland migrates to using the region-wide hotplug. Unplug is atomic across the whole device: dax_kmem_do_hotremove() collects every added range and offlines/removes them in one operation via offline_and_remove_memory_ranges(). Either all ranges are removed and the device becomes "unplugged", or offlining is rolled back and the device is left fully online, so the reported 'hotplug' state always matches reality. Unbind Note: We used to call remove_memory() during unbind, which would fire a BUG() if any of the memory blocks were online at that time. We lift this into a WARN in the cleanup routine and don't attempt hotremove if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE. Memory that is merely offline (the legacy "offline" state) is removed on unbind as before; only online memory is left pinned. The resources are still leaked but this prevents deadlock on unbind if a memory region happens to be impossible to hotremove. Inconsistency Note: Since memory blocks can still be modified individually, the hotplug attribute can become out of sync with the state of the system if userland software mixes and matches the use of memory_block ABI and kmem/hotplug ABI. It's suggests to use one or the other. Suggested-by: Hannes Reinecke Suggested-by: David Hildenbrand Signed-off-by: Gregory Price --- Documentation/ABI/testing/sysfs-bus-dax | 25 +++ drivers/dax/kmem.c | 254 ++++++++++++++++++++---- 2 files changed, 238 insertions(+), 41 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/te= sting/sysfs-bus-dax index b34266bfae49..931eb4e20358 100644 --- a/Documentation/ABI/testing/sysfs-bus-dax +++ b/Documentation/ABI/testing/sysfs-bus-dax @@ -151,3 +151,28 @@ Description: memmap_on_memory parameter for memory_hotplug. This is typically set on the kernel command line - memory_hotplug.memmap_on_memory set to 'true' or 'force'." + +What: /sys/bus/dax/devices/daxX.Y/hotplug +Date: January, 2026 +KernelVersion: v6.21 +Contact: nvdimm@lists.linux.dev +Description: + (RW) Controls the hotplug state of the memory region. + Applies to all memory blocks associated with the device. + Only applies to dax_kmem devices. + + Reading returns the current state; the writable states mirror + the per-block /sys/devices/system/memory/memoryX/state ABI: + "unplugged": memory blocks are not present + "online": memory is online, zone chosen by the kernel + "online_kernel": memory is online in ZONE_NORMAL + "online_movable": memory is online in ZONE_MOVABLE + + "offline" (memory blocks are present but offline) may also be + reported - this happens when the device is bound while the + auto_online_blocks policy is offline. It cannot be written and + is deprecated; it may be removed in the future. + + A device can only be onlined from the "unplugged" state, so a + device must be returned to "unplugged" before it can be onlined + into a different state. diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 5bf36ab73f86..46ee06d9f56b 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -42,9 +42,15 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i= , struct range *r) return 0; } =20 +#define DAX_KMEM_UNPLUGGED (-1) + struct dax_kmem_data { const char *res_name; int mgid; + int numa_node; + struct dev_dax *dev_dax; + int state; + struct mutex lock; /* protects hotplug state transitions */ struct resource *res[]; }; =20 @@ -63,23 +69,41 @@ static void kmem_put_memory_types(void) mt_put_memory_types(&kmem_memory_types); } =20 +/* True for the online states a kmem dax device can hold. */ +static bool dax_kmem_state_is_online(int state) +{ + return state =3D=3D MMOP_ONLINE || + state =3D=3D MMOP_ONLINE_KERNEL || + state =3D=3D MMOP_ONLINE_MOVABLE; +} + /** - * dax_kmem_do_hotplug - hotplug memory for dax kmem device + * dax_kmem_do_hotplug - add the dev_dax memory ranges as system memory * @dev_dax: the dev_dax instance * @data: the dax_kmem_data structure with resource tracking + * @online_type: MMOP_OFFLINE to add the blocks offline, otherwise the onl= ine + * state (MMOP_ONLINE, MMOP_ONLINE_KERNEL, MMOP_ONLINE_MOVABLE) + * to bring them online in. * - * Hotplugs all ranges in the dev_dax region as system memory. + * Adds all ranges in the dev_dax region as system memory, onlining them in + * the requested zone unless @online_type is MMOP_OFFLINE. * - * Returns the number of successfully mapped ranges, or negative error. + * Returns the number of successfully added ranges, or negative error. */ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax, struct dax_kmem_data *data, int online_type) { struct device *dev =3D &dev_dax->dev; - int i, rc, onlined =3D 0; + int i, rc, added =3D 0; mhp_t mhp_flags; =20 + if (dax_kmem_state_is_online(data->state)) + return -EINVAL; + + if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE) + return -EINVAL; + for (i =3D 0; i < dev_dax->nr_range; i++) { struct range range; =20 @@ -112,14 +136,14 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_da= x, kfree(data->res[i]); data->res[i] =3D NULL; } - if (onlined) + if (added) continue; return rc; } - onlined++; + added++; } =20 - return onlined; + return added; } =20 /** @@ -182,45 +206,65 @@ static int dax_kmem_init_resources(struct dev_dax *de= v_dax, * @dev_dax: the dev_dax instance * @data: the dax_kmem_data structure with resource tracking * - * Removes all ranges in the dev_dax region. + * Offlines and removes every currently-added range in the dev_dax region + * atomically: either all ranges are offlined and removed, or none are and + * the device is left fully online (see offline_and_remove_memory_ranges()= ). * - * Returns the number of successfully removed ranges. + * Returns 0 on success, or a negative errno if the device could not be + * fully unplugged (in which case nothing was removed). */ static int dax_kmem_do_hotremove(struct dev_dax *dev_dax, struct dax_kmem_data *data) { struct device *dev =3D &dev_dax->dev; - int i, success =3D 0; + struct range *ranges; + int i, nr_ranges =3D 0, rc; =20 + ranges =3D kmalloc_array(dev_dax->nr_range, sizeof(*ranges), GFP_KERNEL); + if (!ranges) + return -ENOMEM; + + /* Collect the ranges that were actually added during probe. */ for (i =3D 0; i < dev_dax->nr_range; i++) { struct range range; - int rc; =20 - rc =3D dax_kmem_range(dev_dax, i, &range); - if (rc) + if (!data->res[i]) continue; - - /* range was never added during probe, count as removed */ - if (!data->res[i]) { - success++; + if (dax_kmem_range(dev_dax, i, &range)) continue; - } + ranges[nr_ranges++] =3D range; + } =20 - rc =3D remove_memory(range.start, range_len(&range)); - if (rc =3D=3D 0) { - /* Release the resource for the successfully removed range */ - remove_resource(data->res[i]); - kfree(data->res[i]); - data->res[i] =3D NULL; - success++; - continue; - } + /* Nothing added means nothing to remove. */ + if (!nr_ranges) { + kfree(ranges); + return 0; + } + + rc =3D offline_and_remove_memory_ranges(ranges, nr_ranges); + kfree(ranges); + if (rc) { any_hotremove_failed =3D true; - dev_err(dev, "mapping%d: %#llx-%#llx hotremove failed\n", - i, range.start, range.end); + dev_err(dev, "hotremove failed, device left online: %d\n", rc); + return rc; } =20 - return success; + /* All ranges removed; release the reserved resources. */ + for (i =3D 0; i < dev_dax->nr_range; i++) { + if (!data->res[i]) + continue; + remove_resource(data->res[i]); + kfree(data->res[i]); + data->res[i] =3D NULL; + } + + return 0; +} +#else +static int dax_kmem_do_hotremove(struct dev_dax *dev_dax, + struct dax_kmem_data *data) +{ + return -EBUSY; } #endif /* CONFIG_MEMORY_HOTREMOVE */ =20 @@ -236,6 +280,20 @@ static void dax_kmem_cleanup_resources(struct dev_dax = *dev_dax, { int i; =20 + /* + * If the device unbind occurs before memory is hotremoved, we can never + * remove the memory (requires reboot). Attempting an offline operation + * here may cause deadlock and a failure to finish the unbind. + * + * This WARN used to be a BUG called by remove_memory(). + * + * Note: This leaks the resources. + */ + if (WARN(((data->state !=3D DAX_KMEM_UNPLUGGED) && + (data->state !=3D MMOP_OFFLINE)), + "Hotplug memory regions stuck online until reboot")) + return; + for (i =3D 0; i < dev_dax->nr_range; i++) { if (!data->res[i]) continue; @@ -245,6 +303,107 @@ static void dax_kmem_cleanup_resources(struct dev_dax= *dev_dax, } } =20 +static int dax_kmem_parse_state(const char *buf) +{ + if (sysfs_streq(buf, "unplugged")) + return DAX_KMEM_UNPLUGGED; + if (sysfs_streq(buf, "online")) + return MMOP_ONLINE; + if (sysfs_streq(buf, "online_kernel")) + return MMOP_ONLINE_KERNEL; + if (sysfs_streq(buf, "online_movable")) + return MMOP_ONLINE_MOVABLE; + return -EINVAL; +} + +static ssize_t hotplug_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dax_kmem_data *data =3D dev_get_drvdata(dev); + const char *state_str; + + if (!data) + return -ENXIO; + + switch (data->state) { + case DAX_KMEM_UNPLUGGED: + state_str =3D "unplugged"; + break; + case MMOP_OFFLINE: + state_str =3D "offline"; + break; + case MMOP_ONLINE: + state_str =3D "online"; + break; + case MMOP_ONLINE_KERNEL: + state_str =3D "online_kernel"; + break; + case MMOP_ONLINE_MOVABLE: + state_str =3D "online_movable"; + break; + default: + state_str =3D "unknown"; + break; + } + + return sysfs_emit(buf, "%s\n", state_str); +} + +static ssize_t hotplug_store(struct device *dev, struct device_attribute *= attr, + const char *buf, size_t len) +{ + struct dev_dax *dev_dax =3D to_dev_dax(dev); + struct dax_kmem_data *data =3D dev_get_drvdata(dev); + int online_type; + int rc; + + if (!data) + return -ENXIO; + + online_type =3D dax_kmem_parse_state(buf); + if (online_type < DAX_KMEM_UNPLUGGED) + return online_type; + + guard(mutex)(&data->lock); + + /* Already in requested state */ + if (data->state =3D=3D online_type) + return len; + + if (online_type =3D=3D DAX_KMEM_UNPLUGGED) { + rc =3D dax_kmem_do_hotremove(dev_dax, data); + if (rc) + return rc; + data->state =3D DAX_KMEM_UNPLUGGED; + return len; + } + + /* + * Onlining is only allowed from the unplugged state. An already-online + * device (or one left in the legacy offline state) must be unplugged + * first. + */ + if (data->state !=3D DAX_KMEM_UNPLUGGED) + return -EBUSY; + + /* + * A previous unplug releases the per-range resources, so re-acquire + * them here (mirroring probe). This is a no-op for ranges that are + * still reserved (e.g. transitioning from the offline state). + */ + rc =3D dax_kmem_init_resources(dev_dax, data); + if (rc < 0) + return rc; + + rc =3D dax_kmem_do_hotplug(dev_dax, data, online_type); + if (rc < 0) + return rc; + + data->state =3D online_type; + return len; +} +static DEVICE_ATTR_RW(hotplug); + static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { struct device *dev =3D &dev_dax->dev; @@ -312,6 +471,10 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) if (rc < 0) goto err_reg_mgid; data->mgid =3D rc; + data->numa_node =3D numa_node; + data->dev_dax =3D dev_dax; + data->state =3D DAX_KMEM_UNPLUGGED; + mutex_init(&data->lock); =20 dev_set_drvdata(dev, data); =20 @@ -320,11 +483,19 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) goto err_resources; =20 /* - * Hotplug using the configured online type for this device. + * Always create the memory blocks for backwards compatibility: existing + * tools expect them to be present after the kmem driver binds. Under + * the offline policy they are added but left offline (state + * MMOP_OFFLINE); otherwise they are onlined per the configured policy. */ rc =3D dax_kmem_do_hotplug(dev_dax, data, dev_dax->online_type); if (rc < 0) goto err_hotplug; + data->state =3D dev_dax->online_type; + + rc =3D device_create_file(dev, &dev_attr_hotplug); + if (rc) + dev_warn(dev, "failed to create hotplug sysfs entry\n"); =20 return 0; =20 @@ -345,23 +516,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) #ifdef CONFIG_MEMORY_HOTREMOVE static void dev_dax_kmem_remove(struct dev_dax *dev_dax) { - int success; int node =3D dev_dax->target_node; struct device *dev =3D &dev_dax->dev; struct dax_kmem_data *data =3D dev_get_drvdata(dev); =20 + device_remove_file(dev, &dev_attr_hotplug); /* - * We have one shot for removing memory, if some memory blocks were not - * offline prior to calling this function remove_memory() will fail, and - * there is no way to hotremove this memory until reboot because device - * unbind will succeed even if we return failure. + * Blocks added under the legacy offline policy are present but offline; + * remove them on unbind as the driver always has. If removal fails, + * leak the resources rather than freeing state that still backs present + * memory. Online memory is left alone (dax_kmem_cleanup_resources() + * warns and leaks it) since offlining it here could deadlock the unbind. */ - success =3D dax_kmem_do_hotremove(dev_dax, data); - if (success < dev_dax->nr_range) { - dev_err(dev, "Hotplug regions stuck online until reboot\n"); + if (data->state =3D=3D MMOP_OFFLINE && dax_kmem_do_hotremove(dev_dax, dat= a)) return; - } - dax_kmem_cleanup_resources(dev_dax, data); memory_group_unregister(data->mgid); kfree(data->res_name); @@ -379,6 +547,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_da= x) #else static void dev_dax_kmem_remove(struct dev_dax *dev_dax) { + struct device *dev =3D &dev_dax->dev; + + device_remove_file(dev, &dev_attr_hotplug); + /* * Without hotremove purposely leak the request_mem_region() for the * device-dax range and return '0' to ->remove() attempts. The removal --=20 2.54.0 From nobody Mon Jun 8 06:38:24 2026 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 516D63D3CF0 for ; Fri, 5 Jun 2026 21:19:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694385; cv=none; b=tV9jCvQhZMs9gVaRKK9FY9wuMVcgbS5AP36fNQeptpbEJDU4tpNIhAatY8G+dWITFRTRuBr4bco8at5c88Dze4wPpVr+EvLRVN9hand+4MhMM1Izh/mzfNYyV9eT6yj63/TVKlet3T2yppSEckUGHJyRQ0NmByrx1OzhiVSjmbM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780694385; c=relaxed/simple; bh=Mh1oQ0uFlkU8AfZR8xtW2zvtlUdY/4dRdyJOeaNOMYw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aib9IsodhRO4LnDAwMk23EKT38n8AIB81Y3JIU2EP1JvK9kO7wp3pO7FkYp9CaTMVWsiXKm6wGPj6yf7ji+hEZLejzc2p5Gb4KyYucHzZW7X+mlyKpBiUNZs/mc62fzszZ6zpXAe0nb4I0AETjQNaiNvYUMqxH/hQWuoCwGFQWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=elwND/tp; arc=none smtp.client-ip=209.85.219.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="elwND/tp" Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-8ce65629acaso27120176d6.3 for ; Fri, 05 Jun 2026 14:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1780694373; x=1781299173; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l77sljwLY2ZMyEKXbyo6m0iIvdzx8NQv05f4wwNhT4k=; b=elwND/tpDb3VBKDfYFF8ueWkRH9DetB77KKruGp0eXBXxfTBdou9SFVBf2KLTju3Qc +0euQnGzK6Kgu09fX2BBWo5TqfQBtik9HtG55rQhLwZEESfSbSQTNbgJlxI9O2W2An5B dHvqduQl+2SUkNz6M0ZX/FgEMoZOaHkq1/OMA4+4qDUFihMlaWeNHwSTOjwyYLin9geZ rq1SsMrAhCHa0hPim3LG8ZAOFIU8Z/KkOVRQSi+b6glV9GVVDWR8F1LD+bmlkIcjFSK2 yskQKOM00pyfLMrQVTNA9SeJEVXtZEmpI0b7ndF3YqtOvvbxD0VM2nMj2AyVlkxd+fBR i06A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780694373; x=1781299173; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=l77sljwLY2ZMyEKXbyo6m0iIvdzx8NQv05f4wwNhT4k=; b=cvbRmkgdJUzIXVziGdOK+IBZEKbWWBp2ZIVnofjS4J27j9B/3LQT+1ojMhlnBb6WZg qCiFovTm4MkWTFZ7JZTrmAmj/3v3aIiDOvidty7MXpTWSjLcEHT13fa3rbD9uQTdBk9C Vjmo2V5F77OgyVTaz+cRsDNWLgHuPal9a/HDO+JMSnb4GxhinNp0edGZ2Yy2g9oqu0nG ryNjSw3526Ay8wlh+6QDBobBfzwDY8RZOVMNn1W+z7Klj+Fq45F32tIfeGkI2yxK8utn EnJZae1M0FLhYWtvbZxT4OOD+AF9q47x8H9AIXQQkpVlvJ9acLIgO5IFcuOrFCvrwlOl Kuqw== X-Gm-Message-State: AOJu0Ywwmr//T/ALM1V/woWCjuuvFlERdY3bu9GwC3+hEcsL4smJPVkk EqEXCrpYVrexOzd+K4bEvY10oY6X5bbybdmBpcEX9bSO07be9nZM5mCxbaG5NxccYsA= X-Gm-Gg: Acq92OGCodj7wzKGjLNiLNGtzO0onDRYv8mHPdtfK4uV2K4LhCGuqw4wOE8rMZcdXM7 UpXhkXOV02rMmRGMvlSYeuSU9VzXTjsYmLJrmhMhTmKIiP3GTNbcOv8K/8bCPIEyI9ZvP22cR1k mNCjFL4XeOGLSgS8Mu+Hep2qv9ZKvMspLXVB7Ir2G9B4hU/uaupOHTJElJwSsSx6L35XD5GfMBz tKXKZA7t5UEsV2+PKuYRGIyYkqp6v96ouiWglyxP5l3Qxo9tGm5PrX+5NyHs2vxknSouQCtBmXD 8VUC7pntcFkZR4S0UEZ7//dVGbRZdQDf2LnBpFH3yuBwzNZYSVhLnX+i1LmGdMOF8/g/kXNXneZ WNnI5kW2asfKVr7FULlYAfTzHXJ/XRbEqvcvdAC8qS1ct6hMK2GBMh8D8U1KTZ1cRDIC0GwvxGb TV0zhT1yLOagcuV/RzwLDhSneZunjyLwQIZLXXIBewq7QlsO3cc+McIVgIr+KeApXo94DM15dYt WAoJUwHfR+CACXZF2qQwu/80kZBzV6gfg== X-Received: by 2002:a05:6214:2aad:b0:8ca:2559:8886 with SMTP id 6a1803df08f44-8cee5fbda5cmr92189846d6.13.1780694373324; Fri, 05 Jun 2026 14:19:33 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-60-52.washdc.fios.verizon.net. [173.79.60.52]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cecd277bbcsm90518196d6.49.2026.06.05.14.19.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 14:19:32 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org, nvdimm@lists.linux.dev Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-cxl@vger.kernel.org, linux-kselftest@vger.kernel.org, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, shuah@kernel.org, gourry@gourry.net, alison.schofield@intel.com, Smita.KoralahalliChannabasappa@amd.com, ira.weiny@intel.com, apopple@nvidia.com Subject: [PATCH v4 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Date: Fri, 5 Jun 2026 22:19:11 +0100 Message-ID: <20260605211911.2160954-10-gourry@gourry.net> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260605211911.2160954-1-gourry@gourry.net> References: <20260605211911.2160954-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a kselftest for the dax/kmem whole-device "hotplug" sysfs attribute (/sys/bus/dax/devices/daxX.Y/hotplug), which transitions a kmem-backed dax device between "unplugged", "online" and "online_movable". Provisioning a devdax device and binding it to kmem needs daxctl/ndctl (or the tools/testing/nvdimm emulation) and is out of scope for an in-tree selftest, so the test discovers an already kmem-bound dax device and SKIPs (KSFT_SKIP) when none is present or when the memory cannot be freed to reach a known baseline. When a device is available it validates the interface contract: - online / online_movable actually add memory (MemTotal grows), - online is idempotent, - switching between online types without an intervening unplug is rejected, - unplug removes the memory and the reported state matches reality, - invalid input is rejected. In particular it covers the online -> unplug -> online_movable -> unplug cycle: a re-online must re-reserve the per-range resources so that a subsequent unplug actually offlines and removes the memory instead of silently reporting success while the memory stays online. Signed-off-by: Gregory Price --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/dax/Makefile | 6 + tools/testing/selftests/dax/config | 4 + .../testing/selftests/dax/dax-kmem-hotplug.sh | 145 ++++++++++++++++++ tools/testing/selftests/dax/settings | 1 + 5 files changed, 157 insertions(+) create mode 100644 tools/testing/selftests/dax/Makefile create mode 100644 tools/testing/selftests/dax/config create mode 100755 tools/testing/selftests/dax/dax-kmem-hotplug.sh create mode 100644 tools/testing/selftests/dax/settings diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Mak= efile index 6e59b8f63e41..8c2b4f97619c 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -14,6 +14,7 @@ TARGETS +=3D core TARGETS +=3D cpufreq TARGETS +=3D cpu-hotplug TARGETS +=3D damon +TARGETS +=3D dax TARGETS +=3D devices/error_logs TARGETS +=3D devices/probe TARGETS +=3D dmabuf-heaps diff --git a/tools/testing/selftests/dax/Makefile b/tools/testing/selftests= /dax/Makefile new file mode 100644 index 000000000000..25a4f3d73a5b --- /dev/null +++ b/tools/testing/selftests/dax/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +all: + +TEST_PROGS :=3D dax-kmem-hotplug.sh + +include ../lib.mk diff --git a/tools/testing/selftests/dax/config b/tools/testing/selftests/d= ax/config new file mode 100644 index 000000000000..4c9aaeb6ceb4 --- /dev/null +++ b/tools/testing/selftests/dax/config @@ -0,0 +1,4 @@ +CONFIG_DEV_DAX=3Dm +CONFIG_DEV_DAX_KMEM=3Dm +CONFIG_MEMORY_HOTPLUG=3Dy +CONFIG_MEMORY_HOTREMOVE=3Dy diff --git a/tools/testing/selftests/dax/dax-kmem-hotplug.sh b/tools/testin= g/selftests/dax/dax-kmem-hotplug.sh new file mode 100755 index 000000000000..705a34cc3c6d --- /dev/null +++ b/tools/testing/selftests/dax/dax-kmem-hotplug.sh @@ -0,0 +1,145 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Exercise the dax/kmem whole-device "hotplug" sysfs attribute: +# /sys/bus/dax/devices/daxX.Y/hotplug -> unplugged | online | online_m= ovable +# +# The test needs a dax device already bound to the kmem driver (so the +# 'hotplug' attribute exists). Provisioning a devdax device and binding i= t to +# kmem requires daxctl/ndctl (or the tools/testing/nvdimm emulation) and i= s out +# of scope here; if no suitable device is found the test SKIPs. +# +# To actually run it, provision a kmem-backed dax device first. For examp= le, +# carve a chunk of RAM into an emulated pmem region via the kernel command= line +# (the region must be at least one memory block, e.g. 128MiB on x86): +# +# memmap=3D2G!4G +# +# then, in the booted system: +# +# ndctl create-namespace -m devdax -e namespace0.0 -f +# daxctl reconfigure-device -N -m system-ram dax0.0 # binds the kmem d= river +# ./dax-kmem-hotplug.sh + +DIR=3D"$(dirname "$(readlink -f "$0")")" +. "$DIR"/../kselftest/ktap_helpers.sh + +DAX_BASE=3D/sys/bus/dax/devices + +memtotal_kb() { awk '/^MemTotal:/ {print $2}' /proc/meminfo; } +get_state() { cat "$HP" 2>/dev/null; } +# set_state STATE -- write a state to the hotplug attribute; returns the +# write's exit status (0 =3D accepted by the kernel) +set_state() { echo "$1" > "$HP" 2>/dev/null; } + +find_kmem_dax() { + local d drv + for d in "$DAX_BASE"/dax*; do + [ -e "$d/hotplug" ] || continue + drv=3D$(readlink "$d/driver" 2>/dev/null) + [ "$(basename "${drv:-}")" =3D kmem ] || continue + basename "$d" + return 0 + done + return 1 +} + +ktap_print_header + +if [ "$UID" !=3D 0 ]; then + ktap_skip_all "must be run as root" + exit "$KSFT_SKIP" +fi + +DAX=3D$(find_kmem_dax) +if [ -z "$DAX" ]; then + ktap_skip_all "no kmem-bound dax device with a hotplug attribute" + exit "$KSFT_SKIP" +fi +HP=3D$DAX_BASE/$DAX/hotplug +ORIG=3D$(get_state) + +# A failure to reach the baseline is environmental (memory in use), not an +# interface failure, so skip rather than fail. +set_state unplugged; rc=3D$? +if [ "$rc" !=3D 0 ] || [ "$(get_state)" !=3D unplugged ]; then + ktap_skip_all "$DAX: cannot reach 'unplugged' baseline (memory in use?)" + [ -n "$ORIG" ] && set_state "$ORIG" + exit "$KSFT_SKIP" +fi +mt_unplugged=3D$(memtotal_kb) + +ktap_print_msg "using $DAX (initial state was: $ORIG)" +ktap_set_plan 8 + +set_state online; rc=3D$? +mt_online=3D$(memtotal_kb) +if [ "$rc" =3D 0 ] && [ "$(get_state)" =3D online ] && [ "$mt_online" -gt = "$mt_unplugged" ]; then + ktap_test_pass "online: state=3Donline, MemTotal $mt_unplugged -> $mt_onl= ine kB" +else + ktap_test_fail "online: rc=3D$rc state=3D$(get_state) MemTotal $mt_unplug= ged -> $mt_online" +fi + +set_state online; rc=3D$? +if [ "$rc" =3D 0 ] && [ "$(get_state)" =3D online ]; then + ktap_test_pass "online idempotent" +else + ktap_test_fail "online idempotent: rc=3D$rc state=3D$(get_state)" +fi + +set_state online_movable; rc=3D$? +if [ "$rc" !=3D 0 ] && [ "$(get_state)" =3D online ]; then + ktap_test_pass "reject online_movable without intervening unplug" +else + ktap_test_fail "online->online_movable not rejected: rc=3D$rc state=3D$(g= et_state)" +fi + +set_state unplugged; rc=3D$? +mt=3D$(memtotal_kb) +if [ "$rc" =3D 0 ] && [ "$(get_state)" =3D unplugged ] && [ "$mt" -lt "$mt= _online" ]; then + ktap_test_pass "unplug from online: MemTotal $mt_online -> $mt kB" +else + ktap_test_fail "unplug from online: rc=3D$rc state=3D$(get_state) MemTota= l $mt_online -> $mt" +fi + +set_state online_movable; rc=3D$? +mt_mov=3D$(memtotal_kb) +if [ "$rc" =3D 0 ] && [ "$(get_state)" =3D online_movable ] && [ "$mt_mov"= -gt "$mt_unplugged" ]; then + ktap_test_pass "online_movable after unplug: MemTotal $mt_unplugged -> $m= t_mov kB" +else + ktap_test_fail "online_movable after unplug: rc=3D$rc state=3D$(get_state= ) MemTotal=3D$mt_mov" +fi + +# The online -> unplug -> online_movable -> unplug cycle once regressed: a +# re-online failed to re-reserve the per-range resources, so this final un= plug +# reported success while leaving the memory online. Assert it is really f= reed. +set_state unplugged; rc=3D$? +mt=3D$(memtotal_kb) +if [ "$rc" !=3D 0 ]; then + ktap_test_skip "unplug from movable not accepted (memory in use?) rc=3D$r= c" +elif [ "$(get_state)" =3D unplugged ] && [ "$mt" -lt "$mt_mov" ]; then + ktap_test_pass "unplug from online_movable removed memory: $mt_mov -> $mt= kB" +else + ktap_test_fail "unplug success but memory remained: $(get_state) $mt_mov = -> $mt kB" +fi + +set_state online_kernel; rc=3D$? +mt=3D$(memtotal_kb) +if [ "$rc" =3D 0 ] && [ "$(get_state)" =3D online_kernel ] && [ "$mt" -gt = "$mt_unplugged" ]; then + ktap_test_pass "online_kernel: MemTotal $mt_unplugged -> $mt kB" +else + ktap_test_fail "online_kernel: rc=3D$rc state=3D$(get_state) MemTotal=3D$= mt" +fi +set_state unplugged + +before=3D$(get_state) +set_state bogus_state; rc=3D$? +if [ "$rc" !=3D 0 ] && [ "$(get_state)" =3D "$before" ]; then + ktap_test_pass "reject invalid state string" +else + ktap_test_fail "invalid state not rejected: rc=3D$rc state=3D$(get_state)" +fi + +[ -n "$ORIG" ] && set_state "$ORIG" + +ktap_finished diff --git a/tools/testing/selftests/dax/settings b/tools/testing/selftests= /dax/settings new file mode 100644 index 000000000000..ba4d85f74cd6 --- /dev/null +++ b/tools/testing/selftests/dax/settings @@ -0,0 +1 @@ +timeout=3D90 --=20 2.54.0