Add a memory notifier to prevent external operations from changing the
online/offline state of memory blocks managed by dax_kmem. This ensures
state changes only occur through the driver's hotplug sysfs interface,
providing consistent state tracking and preventing races with auto-online
policies or direct memory block sysfs manipulation.
The goal of this is to prevent `daxN.M/hotplug` from becoming
inconsistent with the state of the memory blocks it owns.
The notifier uses a transition protocol with memory barriers:
- Before initiating a state change, set target_state then in_transition
- Use barrier to ensure target_state is visible before in_transition
- The notifier checks in_transition, then uses barrier before reading
target_state to ensure proper ordering on weakly-ordered architectures
The notifier callback:
- Returns NOTIFY_DONE for non-overlapping memory (not our concern)
- Returns NOTIFY_BAD if in_transition is false (block external ops)
- Validates the memory event matches target_state (MEM_GOING_ONLINE
for online operations, MEM_GOING_OFFLINE for offline/unplug)
- Returns NOTIFY_OK only for driver-initiated operations with matching
target_state
This prevents scenarios where:
- Users manually change memory state via /sys/devices/system/memory/
- Other kernel subsystems interfere with driver-managed memory state
(may be important for regions trying to preserve hot-unpluggability)
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 157 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 154 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index c222ae9d675d..f3562f65376c 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -53,6 +53,9 @@ struct dax_kmem_data {
struct dev_dax *dev_dax;
int state;
struct mutex lock; /* protects hotplug state transitions */
+ bool in_transition;
+ int target_state;
+ struct notifier_block mem_nb;
struct resource *res[];
};
@@ -71,6 +74,116 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/**
+ * dax_kmem_start_transition - begin a driver-initiated state transition
+ * @data: the dax_kmem_data structure
+ * @target: the target state (MMOP_ONLINE, MMOP_ONLINE_MOVABLE, or MMOP_OFFLINE)
+ *
+ * Sets up state for a driver-initiated memory operation. The memory notifier
+ * will only allow operations that match this target state while in transition.
+ * Uses store-release to ensure target_state is visible before in_transition.
+ */
+static void dax_kmem_start_transition(struct dax_kmem_data *data, int target)
+{
+ data->target_state = target;
+ smp_store_release(&data->in_transition, true);
+}
+
+/**
+ * dax_kmem_end_transition - end a driver-initiated state transition
+ * @data: the dax_kmem_data structure
+ *
+ * Clears the in_transition flag after a state change completes or aborts.
+ */
+static void dax_kmem_end_transition(struct dax_kmem_data *data)
+{
+ WRITE_ONCE(data->in_transition, false);
+}
+
+/**
+ * dax_kmem_overlaps_range - check if a memory range overlaps with this device
+ * @data: the dax_kmem_data structure
+ * @start: start physical address of the range to check
+ * @size: size of the range to check
+ *
+ * Returns true if the range overlaps with any of the device's memory ranges.
+ */
+static bool dax_kmem_overlaps_range(struct dax_kmem_data *data,
+ u64 start, u64 size)
+{
+ struct dev_dax *dev_dax = data->dev_dax;
+ int i;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+ struct range check = DEFINE_RANGE(start, start + size - 1);
+
+ if (dax_kmem_range(dev_dax, i, &range))
+ continue;
+
+ if (!data->res[i])
+ continue;
+
+ if (range_overlaps(&range, &check))
+ return true;
+ }
+ return false;
+}
+
+/**
+ * dax_kmem_memory_notifier_cb - memory notifier callback for dax kmem
+ * @nb: the notifier block (embedded in dax_kmem_data)
+ * @action: the memory event (MEM_GOING_ONLINE, MEM_GOING_OFFLINE, etc.)
+ * @arg: pointer to memory_notify structure
+ *
+ * This callback prevents external operations (e.g., from sysfs or auto-online
+ * policies) on memory blocks managed by dax_kmem. Only operations initiated
+ * by the driver itself (via the hotplug sysfs interface) are allowed.
+ *
+ * Returns NOTIFY_OK to allow the operation, NOTIFY_BAD to block it,
+ * or NOTIFY_DONE if the memory doesn't belong to this device.
+ */
+static int dax_kmem_memory_notifier_cb(struct notifier_block *nb,
+ unsigned long action, void *arg)
+{
+ struct dax_kmem_data *data = container_of(nb, struct dax_kmem_data,
+ mem_nb);
+ struct memory_notify *mhp = arg;
+ const u64 start = PFN_PHYS(mhp->start_pfn);
+ const u64 size = PFN_PHYS(mhp->nr_pages);
+
+ /* Only interested in going online/offline events */
+ if (action != MEM_GOING_ONLINE && action != MEM_GOING_OFFLINE)
+ return NOTIFY_DONE;
+
+ /* Check if this memory belongs to our device */
+ if (!dax_kmem_overlaps_range(data, start, size))
+ return NOTIFY_DONE;
+
+ /*
+ * Block all operations unless we're in a driver-initiated transition.
+ * When in_transition is set, only allow operations that match our
+ * target_state to prevent races with external operations.
+ *
+ * Use load-acquire to pair with the store-release in
+ * dax_kmem_start_transition(), ensuring target_state is visible.
+ */
+ if (!smp_load_acquire(&data->in_transition))
+ return NOTIFY_BAD;
+
+ /* Online operations expect MEM_GOING_ONLINE */
+ if (action == MEM_GOING_ONLINE &&
+ (data->target_state == MMOP_ONLINE ||
+ data->target_state == MMOP_ONLINE_MOVABLE))
+ return NOTIFY_OK;
+
+ /* Offline/hotremove operations expect MEM_GOING_OFFLINE */
+ if (action == MEM_GOING_OFFLINE && data->target_state == MMOP_OFFLINE)
+ return NOTIFY_OK;
+
+ return NOTIFY_BAD;
+}
+
/**
* dax_kmem_do_hotplug - hotplug memory for dax kmem device
* @dev_dax: the dev_dax instance
@@ -325,11 +438,27 @@ static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
if (data->state == online_type)
return len;
+ /*
+ * Start transition with target_state for the notifier.
+ * For unplug, use MMOP_OFFLINE since memory goes offline before removal.
+ */
+ if (online_type == DAX_KMEM_UNPLUGGED || online_type == MMOP_OFFLINE)
+ dax_kmem_start_transition(data, MMOP_OFFLINE);
+ else
+ dax_kmem_start_transition(data, online_type);
+
if (online_type == DAX_KMEM_UNPLUGGED) {
+ int expected = 0;
+
+ for (rc = 0; rc < dev_dax->nr_range; rc++)
+ if (data->res[rc])
+ expected++;
+
rc = dax_kmem_do_hotremove(dev_dax, data);
- if (rc < 0) {
+ dax_kmem_end_transition(data);
+ if (rc < expected) {
dev_warn(dev, "hotplug state is inconsistent\n");
- return rc;
+ return rc == 0 ? -EBUSY : -EIO;
}
data->state = DAX_KMEM_UNPLUGGED;
return len;
@@ -339,10 +468,14 @@ static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
* online_type is MMOP_ONLINE or MMOP_ONLINE_MOVABLE
* Cannot switch between online types without unplugging first
*/
- if (data->state == MMOP_ONLINE || data->state == MMOP_ONLINE_MOVABLE)
+ if (data->state == MMOP_ONLINE || data->state == MMOP_ONLINE_MOVABLE) {
+ dax_kmem_end_transition(data);
return -EBUSY;
+ }
rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+ dax_kmem_end_transition(data);
+
if (rc < 0)
return rc;
@@ -430,13 +563,26 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc < 0)
goto err_resources;
+ /* Register memory notifier to block external operations */
+ data->mem_nb.notifier_call = dax_kmem_memory_notifier_cb;
+ rc = register_memory_notifier(&data->mem_nb);
+ if (rc) {
+ dev_warn(dev, "failed to register memory notifier\n");
+ goto err_notifier;
+ }
+
/*
* Hotplug using the system default policy - this preserves backwards
* for existing users who rely on the default auto-online behavior.
+ *
+ * Start transition with resolved system default since the notifier
+ * validates the operation type matches.
*/
online_type = mhp_get_default_online_type();
if (online_type != MMOP_OFFLINE) {
+ dax_kmem_start_transition(data, online_type);
rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+ dax_kmem_end_transition(data);
if (rc < 0)
goto err_hotplug;
data->state = online_type;
@@ -449,6 +595,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return 0;
err_hotplug:
+ unregister_memory_notifier(&data->mem_nb);
+err_notifier:
dax_kmem_cleanup_resources(dev_dax, data);
err_resources:
dev_set_drvdata(dev, NULL);
@@ -471,6 +619,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
device_remove_file(dev, &dev_attr_hotplug);
dax_kmem_cleanup_resources(dev_dax, data);
+ unregister_memory_notifier(&data->mem_nb);
memory_group_unregister(data->mgid);
kfree(data->res_name);
kfree(data);
@@ -488,8 +637,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
device_remove_file(dev, &dev_attr_hotplug);
+ unregister_memory_notifier(&data->mem_nb);
/*
* Without hotremove purposely leak the request_mem_region() for the
--
2.52.0
Since this protection may break userspace tools, it should
be an opt-in until those tools have time to update to the
new daxN.M/hotplug interface instead of memory blocks.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/Kconfig | 18 ++++++++++++++++++
drivers/dax/kmem.c | 29 ++++++++++++++++++++---------
2 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d656e4c0eb84..cc13c22eb8f8 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -78,4 +78,22 @@ config DEV_DAX_KMEM
Say N if unsure.
+config DEV_DAX_KMEM_PROTECTED
+ bool "Protect DAX_KMEM memory blocks being changed"
+ depends on DEV_DAX_KMEM
+ default n
+ help
+ Prevents actions from outside the KMEM DAX driver from changing
+ DAX KMEM memory block states. For example, the memory block
+ sysfs functions (online, state) will return -EBUSY, and normal
+ calls to memory_hotplug functions from other drivers and kernel
+ sources will fail.
+
+ This may break existing memory block management patterns that
+ depend on offlining DAX KMEM blocks from userland before unbinding
+ the driver. Use this only if your tools have been updated to use
+ the daxN.M/hotplug interface.
+
+ Say N if unsure.
+
endif
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index f3562f65376c..094b8a51099e 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -184,6 +184,21 @@ static int dax_kmem_memory_notifier_cb(struct notifier_block *nb,
return NOTIFY_BAD;
}
+static int dax_kmem_register_notifier(struct dax_kmem_data *data)
+{
+ if (!IS_ENABLED(DEV_DAX_KMEM_PROTECTED))
+ return 0;
+ data->mem_nb.notifier_call = dax_kmem_memory_notifier_cb;
+ return register_memory_notifier(&data->mem_nb);
+}
+
+static void dax_kmem_unregister_notifier(struct dax_kmem_data *data)
+{
+ if (!IS_ENABLED(DEV_DAX_KMEM_PROTECTED))
+ return;
+ unregister_memory_notifier(&data->mem_nb);
+}
+
/**
* dax_kmem_do_hotplug - hotplug memory for dax kmem device
* @dev_dax: the dev_dax instance
@@ -563,13 +578,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc < 0)
goto err_resources;
- /* Register memory notifier to block external operations */
- data->mem_nb.notifier_call = dax_kmem_memory_notifier_cb;
- rc = register_memory_notifier(&data->mem_nb);
- if (rc) {
- dev_warn(dev, "failed to register memory notifier\n");
+ rc = dax_kmem_register_notifier(data);
+ if (rc)
goto err_notifier;
- }
/*
* Hotplug using the system default policy - this preserves backwards
@@ -595,7 +606,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return 0;
err_hotplug:
- unregister_memory_notifier(&data->mem_nb);
+ dax_kmem_unregister_notifier(data);
err_notifier:
dax_kmem_cleanup_resources(dev_dax, data);
err_resources:
@@ -619,7 +630,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
device_remove_file(dev, &dev_attr_hotplug);
dax_kmem_cleanup_resources(dev_dax, data);
- unregister_memory_notifier(&data->mem_nb);
+ dax_kmem_unregister_notifier(data);
memory_group_unregister(data->mgid);
kfree(data->res_name);
kfree(data);
@@ -640,7 +651,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
struct dax_kmem_data *data = dev_get_drvdata(dev);
device_remove_file(dev, &dev_attr_hotplug);
- unregister_memory_notifier(&data->mem_nb);
+ dax_kmem_unregister_notifier(data);
/*
* Without hotremove purposely leak the request_mem_region() for the
--
2.52.0
On Wed, 14 Jan 2026 21:42:22 -0500 Gregory Price <gourry@gourry.net> wrote: > Since this protection may break userspace tools, it should > be an opt-in until those tools have time to update to the > new daxN.M/hotplug interface instead of memory blocks. > > --- a/drivers/dax/Kconfig > +++ b/drivers/dax/Kconfig > @@ -78,4 +78,22 @@ config DEV_DAX_KMEM > > Say N if unsure. > > +config DEV_DAX_KMEM_PROTECTED Users must rebuild and redeploy kernels after having updated a userspace tool. They won't thank us for this ;) Isn't there something we can do to make this feature backward-compatible?
On Tue, Jan 27, 2026 at 01:34:31PM -0800, Andrew Morton wrote: > On Wed, 14 Jan 2026 21:42:22 -0500 Gregory Price <gourry@gourry.net> wrote: > > > Since this protection may break userspace tools, it should > > be an opt-in until those tools have time to update to the > > new daxN.M/hotplug interface instead of memory blocks. > > > > --- a/drivers/dax/Kconfig > > +++ b/drivers/dax/Kconfig > > @@ -78,4 +78,22 @@ config DEV_DAX_KMEM > > > > Say N if unsure. > > > > +config DEV_DAX_KMEM_PROTECTED > > Users must rebuild and redeploy kernels after having updated a > userspace tool. They won't thank us for this ;) > > Isn't there something we can do to make this feature > backward-compatible? > This feature is likely getting dropped in favor of pushing such policy to a driver if it cares that much to prevent users toggling memory blocks. I will likely re-spin this series in a week or so when other non-mm changes flesh out a little clearer. This will be removed and some of the mm/memory-hotplug.c changes will be changed to prevent the modification of an already extern'd function. ~Gregory
© 2016 - 2026 Red Hat, Inc.