From nobody Mon Feb 9 17:35:58 2026 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A56D6393DCE for ; Wed, 7 Jan 2026 15:33:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767800032; cv=none; b=elrcjDcs6izT1iDVAzz/CnwCCI6emDOvzjrPxry4Ru0tPCMCM01/z5jvfiP5qhHdFm3RhxTaip1ftBIFy/I5T2tiGuBxdlPWSIx4D5a/ErCfP3s+UZykzhrUgTLiqrWdylBNQ47BdbvuX1n+rwYb5pNiIjYB/emguhpCrXVL+us= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767800032; c=relaxed/simple; bh=XYuMwvSGfsD6m78+ebqdaOLpAZMpGJGzuP5EaFJuLj0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sM1MoVl2QJXR/D7xRbhf9y41Eo5/5s322DrRHhMJMYJPNUib5pIYirXgvHROLdAysUguJNOhvDJEUNRzEuYG81Ld30qTLISWdT1gvzJLi2utxNh+sbhrqKbeCHZZxaDzAEfP/1y8MapryRQh3XZT1rTbkQg+dzaDejbDehtp6EQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=m4LwVfsH; arc=none smtp.client-ip=209.85.167.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="m4LwVfsH" Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-45085a4ab72so1325773b6e.1 for ; Wed, 07 Jan 2026 07:33:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767800028; x=1768404828; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=PpTU4sBIkkpSuASPg2mbfPHdAKpdWRtHuhFiIrJI6T0=; b=m4LwVfsH83mlZnaxdHRnJm5RhTULori+eE+xpnK1+x04Tje119eDewOJ0batT453Dt 2fHoQbSj8u3GW7nTbHzd6Yc1T1kVjYzgqdJaWvAkJB7tr55C4HddAIdCApoIAS5ndbkH IyhvN2vvqLdRvEP8IeWbmm7IAwdJN0ZUCAD1lutsMSBLCdQVKzXFv2yVqYnceselhi3B TFos77MDruqSj5mxmfvos1oJvq2JdrUqNtR7e32UFWIVi+BOdxTdm0cZ2U42C1QDlKd5 6vFiji0jxNHntJT5GHaE1AyZZhgj62eJXmiGy9bxLPKtJaD1tH8vCtwCVfdpUttNoI2U 91FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767800028; x=1768404828; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PpTU4sBIkkpSuASPg2mbfPHdAKpdWRtHuhFiIrJI6T0=; b=SdXQta+MCdpMfH7LWCG+Ce9UHaRynbzZLXmnI756l6/BgM6JkZU+Ihk2OdP2/ukbWy zWm5wen2XWgG8HLZSOslFlgPu8cYZone6PsUwj5EOV5gr9uIG4Uq/sXEAoabyDgsLhEH x7fXDm+TZZfK7AAne/35xtfiFcpW/gH0iP98bD1B3KRMRPfVYDw/LcXG2Lybv14SLL66 6HX8SZVDiShJmnlIHDBnBK8cuKidU1Cp1n8Q3vgq+ekAC9OGkoiS1zYv4z64pNipei1s 8iPNNfznrVtbR57X4U3ZVJ407Mwspo0GOki8mAhe2/SAnyjwhzhgJ4tixULRiMUu97FY I/Gw== X-Forwarded-Encrypted: i=1; AJvYcCUvq3Gw+/pFMjl4atwJgynAlnrMVdSaCBY4kDy8r0yqec/mVV0UHgArK22+XNHVNmwgPllMGBoCUzvbs8o=@vger.kernel.org X-Gm-Message-State: AOJu0YzgQRZtqmgv6twn+D1WbtAeai9xJVJKPuIyoTxBB8OXx/tEUIaY B9AMtvVlt0wtMGBoMlgbupHO8q5fDMgJctky593u3Qq3Grz7HCdr8byg X-Gm-Gg: AY/fxX4GMJMsx9PWhZSgVgZ2jRgNi31PMwHnAh9KE6fVNOx5Jzl1NPK0S82KKz2O3wg HseAf0TtwEcYLrGg49Ynn7LXRPevsoZiZppsnCT4pkBPRf+Di9TZKk02k8wEbMUp2sV3Oql0yEJ V9HjEakWZ+gKwBxVAH8auXlh5cCrQUrNsatXKwOkZtzIWTmRjkPYYjwcu4sNXVlfXa4nmlCCipS AIXmt6jpLWSoEiouMNorlGL2jn05fcnNw/k9hzyH4yxfX5B3domIK3AvYi8Shus8cvdqXCYfaJI Y58+VG2R55D9lmkkXyad7hrxrZw+f+1AsIuRIExEw/BvguQJyrjXipDWZEhb8opjHKPZbAUMrQx OspYHTu7AceNTDLd7ZHGIU+2GprV3KNO543rVR9Tb91oULnpyhbAPMQm9gdgoHSMfZ0VdZS9S7H Dv9jv1jKOllUjAETse2J/T8IQPNmMfcAZh92dmiM3jwOP1 X-Google-Smtp-Source: AGHT+IHPGtjoYX1Si7Gm5rn+aIJuMk4zbmp2FbhjNzf6QxbBnA8l/jsAfKj/acLHvT+M5z4N6tttBA== X-Received: by 2002:a05:6808:6412:b0:450:cc6d:d4ce with SMTP id 5614622812f47-45a6bf090cdmr1300044b6e.63.1767800028458; Wed, 07 Jan 2026 07:33:48 -0800 (PST) Received: from localhost.localdomain ([2603:8080:1500:3d89:a917:5124:7300:7cef]) by smtp.gmail.com with ESMTPSA id 5614622812f47-45a5e2f1de5sm2398106b6e.22.2026.01.07.07.33.46 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 07 Jan 2026 07:33:48 -0800 (PST) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Miklos Szeredi , Dan Williams , Bernd Schubert , Alison Schofield Cc: John Groves , Jonathan Corbet , Vishal Verma , Dave Jiang , Matthew Wilcox , Jan Kara , Alexander Viro , David Hildenbrand , Christian Brauner , "Darrick J . Wong" , Randy Dunlap , Jeff Layton , Amir Goldstein , Jonathan Cameron , Stefan Hajnoczi , Joanne Koong , Josef Bacik , Bagas Sanjaya , Chen Linxuan , James Morse , Fuad Tabba , Sean Christopherson , Shivank Garg , Ackerley Tng , Gregory Price , Aravind Ramesh , Ajay Joshi , venkataravis@micron.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, John Groves Subject: [PATCH V3 02/21] dax: add fsdev.c driver for fs-dax on character dax Date: Wed, 7 Jan 2026 09:33:11 -0600 Message-ID: <20260107153332.64727-3-john@groves.net> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20260107153332.64727-1-john@groves.net> References: <20260107153244.64703-1-john@groves.net> <20260107153332.64727-1-john@groves.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The new fsdev driver provides pages/folios initialized compatibly with fsdax - normal rather than devdax-style refcounting, and starting out with order-0 folios. When fsdev binds to a daxdev, it is usually (always?) switching from the devdax mode (device.c), which pre-initializes compound folios according to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the folios into a fsdax-compatible state. A side effect of this is that raw mmap doesn't (can't?) work on an fsdev dax instance. Accordingly, The fsdev driver does not provide raw mmap - devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw mmap capability. In this commit is just the framework, which remaps pages/folios compatibly with fsdax. Enabling dax changes: * bus.h: add DAXDRV_FSDEV_TYPE driver type * bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs * dax.h: prototype inode_dax(), which fsdev needs Suggested-by: Dan Williams Suggested-by: Gregory Price Signed-off-by: John Groves --- MAINTAINERS | 8 ++ drivers/dax/Kconfig | 17 +++ drivers/dax/Makefile | 2 + drivers/dax/bus.c | 4 + drivers/dax/bus.h | 1 + drivers/dax/fsdev.c | 276 +++++++++++++++++++++++++++++++++++++++++++ include/linux/dax.h | 4 + 7 files changed, 312 insertions(+) create mode 100644 drivers/dax/fsdev.c diff --git a/MAINTAINERS b/MAINTAINERS index 765ad2daa218..90429cb06090 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7184,6 +7184,14 @@ L: linux-cxl@vger.kernel.org S: Supported F: drivers/dax/ =20 +DEVICE DIRECT ACCESS (DAX) [fsdev_dax] +M: John Groves +M: John Groves +L: nvdimm@lists.linux.dev +L: linux-cxl@vger.kernel.org +S: Supported +F: drivers/dax/fsdev.c + DEVICE FREQUENCY (DEVFREQ) M: MyungJoo Ham M: Kyungmin Park diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index d656e4c0eb84..491325d914a8 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -78,4 +78,21 @@ config DEV_DAX_KMEM =20 Say N if unsure. =20 +config DEV_DAX_FS + tristate "FSDEV DAX: fs-dax compatible device driver" + depends on DEV_DAX + default DEV_DAX + help + Support a device-dax driver mode that is compatible with fs-dax + filesystems. Unlike the standard device-dax driver which + pre-initializes compound folios based on device alignment, this + driver leaves folios uninitialized (similar to pmem) allowing + fs-dax to manage folio lifecycles dynamically. + + This driver uses MEMORY_DEVICE_FS_DAX type and does not set + vmemmap_shift, making it compatible with filesystems like famfs + that use the iomap-based fs-dax infrastructure. + + Say M if you plan to use fs-dax filesystems on /dev/dax devices. + Say N if you only need raw character device access to DAX memory. endif diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile index 5ed5c39857c8..77aa3df3285c 100644 --- a/drivers/dax/Makefile +++ b/drivers/dax/Makefile @@ -4,11 +4,13 @@ obj-$(CONFIG_DEV_DAX) +=3D device_dax.o obj-$(CONFIG_DEV_DAX_KMEM) +=3D kmem.o obj-$(CONFIG_DEV_DAX_PMEM) +=3D dax_pmem.o obj-$(CONFIG_DEV_DAX_CXL) +=3D dax_cxl.o +obj-$(CONFIG_DEV_DAX_FS) +=3D fsdev_dax.o =20 dax-y :=3D super.o dax-y +=3D bus.o device_dax-y :=3D device.o dax_pmem-y :=3D pmem.o dax_cxl-y :=3D cxl.o +fsdev_dax-y :=3D fsdev.o =20 obj-y +=3D hmem/ diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index a2f9a3cc30a5..0d7228acb913 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -84,6 +84,10 @@ static int dax_match_type(const struct dax_device_driver= *dax_drv, struct device !IS_ENABLED(CONFIG_DEV_DAX_KMEM)) return 1; =20 + /* fsdev driver can also bind to device-type dax devices */ + if (dax_drv->type =3D=3D DAXDRV_FSDEV_TYPE && type =3D=3D DAXDRV_DEVICE_T= YPE) + return 1; + return 0; } =20 diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index cbbf64443098..880bdf7e72d7 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -31,6 +31,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *= data); enum dax_driver_type { DAXDRV_KMEM_TYPE, DAXDRV_DEVICE_TYPE, + DAXDRV_FSDEV_TYPE, }; =20 struct dax_device_driver { diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c new file mode 100644 index 000000000000..2a3249d1529c --- /dev/null +++ b/drivers/dax/fsdev.c @@ -0,0 +1,276 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2026 Micron Technology, Inc. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "dax-private.h" +#include "bus.h" + +/* + * FS-DAX compatible devdax driver + * + * Unlike drivers/dax/device.c which pre-initializes compound folios based + * on device alignment (via vmemmap_shift), this driver leaves folios + * uninitialized similar to pmem. This allows fs-dax filesystems like famfs + * to work without needing special handling for pre-initialized folios. + * + * Key differences from device.c: + * - pgmap type is MEMORY_DEVICE_FS_DAX (not MEMORY_DEVICE_GENERIC) + * - vmemmap_shift is NOT set (folios remain order-0) + * - fs-dax can dynamically create compound folios as needed + * - No mmap support - all access is through fs-dax/iomap + */ + + +static void fsdev_cdev_del(void *cdev) +{ + cdev_del(cdev); +} + +static void fsdev_kill(void *dev_dax) +{ + kill_dev_dax(dev_dax); +} + +/* + * Page map operations for FS-DAX mode + * Similar to fsdax_pagemap_ops in drivers/nvdimm/pmem.c + * + * Note: folio_free callback is not needed for MEMORY_DEVICE_FS_DAX. + * The core mm code in free_zone_device_folio() handles the wake_up_var() + * directly for this memory type. + */ +static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap, + unsigned long pfn, unsigned long nr_pages, int mf_flags) +{ + struct dev_dax *dev_dax =3D pgmap->owner; + u64 offset =3D PFN_PHYS(pfn) - dev_dax->ranges[0].range.start; + u64 len =3D nr_pages << PAGE_SHIFT; + + return dax_holder_notify_failure(dev_dax->dax_dev, offset, + len, mf_flags); +} + +static const struct dev_pagemap_ops fsdev_pagemap_ops =3D { + .memory_failure =3D fsdev_pagemap_memory_failure, +}; + +/* + * Clear any stale folio state from pages in the given range. + * This is necessary because device_dax pre-initializes compound folios + * based on vmemmap_shift, and that state may persist after driver unbind. + * Since fsdev_dax uses MEMORY_DEVICE_FS_DAX without vmemmap_shift, fs-dax + * expects to find clean order-0 folios that it can build into compound + * folios on demand. + * + * At probe time, no filesystem should be mounted yet, so all mappings + * are stale and must be cleared along with compound state. + */ +static void fsdev_clear_folio_state(struct dev_dax *dev_dax) +{ + int i; + + for (i =3D 0; i < dev_dax->nr_range; i++) { + struct range *range =3D &dev_dax->ranges[i].range; + unsigned long pfn, end_pfn; + + pfn =3D PHYS_PFN(range->start); + end_pfn =3D PHYS_PFN(range->end) + 1; + + while (pfn < end_pfn) { + struct page *page =3D pfn_to_page(pfn); + struct folio *folio =3D (struct folio *)page; + struct dev_pagemap *pgmap =3D page_pgmap(page); + int order =3D folio_order(folio); + + /* + * Clear any stale mapping pointer. At probe time, + * no filesystem is mounted, so any mapping is stale. + */ + folio->mapping =3D NULL; + folio->share =3D 0; + + if (order > 0) { + int j; + + folio_reset_order(folio); + for (j =3D 0; j < (1UL << order); j++) { + struct page *p =3D page + j; + + ClearPageHead(p); + clear_compound_head(p); + ((struct folio *)p)->mapping =3D NULL; + ((struct folio *)p)->share =3D 0; + ((struct folio *)p)->pgmap =3D pgmap; + } + pfn +=3D (1UL << order); + } else { + folio->pgmap =3D pgmap; + pfn++; + } + } + } +} + +static int fsdev_open(struct inode *inode, struct file *filp) +{ + struct dax_device *dax_dev =3D inode_dax(inode); + struct dev_dax *dev_dax =3D dax_get_private(dax_dev); + + dev_dbg(&dev_dax->dev, "trace\n"); + filp->private_data =3D dev_dax; + + return 0; +} + +static int fsdev_release(struct inode *inode, struct file *filp) +{ + struct dev_dax *dev_dax =3D filp->private_data; + + dev_dbg(&dev_dax->dev, "trace\n"); + return 0; +} + +static const struct file_operations fsdev_fops =3D { + .llseek =3D noop_llseek, + .owner =3D THIS_MODULE, + .open =3D fsdev_open, + .release =3D fsdev_release, +}; + +static int fsdev_dax_probe(struct dev_dax *dev_dax) +{ + struct dax_device *dax_dev =3D dev_dax->dax_dev; + struct device *dev =3D &dev_dax->dev; + struct dev_pagemap *pgmap; + u64 data_offset =3D 0; + struct inode *inode; + struct cdev *cdev; + void *addr; + int rc, i; + + if (static_dev_dax(dev_dax)) { + if (dev_dax->nr_range > 1) { + dev_warn(dev, + "static pgmap / multi-range device conflict\n"); + return -EINVAL; + } + + pgmap =3D dev_dax->pgmap; + } else { + if (dev_dax->pgmap) { + dev_warn(dev, + "dynamic-dax with pre-populated page map\n"); + return -EINVAL; + } + + pgmap =3D devm_kzalloc(dev, + struct_size(pgmap, ranges, dev_dax->nr_range - 1), + GFP_KERNEL); + if (!pgmap) + return -ENOMEM; + + pgmap->nr_range =3D dev_dax->nr_range; + dev_dax->pgmap =3D pgmap; + + for (i =3D 0; i < dev_dax->nr_range; i++) { + struct range *range =3D &dev_dax->ranges[i].range; + + pgmap->ranges[i] =3D *range; + } + } + + for (i =3D 0; i < dev_dax->nr_range; i++) { + struct range *range =3D &dev_dax->ranges[i].range; + + if (!devm_request_mem_region(dev, range->start, + range_len(range), dev_name(dev))) { + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n", + i, range->start, range->end); + return -EBUSY; + } + } + + /* + * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and + * do NOT set vmemmap_shift. This leaves folios at order-0, + * allowing fs-dax to dynamically create compound folios as needed + * (similar to pmem behavior). + */ + pgmap->type =3D MEMORY_DEVICE_FS_DAX; + pgmap->ops =3D &fsdev_pagemap_ops; + pgmap->owner =3D dev_dax; + + /* + * CRITICAL DIFFERENCE from device.c: + * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE. + * This ensures folios remain order-0 and are compatible with + * fs-dax's folio management. + */ + + addr =3D devm_memremap_pages(dev, pgmap); + if (IS_ERR(addr)) + return PTR_ERR(addr); + + /* + * Clear any stale compound folio state left over from a previous + * driver (e.g., device_dax with vmemmap_shift). + */ + fsdev_clear_folio_state(dev_dax); + + /* Detect whether the data is at a non-zero offset into the memory */ + if (pgmap->range.start !=3D dev_dax->ranges[0].range.start) { + u64 phys =3D dev_dax->ranges[0].range.start; + u64 pgmap_phys =3D dev_dax->pgmap[0].range.start; + + if (!WARN_ON(pgmap_phys > phys)) + data_offset =3D phys - pgmap_phys; + + pr_debug("%s: offset detected phys=3D%llx pgmap_phys=3D%llx offset=3D%ll= x\n", + __func__, phys, pgmap_phys, data_offset); + } + + inode =3D dax_inode(dax_dev); + cdev =3D inode->i_cdev; + cdev_init(cdev, &fsdev_fops); + cdev->owner =3D dev->driver->owner; + cdev_set_parent(cdev, &dev->kobj); + rc =3D cdev_add(cdev, dev->devt, 1); + if (rc) + return rc; + + rc =3D devm_add_action_or_reset(dev, fsdev_cdev_del, cdev); + if (rc) + return rc; + + run_dax(dax_dev); + return devm_add_action_or_reset(dev, fsdev_kill, dev_dax); +} + +static struct dax_device_driver fsdev_dax_driver =3D { + .probe =3D fsdev_dax_probe, + .type =3D DAXDRV_FSDEV_TYPE, +}; + +static int __init dax_init(void) +{ + return dax_driver_register(&fsdev_dax_driver); +} + +static void __exit dax_exit(void) +{ + dax_driver_unregister(&fsdev_dax_driver); +} + +MODULE_AUTHOR("John Groves"); +MODULE_DESCRIPTION("FS-DAX Device: fs-dax compatible devdax driver"); +MODULE_LICENSE("GPL"); +module_init(dax_init); +module_exit(dax_exit); +MODULE_ALIAS_DAX_DEVICE(0); diff --git a/include/linux/dax.h b/include/linux/dax.h index 9d624f4d9df6..74e098010016 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -51,6 +51,10 @@ struct dax_holder_operations { =20 #if IS_ENABLED(CONFIG_DAX) struct dax_device *alloc_dax(void *private, const struct dax_operations *o= ps); + +#if IS_ENABLED(CONFIG_DEV_DAX_FS) +struct dax_device *inode_dax(struct inode *inode); +#endif void *dax_holder(struct dax_device *dax_dev); void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); --=20 2.49.0