From nobody Thu Oct 2 15:35:36 2025 Received: from fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com [18.197.217.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65D4A2797AE; Mon, 15 Sep 2025 16:18:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.197.217.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757953132; cv=none; b=I0+cL9jlbXBcrip9O9BTdE7Fj25rJdmVacoSUaLtMC3i1VXH+pRySMxHRAfHP3wwyR/kgeB3ZbprRvwhhO17usKx1XU1bWP0LkrLoBFz52lROF2DdE1UjRMY2tTczkL3vOvSXMeaTYhbtt3ODz3PZTYuhck4C+BxBjWfI7Q0HI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757953132; c=relaxed/simple; bh=fJDRk2Bdf81Gp496FGpX8GvGCt6gwfobFHAMxUT4pKI=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=D6E240+YxXNX3NQUGoPdeE0XOuRQnMDlBXwLuetbFCYidEb27dmKgSMY5y6u1V2B/Yzy/V66SksJYkmxsalzHwJSC0nR7ziXQ+Dp8mtmxkSulWtVFV5FuK3cfT81/JO1Hhdi1pf5oEb5r7YCKnaY2F6TwFy7YzfW8w7hvIT1ChI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=j7Kfrjvm; arc=none smtp.client-ip=18.197.217.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="j7Kfrjvm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazoncorp2; t=1757953130; x=1789489130; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=N3NeNyamqVLa9+DQCyzsj2cOhr41ZZd2MWu/K0OO1HI=; b=j7KfrjvmRdp9obCrpQXNsmdhWJv88NNYyQptQTSwGTGxAPidBvzn5+6P 9zTt+AqBOZYVw+divbWQiKGA0oGWNdPSzekwnweaKDb5jxp+FLMx2KRcW sWLcYURlp4h0S4zb3pGVOm8iGdF6HYCXXMQ2UR/KT/NINppCkelRUs9VW xZ/bmbJpYb403u/kKs5u3sxB04t31nvF7MGlpbPj4E07JXySKX4udcZBX PGNS9WMOkCZGIy8h7EDjV9rqXoGXWUqqb4tTpLT33gFHiStyJQNTfBCCl 9UNxCa5gqwhAqKJNfQJU5dR5fVcj0f95SA0pBa4UMC5qMSvnczjrMhmQv Q==; X-CSE-ConnectionGUID: ZcEgW1YvT0C2GF/FYjPcxw== X-CSE-MsgGUID: IfTURVlRSXKEJ320dW9aJQ== X-IronPort-AV: E=Sophos;i="6.18,266,1751241600"; d="scan'208";a="2137065" Received: from ip-10-6-3-216.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.3.216]) by internal-fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2025 16:18:40 +0000 Received: from EX19MTAEUB002.ant.amazon.com [54.240.197.224:25479] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.0.240:2525] with esmtp (Farcaster) id f857cdf3-c4a1-44e0-903b-97f2c9bca60f; Mon, 15 Sep 2025 16:18:40 +0000 (UTC) X-Farcaster-Flow-ID: f857cdf3-c4a1-44e0-903b-97f2c9bca60f Received: from EX19D022EUC004.ant.amazon.com (10.252.51.159) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Mon, 15 Sep 2025 16:18:39 +0000 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19D022EUC004.ant.amazon.com (10.252.51.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Mon, 15 Sep 2025 16:18:39 +0000 Received: from EX19D022EUC002.ant.amazon.com ([fe80::bd:307b:4d3a:7d80]) by EX19D022EUC002.ant.amazon.com ([fe80::bd:307b:4d3a:7d80%3]) with mapi id 15.02.2562.020; Mon, 15 Sep 2025 16:18:39 +0000 From: "Kalyazin, Nikita" To: "akpm@linux-foundation.org" , "david@redhat.com" , "pbonzini@redhat.com" , "seanjc@google.com" , "viro@zeniv.linux.org.uk" , "brauner@kernel.org" CC: "peterx@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "willy@infradead.org" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "jack@suse.cz" , "linux-mm@kvack.org" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "jthoughton@google.com" , "tabba@google.com" , "vannapurve@google.com" , "Roy, Patrick" , "Thomson, Jack" , "Manwaring, Derek" , "Cali, Marco" , "Kalyazin, Nikita" Subject: [RFC PATCH v6 2/2] userfaulfd: add minor mode for guestmem Thread-Topic: [RFC PATCH v6 2/2] userfaulfd: add minor mode for guestmem Thread-Index: AQHcJlxlJ76MA28SOEel205dRdsXkA== Date: Mon, 15 Sep 2025 16:18:39 +0000 Message-ID: <20250915161815.40729-3-kalyazin@amazon.com> References: <20250915161815.40729-1-kalyazin@amazon.com> In-Reply-To: <20250915161815.40729-1-kalyazin@amazon.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Nikita Kalyazin UserfaultFD support in guestmem enables use cases like restoring a guest_memfd-backed VM from a memory snapshot in Firecracker [1] where an external process is responsible for supplying the content of the guest memory or live migration of guest_memfd-backed VMs. [1] https://github.com/firecracker-microvm/firecracker/blob/main/docs/snaps= hotting/handling-page-faults-on-snapshot-resume.md Signed-off-by: Nikita Kalyazin --- Documentation/admin-guide/mm/userfaultfd.rst | 4 +++- fs/userfaultfd.c | 3 ++- include/linux/userfaultfd_k.h | 8 +++++--- include/uapi/linux/userfaultfd.h | 8 +++++++- mm/userfaultfd.c | 14 +++++++++++--- 5 files changed, 28 insertions(+), 9 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index e5cc8848dcb3..ca8c5954ffdb 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -111,7 +111,9 @@ events, except page fault notifications, may be generat= ed: - ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating - support for shmem virtual memory areas. + support for shmem virtual memory areas. ``UFFD_FEATURE_MINOR_GUESTMEM`` + is the analogous feature indicating support for guestmem-backed memory + areas. =20 - ``UFFD_FEATURE_MOVE`` indicates that the kernel supports moving an existing page contents from userspace. diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 54c6cc7fe9c6..e4e80f1072a6 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1978,7 +1978,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, uffdio_api.features =3D UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &=3D - ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); + ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM | + UFFD_FEATURE_MINOR_GUESTMEM); #endif #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c0e716aec26a..37bd4e71b611 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -14,6 +14,7 @@ #include /* linux/include/uapi/linux/userfaultfd.h */ =20 #include +#include #include #include #include @@ -218,7 +219,8 @@ static inline bool vma_can_userfault(struct vm_area_str= uct *vma, return false; =20 if ((vm_flags & VM_UFFD_MINOR) && - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) + (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma) && + !guestmem_vma_is_guestmem(vma))) return false; =20 /* @@ -238,9 +240,9 @@ static inline bool vma_can_userfault(struct vm_area_str= uct *vma, return false; #endif =20 - /* By default, allow any of anon|shmem|hugetlb */ + /* By default, allow any of anon|shmem|hugetlb|guestmem */ return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); + vma_is_shmem(vma) || guestmem_vma_is_guestmem(vma); } =20 static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 2841e4ea8f2c..0fe9fbd29772 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -42,7 +42,8 @@ UFFD_FEATURE_WP_UNPOPULATED | \ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ - UFFD_FEATURE_MOVE) + UFFD_FEATURE_MOVE | \ + UFFD_FEATURE_MINOR_GUESTMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -230,6 +231,10 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_MINOR_GUESTMEM indicates the same support as + * UFFD_FEATURE_MINOR_HUGETLBFS, but for guestmem-backed pages + * instead. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +253,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_MINOR_GUESTMEM (1<<17) __u64 features; =20 __u64 ioctls; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 45e6290e2e8b..304e5d7dbb70 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -388,7 +388,14 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, struct page *page; int ret; =20 - ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + if (guestmem_vma_is_guestmem(dst_vma)) { + ret =3D 0; + folio =3D guestmem_grab_folio(inode->i_mapping, pgoff); + if (IS_ERR(folio)) + ret =3D PTR_ERR(folio); + } else { + ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + } /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret =3D=3D -ENOENT) ret =3D -EFAULT; @@ -766,9 +773,10 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, src_start, len, flags); =20 - if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) + if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma) + && !guestmem_vma_is_guestmem(dst_vma)) goto out_unlock; - if (!vma_is_shmem(dst_vma) && + if (!vma_is_shmem(dst_vma) && !guestmem_vma_is_guestmem(dst_vma) && uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) goto out_unlock; =20 --=20 2.50.1