From nobody Thu Oct 9 02:12:55 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7ED6D2E8E1D for ; Fri, 20 Jun 2025 19:03:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446236; cv=none; b=qHCG8/VNtf0Js2mLFLIENLU2al3zAbsMRgeYjRayWea7a5/xmgFrwLMQeQY78vSexixq8ln8nVcvhfYzVBIksiJ8y286u7BSkA0qO2EKeXvHIDtCvGFNZ1YprUbJdIL4RsY7Bmyvb0qiLtvKNdIV8oaLnutbQ78KijlX9dyWjRE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446236; c=relaxed/simple; bh=kckZUmeRwbXeBBm5rBOa1jsS0DEHUnC6XuHkwJPsrIE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AZ1hG4rGzJWWhzsdVaen8K2Fb1y+Tdo1Psmq6W040oUQ23iBJC3qfJlyQrm7jX1fohTnL5LZjxMam3A5DlkicMyOb69zoX+N3uJrqHHicCi1cPPtfoLcxVpHbOS+pUBfPIek+P1uUdaNTtzIpwnKfZQpIGxNRRPc1y38EF8Fkis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BkoojT98; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BkoojT98" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750446233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L/uUmIIKaxQEfWKnIn4c6AHy8NnxzF1D+uAlq9OPV6E=; b=BkoojT98qpugEYo6k0Q4zMCCSgZT4GeXKI+e2F/9LyRGjfM2CBjAh6tsvtR92XZHag1/9h QsroZruvje8YziIx6DNAE/9Agtnq0ETBUVE4NPBzMtlhm1+r1qsLD1UT7d7LM1joMKbbbP Z3HT2Fa0R1rdive0RpGaXkk0qiLo/2s= Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-70-wdWUYb3AMOCGE1xjeV40lQ-1; Fri, 20 Jun 2025 15:03:52 -0400 X-MC-Unique: wdWUYb3AMOCGE1xjeV40lQ-1 X-Mimecast-MFC-AGG-ID: wdWUYb3AMOCGE1xjeV40lQ_1750446231 Received: by mail-pl1-f198.google.com with SMTP id d9443c01a7336-236725af87fso25732975ad.3 for ; Fri, 20 Jun 2025 12:03:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750446231; x=1751051031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L/uUmIIKaxQEfWKnIn4c6AHy8NnxzF1D+uAlq9OPV6E=; b=TNkEje0FJy6fO6T/9NvyajGZsQ0PrRVtbzMUBlPj21Bi3mwgRoorJZt5g7b79neZlz CHc77L8zZeB/3jeN4eshy68FjmdPWPS6uKXEkruiTKkcCulZbTB9OiYtOSn9/SALzuNj yIEqCX0mUn1DsENogmhLqtQ/m3cF9pp45U24ZsGCMu4A6/oC/P/ONchnNnUSwX3WKsJf rdyuTiy+DOLHgySajntoT4jf9JumNWv149XmIrOX1zRbLbkiPHsGtIq+kF+InVbXfuWN 6PH+GiS/6QsinqL1ap6gDUrgK3C8l5XYpv57yed94cHm55aR1kpWHAFV3OT8/CvAZ09s UXzw== X-Forwarded-Encrypted: i=1; AJvYcCWAfBJ+U8rKAq0mtoi30HeR0lwkjLO1nqzg8SVwmfpvCxICEwWn5jYe3NEzau9FwtPADf/tooEpj7ur1sU=@vger.kernel.org X-Gm-Message-State: AOJu0Yy22aoLeIXMmVdGLCfUlG5IPYZJYAIUUxDj8FfROqyAxus/r5CW cL+Y1fN9o6dp8iy6i/1tvASsWdRFvTxO745PkK1kRWrvpSoG5X8eeD4fYj8jauwS/Q9NuwWwE9a dNbk6AE56asRHAKPjtbMcehHPJrpwOZ75q/9e6P2k+K7Zt1Eiwk7G+RHlw/6G8A+YFg== X-Gm-Gg: ASbGnctb2+c+55MqALdkOfTSxjh8lOGKYjykuKsFS7BzdNx+Bf3LmLQuLjr9QLAxtma Ypiwo9PjmYclZ+QLHj4diHYqqRYhqq7SO1J24zE2oUatibbV0NMd6EdNS/AwjMGPvl0ykzlUBFx XvXWTr2esT/f8MkDSX3QSFnpFyRzvQx1UhZW+OsA8SnhvRJJ3HErntCfd3NLw3iyHlGr9FOk8OB AYbFrpFs5sMZe+bdDGnrbXMd1sDKseWYn5ogZN+MifykVxIGJYENeYYl/2q+RfHGaatbegI76XW Yiy/BxosPtI= X-Received: by 2002:a17:902:ce03:b0:234:a139:1208 with SMTP id d9443c01a7336-237d972e274mr70968745ad.16.1750446231089; Fri, 20 Jun 2025 12:03:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGu8J6Sq7bshzIqQ3TEXmd3Fhe1gqUM/uJN6VB90LrZm8iK8bS2G/SbAjodZmsNkHbCelLp/g== X-Received: by 2002:a17:902:ce03:b0:234:a139:1208 with SMTP id d9443c01a7336-237d972e274mr70968235ad.16.1750446230682; Fri, 20 Jun 2025 12:03:50 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-237d8609968sm24235535ad.136.2025.06.20.12.03.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Jun 2025 12:03:50 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nikita Kalyazin , peterx@redhat.com, Hugh Dickins , Oscar Salvador , Michal Hocko , David Hildenbrand , Muchun Song , Andrea Arcangeli , Ujwal Kundur , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , "Liam R . Howlett" , James Houghton , Mike Rapoport , Lorenzo Stoakes , Axel Rasmussen Subject: [PATCH 1/4] mm: Introduce vm_uffd_ops API Date: Fri, 20 Jun 2025 15:03:39 -0400 Message-ID: <20250620190342.1780170-2-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250620190342.1780170-1-peterx@redhat.com> References: <20250620190342.1780170-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a generic userfaultfd API for vm_operations_struct, so that one vma, especially when as a module, can support userfaults without modifying the core files. More importantly, when the module can be compiled out of the kernel. So, instead of having core mm referencing modules that may not ever exist, we need to have modules opt-in on core mm hooks instead. After this API applied, if a module wants to support userfaultfd, the module should only need to touch its own file and properly define vm_uffd_ops, instead of changing anything in core mm. Note that such API will not work for anonymous. Core mm will process anonymous memory separately for userfault operations like before. This patch only introduces the API alone so that we can start to move existing users over but without breaking them. Currently the uffd_copy() API is almost designed to be the simplistic with minimum mm changes to move over to the API. Signed-off-by: Peter Xu --- include/linux/mm.h | 71 +++++++++++++++++++++++++++++++++++ include/linux/userfaultfd_k.h | 12 ------ 2 files changed, 71 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 98a606908307..8dfd83f01d3d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -576,6 +576,70 @@ struct vm_fault { */ }; =20 +#ifdef CONFIG_USERFAULTFD +/* A combined operation mode + behavior flags. */ +typedef unsigned int __bitwise uffd_flags_t; + +enum mfill_atomic_mode { + MFILL_ATOMIC_COPY, + MFILL_ATOMIC_ZEROPAGE, + MFILL_ATOMIC_CONTINUE, + MFILL_ATOMIC_POISON, + NR_MFILL_ATOMIC_MODES, +}; + +/* VMA userfaultfd operations */ +typedef struct { + /** + * @uffd_features: features supported in bitmask. + * + * When the ops is defined, the driver must set non-zero features + * to be a subset (or all) of: VM_UFFD_MISSING|WP|MINOR. + */ + unsigned long uffd_features; + /** + * @uffd_ioctls: ioctls supported in bitmask. + * + * Userfaultfd ioctls supported by the module. Below will always + * be supported by default whenever a module provides vm_uffd_ops: + * + * _UFFDIO_API, _UFFDIO_REGISTER, _UFFDIO_UNREGISTER, _UFFDIO_WAKE + * + * The module needs to provide all the rest optionally supported + * ioctls. For example, when VM_UFFD_MISSING was supported, + * _UFFDIO_COPY must be supported as ioctl, while _UFFDIO_ZEROPAGE + * is optional. + */ + unsigned long uffd_ioctls; + /** + * uffd_get_folio: Handler to resolve UFFDIO_CONTINUE request. + * + * @inode: the inode for folio lookup + * @pgoff: the pgoff of the folio + * @folio: returned folio pointer + * + * Return: zero if succeeded, negative for errors. + */ + int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff, + struct folio **folio); + /** + * uffd_copy: Handler to resolve UFFDIO_COPY|ZEROPAGE request. + * + * @dst_pmd: target pmd to resolve page fault + * @dst_vma: target vma + * @dst_addr: target virtual address + * @src_addr: source address to copy from + * @flags: userfaultfd request flags + * @foliop: previously allocated folio + * + * Return: zero if succeeded, negative for errors. + */ + int (*uffd_copy)(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, + unsigned long dst_addr, unsigned long src_addr, + uffd_flags_t flags, struct folio **foliop); +} vm_uffd_ops; +#endif + /* * These are the virtual MM functions - opening of an area, closing and * unmapping it (needed to keep files on disk up-to-date etc), pointer @@ -653,6 +717,13 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); +#ifdef CONFIG_USERFAULTFD + /* + * Userfaultfd related ops. Modules need to define this to support + * userfaultfd. + */ + const vm_uffd_ops *userfaultfd_ops; +#endif }; =20 #ifdef CONFIG_NUMA_BALANCING diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index ccad58602846..e79c724b3b95 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -80,18 +80,6 @@ struct userfaultfd_ctx { =20 extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long rea= son); =20 -/* A combined operation mode + behavior flags. */ -typedef unsigned int __bitwise uffd_flags_t; - -/* Mutually exclusive modes of operation. */ -enum mfill_atomic_mode { - MFILL_ATOMIC_COPY, - MFILL_ATOMIC_ZEROPAGE, - MFILL_ATOMIC_CONTINUE, - MFILL_ATOMIC_POISON, - NR_MFILL_ATOMIC_MODES, -}; - #define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1) + 1) #define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr)) #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) --=20 2.49.0 From nobody Thu Oct 9 02:12:55 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54DC51CBEB9 for ; Fri, 20 Jun 2025 19:03:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446238; cv=none; b=XgnhgcnqMD02pK70KuRDp733ajGldZiDkayctmRMjPLaoaa1jZ62bFrhkMJp9iUshJTWaremqczCIwxC2dOp31BUWZ9474nRgOC9u181QuClybcBgkihM4Y67ipJM4S1MCUPX4/j5wfinFJc9pBMbaMv+Wp7BnOfHKZVn6Sl9iI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446238; c=relaxed/simple; bh=CHbka3kwIlIu5uqoSTJK00+tYhE92ZV/8pgE2O6+INo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CfnPNb64WBYTywk6wG0r/dsYPDtapDtu0Cws976BmqwVUyvcyYJ9R5PCn70bCTV6Yv2c09ecmFPDIcrBumJKb5BCYoN3HKVd/CnO43zEJ5T2TGk34cU2Ib+SV0yj4uVuaaOhRYzCvWfHj032jcFJont43dGIY3hnDteu/l5BfSY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TY7tAl8L; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TY7tAl8L" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750446236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MlVYTqcNyEXwc8zN1rOa2F8O42Bhf5IYSqZeViP+B0U=; b=TY7tAl8LbDllrR9GEvsW3xy+bGnIXH6P59+omB5pPP7vbNkXoWfCuaNJT2veYipVsSRMox zJHRxiltgzdF2JDidnvGiZCuyeLn6dPVgWPU/CgtdI0oOn1p9Bdyb5A6mETroHoaKMATXv HdiBKVZ2bWogLrr+ZsOqGxdz56IxSJ8= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-480-zGLW6fCuNwur870TXoOEGg-1; Fri, 20 Jun 2025 15:03:55 -0400 X-MC-Unique: zGLW6fCuNwur870TXoOEGg-1 X-Mimecast-MFC-AGG-ID: zGLW6fCuNwur870TXoOEGg_1750446234 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-235eefe6a8fso16131285ad.1 for ; Fri, 20 Jun 2025 12:03:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750446234; x=1751051034; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MlVYTqcNyEXwc8zN1rOa2F8O42Bhf5IYSqZeViP+B0U=; b=qpxIuq5lPD5ALd9rtO6G9GBCtrLCg68GqToFErftObhEM+y6wu1+q2ZPCWMLsVK93i Tj0h+OnKLYVjg8HKcYYRxIHuFdcFLK/NrrYWk7VzBtYjYbCVXPf1/ZBm2/oiPctdEJ26 Aa/irn6FyT4j+GM9AZQ0UxTZllL55nnfugZD5xmMb3oTkw5cJ/vlElOOqAAQ+ww2xFCw sxn4pU9KKduX/K/g+i+cuMc6iu+bI8C+Ef254YHM3+HYbHo2aeY50GBGWvaKIg1qp9yW +0vUkTg66kAm2b3Kt+tx7JESDPYYrq7J33fltpEOwvrmgRuG31sTHsz8OQPlhtNr90oV ds9Q== X-Forwarded-Encrypted: i=1; AJvYcCVZbv9FlDbRoyjpQP2xJ6ibyHGZ5zlxckjohjf2Jr6wSZLJcILbIKWgeqeRfQc6q9fknRZjHu/elsRjvSo=@vger.kernel.org X-Gm-Message-State: AOJu0Yx060FTWIoUDE6lscZ4vbxWczyRdNDUGP+L/jQt4gYibNtX8V7M KFx0BnXlQyNubjFzrX3TgDf0SDFlJI2SHilC27byMEOqzyBDbbkdrvLMARtUwGh1I0lPV07Lv3J EcnUO9tjfjlRVWgn1KbJRvYIcc5SCpTN+VmXZSZuqlQSNdLN1XDFTaeRQgvKnQUhbCA== X-Gm-Gg: ASbGncv22pazZxtlUXNgQEjF/ZDBDT/a0ekUNXy13Pb6XX1npJWOENzuhbDVk/xWR6d XeBAGB+MSH/i10XaixRaTALxXsk2e3nDL1juqj9o1j2IK41Mllzxqq8yzmq0VxfsHmRtGOA84BG pcC34SLaXAxeqze4tcMhEs4da7eHWm5hQHbJowRgj/QNI1MLhJTdh8MrYZUezceQ3QWMIbALqOL s34VwnBfcG9mJdzi1HXDA/E1KRBAZ6+ZGxzgspy5HMSN2xG6VNYd9vu7atgrKS8sKZ9S2ddifmk CzmuVWOAXc0= X-Received: by 2002:a17:902:e842:b0:235:ea0d:ae21 with SMTP id d9443c01a7336-237d996585fmr70252305ad.35.1750446233784; Fri, 20 Jun 2025 12:03:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG+SvCa7Srq7yrsaP9omrf+ymFZqhKwNSmDUngwJyl3buQUDwE3xOiUUDw2ItAjef9vi4uYAw== X-Received: by 2002:a17:902:e842:b0:235:ea0d:ae21 with SMTP id d9443c01a7336-237d996585fmr70251825ad.35.1750446233263; Fri, 20 Jun 2025 12:03:53 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-237d8609968sm24235535ad.136.2025.06.20.12.03.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Jun 2025 12:03:52 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nikita Kalyazin , peterx@redhat.com, Hugh Dickins , Oscar Salvador , Michal Hocko , David Hildenbrand , Muchun Song , Andrea Arcangeli , Ujwal Kundur , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , "Liam R . Howlett" , James Houghton , Mike Rapoport , Lorenzo Stoakes , Axel Rasmussen Subject: [PATCH 2/4] mm/shmem: Support vm_uffd_ops API Date: Fri, 20 Jun 2025 15:03:40 -0400 Message-ID: <20250620190342.1780170-3-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250620190342.1780170-1-peterx@redhat.com> References: <20250620190342.1780170-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for the new vm_uffd_ops API for shmem. Note that this only introduces the support, the API is not yet used by core mm. Due to the tailored uffd_copy() API, shmem is extremely easy to support it by reusing the existing mfill function. It only needs a separate uffd_get_folio() definition but that's oneliner. Cc: Hugh Dickins Signed-off-by: Peter Xu --- mm/shmem.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 0bc30dafad90..bd0a29000318 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3151,6 +3151,13 @@ static inline struct inode *shmem_get_inode(struct m= nt_idmap *idmap, #endif /* CONFIG_TMPFS_QUOTA */ =20 #ifdef CONFIG_USERFAULTFD + +static int shmem_uffd_get_folio(struct inode *inode, pgoff_t pgoff, + struct folio **folio) +{ + return shmem_get_folio(inode, pgoff, 0, folio, SGP_NOALLOC); +} + int shmem_mfill_atomic_pte(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, @@ -5194,6 +5201,19 @@ static int shmem_error_remove_folio(struct address_s= pace *mapping, return 0; } =20 +#ifdef CONFIG_USERFAULTFD +static const vm_uffd_ops shmem_uffd_ops =3D { + .uffd_features =3D __VM_UFFD_FLAGS, + .uffd_ioctls =3D BIT(_UFFDIO_COPY) | + BIT(_UFFDIO_ZEROPAGE) | + BIT(_UFFDIO_WRITEPROTECT) | + BIT(_UFFDIO_CONTINUE) | + BIT(_UFFDIO_POISON), + .uffd_get_folio =3D shmem_uffd_get_folio, + .uffd_copy =3D shmem_mfill_atomic_pte, +}; +#endif + static const struct address_space_operations shmem_aops =3D { .dirty_folio =3D noop_dirty_folio, #ifdef CONFIG_TMPFS @@ -5296,6 +5316,9 @@ static const struct vm_operations_struct shmem_vm_ops= =3D { .set_policy =3D shmem_set_policy, .get_policy =3D shmem_get_policy, #endif +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops =3D &shmem_uffd_ops, +#endif }; =20 static const struct vm_operations_struct shmem_anon_vm_ops =3D { @@ -5305,6 +5328,9 @@ static const struct vm_operations_struct shmem_anon_v= m_ops =3D { .set_policy =3D shmem_set_policy, .get_policy =3D shmem_get_policy, #endif +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops =3D &shmem_uffd_ops, +#endif }; =20 int shmem_init_fs_context(struct fs_context *fc) --=20 2.49.0 From nobody Thu Oct 9 02:12:55 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4C792ECE8A for ; Fri, 20 Jun 2025 19:04:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446243; cv=none; b=ClBvy8ucdrBS165g8w2F2OLJ0kdphWw1Y/iiOGN7ou2B/zKSlK6Q3N3YwBy8YUoR1ATOfx+D8jrornGuBOE+V+Acxli6AVWvb1DRu78GSYS7ajVJbktOzrugPRQXiHKBzAWEvztv9oeCT9cD6fOrm7Tpz0r17f67wJEoVAZu6R8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446243; c=relaxed/simple; bh=isjtrVfx0a6AusOG3S8uk5U68yHrDAf7ViRy4In41N4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hgYmstWA1CpFABwuQ3zm3CDOjfv1fIhDK2cVk91xmYSKuq1nouiveIAKigxG7k9fmznDGQYz4ZgZrQyLoYReUTd05vVsREGwFzq2115tKWIBnjXUaBohYdgclTrsoXWcMkM7UFVlQaWcEWAPFk6GggBJ0a2ri+3aX4tfva6rsMY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=P65FvqkY; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P65FvqkY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750446239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cZDbbtIAa6EPMWRofMi/kXSaO7jdlCdJOMqHnRdeIWA=; b=P65FvqkYntdZusaKBWXMydPRQ5ZuBy6C+PKBaBLNjup+gcd+K9vX8liErnQD8kqCWgXUeS 9RLqWmtjDmC7qJnAgu3dj9l7rcNaPbBpMOoi14tyTZGqKU+bWBJTRdXE7Wnm3UHtreiiMG 1DVlYsYloo8dWmaea4y5YsqXdES9urE= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-370-eNVSKvCqPril4U1uqMb4iA-1; Fri, 20 Jun 2025 15:03:57 -0400 X-MC-Unique: eNVSKvCqPril4U1uqMb4iA-1 X-Mimecast-MFC-AGG-ID: eNVSKvCqPril4U1uqMb4iA_1750446236 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-235f77f86f6so22539795ad.2 for ; Fri, 20 Jun 2025 12:03:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750446236; x=1751051036; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cZDbbtIAa6EPMWRofMi/kXSaO7jdlCdJOMqHnRdeIWA=; b=qlbtDiQzxTs3oRsak3MsjKWHf1LXoMGMveB0e1eQIrQ3EJ6Yt8AEQMSjmYzlgQhUVY /oHeM5bFiet9ZXeL+Lu+98O9qn9UgB2B25853ppWMRr4SaD9HzC22mvalBisYEte8xAU 3XlUPdFUI+aIF8ZpZeZQq7eOpNA6hqbRS3lwS6er89z7d54WGE2jOaq62nV5d+i/iZr8 zH3ggqhfkufAToqFtXsK5NmwaELgWBfJlJyIzMTvcjTAVLCrV0/xdZ3r3RVas79l3z5x 6bBuCwkFd18PV5pqym8YEno4yN2cKQhnW2uTEWChb2ghlU1okX7CzesUeUFBWPwtHrzT 5e1w== X-Forwarded-Encrypted: i=1; AJvYcCVtqxS3axqJWR4oGzYP6nBqN2e2jQ0gz3dwycJwPhqU6KjrQwicajursNHFPTmGHs15jDd+zFwgCfj/z7c=@vger.kernel.org X-Gm-Message-State: AOJu0YwHt5+pqtvIk6T2WvHDQRxRa52J+Ul4j/3DtNnM+9vk2Myrso6P thLwL+CYd1FZsrOo6ib20hzB8Y8WlW/MBCdrs/WU7YPiQh3kRGbhA9BnbsG/emTniGrSgW04NFl zcjiztwLPieI56n+pCfzCBcLW8VsDnhk5uCD+b1h6gol7i9nUsaPtFOp7GMtQwdmJRw== X-Gm-Gg: ASbGnctA4SHjQfvFPn/IEXLROFCA1NueC6nWrmH35MRgEYdpxkC239iNC6NZQunHGe9 rmjR3/PZmT4fpvhEGHAtKzVO09y/YPMJGK5C8CQfqnKwQf2KXwHamwuH6h80OKnIWwZHaLcpGfe SZSy/qdvThdNzllBvKkZqcV2XGmIpC3OsQKFGJB4n4Qh1kEelCwUDByNk0cEIRdxJ8lel+meIPQ kuJKUUROOG6dlusudjNDwHieTkN/8q4QJX36WLA5P0PC1wX4wfwAOh/7v2C5aioZDHlin/mcEA8 zP/S6MC2rus= X-Received: by 2002:a17:902:d542:b0:234:d399:f948 with SMTP id d9443c01a7336-237d997fda8mr56957855ad.33.1750446236391; Fri, 20 Jun 2025 12:03:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEGWRVo9lU6CN1+qf1IBzFazCaFwiIFN8m5Wc6/OgMNQb0aQQjMN6hDbpeBJmQP3cfZ3Z1D5Q== X-Received: by 2002:a17:902:d542:b0:234:d399:f948 with SMTP id d9443c01a7336-237d997fda8mr56957375ad.33.1750446235934; Fri, 20 Jun 2025 12:03:55 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-237d8609968sm24235535ad.136.2025.06.20.12.03.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Jun 2025 12:03:55 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nikita Kalyazin , peterx@redhat.com, Hugh Dickins , Oscar Salvador , Michal Hocko , David Hildenbrand , Muchun Song , Andrea Arcangeli , Ujwal Kundur , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , "Liam R . Howlett" , James Houghton , Mike Rapoport , Lorenzo Stoakes , Axel Rasmussen Subject: [PATCH 3/4] mm/hugetlb: Support vm_uffd_ops API Date: Fri, 20 Jun 2025 15:03:41 -0400 Message-ID: <20250620190342.1780170-4-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250620190342.1780170-1-peterx@redhat.com> References: <20250620190342.1780170-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for the new vm_uffd_ops API for hugetlb. Note that this only introduces the support, the API is not yet used by core mm. Due to legacy reasons, it's still not trivial to move hugetlb completely to the API (like shmem). But it will still use uffd_features and uffd_ioctls properly on the API because that's pretty general. Cc: Muchun Song Cc: Oscar Salvador Signed-off-by: Peter Xu --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3d61ec17c15a..b9e473fab871 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5459,6 +5459,22 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_faul= t *vmf) return 0; } =20 +#ifdef CONFIG_USERFAULTFD +static const vm_uffd_ops hugetlb_uffd_ops =3D { + .uffd_features =3D __VM_UFFD_FLAGS, + /* _UFFDIO_ZEROPAGE not supported */ + .uffd_ioctls =3D BIT(_UFFDIO_COPY) | + BIT(_UFFDIO_WRITEPROTECT) | + BIT(_UFFDIO_CONTINUE) | + BIT(_UFFDIO_POISON), + /* + * Hugetlbfs still has its own hard-coded handler in userfaultfd, + * due to limitations similar to vm_operations_struct.fault(). + * TODO: generalize it to use the API functions. + */ +}; +#endif + /* * When a new function is introduced to vm_operations_struct and added * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops. @@ -5472,6 +5488,9 @@ const struct vm_operations_struct hugetlb_vm_ops =3D { .close =3D hugetlb_vm_op_close, .may_split =3D hugetlb_vm_op_split, .pagesize =3D hugetlb_vm_op_pagesize, +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops =3D &hugetlb_uffd_ops, +#endif }; =20 static pte_t make_huge_pte(struct vm_area_struct *vma, struct folio *folio, --=20 2.49.0 From nobody Thu Oct 9 02:12:55 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F15102EBB98 for ; Fri, 20 Jun 2025 19:04:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446249; cv=none; b=ffTEeneoSJm9SfmdMhs00owH9ahiCA3Jei6r+utyNV6cMt5p0QbW6tDedn4EO6NXTWHjobklCOabxT7+aROvvu1DhPRYgGE50tYw2Mzd0qHmGOYlomCBLlkEO4Wd5tmmWd3Vc3ZFnlC+p9Vaj6sR9vln7lnhZWG0m+xeqmf4qhk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750446249; c=relaxed/simple; bh=TMLlSM8BMqo775Gj2E0Dc8mxLrtlrK28d74l44KVCCs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J38ZYdqNWue3bWTsbA1fcNank7pSmPS9WzFqsaSXTOphodtN3cpIFSM1My9t8rF8Kmwt4CxjfFKf3i8Ctuu1/uCSzTqtJ5l9+KgpIRahwhOVGQcMTqm/Pewc4h15vXcbGdr/14vbqp+/bdf7lvyTddwunWGC7T9ZHqum+a6MP2M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=afJhF+xi; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="afJhF+xi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750446242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sHY3BJ1O8Kmf60+iubBnseKb0MyvGjRBfhUw5+ZcMhs=; b=afJhF+ximxvLfbIxSCH4plQp6HfQ7QilKyy3FVLpouxLv6oGgB2fshWCmCLGisg/HETUAG 2agbwX9jZRJI6uzQUj67v8tQPdNvnCdhmLj3JS/FY+hdm90geA4mQlGXZnFimDCrOp36Rx OjIzlJqDXyfRgaPBepRmh0E/EQeiwhQ= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-307-qmfgEOGsPryaSpCTzuP6Rw-1; Fri, 20 Jun 2025 15:04:00 -0400 X-MC-Unique: qmfgEOGsPryaSpCTzuP6Rw-1 X-Mimecast-MFC-AGG-ID: qmfgEOGsPryaSpCTzuP6Rw_1750446239 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-231d13ac4d4so30058345ad.3 for ; Fri, 20 Jun 2025 12:04:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750446239; x=1751051039; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sHY3BJ1O8Kmf60+iubBnseKb0MyvGjRBfhUw5+ZcMhs=; b=mdXAvNf9k7fmy9YFqS0f7Y+XeVJ3I2GqnSBAF30KIhTKll0k2pt0vmf5BFgrONWjy2 Q2cJJr+qEjlZ73r0SrdPuklAmeLRXpfuTg5ntRqTP8v20eRN6rr0kLBV1pghqrMKHGzF ONHo+vDhoZYDUU9peWSnbDv9JFWARhqTdZvo+NtgBYiJHD1UVWX/0il81ogAgsa1Htgk Gma703bNoyn0P3oTtANXCh0FFKMGR65AfbneoFliZyivVX1fmqiMilEPV8p/hnGw7Y0t XXXQo4jWkbO3JWcr+KwXRn/6TC9dA+CSKlVnofhwNjTkNbjXpDtocM9Pp/5xAMIXdFo3 InCg== X-Forwarded-Encrypted: i=1; AJvYcCU0NeviaYJy0O/OxRQSXCd4gOdI/usIjxWUUcikIroNYIw9jzbDzKCzzSaKAn1z+JbpPrhZCXvV0Tt1fkQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyisrCw/KL30mnPengxxXiV4pxVg85O92gbB8YErd84EJbvnVBj 7ldnIAQahi38nSDj7Ra5WwI5Fm9OXgiEmjvFULzYpLjgipwLuTlflLV381MCSFeBW5JEMXoDq4H 9NP92SMQ0Axm574HU1VdMTcQeFTd4QjuAvHVidlz0u6OOti54cTHWp9KgpTxAWm+cOg== X-Gm-Gg: ASbGncthtCATYBujWd0PMn7VkiWHvhcA5l713YQzQTP5ryT6/0W7Y4GUHIIu1NLmiK7 jjgoOD9l85zgJDmlFqyxI6ZszVSmJXIPpyp6+1VVEZhMM3ItcLVneLKNSXrF7EvuCo2v4kdqHpF osDeZTOVsI7OkImLBvuL/lduA2TNu0Xp4yXPpvI52rZhERsaqG4qDxJFeYROmBZO+51ydNDF38D 7QXi5MtcCTF7QVIQ9PhKCNFi3107fId+A8vm91OEAEdAOtlVxMrLgua9KskiA50lMgThlhnY3Hv xk+ZjxG/a8U= X-Received: by 2002:a17:902:c941:b0:235:779:ede3 with SMTP id d9443c01a7336-237d9a74d3fmr63714675ad.41.1750446239089; Fri, 20 Jun 2025 12:03:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IExmitzxGaktnhz9QmbqTtnlFfBEk+Ufz7lYfnEFxYhJCludK4ajrFoYpJZXCG/xUHyP5XbWg== X-Received: by 2002:a17:902:c941:b0:235:779:ede3 with SMTP id d9443c01a7336-237d9a74d3fmr63714195ad.41.1750446238652; Fri, 20 Jun 2025 12:03:58 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-237d8609968sm24235535ad.136.2025.06.20.12.03.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Jun 2025 12:03:58 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nikita Kalyazin , peterx@redhat.com, Hugh Dickins , Oscar Salvador , Michal Hocko , David Hildenbrand , Muchun Song , Andrea Arcangeli , Ujwal Kundur , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , "Liam R . Howlett" , James Houghton , Mike Rapoport , Lorenzo Stoakes , Axel Rasmussen Subject: [PATCH 4/4] mm: Apply vm_uffd_ops API to core mm Date: Fri, 20 Jun 2025 15:03:42 -0400 Message-ID: <20250620190342.1780170-5-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250620190342.1780170-1-peterx@redhat.com> References: <20250620190342.1780170-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch completely moves the old userfaultfd core to use the new vm_uffd_ops API. After this change, existing file systems will start to use the new API for userfault operations. When at it, moving vma_can_userfault() into mm/userfaultfd.c instead, because it's getting too big. It's only used in slow paths so it shouldn't be an issue. This will also remove quite some hard-coded checks for either shmem or hugetlbfs. Now all the old checks should still work but with vm_uffd_ops. Note that anonymous memory will still need to be processed separately because it doesn't have vm_ops at all. Signed-off-by: Peter Xu Reviewed-by: James Houghton --- include/linux/shmem_fs.h | 14 ----- include/linux/userfaultfd_k.h | 46 ++++---------- mm/shmem.c | 2 +- mm/userfaultfd.c | 115 +++++++++++++++++++++++++--------- 4 files changed, 101 insertions(+), 76 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 6d0f9c599ff7..2f5b7b295cf6 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -195,20 +195,6 @@ static inline pgoff_t shmem_fallocend(struct inode *in= ode, pgoff_t eof) extern bool shmem_charge(struct inode *inode, long pages); extern void shmem_uncharge(struct inode *inode, long pages); =20 -#ifdef CONFIG_USERFAULTFD -#ifdef CONFIG_SHMEM -extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd, - struct vm_area_struct *dst_vma, - unsigned long dst_addr, - unsigned long src_addr, - uffd_flags_t flags, - struct folio **foliop); -#else /* !CONFIG_SHMEM */ -#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \ - src_addr, flags, foliop) ({ BUG(); 0; }) -#endif /* CONFIG_SHMEM */ -#endif /* CONFIG_USERFAULTFD */ - /* * Used space is stored as unsigned 64-bit value in bytes but * quota core supports only signed 64-bit values so use that diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index e79c724b3b95..4e56ad423a4a 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -85,9 +85,14 @@ extern vm_fault_t handle_userfault(struct vm_fault *vmf,= unsigned long reason); #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) #define MFILL_ATOMIC_MODE_MASK ((__force uffd_flags_t) (MFILL_ATOMIC_BIT(0= ) - 1)) =20 +static inline enum mfill_atomic_mode uffd_flags_get_mode(uffd_flags_t flag= s) +{ + return (enum mfill_atomic_mode)(flags & MFILL_ATOMIC_MODE_MASK); +} + static inline bool uffd_flags_mode_is(uffd_flags_t flags, enum mfill_atomi= c_mode expected) { - return (flags & MFILL_ATOMIC_MODE_MASK) =3D=3D ((__force uffd_flags_t) ex= pected); + return uffd_flags_get_mode(flags) =3D=3D expected; } =20 static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mf= ill_atomic_mode mode) @@ -196,41 +201,16 @@ static inline bool userfaultfd_armed(struct vm_area_s= truct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } =20 -static inline bool vma_can_userfault(struct vm_area_struct *vma, - unsigned long vm_flags, - bool wp_async) +static inline const vm_uffd_ops *vma_get_uffd_ops(struct vm_area_struct *v= ma) { - vm_flags &=3D __VM_UFFD_FLAGS; - - if (vma->vm_flags & VM_DROPPABLE) - return false; - - if ((vm_flags & VM_UFFD_MINOR) && - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) - return false; - - /* - * If wp async enabled, and WP is the only mode enabled, allow any - * memory type. - */ - if (wp_async && (vm_flags =3D=3D VM_UFFD_WP)) - return true; - -#ifndef CONFIG_PTE_MARKER_UFFD_WP - /* - * If user requested uffd-wp but not enabled pte markers for - * uffd-wp, then shmem & hugetlbfs are not supported but only - * anonymous. - */ - if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) - return false; -#endif - - /* By default, allow any of anon|shmem|hugetlb */ - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); + if (vma->vm_ops && vma->vm_ops->userfaultfd_ops) + return vma->vm_ops->userfaultfd_ops; + return NULL; } =20 +bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags, bool wp_async); + static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) { struct userfaultfd_ctx *uffd_ctx =3D vma->vm_userfaultfd_ctx.ctx; diff --git a/mm/shmem.c b/mm/shmem.c index bd0a29000318..4d71fc7be358 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3158,7 +3158,7 @@ static int shmem_uffd_get_folio(struct inode *inode, = pgoff_t pgoff, return shmem_get_folio(inode, pgoff, 0, folio, SGP_NOALLOC); } =20 -int shmem_mfill_atomic_pte(pmd_t *dst_pmd, +static int shmem_mfill_atomic_pte(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 879505c6996f..61783ff2d335 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -14,12 +14,48 @@ #include #include #include -#include #include #include #include "internal.h" #include "swap.h" =20 +bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags, bool wp_async) +{ + unsigned long supported; + + if (vma->vm_flags & VM_DROPPABLE) + return false; + + vm_flags &=3D __VM_UFFD_FLAGS; + +#ifndef CONFIG_PTE_MARKER_UFFD_WP + /* + * If user requested uffd-wp but not enabled pte markers for + * uffd-wp, then any file system (like shmem or hugetlbfs) are not + * supported but only anonymous. + */ + if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) + return false; +#endif + /* + * If wp async enabled, and WP is the only mode enabled, allow any + * memory type. + */ + if (wp_async && (vm_flags =3D=3D VM_UFFD_WP)) + return true; + + if (vma_is_anonymous(vma)) + /* Anonymous has no page cache, MINOR not supported */ + supported =3D VM_UFFD_MISSING | VM_UFFD_WP; + else if (vma_get_uffd_ops(vma)) + supported =3D vma_get_uffd_ops(vma)->uffd_features; + else + return false; + + return !(vm_flags & (~supported)); +} + static __always_inline bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_en= d) { @@ -384,11 +420,15 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, { struct inode *inode =3D file_inode(dst_vma->vm_file); pgoff_t pgoff =3D linear_page_index(dst_vma, dst_addr); + const vm_uffd_ops *uffd_ops =3D vma_get_uffd_ops(dst_vma); struct folio *folio; struct page *page; int ret; =20 - ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_get_folio)) + return -EINVAL; + + ret =3D uffd_ops->uffd_get_folio(inode, pgoff, &folio); /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret =3D=3D -ENOENT) ret =3D -EFAULT; @@ -504,18 +544,6 @@ static __always_inline ssize_t mfill_atomic_hugetlb( u32 hash; struct address_space *mapping; =20 - /* - * There is no default zero huge page for all huge page sizes as - * supported by hugetlb. A PMD_SIZE huge pages may exist as used - * by THP. Since we can not reliably insert a zero page, this - * feature is not supported. - */ - if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) { - up_read(&ctx->map_changing_lock); - uffd_mfill_unlock(dst_vma); - return -EINVAL; - } - src_addr =3D src_start; dst_addr =3D dst_start; copied =3D 0; @@ -686,14 +714,55 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t= *dst_pmd, err =3D mfill_atomic_pte_zeropage(dst_pmd, dst_vma, dst_addr); } else { - err =3D shmem_mfill_atomic_pte(dst_pmd, dst_vma, - dst_addr, src_addr, - flags, foliop); + const vm_uffd_ops *uffd_ops =3D vma_get_uffd_ops(dst_vma); + + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_copy)) { + err =3D -EINVAL; + } else { + err =3D uffd_ops->uffd_copy(dst_pmd, dst_vma, + dst_addr, src_addr, + flags, foliop); + } } =20 return err; } =20 +static inline bool +vma_uffd_ops_supported(struct vm_area_struct *vma, uffd_flags_t flags) +{ + enum mfill_atomic_mode mode =3D uffd_flags_get_mode(flags); + const vm_uffd_ops *uffd_ops; + unsigned long uffd_ioctls; + + if ((flags & MFILL_ATOMIC_WP) && !(vma->vm_flags & VM_UFFD_WP)) + return false; + + /* Anonymous supports everything except CONTINUE */ + if (vma_is_anonymous(vma)) + return mode !=3D MFILL_ATOMIC_CONTINUE; + + uffd_ops =3D vma_get_uffd_ops(vma); + if (!uffd_ops) + return false; + + uffd_ioctls =3D uffd_ops->uffd_ioctls; + switch (mode) { + case MFILL_ATOMIC_COPY: + return uffd_ioctls & BIT(_UFFDIO_COPY); + case MFILL_ATOMIC_ZEROPAGE: + return uffd_ioctls & BIT(_UFFDIO_ZEROPAGE); + case MFILL_ATOMIC_CONTINUE: + if (!(vma->vm_flags & VM_SHARED)) + return false; + return uffd_ioctls & BIT(_UFFDIO_CONTINUE); + case MFILL_ATOMIC_POISON: + return uffd_ioctls & BIT(_UFFDIO_POISON); + default: + return false; + } +} + static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, unsigned long dst_start, unsigned long src_start, @@ -752,11 +821,7 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, dst_vma->vm_flags & VM_SHARED)) goto out_unlock; =20 - /* - * validate 'mode' now that we know the dst_vma: don't allow - * a wrprotect copy if the userfaultfd didn't register as WP. - */ - if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) + if (!vma_uffd_ops_supported(dst_vma, flags)) goto out_unlock; =20 /* @@ -766,12 +831,6 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, src_start, len, flags); =20 - if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) - goto out_unlock; - if (!vma_is_shmem(dst_vma) && - uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) - goto out_unlock; - while (src_addr < src_start + len) { pmd_t dst_pmdval; =20 --=20 2.49.0