From nobody Wed Oct 8 12:53:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6875D155757 for ; Fri, 27 Jun 2025 15:47:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039227; cv=none; b=WtbhbNOjbxXGS8BCfo9oiwndwTUuq4l5U6ANe2UMG30QMOidKimqVuZ6EKJj8EjQ4k4Cpn7AXbIH1qHRaELe1JiVREPGa1f1FW2Svd72uf2w+RWyAZHy1tsr1HR2hEqUkZQFvT5gl0k4JkmvG5vCj3aPYccPR3V9U7hyRRTHwnQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039227; c=relaxed/simple; bh=5y2NXz4YcakKthlfKuz6nEMFhFMBDqUAhRLuI0MP5Rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BIDHWkemgMWsQLZyYnQtWjzDtfiPGHFqIO8B08B5jXq8gLD/Zjo1R3SRkfXRB/ssy9/cuumoZ2LN46efKk2iX21W/kqj+yuQKU5d8nXICm+SG7/m+ah8V1yrQFOP+EvvzgdhOsToqMsLszPzn08PfG8GTuYROO2mtTpVZFF0OyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VqpRFRjq; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VqpRFRjq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751039224; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DdfSKdvyHqltB4jBxjhCDVFPXPlOAjjmfzthyT7ql+g=; b=VqpRFRjq/sDXYemcSdXlSDf/KgBwvlrQIFW7msj43bPOxrabwmb0xsnJdpwUnHuHlxMHUP 4Bd03R/wgXqjV1a0tZfxAlTVhZjX3W2nPDFt3sXXaBhtc928NGNoRiSCwez5oip+5nByCF jPIi/rSD95qrusp5azSipR/6k2/gqeI= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-70-Xqz-G3FYPG2u_w_9TCqa8A-1; Fri, 27 Jun 2025 11:47:02 -0400 X-MC-Unique: Xqz-G3FYPG2u_w_9TCqa8A-1 X-Mimecast-MFC-AGG-ID: Xqz-G3FYPG2u_w_9TCqa8A_1751039222 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6fd3aeb2825so78104586d6.3 for ; Fri, 27 Jun 2025 08:47:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751039222; x=1751644022; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DdfSKdvyHqltB4jBxjhCDVFPXPlOAjjmfzthyT7ql+g=; b=tbz9+j7YuhLRBQOkXAxBEmCQmmF49QdsRamfQrAHreRRKCOb70lT0XDQXbTY6h4txi rfcLBM9gyBE/W4b9wkmFHHkuld4Vxsa1Z/7zNgVeZKe5UAPszHmhem959yV6jDo15nUw HGG8G7lh3iTcnidV0z1cxKGi2FfV802GcDq1QDxWA0wHSLetLBGh6DdyzjEcUH4rmge7 uHlioJH5jM9YrrRDPS9GloE/TJ2h8EiogDHzAD1PolsJB5YuHJERKUsFDdlfSCtyvoBV 0+riqTVARwFO2ToVQKZqaOUDc+V+thvKBXUl6pZXS6Wih2Ul1glfPtOVozomY2Hjf99s 3NHA== X-Forwarded-Encrypted: i=1; AJvYcCV4C6bzN3HQh4AduqBkzCOo0wmA5Av3uQMrOyoAfpG4GY53ziWV5xxSVYJ7sddsyLfODrUl1YPsfqNZxKY=@vger.kernel.org X-Gm-Message-State: AOJu0Yx4O4xnaqqyIMRcgP+7ohLxbRd/CF4v95VS4ZUCb6XboCrTsuOW ogDeh5a9fgD6gAgeLe9VwsWnTiT2fekGXv0FjG8PRfcaU3F+E4HXbJk0yu+Y80oHWL+42osZisR vktv15MRWRP8ro2cjoYmDm4bk7eVlA8eTX4R4q/GI+6QAx4I2S/7zquZ1cZI2v8UUKg== X-Gm-Gg: ASbGncvaU1/pmaWH3g6KuwXP1w9mIG9wDcN7XjaSFuLtcVrtSOwMZS/hd43WMfa3jvo k/SA1KEpwIGoq7hH0ot+fXLW51uFtLTxJoFvIXLXV1vqhNRNvlcdH9sZXz5cWlROPSXZLKuuiwL BrUBxM0bHURiMuWByqhaZU+0GYWp3ebuKqfRzosjhcRoxdfYidpqmpI0v/HDqAnZl+cjOIDy73I pPJwuNPkt3VJqGa5PL57z6FYMwF6SIXh+6oaaxikJX31FsbOmpP04J9EMaoxTHn1roh6igUo3i5 CpYt4lA2WcA= X-Received: by 2002:a05:6214:4287:b0:6fa:b03f:8a39 with SMTP id 6a1803df08f44-70002918557mr69776216d6.35.1751039222191; Fri, 27 Jun 2025 08:47:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGqPx1Yiq6pj0PPzBl/SyDQ/yJvfWNr26rAFkehOuVMiuG+zZIZjshG/+mWcNwt5gDb3iVcZA== X-Received: by 2002:a05:6214:4287:b0:6fa:b03f:8a39 with SMTP id 6a1803df08f44-70002918557mr69775896d6.35.1751039221825; Fri, 27 Jun 2025 08:47:01 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd772e4fddsm22296066d6.65.2025.06.27.08.46.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jun 2025 08:47:01 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Mike Rapoport , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , peterx@redhat.com, "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API Date: Fri, 27 Jun 2025 11:46:52 -0400 Message-ID: <20250627154655.2085903-2-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250627154655.2085903-1-peterx@redhat.com> References: <20250627154655.2085903-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a generic userfaultfd API for vm_operations_struct, so that one vma, especially when as a module, can support userfaults without modifying the core files. More importantly, when the module can be compiled out of the kernel. So, instead of having core mm referencing modules that may not ever exist, we need to have modules opt-in on core mm hooks instead. After this API applied, if a module wants to support userfaultfd, the module should only need to touch its own file and properly define vm_uffd_ops, instead of changing anything in core mm. Note that such API will not work for anonymous. Core mm will process anonymous memory separately for userfault operations like before. This patch only introduces the API alone so that we can start to move existing users over but without breaking them. Currently the uffd_copy() API is almost designed to be the simplistic with minimum mm changes to move over to the API. Signed-off-by: Peter Xu --- include/linux/mm.h | 9 ++++++ include/linux/userfaultfd_k.h | 52 +++++++++++++++++++++++++++++++++++ 2 files changed, 61 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index ef40f68c1183..6a5447bd43fd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -576,6 +576,8 @@ struct vm_fault { */ }; =20 +struct vm_uffd_ops; + /* * These are the virtual MM functions - opening of an area, closing and * unmapping it (needed to keep files on disk up-to-date etc), pointer @@ -653,6 +655,13 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); +#ifdef CONFIG_USERFAULTFD + /* + * Userfaultfd related ops. Modules need to define this to support + * userfaultfd. + */ + const struct vm_uffd_ops *userfaultfd_ops; +#endif }; =20 #ifdef CONFIG_NUMA_BALANCING diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index df85330bcfa6..c9a093c4502b 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -92,6 +92,58 @@ enum mfill_atomic_mode { NR_MFILL_ATOMIC_MODES, }; =20 +/* VMA userfaultfd operations */ +struct vm_uffd_ops { + /** + * @uffd_features: features supported in bitmask. + * + * When the ops is defined, the driver must set non-zero features + * to be a subset (or all) of: VM_UFFD_MISSING|WP|MINOR. + */ + unsigned long uffd_features; + /** + * @uffd_ioctls: ioctls supported in bitmask. + * + * Userfaultfd ioctls supported by the module. Below will always + * be supported by default whenever a module provides vm_uffd_ops: + * + * _UFFDIO_API, _UFFDIO_REGISTER, _UFFDIO_UNREGISTER, _UFFDIO_WAKE + * + * The module needs to provide all the rest optionally supported + * ioctls. For example, when VM_UFFD_MISSING was supported, + * _UFFDIO_COPY must be supported as ioctl, while _UFFDIO_ZEROPAGE + * is optional. + */ + unsigned long uffd_ioctls; + /** + * uffd_get_folio: Handler to resolve UFFDIO_CONTINUE request. + * + * @inode: the inode for folio lookup + * @pgoff: the pgoff of the folio + * @folio: returned folio pointer + * + * Return: zero if succeeded, negative for errors. + */ + int (*uffd_get_folio)(struct inode *inode, pgoff_t pgoff, + struct folio **folio); + /** + * uffd_copy: Handler to resolve UFFDIO_COPY|ZEROPAGE request. + * + * @dst_pmd: target pmd to resolve page fault + * @dst_vma: target vma + * @dst_addr: target virtual address + * @src_addr: source address to copy from + * @flags: userfaultfd request flags + * @foliop: previously allocated folio + * + * Return: zero if succeeded, negative for errors. + */ + int (*uffd_copy)(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, + unsigned long dst_addr, unsigned long src_addr, + uffd_flags_t flags, struct folio **foliop); +}; +typedef struct vm_uffd_ops vm_uffd_ops; + #define MFILL_ATOMIC_MODE_BITS (const_ilog2(NR_MFILL_ATOMIC_MODES - 1) + 1) #define MFILL_ATOMIC_BIT(nr) BIT(MFILL_ATOMIC_MODE_BITS + (nr)) #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) --=20 2.49.0 From nobody Wed Oct 8 12:53:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B08D2E9EA7 for ; Fri, 27 Jun 2025 15:47:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039229; cv=none; b=QbCj4SzmeGicKXRk/Pt/SFq6U5xl51A0pGgcQYg+dZCOAURuZHtQhRxN+ix/Qb2vEv37PmAFaGwAE+zMaZmq4HjnhP9DgIk1voJyuGWgzJkLxsUP36hEO8TnjdcDy1JSAvzV4/Zqi5EWEwEN9DUn7xYPglHAz2E6iaaQjAH72UM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039229; c=relaxed/simple; bh=znDgLA4PCa0dI46kISScS4QqE/4p3uzAS+5TqbQUTHI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=obUa5Z7N+CKGF6w0bxvxuoMT/uIyNP4Mov0cylv3r0paOC5YZecr3jafBqs5CHfjX4Cc5b4fvQfKgDJxJsUOpjlxHbevcwoQFM8DiI2vcevz+x2EQ3tbc6oqGyVWGtZbKF8M8eITMmUEyW+MYNvm5XsQvueApeY9xunWC7iSslA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fqF5i0IE; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fqF5i0IE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751039226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4s5p6yQ8BzE2xSNqwYo7qQuoEVz15qoQNKU3WjAr+jY=; b=fqF5i0IExfh7FFoyv79AztQf/2f6C+QhxSEOcKAlsCJuxmEsmyoGyLtSBkIXMpYOB8vYhJ yDwdR/urp4zpMETZHUcEZg4VMucg3TiYu++rB3tq5aNAZA+ggnCkCETEMRj5Y+9vNQAn8r wQoFaITs8SdMrlrIHHMNGrL+UcjiyKg= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-561-NXpfctAZNe257W2ABn_X1w-1; Fri, 27 Jun 2025 11:47:04 -0400 X-MC-Unique: NXpfctAZNe257W2ABn_X1w-1 X-Mimecast-MFC-AGG-ID: NXpfctAZNe257W2ABn_X1w_1751039224 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6fad2a25b65so38077496d6.3 for ; Fri, 27 Jun 2025 08:47:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751039224; x=1751644024; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4s5p6yQ8BzE2xSNqwYo7qQuoEVz15qoQNKU3WjAr+jY=; b=xD/zTtMwQSEGc06I1gd5bvUVrrdCG3A6eSuhVL1QwO5r9ZcZA0TxTiBlyb0Eh0+uoi 0VA5ywpcAjcoPhQpBbLAmmwFm+2DxYTaO+qwxkLQjLzIi/6lYggQOc3HFMdPBir1qE/t +dCiHrMdDYlFfUwQjxxgFJJbfeEKe/TCaAx+0VRqJDD4L/aOW4HtUDcRKY0MaQNWGRA6 ss2a9X8zVzIoZgSgiWY4w1bYSFcVd4VrlQh634n7DdYUNRu59kiZfjORkrhOLCPuMdDf CzucqwCWMC+uyynKitRsxxPEgRDYAc/VQQyn+gfzCPCHlTG2C/dPstW+NGfl2Gt4NJ9Z vNAQ== X-Forwarded-Encrypted: i=1; AJvYcCWLZ3vbkRIhDludeIxV2Hi+m1Q/nT3+zvrJu85AezxycVdEIcljxXBPDJni6JAJp3qIreNOFm43bcVCSaQ=@vger.kernel.org X-Gm-Message-State: AOJu0Ywxcbovo2MNmMBm+9OPXK9VC1focg45+XLL5QpjGtDbDCujmSpm 9e+0niCp1uhvXjRpGsz6YwfWKILoV84Knsap3T3phFn7irA2RgOBg3mPU85jB+8NgeUWI7s4Zo2 +cQkaoiUeBRNw7xGOElk08e/6GdZtqPcdrI5sr2biX0TzMmwLYklXYHSfozhG0YqC0Q== X-Gm-Gg: ASbGncusrK+NQRm6fE9nRh+GRdLi2T4xT7HICc7UG5I5gcef7GWs/4y7cnAgzbo0Tpo gc8q7E+1GS3Fa3SUxJ7ovgTwKxrUxBS+BumCxoUm9irlNsEKOCqvNCbkT2J00QIUQRp8evtmFgX ojbTtMXdnmrW7NVV1Zfw6yK6xlphYKh6lRwTMtcIjmpmzTsiTn2zhIunwM/EDy7T9KKKAnno6Nx +8A5hfYjesUmfwiq4iiH13GOveHq6hWK61Vqdk+YiWuImPxvnI+ioRZfSliPt/ELZnB9hh1x/gX Fo20FQ+iunc= X-Received: by 2002:a05:6214:248d:b0:6cb:ee08:c1e8 with SMTP id 6a1803df08f44-7001378bc2emr77815216d6.23.1751039224358; Fri, 27 Jun 2025 08:47:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHrpemk2muCVBZAJdk1MH+sryhPpY7XYDf5g/Kyt5Bk9fidyZgNPYd2fX9DgN0JanIjEVqfIA== X-Received: by 2002:a05:6214:248d:b0:6cb:ee08:c1e8 with SMTP id 6a1803df08f44-7001378bc2emr77814886d6.23.1751039223990; Fri, 27 Jun 2025 08:47:03 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd772e4fddsm22296066d6.65.2025.06.27.08.47.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jun 2025 08:47:02 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Mike Rapoport , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , peterx@redhat.com, "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: [PATCH v2 2/4] mm/shmem: Support vm_uffd_ops API Date: Fri, 27 Jun 2025 11:46:53 -0400 Message-ID: <20250627154655.2085903-3-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250627154655.2085903-1-peterx@redhat.com> References: <20250627154655.2085903-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for the new vm_uffd_ops API for shmem. Note that this only introduces the support, the API is not yet used by core mm. Due to the tailored uffd_copy() API, shmem is extremely easy to support it by reusing the existing mfill function. It only needs a separate uffd_get_folio() definition but that's oneliner. Cc: Hugh Dickins Signed-off-by: Peter Xu Acked-by: Mike Rapoport (Microsoft) --- mm/shmem.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 2b19965d27df..9a8b8dd4709b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3151,6 +3151,13 @@ static inline struct inode *shmem_get_inode(struct m= nt_idmap *idmap, #endif /* CONFIG_TMPFS_QUOTA */ =20 #ifdef CONFIG_USERFAULTFD + +static int shmem_uffd_get_folio(struct inode *inode, pgoff_t pgoff, + struct folio **folio) +{ + return shmem_get_folio(inode, pgoff, 0, folio, SGP_NOALLOC); +} + int shmem_mfill_atomic_pte(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, @@ -5194,6 +5201,19 @@ static int shmem_error_remove_folio(struct address_s= pace *mapping, return 0; } =20 +#ifdef CONFIG_USERFAULTFD +static const vm_uffd_ops shmem_uffd_ops =3D { + .uffd_features =3D __VM_UFFD_FLAGS, + .uffd_ioctls =3D BIT(_UFFDIO_COPY) | + BIT(_UFFDIO_ZEROPAGE) | + BIT(_UFFDIO_WRITEPROTECT) | + BIT(_UFFDIO_CONTINUE) | + BIT(_UFFDIO_POISON), + .uffd_get_folio =3D shmem_uffd_get_folio, + .uffd_copy =3D shmem_mfill_atomic_pte, +}; +#endif + static const struct address_space_operations shmem_aops =3D { .dirty_folio =3D noop_dirty_folio, #ifdef CONFIG_TMPFS @@ -5296,6 +5316,9 @@ static const struct vm_operations_struct shmem_vm_ops= =3D { .set_policy =3D shmem_set_policy, .get_policy =3D shmem_get_policy, #endif +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops =3D &shmem_uffd_ops, +#endif }; =20 static const struct vm_operations_struct shmem_anon_vm_ops =3D { @@ -5305,6 +5328,9 @@ static const struct vm_operations_struct shmem_anon_v= m_ops =3D { .set_policy =3D shmem_set_policy, .get_policy =3D shmem_get_policy, #endif +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops =3D &shmem_uffd_ops, +#endif }; =20 int shmem_init_fs_context(struct fs_context *fc) --=20 2.49.0 From nobody Wed Oct 8 12:53:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 184D02EA163 for ; Fri, 27 Jun 2025 15:47:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039232; cv=none; b=p5D2N1UmmpfalULFrwU4zk/9kkvVmX0gVNXDkpUx77ePJOZsmnUxy+23/J7pNSdvQ1h04JPnB1G0zWCJHe2Kq+lB6tjHRMVIINnWzX8ypIIbtlPzIg8v/SdnFOYef7/Qoap6ckECZo6w731pt2+kV7VeFNs/ckuJHC79H71YMVM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039232; c=relaxed/simple; bh=9ZtPgZuS9uFslrXUQ1rcSbFCNR4aA6sb8TcVi4Z+HRw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=igreh4fW2RCqMyUeAbLIk48TeqUfvXdQpCteynDoFbZy0skSe+G4kVpqpnlRTfrRxqanY83maDD0yenXOWyAdIG/xswSwgvVet3HuakqSZD591gZ0YzQCE6qH2QX7NizAIQ/o3gh+bwhGKMfq1dHEytUFcZ4b5zMWs4qzxa4G/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Vk5KrW9I; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Vk5KrW9I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751039230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2pTwEtrgahpBZLxew2/VEiH1EfjnhNlR4+CF1BwOTeU=; b=Vk5KrW9IEszFLTiZxAXai7oAwbnElfCrTXFJ33Jhz7MZosMQuobDurVZYQOP98AEjHRw8o XCbxM/+LDozMrawh73+IwpJvB7Ea/diXmt4pnMuKlm0DUjvCWwSZGZT/k49HmHLbVy2wtC Z81dq7H5fjuycmxw68GjDZAqkPZR6qg= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-651-cSy7lidfP0uF3E5TnZgndw-1; Fri, 27 Jun 2025 11:47:07 -0400 X-MC-Unique: cSy7lidfP0uF3E5TnZgndw-1 X-Mimecast-MFC-AGG-ID: cSy7lidfP0uF3E5TnZgndw_1751039227 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6fb1be925fbso35502866d6.1 for ; Fri, 27 Jun 2025 08:47:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751039227; x=1751644027; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2pTwEtrgahpBZLxew2/VEiH1EfjnhNlR4+CF1BwOTeU=; b=d0B+uaWExSAyaSioB8iMfkjSsk550GIbeJbF5UVDLNr2kWoC6uKv4QFXHh1UOZPGtH vzIPEtjrbQ8p8EsAC8QScEgn+j1tz8+J2cM4nVNruI55qzmPpmkpUdtSb9UM5eJpPFEK rbQBBgPDGLvWhsQK6mK7xtz3mdsBMUq5YS3+rdaT+q8PVsoeZRzN8BfG67d8nlXxxmOV xTnUd3DFPlGukEckrZinqDoqBULKhbQ8jCYOi6Dr8OizVAycGqsBOq7Z6pnIPNUA7UIQ DZlErQUtacvOhuBHojmYf46dJeLR0mpWF+6iZBKQlVJSo6IGdnXf/jjNCquebVTLGood BOdg== X-Forwarded-Encrypted: i=1; AJvYcCW9ZMv4mWHThbFZajlfUApp1EZofg7+fkxEklgzG0JbV39FY7PFUrOrQjJUFm8amxvJIPkbyvas04YFbQI=@vger.kernel.org X-Gm-Message-State: AOJu0Yxd25UVDlTCaVlEZiUDw9P2WVWrvjumiScB4Q7Nyj4/0pRzwDjT cVuPIouxEgOUPKV7bB1p14ZQmiRQQ9QvFwBIaKXqfqyBPvP0zP9lPrWmDyXyDCW7DZORKrIUCac yzkD85GtCG8SMQguNB3vMbNdQJkiuLXEUkRvA+dibWrhuSJ3piCbvE3Qs+nsgz2QJjg== X-Gm-Gg: ASbGnctQ/TO7G6gQxPNbzdde4BEShUwp8bFJGN50U3/8tHv4Pgvl3PciMRpk2Sz7mgS M/Py3RMjrJDg/yW3xApZ26Up5Jf4avgs/vLq8Xmb3Homd7hCAFrgI0uOk2u1o3LwWeB1ZKbgzC8 VJMa1yV3jWrX0qdcX+uxiWPbo27TPd+FJUukC/GNddFR6IqT0V9BRGYvRUyk9H2mXmi6WB0fEbZ gU2yoCgm+U5WdORVqDgDJ/NtbjJzALOdz8IiW+uIyPYD2VlVo+9S5A4b2DGkh69ZYDvIOGIyYCT axYr0cL1UHE= X-Received: by 2002:ad4:5ae3:0:b0:6f8:b73e:8ea5 with SMTP id 6a1803df08f44-7000233c759mr59425796d6.26.1751039226956; Fri, 27 Jun 2025 08:47:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE1K53WVc8+zg900RTkXeBD9NJqIkartaxwzghCXbibLJ+g6fNpxQUnaJNDNiONSfRZ42vUYQ== X-Received: by 2002:ad4:5ae3:0:b0:6f8:b73e:8ea5 with SMTP id 6a1803df08f44-7000233c759mr59425426d6.26.1751039226435; Fri, 27 Jun 2025 08:47:06 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd772e4fddsm22296066d6.65.2025.06.27.08.47.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jun 2025 08:47:05 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Mike Rapoport , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , peterx@redhat.com, "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: [PATCH v2 3/4] mm/hugetlb: Support vm_uffd_ops API Date: Fri, 27 Jun 2025 11:46:54 -0400 Message-ID: <20250627154655.2085903-4-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250627154655.2085903-1-peterx@redhat.com> References: <20250627154655.2085903-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for the new vm_uffd_ops API for hugetlb. Note that this only introduces the support, the API is not yet used by core mm. Due to legacy reasons, it's still not trivial to move hugetlb completely to the API (like shmem). But it will still use uffd_features and uffd_ioctls properly on the API because that's pretty general. Cc: Muchun Song Cc: Oscar Salvador Signed-off-by: Peter Xu Acked-by: Mike Rapoport (Microsoft) --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 11d5668ff6e7..ccd2be152d36 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5457,6 +5457,22 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_faul= t *vmf) return 0; } =20 +#ifdef CONFIG_USERFAULTFD +static const vm_uffd_ops hugetlb_uffd_ops =3D { + .uffd_features =3D __VM_UFFD_FLAGS, + /* _UFFDIO_ZEROPAGE not supported */ + .uffd_ioctls =3D BIT(_UFFDIO_COPY) | + BIT(_UFFDIO_WRITEPROTECT) | + BIT(_UFFDIO_CONTINUE) | + BIT(_UFFDIO_POISON), + /* + * Hugetlbfs still has its own hard-coded handler in userfaultfd, + * due to limitations similar to vm_operations_struct.fault(). + * TODO: generalize it to use the API functions. + */ +}; +#endif + /* * When a new function is introduced to vm_operations_struct and added * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops. @@ -5470,6 +5486,9 @@ const struct vm_operations_struct hugetlb_vm_ops =3D { .close =3D hugetlb_vm_op_close, .may_split =3D hugetlb_vm_op_split, .pagesize =3D hugetlb_vm_op_pagesize, +#ifdef CONFIG_USERFAULTFD + .userfaultfd_ops =3D &hugetlb_uffd_ops, +#endif }; =20 static pte_t make_huge_pte(struct vm_area_struct *vma, struct folio *folio, --=20 2.49.0 From nobody Wed Oct 8 12:53:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC7692EA17A for ; Fri, 27 Jun 2025 15:47:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039233; cv=none; b=U9UqtdP7LRVun+Fhch9SRChlpSRnkBLaE4fGxqeGXZP9zkrf9zK1RMMXLMil7J7ml0vCY7TokAMWn83xZzgaGq7YC1ZkjHxDXExTRe/796aKS2iepMprSb6KM4HmsNlew/dynBZ768+E3FSkDSsvvYX4mNo6hxOFlRgu80ZXkrw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751039233; c=relaxed/simple; bh=z8GNQk3+sk95gdkiVFgy62ht5PcY+SiZcplsrJNNKNc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EtmNLigBhOvvHOwFHaKi9TmoeYCPv36Nnd1QEns2JYRDnUKmQTeA0YSGjYJUN+fILofJ7m/yXQ6dZnwNS2L3G7UJXt5r8bKAV8fWZxmsXrTJr8I/07tqr+hJId9ImTNaaojPBBcGB3OvENSVv1AnqZuPmFLQFik6l9HSdi15390= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BIIiExSc; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BIIiExSc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751039230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dLsEBLaMb/b3Vgp2d6YxUIW6WEnZzDk0PEHjsGDj8XY=; b=BIIiExScSgxMcRFvDKPlKo1tnesEORVjkd4mbnZSyDvInB12E1R8xz38vOr4a+LJQfb4xK Zqf1vufHcEb1OiebljAnuCQ2eX2t+xlN+dPk10nACeSZUKCoWpQBRiEImNnD4r2a6L+at0 QnVa/mGY83JU2R/pv7qYAHW6isfUyQY= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-jj0mYc3zNIa7gsWxyV2w4w-1; Fri, 27 Jun 2025 11:47:09 -0400 X-MC-Unique: jj0mYc3zNIa7gsWxyV2w4w-1 X-Mimecast-MFC-AGG-ID: jj0mYc3zNIa7gsWxyV2w4w_1751039229 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6fb2910dd04so41369966d6.0 for ; Fri, 27 Jun 2025 08:47:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751039229; x=1751644029; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dLsEBLaMb/b3Vgp2d6YxUIW6WEnZzDk0PEHjsGDj8XY=; b=XY9C8h2WxJyZMcRW9ANqbHCpywGFBgddwqCmJcmKZYwHED7PGq/owWCt3PgWevoxSS +RKtzsqQluShIcleiCViDmSaArrQPAk7NMNfK3zw23joST54q0z2NAylEWtssu43S1nD YklV2LLciXuh4HZnSZesgj5h+GkYZ5MJsx7H81GxD/35IGNf37XNp5aq7K7gYJMj4/Ce 5GFifWJpgLYra+2EUenPxEWXIoXGAy8/4g3FVB1KDkKtqrzVWyRGlqAb6/H3dkedJtv7 2QfLGoPsFkppknk0cQ+BEKKUqow0Gz+voYCwW9wSFujtltmwUPhXDoG1TXjCs5Jm4h9u ZmIg== X-Forwarded-Encrypted: i=1; AJvYcCUTfidgAv2vSyWghQF3ySlpJhYjA0YeGcQWr48MlwmTZg7mqtc48UafdBa0Ubnsf0s0J6c5XqvWEGj0w54=@vger.kernel.org X-Gm-Message-State: AOJu0YwgNk7ePiWWjGz3Z4fsQRCv3ug9OS0TuSu7x+pvaaD9OnzCynXN bzBBcWofx4czP47EsENwACekH0CmDjHkcaCmcg5Q4O5ZAdVVy1cwcozv6ezzWnsjy5ejybKEUWg D1474TLjqwPhm4OUGQTCQPlypdQw4BaYNE6DwTgiUuTkcQShtlj+oanYK2aMkd6czPQ== X-Gm-Gg: ASbGnctVo+Ay5dVcxabrwytdP/p9agnS7sZ08XjUL4aqWZVsDKVCgqH5RgrXZTHcYiy YMr4GARUgDIF8FOdE5CChKBGdw4PUwXoRlIVV45C30hNE+Lpp40oZJAsNVqQpnCe8WABRdzap9B uJQK3PoQYUlZw8dCZPnG5YGdbIfFfulNNefo4ybCJ2wVN4rQxSMufUIQ6G6rb+CHLnB++HukOpj +vMdminD1PazGgFbFgPxv9x45IXS2sZzOrpBqqRiJ1WNEornhZpKDG022j6HwT8xT/uLMVMUZsu fOhIltvmvXU= X-Received: by 2002:a05:6214:4587:b0:6d8:99cf:d2db with SMTP id 6a1803df08f44-70003c8d7fdmr69296826d6.38.1751039228822; Fri, 27 Jun 2025 08:47:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkTQ3X+dYDWCeaFKDWyYkyNZOuxb4Hf3+//g3kzB/Ownzy/PsHVMHjJ/Tpv6Rh1L1/WGJ+JQ== X-Received: by 2002:a05:6214:4587:b0:6d8:99cf:d2db with SMTP id 6a1803df08f44-70003c8d7fdmr69296386d6.38.1751039228368; Fri, 27 Jun 2025 08:47:08 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd772e4fddsm22296066d6.65.2025.06.27.08.47.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jun 2025 08:47:07 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Mike Rapoport , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , peterx@redhat.com, "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: [PATCH v2 4/4] mm: Apply vm_uffd_ops API to core mm Date: Fri, 27 Jun 2025 11:46:55 -0400 Message-ID: <20250627154655.2085903-5-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250627154655.2085903-1-peterx@redhat.com> References: <20250627154655.2085903-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch completely moves the old userfaultfd core to use the new vm_uffd_ops API. After this change, existing file systems will start to use the new API for userfault operations. When at it, moving vma_can_userfault() into mm/userfaultfd.c instead, because it's getting too big. It's only used in slow paths so it shouldn't be an issue. Move the pte marker check before wp_async, which might be more intuitive because wp_async depends on pte markers. That shouldn't cause any functional change though because only one check would take effect depending on whether pte marker was selected in config. This will also remove quite some hard-coded checks for either shmem or hugetlbfs. Now all the old checks should still work but with vm_uffd_ops. Note that anonymous memory will still need to be processed separately because it doesn't have vm_ops at all. Reviewed-by: James Houghton Signed-off-by: Peter Xu Acked-by: Mike Rapoport (Microsoft) --- include/linux/shmem_fs.h | 14 ----- include/linux/userfaultfd_k.h | 46 ++++---------- mm/shmem.c | 2 +- mm/userfaultfd.c | 115 +++++++++++++++++++++++++--------- 4 files changed, 101 insertions(+), 76 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 6d0f9c599ff7..2f5b7b295cf6 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -195,20 +195,6 @@ static inline pgoff_t shmem_fallocend(struct inode *in= ode, pgoff_t eof) extern bool shmem_charge(struct inode *inode, long pages); extern void shmem_uncharge(struct inode *inode, long pages); =20 -#ifdef CONFIG_USERFAULTFD -#ifdef CONFIG_SHMEM -extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd, - struct vm_area_struct *dst_vma, - unsigned long dst_addr, - unsigned long src_addr, - uffd_flags_t flags, - struct folio **foliop); -#else /* !CONFIG_SHMEM */ -#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \ - src_addr, flags, foliop) ({ BUG(); 0; }) -#endif /* CONFIG_SHMEM */ -#endif /* CONFIG_USERFAULTFD */ - /* * Used space is stored as unsigned 64-bit value in bytes but * quota core supports only signed 64-bit values so use that diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c9a093c4502b..1aa9b246fb84 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -149,9 +149,14 @@ typedef struct vm_uffd_ops vm_uffd_ops; #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) #define MFILL_ATOMIC_MODE_MASK ((__force uffd_flags_t) (MFILL_ATOMIC_BIT(0= ) - 1)) =20 +static inline enum mfill_atomic_mode uffd_flags_get_mode(uffd_flags_t flag= s) +{ + return (__force enum mfill_atomic_mode)(flags & MFILL_ATOMIC_MODE_MASK); +} + static inline bool uffd_flags_mode_is(uffd_flags_t flags, enum mfill_atomi= c_mode expected) { - return (flags & MFILL_ATOMIC_MODE_MASK) =3D=3D ((__force uffd_flags_t) ex= pected); + return uffd_flags_get_mode(flags) =3D=3D expected; } =20 static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mf= ill_atomic_mode mode) @@ -260,41 +265,16 @@ static inline bool userfaultfd_armed(struct vm_area_s= truct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } =20 -static inline bool vma_can_userfault(struct vm_area_struct *vma, - vm_flags_t vm_flags, - bool wp_async) +static inline const vm_uffd_ops *vma_get_uffd_ops(struct vm_area_struct *v= ma) { - vm_flags &=3D __VM_UFFD_FLAGS; - - if (vma->vm_flags & VM_DROPPABLE) - return false; - - if ((vm_flags & VM_UFFD_MINOR) && - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) - return false; - - /* - * If wp async enabled, and WP is the only mode enabled, allow any - * memory type. - */ - if (wp_async && (vm_flags =3D=3D VM_UFFD_WP)) - return true; - -#ifndef CONFIG_PTE_MARKER_UFFD_WP - /* - * If user requested uffd-wp but not enabled pte markers for - * uffd-wp, then shmem & hugetlbfs are not supported but only - * anonymous. - */ - if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) - return false; -#endif - - /* By default, allow any of anon|shmem|hugetlb */ - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); + if (vma->vm_ops && vma->vm_ops->userfaultfd_ops) + return vma->vm_ops->userfaultfd_ops; + return NULL; } =20 +bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags, bool wp_async); + static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) { struct userfaultfd_ctx *uffd_ctx =3D vma->vm_userfaultfd_ctx.ctx; diff --git a/mm/shmem.c b/mm/shmem.c index 9a8b8dd4709b..fc85002dd8af 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3158,7 +3158,7 @@ static int shmem_uffd_get_folio(struct inode *inode, = pgoff_t pgoff, return shmem_get_folio(inode, pgoff, 0, folio, SGP_NOALLOC); } =20 -int shmem_mfill_atomic_pte(pmd_t *dst_pmd, +static int shmem_mfill_atomic_pte(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index cbed91b09640..52d7d5f144b8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -14,12 +14,48 @@ #include #include #include -#include #include #include #include "internal.h" #include "swap.h" =20 +bool vma_can_userfault(struct vm_area_struct *vma, vm_flags_t vm_flags, + bool wp_async) +{ + unsigned long supported; + + if (vma->vm_flags & VM_DROPPABLE) + return false; + + vm_flags &=3D __VM_UFFD_FLAGS; + +#ifndef CONFIG_PTE_MARKER_UFFD_WP + /* + * If user requested uffd-wp but not enabled pte markers for + * uffd-wp, then any file system (like shmem or hugetlbfs) are not + * supported but only anonymous. + */ + if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) + return false; +#endif + /* + * If wp async enabled, and WP is the only mode enabled, allow any + * memory type. + */ + if (wp_async && (vm_flags =3D=3D VM_UFFD_WP)) + return true; + + if (vma_is_anonymous(vma)) + /* Anonymous has no page cache, MINOR not supported */ + supported =3D VM_UFFD_MISSING | VM_UFFD_WP; + else if (vma_get_uffd_ops(vma)) + supported =3D vma_get_uffd_ops(vma)->uffd_features; + else + return false; + + return !(vm_flags & (~supported)); +} + static __always_inline bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_en= d) { @@ -384,11 +420,15 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, { struct inode *inode =3D file_inode(dst_vma->vm_file); pgoff_t pgoff =3D linear_page_index(dst_vma, dst_addr); + const vm_uffd_ops *uffd_ops =3D vma_get_uffd_ops(dst_vma); struct folio *folio; struct page *page; int ret; =20 - ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_get_folio)) + return -EINVAL; + + ret =3D uffd_ops->uffd_get_folio(inode, pgoff, &folio); /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret =3D=3D -ENOENT) ret =3D -EFAULT; @@ -504,18 +544,6 @@ static __always_inline ssize_t mfill_atomic_hugetlb( u32 hash; struct address_space *mapping; =20 - /* - * There is no default zero huge page for all huge page sizes as - * supported by hugetlb. A PMD_SIZE huge pages may exist as used - * by THP. Since we can not reliably insert a zero page, this - * feature is not supported. - */ - if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) { - up_read(&ctx->map_changing_lock); - uffd_mfill_unlock(dst_vma); - return -EINVAL; - } - src_addr =3D src_start; dst_addr =3D dst_start; copied =3D 0; @@ -686,14 +714,55 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t= *dst_pmd, err =3D mfill_atomic_pte_zeropage(dst_pmd, dst_vma, dst_addr); } else { - err =3D shmem_mfill_atomic_pte(dst_pmd, dst_vma, - dst_addr, src_addr, - flags, foliop); + const vm_uffd_ops *uffd_ops =3D vma_get_uffd_ops(dst_vma); + + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_copy)) { + err =3D -EINVAL; + } else { + err =3D uffd_ops->uffd_copy(dst_pmd, dst_vma, + dst_addr, src_addr, + flags, foliop); + } } =20 return err; } =20 +static inline bool +vma_uffd_ops_supported(struct vm_area_struct *vma, uffd_flags_t flags) +{ + enum mfill_atomic_mode mode =3D uffd_flags_get_mode(flags); + const vm_uffd_ops *uffd_ops; + unsigned long uffd_ioctls; + + if ((flags & MFILL_ATOMIC_WP) && !(vma->vm_flags & VM_UFFD_WP)) + return false; + + /* Anonymous supports everything except CONTINUE */ + if (vma_is_anonymous(vma)) + return mode !=3D MFILL_ATOMIC_CONTINUE; + + uffd_ops =3D vma_get_uffd_ops(vma); + if (!uffd_ops) + return false; + + uffd_ioctls =3D uffd_ops->uffd_ioctls; + switch (mode) { + case MFILL_ATOMIC_COPY: + return uffd_ioctls & BIT(_UFFDIO_COPY); + case MFILL_ATOMIC_ZEROPAGE: + return uffd_ioctls & BIT(_UFFDIO_ZEROPAGE); + case MFILL_ATOMIC_CONTINUE: + if (!(vma->vm_flags & VM_SHARED)) + return false; + return uffd_ioctls & BIT(_UFFDIO_CONTINUE); + case MFILL_ATOMIC_POISON: + return uffd_ioctls & BIT(_UFFDIO_POISON); + default: + return false; + } +} + static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, unsigned long dst_start, unsigned long src_start, @@ -752,11 +821,7 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, dst_vma->vm_flags & VM_SHARED)) goto out_unlock; =20 - /* - * validate 'mode' now that we know the dst_vma: don't allow - * a wrprotect copy if the userfaultfd didn't register as WP. - */ - if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) + if (!vma_uffd_ops_supported(dst_vma, flags)) goto out_unlock; =20 /* @@ -766,12 +831,6 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, src_start, len, flags); =20 - if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) - goto out_unlock; - if (!vma_is_shmem(dst_vma) && - uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) - goto out_unlock; - while (src_addr < src_start + len) { pmd_t dst_pmdval; =20 --=20 2.49.0