From nobody Mon Feb 9 06:24:39 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F22A2D73A7 for ; Thu, 4 Dec 2025 15:10:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861019; cv=none; b=IsovOW/WZ5NQ8lEgnzPygvhLk0rmZJhLBkH7IUW1CnXGJkU+CQwEciY2r4aDkoZP7GSJm6z351dhQ/l4wdvfYVdx9CkCpVr6ZO0z/iLqHjBNP28kzUuwEaZWIWIH0QwizwKMQsEuGLE7rtwVEYq0C5SY2bw//wD9CG6G3gaeaLc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861019; c=relaxed/simple; bh=YZrU39LZ+1N2e7UrW746DAf5wMDP77P1FpRzJ7y1WH4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sCJfD7SjWb8U5rDUwNkxEbhLdxPZ76VQKr6LQS6xieYRze929XoL+dhPFV0HtHF4+6WMEAytnw0eDRtWZYGjpL8w20L+nFN+zEqYltlLJhCaAKYiS1eUGSprcFlT6aq2vt5ZGf6AkO8kfSlHsPXNGKM6NRTX+oY7t0X8wXLq5z4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XXSynCSk; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=SPALj42U; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XXSynCSk"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="SPALj42U" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764861016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QqnsDgnWNJJ4lTRWnBZQypy/wcJe61u46quVDoPefb0=; b=XXSynCSkC65uJpf5+qjfAz9DlIFyqQp/yplBS8u9YxSLFobYgNUE5rtwcWDrpDWMloEbq/ xBWRZjam9R6N7Ir1bn59BuaLTVOTchkd98is2eNLAGfFX9vk//llb18dB8QW81j8AUatUo E+KOFtQLkUVwxCz8+RFU0YY+EKCQFYU= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-194-VywnldKNNyuPs7k5jis_bQ-1; Thu, 04 Dec 2025 10:10:14 -0500 X-MC-Unique: VywnldKNNyuPs7k5jis_bQ-1 X-Mimecast-MFC-AGG-ID: VywnldKNNyuPs7k5jis_bQ_1764861014 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-8b5c811d951so227556085a.2 for ; Thu, 04 Dec 2025 07:10:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1764861014; x=1765465814; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QqnsDgnWNJJ4lTRWnBZQypy/wcJe61u46quVDoPefb0=; b=SPALj42UQQF4WnRWhMO88DtGIBiyF2S0UTxChHni1TvpBrvyYGI7Jlhy3mwcCfq+mv 3hPKmoGT1wL8ZPlfOhlAhToLSNDjHz1jHronDyr8rwago0PRZgRHOnAa5pgW4MG7ln8m 3q8IneEl6968xWGEue43Ui5rkIk01w3CFqjNQ4Wv0eNDn97zCotooz529C63XknE8OJb 28BIgCfv1t8XIfW5Knc/R+UMP/JpjYGfKAPt9xdOKPue9SZH/hiiyp0WTqsHtBIN4B4k DtLMnGwO5tkkRahek2n6rvlGFxHvuAwfq7rEb+EczdjP44r+YzX1W4zuFN54J5Fo6Nk3 n2Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764861014; x=1765465814; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QqnsDgnWNJJ4lTRWnBZQypy/wcJe61u46quVDoPefb0=; b=HT4Qa2s+e8Y7biJyrDkaD8ooSjtKgpiwGhUpzhxuk0CCJbM4IsTZ4EML93lpahbIx3 M1PkToIk43rXaMg7Ag71oYwizx5ElwvBvhqUHUSaGk2x+JPkftz6aBo+AZ+2MKI1f0Ga tUfpEP4mcDDr9taiLC7phIbbG51D+FFYhobFX7A82dXO45YRVyPSpVuIBWD4nvValD0d 66I/8PuoIvGLbtMxM3Iv1JrzIcKZanNdaNI/qBWKnvmzIquepOwxZejNjAlr5J3GvbhU JptBsytRSluY0/849J3bMYWGMI3Smrm8KAh3s58dHXv2H2tfmk8P0zrLxjNafMuJu8fR U9BA== X-Forwarded-Encrypted: i=1; AJvYcCUiQAu2TiwCBoO3no7WJYEBKlBXqkWE70VqNII9Dq2j7qQNP35uD6ACociC1sd/QtWbE4OmpBs+KjsKntM=@vger.kernel.org X-Gm-Message-State: AOJu0YyqYC4h6eNyXp8TRHfM3gZTXufSH2w3mgUU7/Tg+ZjDGwq8F1QK Qzeyk+ly2PPnjrvBh+g2c+JAFmFbSQdtzixM9PlVQjPf5RJ0E9IcJx/JD6TKrOvBNvx5LmM5E7s U9toBwFYAI7J0Tl65QuYeJZ4B3bE6Rc1ImzDQ2MiV2fGUdm7RorgSiO96JfwWP8glnw== X-Gm-Gg: ASbGncsu9QAZ4b/fH86f1+B+nr0kax80bqA7dw50Rj3HqT0VxHhs4C5izXeWmRWYbS3 qjWcT8UResVWWbhW4Rh9SDcjLEns1uF5+l3ZfJ0wqzw0sSGjMay+I6PeRfRkRbmNkczOCK/L6av 1rePKQpxA8co1SpCpK1IpesRb3RWTsEZ7qfA5yAnGTy8+Be00PUwbxXBln7QBQzgh7BqhFjTtNE FET13dyL3j19u5PMf3hRtWwmbUHGJFemuUaXRaqcuNg9tNcL5G8qzhpDeBzBZVHiKv7vKhmj+Eu R7KxdAMAKS2EQ/tJgdp1ZKX+ldMvrh9wYlm7YE+jIgh52HvPYukDruuk7lfiWnt6bHHqBAsU+9Q Y X-Received: by 2002:a05:620a:4493:b0:8b2:f145:7f2e with SMTP id af79cd13be357-8b5e77339b4mr883014885a.77.1764861014178; Thu, 04 Dec 2025 07:10:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IHg1P7IVsR21tbM+ynMYgTnLBafpvtkF3gYNVFFKjN2cvYYIUWpdyVgk1EkcF9l/NqzRg+vQw== X-Received: by 2002:a05:620a:4493:b0:8b2:f145:7f2e with SMTP id af79cd13be357-8b5e77339b4mr883007985a.77.1764861013650; Thu, 04 Dec 2025 07:10:13 -0800 (PST) Received: from x1.com ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b627a9fd23sm154263285a.46.2025.12.04.07.10.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 07:10:13 -0800 (PST) From: Peter Xu To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jason Gunthorpe , Nico Pache , Zi Yan , Alex Mastro , David Hildenbrand , Alex Williamson , Zhi Wang , David Laight , Yi Liu , Ankit Agrawal , peterx@redhat.com, Kevin Tian , Andrew Morton Subject: [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Date: Thu, 4 Dec 2025 10:10:03 -0500 Message-ID: <20251204151003.171039-5-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251204151003.171039-1-peterx@redhat.com> References: <20251204151003.171039-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch enables best-effort mmap() for vfio-pci bars even without MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA addresses) to start enabling huge pfnmaps on VFIO bars. Here the trick is making sure the MMIO PFNs will be aligned with the VAs allocated from mmap() when !MAP_FIXED, so that whatever returned from mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable for huge pfnmaps as much as possible. To achieve that, a custom vfio_device's get_mapping_hint() for vfio-pci devices is needed. Note that BAR's MMIO physical addresses should normally be guaranteed to be BAR-size aligned. It means the MMIO address will also always be aligned with vfio-pci's file offset address space, per VFIO_PCI_OFFSET_SHIFT. With that guaranteed, VA allocator can calculate the alignment with pgoff, which will be further aligned with the MMIO physical addresses to be mapped in the VMA later. So far, stick with the simple plan to rely on the hardware assumption that should always be true. Leave it for later if pgoff needs adjustments when there's a real demand of it when calculating the alignment. For discussion on the requirement of this feature, see: https://lore.kernel.org/linux-pci/20250529214414.1508155-1-amastro@fb.com/ Signed-off-by: Peter Xu --- drivers/vfio/pci/vfio_pci.c | 1 + drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++++++++ include/linux/vfio_pci_core.h | 2 ++ 3 files changed, 52 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index ac10f14417f2f..8f29037cee6eb 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -145,6 +145,7 @@ static const struct vfio_device_ops vfio_pci_ops =3D { .detach_ioas =3D vfio_iommufd_physical_detach_ioas, .pasid_attach_ioas =3D vfio_iommufd_physical_pasid_attach_ioas, .pasid_detach_ioas =3D vfio_iommufd_physical_pasid_detach_ioas, + .get_mapping_order =3D vfio_pci_core_get_mapping_order, }; =20 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id= *id) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 7dcf5439dedc9..28ab37715acc0 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1640,6 +1640,55 @@ static unsigned long vma_to_pfn(struct vm_area_struc= t *vma) return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; } =20 +/* + * Hint function for mmap() about the size of mapping to be carried out. + * This helps to enable huge pfnmaps as much as possible on BAR mappings. + * + * This function does the minimum check on mmap() parameters to make the + * hint valid only. The majority of mmap() sanity check will be done later + * in mmap(). + */ +int vfio_pci_core_get_mapping_order(struct vfio_device *device, + unsigned long pgoff, size_t len) +{ + struct vfio_pci_core_device *vdev =3D + container_of(device, struct vfio_pci_core_device, vdev); + struct pci_dev *pdev =3D vdev->pdev; + unsigned int index =3D pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + unsigned long req_start; + size_t phys_len; + + /* Currently, only bars 0-5 supports huge pfnmap */ + if (index >=3D VFIO_PCI_ROM_REGION_INDEX) + return 0; + + /* + * NOTE: we're keeping things simple as of now, assuming the + * physical address of BARs (aka, pci_resource_start(pdev, index)) + * should always be aligned with pgoff in vfio-pci's address space. + */ + req_start =3D (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1= ); + phys_len =3D PAGE_ALIGN(pci_resource_len(pdev, index)); + + /* + * If this happens, it will probably fail mmap() later.. mapping + * hint isn't important anymore. + */ + if (req_start >=3D phys_len) + return 0; + + phys_len =3D MIN(phys_len - req_start, len); + + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >=3D PUD_SIZE) + return PUD_ORDER; + + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP) && phys_len >=3D PMD_SIZE) + return PMD_ORDER; + + return 0; +} +EXPORT_SYMBOL_GPL(vfio_pci_core_get_mapping_order); + static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf, unsigned int order) { diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index f541044e42a2a..d320dfacc5681 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -119,6 +119,8 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vde= v, char __user *buf, size_t count, loff_t *ppos); ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __us= er *buf, size_t count, loff_t *ppos); +int vfio_pci_core_get_mapping_order(struct vfio_device *device, + unsigned long pgoff, size_t len); int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struc= t *vma); void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int cou= nt); int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf); --=20 2.50.1