From nobody Sat Feb 7 10:16:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6F632D948D for ; Thu, 4 Dec 2025 15:10:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861014; cv=none; b=SCSpPthA2BCbNBMU5IcINc9R8FMgCGU3NAEzAKswSfWSfdGVaeDmo0nkpOvAAnDmN+EX+/Z1cheZxbGztfALQMV7IidgYVTgKiz5r9/tIZR7l6gnKdNuNocR0YjZZmBDVxhCW9q0b/SJT8Hzk/qa49poaPO1pW7z6GSmWwGQjZg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861014; c=relaxed/simple; bh=LzIu679OiSRplUOyBus7rxkVzujQbIAxT1+bCdZbvbY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mm46pGlS7sFIwn0ZAVOwomXD+satZdEPgnIWYfDYIJkwghuo6oCBUjPpYS2KLTNj1M9mNlGRoslD/Ck1XVxIZ+1U3/bGMr8H6wgmUaByiK8CzZMYFLoOSQEQme/3PEvXgnOGwnsXxEgiTxtGqVhGLDgdpYNEqrZpddrDehJx72w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eoUsYMq5; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=nBYoZ53O; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eoUsYMq5"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="nBYoZ53O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764861011; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sakCo68nQ8rcVde6IgWDv4cz9Y66spTur3aP7RQqpdI=; b=eoUsYMq5FMZ97wfB9nSl4ymGnTJfu+6858NfysvCJLKTNzoXn2PYnglIFLyixGnR+U/gYo x9uhqrkN6gKOcpOafW+0BboCDFVBheItuiyq78uAOnWIXuDZhq8ZbS0Rpdr+Y/MtGUTJBi yfc+jW+15LSV79tc9dSQUEkP90dEp5c= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-572-0dz5gmGgMrG-7GpmM1qDWA-1; Thu, 04 Dec 2025 10:10:09 -0500 X-MC-Unique: 0dz5gmGgMrG-7GpmM1qDWA-1 X-Mimecast-MFC-AGG-ID: 0dz5gmGgMrG-7GpmM1qDWA_1764861009 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-8b2e235d4d2so352526785a.3 for ; Thu, 04 Dec 2025 07:10:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1764861009; x=1765465809; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sakCo68nQ8rcVde6IgWDv4cz9Y66spTur3aP7RQqpdI=; b=nBYoZ53OI9D+n6WTEpUvJaS1+bP/a4FNup7tkOFV0yKm/k8W0nCa7vWfr5KaKistZB 15mSEsvDRB09oSsxTi4aaMpHYK+vZkR2C5oGJZmrr8Dvq/2lRYeQEx92y1Ck1EIEClni 3BYBXNs0/fgHpZDreyjnpcTVajO5NTQTz6RK8ZuxspoklQimjFyu8EkslMlsbJSZQ7JA yEM2/TNPWJCR4gS5WzJOHsNDD6o8NIVYVxoIRp8KMRy5xN74ElZAdvkpyhuvtvIAcwrE ID+a7GBLSflQ3MXxXk0Ql9d6mBjqJs+0gZno6aPtAlkbwTehs69qRVzp3U5Ks+61AnYL klzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764861009; x=1765465809; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=sakCo68nQ8rcVde6IgWDv4cz9Y66spTur3aP7RQqpdI=; b=K7k7hVQBW3pe+Hr3spDILOqoQnonTA7QntoIQIyKhB252G/UpXcs6Po0ClLXZk5swg N0aquECTPdWg3XJkajfZwPQkL7XeP9VKRPSGvWN5ju66SkGKrIVHnIEVKEMtarrUMTTB pdqX+LzaeJVCr6UokzYtG6scGHr5mXtig7r7YTBJLSEQz8QRP3A/ZQEq4So3MBYZif6O YFJHos7A9pVp63aZtjcS19mLaviLGoO0W2OKh1vBzG0ERonDBv60ibpNps347LJ5Z8Pe imBDPAAf0/nPrvPF2ULc0rcfWXtU4hlVBp42JHGYhdwHBu/X+NSVAecz3vsn7g33bmaY HFfA== X-Forwarded-Encrypted: i=1; AJvYcCVPGa8rRKO1fivhbhTR6+olA+f9IiLoFUxQUiF3W7BPgtkAvhKTfFGBz+vz+/NzlCrSVxaPJtKmPgDHdDo=@vger.kernel.org X-Gm-Message-State: AOJu0YyfQyhWra4rMXWL1BrQgAbaf2zVdyzPHTn6nKFFJPLku2mXT0EA E4WTzU1B+ofGpjB48+Esz+XaG9kjESON6C0mTHy2KrHu+OcY9iX1hOwkUBw4fggNG4MWroCLty6 QxnyJ7PA+WnGeFhX0UBfVSd7ed7G8UhSEHjOLAkBhzK8ZpIRTWCPqfBdAEPhjGaK7kQcglmzeTA == X-Gm-Gg: ASbGnctOXpiVF/kOfozwsaG35STL+wjBbSrJCwWxzr/cJQEZuIJ4MaQZeLwN3rw6ZQt nO8ctw95wBtV786HlEJV2uVXKoJOyP1RqW95mB03YR548WsszHdpnz+KQUipyZaKUUXFtDwZeQ6 maocwnseNUZ0jUUB1zDHsuZnzEJWP+PqDEgC4dee6KOQHYtwmVhjvmkGo+kepulg/+abc1qOgcl nUcgdW58rV/Dvvi45gknMrJHOS9XXiQVjLmfWiQTS31SkU6WtAK6kwcuMH6ul98vhlndW2V3/w0 Ci6kviZ2MLqa1i5P+rRHqXuu+KPZFXgkmypbmNL25aaJuqvG9nCcKUkaIlw/37dXoMzDTrcYqCQ 5 X-Received: by 2002:a05:620a:414d:b0:84a:d3ce:c749 with SMTP id af79cd13be357-8b5e6f7aedcmr934169785a.64.1764861008644; Thu, 04 Dec 2025 07:10:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IHDVl5FRV7K1/nE3AfLW2VTsm3DwP3EmQWe4gJ+Y9zdrDlKNoHx1ohilHm+3n6BN0372Ep4EA== X-Received: by 2002:a05:620a:414d:b0:84a:d3ce:c749 with SMTP id af79cd13be357-8b5e6f7aedcmr934161985a.64.1764861008037; Thu, 04 Dec 2025 07:10:08 -0800 (PST) Received: from x1.com ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b627a9fd23sm154263285a.46.2025.12.04.07.10.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 07:10:07 -0800 (PST) From: Peter Xu To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jason Gunthorpe , Nico Pache , Zi Yan , Alex Mastro , David Hildenbrand , Alex Williamson , Zhi Wang , David Laight , Yi Liu , Ankit Agrawal , peterx@redhat.com, Kevin Tian , Andrew Morton Subject: [PATCH v2 1/4] mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment Date: Thu, 4 Dec 2025 10:10:00 -0500 Message-ID: <20251204151003.171039-2-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251204151003.171039-1-peterx@redhat.com> References: <20251204151003.171039-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add "align" parameter to thp_get_unmapped_area_vmflags() so that it allows get unmapped area with any alignment. There're two existing callers, use PMD_SIZE explicitly for them. No functional change intended. Signed-off-by: Peter Xu Tested-by: Alex Mastro --- include/linux/huge_mm.h | 5 +++-- mm/huge_memory.c | 7 ++++--- mm/mmap.c | 3 ++- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 71ac78b9f834f..1c221550362d7 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -362,7 +362,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, = unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned lo= ng addr, unsigned long len, unsigned long pgoff, unsigned long flags, - vm_flags_t vm_flags); + unsigned long align, vm_flags_t vm_flags); =20 bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pin= s); int split_huge_page_to_list_to_order(struct page *page, struct list_head *= list, @@ -559,7 +559,8 @@ static inline unsigned long thp_vma_allowable_orders(st= ruct vm_area_struct *vma, static inline unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, - unsigned long flags, vm_flags_t vm_flags) + unsigned long flags, unsigned long align, + vm_flags_t vm_flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6cba1cb14b23a..ab2450b985171 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1155,12 +1155,12 @@ static unsigned long __thp_get_unmapped_area(struct= file *filp, =20 unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned lo= ng addr, unsigned long len, unsigned long pgoff, unsigned long flags, - vm_flags_t vm_flags) + unsigned long align, vm_flags_t vm_flags) { unsigned long ret; loff_t off =3D (loff_t)pgoff << PAGE_SHIFT; =20 - ret =3D __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm= _flags); + ret =3D __thp_get_unmapped_area(filp, addr, len, off, flags, align, vm_fl= ags); if (ret) return ret; =20 @@ -1171,7 +1171,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct fi= le *filp, unsigned long add unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { - return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0); + return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, + PMD_SIZE, 0); } EXPORT_SYMBOL_GPL(thp_get_unmapped_area); =20 diff --git a/mm/mmap.c b/mm/mmap.c index 5fd3b80fda1d5..8fa397a18252e 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -846,7 +846,8 @@ __get_unmapped_area(struct file *file, unsigned long ad= dr, unsigned long len, && IS_ALIGNED(len, PMD_SIZE)) { /* Ensures that larger anonymous mappings are THP aligned. */ addr =3D thp_get_unmapped_area_vmflags(file, addr, len, - pgoff, flags, vm_flags); + pgoff, flags, PMD_SIZE, + vm_flags); } else { addr =3D mm_get_unmapped_area_vmflags(current->mm, file, addr, len, pgoff, flags, vm_flags); --=20 2.50.1 From nobody Sat Feb 7 10:16:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FB77280A29 for ; Thu, 4 Dec 2025 15:10:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861016; cv=none; b=V211OD9gW/37t5QQFDzMfcLjELZSld5j866QYKzCCnEBCVX0XatWpoVZpy7rlziZnTK8JGLlptfsL7sRFGQATtmtrfTSExyex4YVgPBCbD+At5baqsatc/+vieOQ63+td68/mlQzH1avohDio3LZrs7DNRa4MQahMPCTlTJkWok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861016; c=relaxed/simple; bh=/4MuTMD61olHbfivKDj1nhPnv6vtHlkiwYjbKFAVOmA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tVbcgSwXlkVRdECGKUMQL/V80DCZTeYm33tHVkE2FZ16/KOkyUXFtLPunP8zafzWS64/Phw1bkhHy6GrNr7CkCt0mzXqFznfCjhRGTqNhHSIx+Higl8l6zagidv3H8bl5oNgKcYxOjjUWWXcu3mEIDlqjzFXLP80s9YYKuz6oVc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TN2Hq5gV; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=hFVOQqOR; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TN2Hq5gV"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="hFVOQqOR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764861013; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=shBKdGm4YTOEDFQeXrEUnQGLwKuFe6yb6HVy/vXuL+M=; b=TN2Hq5gV8U6PRy3OS52tLJTEus1VdmHx/GbGLMxJ+4uNEd/ENPJegv84w37M3L0pUQiqBd O5PZWOhacu5YyZByE3VSSo41YiU2xBB+vWIKzu0LWHuNzIFxvGnY2+ICv8J3211jVUScEK Blpdtd4ruUHXtsT0bpcR8pz0ab/kI64= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-681-zC9QtwcOPDyALuuBcIvA8w-1; Thu, 04 Dec 2025 10:10:11 -0500 X-MC-Unique: zC9QtwcOPDyALuuBcIvA8w-1 X-Mimecast-MFC-AGG-ID: zC9QtwcOPDyALuuBcIvA8w_1764861011 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-8b225760181so127417785a.2 for ; Thu, 04 Dec 2025 07:10:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1764861011; x=1765465811; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=shBKdGm4YTOEDFQeXrEUnQGLwKuFe6yb6HVy/vXuL+M=; b=hFVOQqORF6F0Fcyni/hXlu/o/8XVfo9dOxQQWB4me5Un66wiHXFMfLnqP6i2KlhcOe LnBg8X8jkMqu5VRpziMd6RU4wnvpt1L5JEGhJpZO2POZzbcpXZhNVG2FxAZMyqMiszb3 4fJkO96F8Ku6ikzRjaP/x590bxtE4EHu0WM+S66bwZJ9PkMREZrMVGPbHNi4a2aiVJ+i 4f/JRg3HY2OdN3xUaFnkUwGhPphGRgHt6YKOmKlhvMPxifyg0TA7CLgzUSIoTFEhh4zL R+vLZUvodp1MdKwnV5T14TMbrPCm/SlChEi/wjDtIoY13z2q73Dm2j+Qw4rkKnJSwYL3 j4qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764861011; x=1765465811; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=shBKdGm4YTOEDFQeXrEUnQGLwKuFe6yb6HVy/vXuL+M=; b=gJrZ8NKf/H8dhgVcvXpnV2X26cfo7uizOosG+bNGRmmVPY0jv4UUXJwxfyCEJz06fj 5QxSDMZkNdwMivkSWfjgoVKoIbZF7B6mdPeU9+I3+DFtckvSaJ1ZzE6CwbNgHUaWkz56 RuHRinKwuKWSHap8PxcZ2r8eQxTyB5qJdp38tcEwmy3A3Dw07e9zqjBtPem68w8hxEIi DUgTnglznzVKPZvPNXxJaflirrccAprZOVYH97BVzDEJ+F5swa+iU/ptob7FzTVvHqHa EWahyyrYT6rDYy9Rl66DZAVus190rPlRvpYUY3YXcf/+bulOikzcHVh3zhT5WDvSgDFc wqNA== X-Forwarded-Encrypted: i=1; AJvYcCV/ho4vrfK1Rd1Abma/9MMQNR3ndgoDjMzs9RVYU+jn/j2+zDoD7kqP/6h4ww6paYCZXQCV3jQnBupecxQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyZchleF2OwDxVi7CGzOgVu6A8vLjGz1FLL6Qark3VdzzbVUE2S NfCPfDxnzCB6jveFskFCEXAXb5vuWWBXgV1Fb8FdgQ19ec44rWWsNI/AnFewB4efFMgANfxkLgd S62hXKz0FFuvCJvMRMZYn1vQiOTP3buAsDPHafNMHrbGDZylAnQAH4dS74TF0Qs9A+A== X-Gm-Gg: ASbGncuspkOotZP9N/JNsOYPHPHUrCTNARYsP8YDwwSf+suxLu3EjeEyCmEeLDfzblz zrwV0DhHYQaV5KZxDJySCkn61xtEP6eUfthS3gD58ZshbvJFCavfeXTIGaxgSl6Rub0XOrAOY6x FIiIZhmBjUNtgm2tYHMQuOMOH75HWA+YQiiZMvxq1ov4ieabtCTSghAZLI8qBTjahsDK5WJP951 ShHxaCIAwj6ggsN3m/yLyG93FjpRIErBibLbnTN/A148QM+LUTbzAuKBipseGnZaB/F3zkqtNtO NBRmuFT1QvJe2kHqiaHFvESxREH4H1/ajV+Ls1lXGbTZuRza9GnubC6lX/sJNwLDKgPlGc/ifZa l X-Received: by 2002:a05:620a:408c:b0:89f:52d:8560 with SMTP id af79cd13be357-8b6181ed8b6mr519468485a.47.1764861010507; Thu, 04 Dec 2025 07:10:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IEUYHhbX8YMXNGXZOpMbNo0BJJ62y/raaT2i4rcEhXIK6eeuVSL9soO5R2l6I/8tLGphH/NCA== X-Received: by 2002:a05:620a:408c:b0:89f:52d:8560 with SMTP id af79cd13be357-8b6181ed8b6mr519458985a.47.1764861009954; Thu, 04 Dec 2025 07:10:09 -0800 (PST) Received: from x1.com ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b627a9fd23sm154263285a.46.2025.12.04.07.10.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 07:10:09 -0800 (PST) From: Peter Xu To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jason Gunthorpe , Nico Pache , Zi Yan , Alex Mastro , David Hildenbrand , Alex Williamson , Zhi Wang , David Laight , Yi Liu , Ankit Agrawal , peterx@redhat.com, Kevin Tian , Andrew Morton Subject: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Date: Thu, 4 Dec 2025 10:10:01 -0500 Message-ID: <20251204151003.171039-3-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251204151003.171039-1-peterx@redhat.com> References: <20251204151003.171039-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add one new file operation, get_mapping_order(). It can be used by file backends to report mapping order hints. By default, Linux assumed we will map in PAGE_SIZE chunks. With this hint, the driver can report the possibility of mapping chunks that are larger than PAGE_SIZE. Then, the VA allocator will try to use that as alignment when allocating the VA ranges. This is useful because when chunks to be mapped are larger than PAGE_SIZE, VA alignment matters and it needs to be aligned with the size of the chunk to be mapped. Said that, no matter what is the alignment used for the VA allocation, the driver can still decide which size to map the chunks. It is also not an issue if it keeps mapping in PAGE_SIZE. get_mapping_order() is defined to take three parameters. Besides the 1st parameter which will be the file object pointer, the 2nd + 3rd parameters being the pgoff + size of the mmap() request. Its retval is defined as the order, which must be non-negative to enable the alignment. When zero is returned, it should behave like when the hint is not provided, IOW, alignment will still be PAGE_SIZE. When the order is too big, ignore the hint. Normally drivers are trusted, so it's more of an extra layer of safety measure. Suggested-by: Jason Gunthorpe Signed-off-by: Peter Xu Tested-by: Alex Mastro --- Documentation/filesystems/vfs.rst | 4 +++ include/linux/fs.h | 1 + mm/mmap.c | 59 +++++++++++++++++++++++++++---- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/= vfs.rst index 4f13b01e42eb5..b707ddbebbf52 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1069,6 +1069,7 @@ This describes how the VFS can manipulate an open fil= e. As of kernel int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigne= d long, unsigned long, unsigned long); + int (*get_mapping_order)(struct file *, unsigned long, size_t); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t = *, size_t, unsigned int); @@ -1165,6 +1166,9 @@ otherwise noted. ``get_unmapped_area`` called by the mmap(2) system call =20 +``get_mapping_order`` + called by the mmap(2) system call to get mapping order hint + ``check_flags`` called by the fcntl(2) system call for F_SETFL command =20 diff --git a/include/linux/fs.h b/include/linux/fs.h index dd3b57cfadeeb..5ba373576bfe5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2287,6 +2287,7 @@ struct file_operations { int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned= long, unsigned long, unsigned long); + int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t l= en); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *= , size_t, unsigned int); diff --git a/mm/mmap.c b/mm/mmap.c index 8fa397a18252e..be3dd0623f00c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -808,6 +808,33 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_s= truct *mm, struct file *fi return arch_get_unmapped_area(filp, addr, len, pgoff, flags, vm_flags); } =20 +static inline bool file_has_mmap_order_hint(struct file *file) +{ + return file && file->f_op && file->f_op->get_mapping_order; +} + +static inline bool +mmap_should_align(struct file *file, unsigned long addr, unsigned long len) +{ + /* When THP not enabled at all, skip */ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return false; + + /* Never try any alignment if the mmap() address hint is provided */ + if (addr) + return false; + + /* Anonymous THP could use some better alignment when len aligned */ + if (!file) + return IS_ALIGNED(len, PMD_SIZE); + + /* + * It's a file mapping, no address hint provided by caller, try any + * alignment if the file backend would provide a hint + */ + return file_has_mmap_order_hint(file); +} + unsigned long __get_unmapped_area(struct file *file, unsigned long addr, unsigned long l= en, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags) @@ -815,8 +842,9 @@ __get_unmapped_area(struct file *file, unsigned long ad= dr, unsigned long len, unsigned long (*get_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long) =3D NULL; - unsigned long error =3D arch_mmap_check(addr, len, flags); + unsigned long align; + if (error) return error; =20 @@ -841,13 +869,30 @@ __get_unmapped_area(struct file *file, unsigned long = addr, unsigned long len, =20 if (get_area) { addr =3D get_area(file, addr, len, pgoff, flags); - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file - && !addr /* no hint */ - && IS_ALIGNED(len, PMD_SIZE)) { - /* Ensures that larger anonymous mappings are THP aligned. */ + } else if (mmap_should_align(file, addr, len)) { + if (file_has_mmap_order_hint(file)) { + int order; + /* + * Allow driver to opt-in on the order hint. + * + * Sanity check on the order returned. Treating + * either negative or too big order to be invalid, + * where alignment will be skipped. + */ + order =3D file->f_op->get_mapping_order(file, pgoff, len); + if (order < 0) + order =3D 0; + if (check_shl_overflow(PAGE_SIZE, order, &align)) + /* No alignment applied */ + align =3D PAGE_SIZE; + } else { + /* Default alignment for anonymous THPs */ + align =3D PMD_SIZE; + } + addr =3D thp_get_unmapped_area_vmflags(file, addr, len, - pgoff, flags, PMD_SIZE, - vm_flags); + pgoff, flags, + align, vm_flags); } else { addr =3D mm_get_unmapped_area_vmflags(current->mm, file, addr, len, pgoff, flags, vm_flags); --=20 2.50.1 From nobody Sat Feb 7 10:16:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF8B52DBF78 for ; Thu, 4 Dec 2025 15:10:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861017; cv=none; b=GM3I0pLts/+5i2NDpGw1O39HTW0gcSgT8a5lfmn/bqrAoN633DWsqrCzgWvGnOuMt6TscG+AsF+GmJtsGkWyZCIkEXD3AdWmrdIMselDEDRFEWihKknCQKrBOCvhazx0FO0Zo/UfsssbuX/3SGhd/EBs8/y0EzsBGCCXHhhK44c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861017; c=relaxed/simple; bh=lJveRauWUx7y5IzCU8bQ8COo3Mcg7RepbtONX9mf16o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sQU7oiiCRBSVapjpgLCKqsNJRIxN5NNxAzy25Sqy9c7EtVBHCTppFa9Qk1Z+IKDAR7V93R2eRaojdbuX4AVtTmfdEb5zmkKdBF4zXmxSVrYB6E+TgqqGsf/4F88qS4jiMoxTPDYStqAtPlPSMQCAOkTydGVyFslJnKgkg1Fx5oc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BoTx9/7o; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=jxdvfIo7; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BoTx9/7o"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="jxdvfIo7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764861014; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i3kfS3mg3HiGefUEuetCpDNaxRxqIsOmdjOWI6Qvpxo=; b=BoTx9/7oaFjSrdq+g6+PCK5eALbTZmjV/OEd1iOMR+QYYaeOz+sdj4CBV6mxpQpuacGNOT 1PogDpP66fEM/Ur7p2oLYTqUb9jU8/Kn2pz8jzpHCH+c1AfC1VzYFPI/U36JjBcRyPeyQz vkeYGYkcUtOLOBvQwGR4LXM7ekL+YqU= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-O4SyUZ71M4uB9KgooQljDA-1; Thu, 04 Dec 2025 10:10:13 -0500 X-MC-Unique: O4SyUZ71M4uB9KgooQljDA-1 X-Mimecast-MFC-AGG-ID: O4SyUZ71M4uB9KgooQljDA_1764861013 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-8b245c49d0cso220313285a.3 for ; Thu, 04 Dec 2025 07:10:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1764861013; x=1765465813; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=i3kfS3mg3HiGefUEuetCpDNaxRxqIsOmdjOWI6Qvpxo=; b=jxdvfIo7LhB54e4GmojuV+AGhWHdrNXhp6kTm2cpx++qND90Vf2sOhly5tgQxla10P qtdOV5S0bTpoavddaiD3SObJj4orWmDuAlCGP6jCtGbgWLmLT5WJ8cJ1MJDSuon/rYjo GHQq88bfTCaeDEj4yRKQTbUq9ZbLRXRpK396dIhmvL9QObX5eDoZaC3FdH1xDRFlQ75W ggzJWhlcFuFz5+O6lnSQBgnMn6lG+QO3Y8BGnVInup8X2TvKK6T9r+q/yWQgm5zNfoID JfHwP0z7owYnuJfue5MsrRZl0GQF/DqU04UohKhV9X4tTHwucBwWVBP3iIFHAyX/Nqu6 1bkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764861013; x=1765465813; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=i3kfS3mg3HiGefUEuetCpDNaxRxqIsOmdjOWI6Qvpxo=; b=sVv3vc++89kDGFFVuuhoKYaF+kGMBbsndQun/A1bqaON8m5LcAtKmn1rTvpsfjloSS dtKBttqYK+CmXeVlbMepw/yjyGxYOBHJHDtYFDdqCveFcM5yUdojZCXzLQuYvBnV/pqz bsszl+mBn12aogiq2/hyo9q3W2pQZLhhCmNI/ke8L6nO5j9UOjwQa5FVdlK1gP5gTfGc N94GGvsPONPSm4IVlTt4g7eilp9VbUhSg9FlK2nhrko/ar8tDIMghY8eDf1aWXFNMZwc i2FFn9GLI5paxIBSpUGBFoji2dctURzy01kbb8lAvr/IJvPqVrQ8MpbaM/9BKVvoP99R Sfpw== X-Forwarded-Encrypted: i=1; AJvYcCUzYpJE58R10GjOsynvnesLKqx3F0IQUAkBjU8R3gEaKD4MtnUJr66xyWUXqrz0H+ney03v2hpJBy7UCYw=@vger.kernel.org X-Gm-Message-State: AOJu0YxmgPYNUmdo1+krSJPSMEXpZ9yFrnggE4GrWeDmOFrPzTos/pty cpVrn52wXPHHLJ7vae4BKHJWtaWjyNmxEu8cGpUWP5iVJEH19NDXNUDalCQFCcif6pqaqU5he9r HYFdlPgPczqAZmTjIQbKvQqbW/zhMllyceZQ0IxKS4FygfG/TfNrPStO3A0qdWAxlnQ== X-Gm-Gg: ASbGnctvdoRq4S9ulL95NA3uAfekKY3ivbyXhAU2T449gC6OD69eZQ1JYZvG6YcLvEx E5jgHkZV51jxv5tdvGfseV2Sm6tDloe7j/nq/yUQtnrkKaIndm1Lp4OtmC3B2TcTMSzpwT41NEH ZKd4EFcFr9jUfpawGXGMUelRkYxPIiVSxkS7RcaUaIElMZv6Xi8L72WYunyhmGY5dOlhy4dX0qa UGPbi2KcAukTnHcC6/f3ird27rdSJwPfABEYlrDx6SpUOukaL6jsYF1fgi7yfY3PPm255mhhua5 skM9yuRsU2q6wXd9lfHc80Ri1FgO6HApBZPSX2ExJTky0+m+sOOnYZZiNNHu16M1uqN31SWqt5v y X-Received: by 2002:a05:620a:4095:b0:8b1:c48f:105d with SMTP id af79cd13be357-8b5e773519fmr795855385a.87.1764861012802; Thu, 04 Dec 2025 07:10:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBkxqpioh6ZUgFWGNITjXJGV2mFH098xBMOQL61kY3xJnsMn9OK+8WpatADqnDH4w9UHn6yQ== X-Received: by 2002:a05:620a:4095:b0:8b1:c48f:105d with SMTP id af79cd13be357-8b5e773519fmr795845685a.87.1764861012060; Thu, 04 Dec 2025 07:10:12 -0800 (PST) Received: from x1.com ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b627a9fd23sm154263285a.46.2025.12.04.07.10.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 07:10:10 -0800 (PST) From: Peter Xu To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jason Gunthorpe , Nico Pache , Zi Yan , Alex Mastro , David Hildenbrand , Alex Williamson , Zhi Wang , David Laight , Yi Liu , Ankit Agrawal , peterx@redhat.com, Kevin Tian , Andrew Morton Subject: [PATCH v2 3/4] vfio: Introduce vfio_device_ops.get_mapping_order hook Date: Thu, 4 Dec 2025 10:10:02 -0500 Message-ID: <20251204151003.171039-4-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251204151003.171039-1-peterx@redhat.com> References: <20251204151003.171039-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a hook to vfio_device_ops to allow sub-modules provide mapping order hint for an mmap() request. When not available, use the default value (0). Note that this patch will change the code path for vfio on mmap() when allocating the virtual address range to be mapped, however it should not change the result of the VA allocated, because the default value (0) should be the old behavior. Signed-off-by: Peter Xu Tested-by: Alex Mastro --- drivers/vfio/vfio_main.c | 14 ++++++++++++++ include/linux/vfio.h | 5 +++++ 2 files changed, 19 insertions(+) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 38c8e9350a60e..3f2107ff93e5d 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1372,6 +1372,19 @@ static void vfio_device_show_fdinfo(struct seq_file = *m, struct file *filep) } #endif =20 +static int vfio_device_get_mapping_order(struct file *file, + unsigned long pgoff, + size_t len) +{ + struct vfio_device_file *df =3D file->private_data; + struct vfio_device *device =3D df->device; + + if (device->ops->get_mapping_order) + return device->ops->get_mapping_order(device, pgoff, len); + + return 0; +} + const struct file_operations vfio_device_fops =3D { .owner =3D THIS_MODULE, .open =3D vfio_device_fops_cdev_open, @@ -1384,6 +1397,7 @@ const struct file_operations vfio_device_fops =3D { #ifdef CONFIG_PROC_FS .show_fdinfo =3D vfio_device_show_fdinfo, #endif + .get_mapping_order =3D vfio_device_get_mapping_order, }; =20 static struct vfio_device *vfio_device_from_file(struct file *file) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index eb563f538dee5..46a4d85fc4953 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -111,6 +111,8 @@ struct vfio_device { * @dma_unmap: Called when userspace unmaps IOVA from the container * this device is attached to. * @device_feature: Optional, fill in the VFIO_DEVICE_FEATURE ioctl + * @get_mapping_order: Optional, provide mapping order hints for mmap(). + * When unavailable, use the default order (zero). */ struct vfio_device_ops { char *name; @@ -139,6 +141,9 @@ struct vfio_device_ops { void (*dma_unmap)(struct vfio_device *vdev, u64 iova, u64 length); int (*device_feature)(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz); + int (*get_mapping_order)(struct vfio_device *device, + unsigned long pgoff, + size_t len); }; =20 #if IS_ENABLED(CONFIG_IOMMUFD) --=20 2.50.1 From nobody Sat Feb 7 10:16:10 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F22A2D73A7 for ; Thu, 4 Dec 2025 15:10:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861019; cv=none; b=IsovOW/WZ5NQ8lEgnzPygvhLk0rmZJhLBkH7IUW1CnXGJkU+CQwEciY2r4aDkoZP7GSJm6z351dhQ/l4wdvfYVdx9CkCpVr6ZO0z/iLqHjBNP28kzUuwEaZWIWIH0QwizwKMQsEuGLE7rtwVEYq0C5SY2bw//wD9CG6G3gaeaLc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861019; c=relaxed/simple; bh=YZrU39LZ+1N2e7UrW746DAf5wMDP77P1FpRzJ7y1WH4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sCJfD7SjWb8U5rDUwNkxEbhLdxPZ76VQKr6LQS6xieYRze929XoL+dhPFV0HtHF4+6WMEAytnw0eDRtWZYGjpL8w20L+nFN+zEqYltlLJhCaAKYiS1eUGSprcFlT6aq2vt5ZGf6AkO8kfSlHsPXNGKM6NRTX+oY7t0X8wXLq5z4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XXSynCSk; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=SPALj42U; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XXSynCSk"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="SPALj42U" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764861016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QqnsDgnWNJJ4lTRWnBZQypy/wcJe61u46quVDoPefb0=; b=XXSynCSkC65uJpf5+qjfAz9DlIFyqQp/yplBS8u9YxSLFobYgNUE5rtwcWDrpDWMloEbq/ xBWRZjam9R6N7Ir1bn59BuaLTVOTchkd98is2eNLAGfFX9vk//llb18dB8QW81j8AUatUo E+KOFtQLkUVwxCz8+RFU0YY+EKCQFYU= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-194-VywnldKNNyuPs7k5jis_bQ-1; Thu, 04 Dec 2025 10:10:14 -0500 X-MC-Unique: VywnldKNNyuPs7k5jis_bQ-1 X-Mimecast-MFC-AGG-ID: VywnldKNNyuPs7k5jis_bQ_1764861014 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-8b5c811d951so227556085a.2 for ; Thu, 04 Dec 2025 07:10:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1764861014; x=1765465814; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QqnsDgnWNJJ4lTRWnBZQypy/wcJe61u46quVDoPefb0=; b=SPALj42UQQF4WnRWhMO88DtGIBiyF2S0UTxChHni1TvpBrvyYGI7Jlhy3mwcCfq+mv 3hPKmoGT1wL8ZPlfOhlAhToLSNDjHz1jHronDyr8rwago0PRZgRHOnAa5pgW4MG7ln8m 3q8IneEl6968xWGEue43Ui5rkIk01w3CFqjNQ4Wv0eNDn97zCotooz529C63XknE8OJb 28BIgCfv1t8XIfW5Knc/R+UMP/JpjYGfKAPt9xdOKPue9SZH/hiiyp0WTqsHtBIN4B4k DtLMnGwO5tkkRahek2n6rvlGFxHvuAwfq7rEb+EczdjP44r+YzX1W4zuFN54J5Fo6Nk3 n2Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764861014; x=1765465814; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QqnsDgnWNJJ4lTRWnBZQypy/wcJe61u46quVDoPefb0=; b=HT4Qa2s+e8Y7biJyrDkaD8ooSjtKgpiwGhUpzhxuk0CCJbM4IsTZ4EML93lpahbIx3 M1PkToIk43rXaMg7Ag71oYwizx5ElwvBvhqUHUSaGk2x+JPkftz6aBo+AZ+2MKI1f0Ga tUfpEP4mcDDr9taiLC7phIbbG51D+FFYhobFX7A82dXO45YRVyPSpVuIBWD4nvValD0d 66I/8PuoIvGLbtMxM3Iv1JrzIcKZanNdaNI/qBWKnvmzIquepOwxZejNjAlr5J3GvbhU JptBsytRSluY0/849J3bMYWGMI3Smrm8KAh3s58dHXv2H2tfmk8P0zrLxjNafMuJu8fR U9BA== X-Forwarded-Encrypted: i=1; AJvYcCUiQAu2TiwCBoO3no7WJYEBKlBXqkWE70VqNII9Dq2j7qQNP35uD6ACociC1sd/QtWbE4OmpBs+KjsKntM=@vger.kernel.org X-Gm-Message-State: AOJu0YyqYC4h6eNyXp8TRHfM3gZTXufSH2w3mgUU7/Tg+ZjDGwq8F1QK Qzeyk+ly2PPnjrvBh+g2c+JAFmFbSQdtzixM9PlVQjPf5RJ0E9IcJx/JD6TKrOvBNvx5LmM5E7s U9toBwFYAI7J0Tl65QuYeJZ4B3bE6Rc1ImzDQ2MiV2fGUdm7RorgSiO96JfwWP8glnw== X-Gm-Gg: ASbGncsu9QAZ4b/fH86f1+B+nr0kax80bqA7dw50Rj3HqT0VxHhs4C5izXeWmRWYbS3 qjWcT8UResVWWbhW4Rh9SDcjLEns1uF5+l3ZfJ0wqzw0sSGjMay+I6PeRfRkRbmNkczOCK/L6av 1rePKQpxA8co1SpCpK1IpesRb3RWTsEZ7qfA5yAnGTy8+Be00PUwbxXBln7QBQzgh7BqhFjTtNE FET13dyL3j19u5PMf3hRtWwmbUHGJFemuUaXRaqcuNg9tNcL5G8qzhpDeBzBZVHiKv7vKhmj+Eu R7KxdAMAKS2EQ/tJgdp1ZKX+ldMvrh9wYlm7YE+jIgh52HvPYukDruuk7lfiWnt6bHHqBAsU+9Q Y X-Received: by 2002:a05:620a:4493:b0:8b2:f145:7f2e with SMTP id af79cd13be357-8b5e77339b4mr883014885a.77.1764861014178; Thu, 04 Dec 2025 07:10:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IHg1P7IVsR21tbM+ynMYgTnLBafpvtkF3gYNVFFKjN2cvYYIUWpdyVgk1EkcF9l/NqzRg+vQw== X-Received: by 2002:a05:620a:4493:b0:8b2:f145:7f2e with SMTP id af79cd13be357-8b5e77339b4mr883007985a.77.1764861013650; Thu, 04 Dec 2025 07:10:13 -0800 (PST) Received: from x1.com ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b627a9fd23sm154263285a.46.2025.12.04.07.10.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 07:10:13 -0800 (PST) From: Peter Xu To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jason Gunthorpe , Nico Pache , Zi Yan , Alex Mastro , David Hildenbrand , Alex Williamson , Zhi Wang , David Laight , Yi Liu , Ankit Agrawal , peterx@redhat.com, Kevin Tian , Andrew Morton Subject: [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Date: Thu, 4 Dec 2025 10:10:03 -0500 Message-ID: <20251204151003.171039-5-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251204151003.171039-1-peterx@redhat.com> References: <20251204151003.171039-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch enables best-effort mmap() for vfio-pci bars even without MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA addresses) to start enabling huge pfnmaps on VFIO bars. Here the trick is making sure the MMIO PFNs will be aligned with the VAs allocated from mmap() when !MAP_FIXED, so that whatever returned from mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable for huge pfnmaps as much as possible. To achieve that, a custom vfio_device's get_mapping_hint() for vfio-pci devices is needed. Note that BAR's MMIO physical addresses should normally be guaranteed to be BAR-size aligned. It means the MMIO address will also always be aligned with vfio-pci's file offset address space, per VFIO_PCI_OFFSET_SHIFT. With that guaranteed, VA allocator can calculate the alignment with pgoff, which will be further aligned with the MMIO physical addresses to be mapped in the VMA later. So far, stick with the simple plan to rely on the hardware assumption that should always be true. Leave it for later if pgoff needs adjustments when there's a real demand of it when calculating the alignment. For discussion on the requirement of this feature, see: https://lore.kernel.org/linux-pci/20250529214414.1508155-1-amastro@fb.com/ Signed-off-by: Peter Xu Tested-by: Alex Mastro --- drivers/vfio/pci/vfio_pci.c | 1 + drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++++++++ include/linux/vfio_pci_core.h | 2 ++ 3 files changed, 52 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index ac10f14417f2f..8f29037cee6eb 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -145,6 +145,7 @@ static const struct vfio_device_ops vfio_pci_ops =3D { .detach_ioas =3D vfio_iommufd_physical_detach_ioas, .pasid_attach_ioas =3D vfio_iommufd_physical_pasid_attach_ioas, .pasid_detach_ioas =3D vfio_iommufd_physical_pasid_detach_ioas, + .get_mapping_order =3D vfio_pci_core_get_mapping_order, }; =20 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id= *id) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 7dcf5439dedc9..28ab37715acc0 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1640,6 +1640,55 @@ static unsigned long vma_to_pfn(struct vm_area_struc= t *vma) return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; } =20 +/* + * Hint function for mmap() about the size of mapping to be carried out. + * This helps to enable huge pfnmaps as much as possible on BAR mappings. + * + * This function does the minimum check on mmap() parameters to make the + * hint valid only. The majority of mmap() sanity check will be done later + * in mmap(). + */ +int vfio_pci_core_get_mapping_order(struct vfio_device *device, + unsigned long pgoff, size_t len) +{ + struct vfio_pci_core_device *vdev =3D + container_of(device, struct vfio_pci_core_device, vdev); + struct pci_dev *pdev =3D vdev->pdev; + unsigned int index =3D pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + unsigned long req_start; + size_t phys_len; + + /* Currently, only bars 0-5 supports huge pfnmap */ + if (index >=3D VFIO_PCI_ROM_REGION_INDEX) + return 0; + + /* + * NOTE: we're keeping things simple as of now, assuming the + * physical address of BARs (aka, pci_resource_start(pdev, index)) + * should always be aligned with pgoff in vfio-pci's address space. + */ + req_start =3D (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1= ); + phys_len =3D PAGE_ALIGN(pci_resource_len(pdev, index)); + + /* + * If this happens, it will probably fail mmap() later.. mapping + * hint isn't important anymore. + */ + if (req_start >=3D phys_len) + return 0; + + phys_len =3D MIN(phys_len - req_start, len); + + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >=3D PUD_SIZE) + return PUD_ORDER; + + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP) && phys_len >=3D PMD_SIZE) + return PMD_ORDER; + + return 0; +} +EXPORT_SYMBOL_GPL(vfio_pci_core_get_mapping_order); + static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf, unsigned int order) { diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index f541044e42a2a..d320dfacc5681 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -119,6 +119,8 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vde= v, char __user *buf, size_t count, loff_t *ppos); ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __us= er *buf, size_t count, loff_t *ppos); +int vfio_pci_core_get_mapping_order(struct vfio_device *device, + unsigned long pgoff, size_t len); int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struc= t *vma); void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int cou= nt); int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf); --=20 2.50.1