From nobody Mon Feb 9 00:54:41 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FB77280A29 for ; Thu, 4 Dec 2025 15:10:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861016; cv=none; b=V211OD9gW/37t5QQFDzMfcLjELZSld5j866QYKzCCnEBCVX0XatWpoVZpy7rlziZnTK8JGLlptfsL7sRFGQATtmtrfTSExyex4YVgPBCbD+At5baqsatc/+vieOQ63+td68/mlQzH1avohDio3LZrs7DNRa4MQahMPCTlTJkWok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764861016; c=relaxed/simple; bh=/4MuTMD61olHbfivKDj1nhPnv6vtHlkiwYjbKFAVOmA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tVbcgSwXlkVRdECGKUMQL/V80DCZTeYm33tHVkE2FZ16/KOkyUXFtLPunP8zafzWS64/Phw1bkhHy6GrNr7CkCt0mzXqFznfCjhRGTqNhHSIx+Higl8l6zagidv3H8bl5oNgKcYxOjjUWWXcu3mEIDlqjzFXLP80s9YYKuz6oVc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TN2Hq5gV; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=hFVOQqOR; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TN2Hq5gV"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="hFVOQqOR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1764861013; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=shBKdGm4YTOEDFQeXrEUnQGLwKuFe6yb6HVy/vXuL+M=; b=TN2Hq5gV8U6PRy3OS52tLJTEus1VdmHx/GbGLMxJ+4uNEd/ENPJegv84w37M3L0pUQiqBd O5PZWOhacu5YyZByE3VSSo41YiU2xBB+vWIKzu0LWHuNzIFxvGnY2+ICv8J3211jVUScEK Blpdtd4ruUHXtsT0bpcR8pz0ab/kI64= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-681-zC9QtwcOPDyALuuBcIvA8w-1; Thu, 04 Dec 2025 10:10:11 -0500 X-MC-Unique: zC9QtwcOPDyALuuBcIvA8w-1 X-Mimecast-MFC-AGG-ID: zC9QtwcOPDyALuuBcIvA8w_1764861011 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-8b225760181so127417785a.2 for ; Thu, 04 Dec 2025 07:10:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1764861011; x=1765465811; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=shBKdGm4YTOEDFQeXrEUnQGLwKuFe6yb6HVy/vXuL+M=; b=hFVOQqORF6F0Fcyni/hXlu/o/8XVfo9dOxQQWB4me5Un66wiHXFMfLnqP6i2KlhcOe LnBg8X8jkMqu5VRpziMd6RU4wnvpt1L5JEGhJpZO2POZzbcpXZhNVG2FxAZMyqMiszb3 4fJkO96F8Ku6ikzRjaP/x590bxtE4EHu0WM+S66bwZJ9PkMREZrMVGPbHNi4a2aiVJ+i 4f/JRg3HY2OdN3xUaFnkUwGhPphGRgHt6YKOmKlhvMPxifyg0TA7CLgzUSIoTFEhh4zL R+vLZUvodp1MdKwnV5T14TMbrPCm/SlChEi/wjDtIoY13z2q73Dm2j+Qw4rkKnJSwYL3 j4qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764861011; x=1765465811; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=shBKdGm4YTOEDFQeXrEUnQGLwKuFe6yb6HVy/vXuL+M=; b=gJrZ8NKf/H8dhgVcvXpnV2X26cfo7uizOosG+bNGRmmVPY0jv4UUXJwxfyCEJz06fj 5QxSDMZkNdwMivkSWfjgoVKoIbZF7B6mdPeU9+I3+DFtckvSaJ1ZzE6CwbNgHUaWkz56 RuHRinKwuKWSHap8PxcZ2r8eQxTyB5qJdp38tcEwmy3A3Dw07e9zqjBtPem68w8hxEIi DUgTnglznzVKPZvPNXxJaflirrccAprZOVYH97BVzDEJ+F5swa+iU/ptob7FzTVvHqHa EWahyyrYT6rDYy9Rl66DZAVus190rPlRvpYUY3YXcf/+bulOikzcHVh3zhT5WDvSgDFc wqNA== X-Forwarded-Encrypted: i=1; AJvYcCV/ho4vrfK1Rd1Abma/9MMQNR3ndgoDjMzs9RVYU+jn/j2+zDoD7kqP/6h4ww6paYCZXQCV3jQnBupecxQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyZchleF2OwDxVi7CGzOgVu6A8vLjGz1FLL6Qark3VdzzbVUE2S NfCPfDxnzCB6jveFskFCEXAXb5vuWWBXgV1Fb8FdgQ19ec44rWWsNI/AnFewB4efFMgANfxkLgd S62hXKz0FFuvCJvMRMZYn1vQiOTP3buAsDPHafNMHrbGDZylAnQAH4dS74TF0Qs9A+A== X-Gm-Gg: ASbGncuspkOotZP9N/JNsOYPHPHUrCTNARYsP8YDwwSf+suxLu3EjeEyCmEeLDfzblz zrwV0DhHYQaV5KZxDJySCkn61xtEP6eUfthS3gD58ZshbvJFCavfeXTIGaxgSl6Rub0XOrAOY6x FIiIZhmBjUNtgm2tYHMQuOMOH75HWA+YQiiZMvxq1ov4ieabtCTSghAZLI8qBTjahsDK5WJP951 ShHxaCIAwj6ggsN3m/yLyG93FjpRIErBibLbnTN/A148QM+LUTbzAuKBipseGnZaB/F3zkqtNtO NBRmuFT1QvJe2kHqiaHFvESxREH4H1/ajV+Ls1lXGbTZuRza9GnubC6lX/sJNwLDKgPlGc/ifZa l X-Received: by 2002:a05:620a:408c:b0:89f:52d:8560 with SMTP id af79cd13be357-8b6181ed8b6mr519468485a.47.1764861010507; Thu, 04 Dec 2025 07:10:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IEUYHhbX8YMXNGXZOpMbNo0BJJ62y/raaT2i4rcEhXIK6eeuVSL9soO5R2l6I/8tLGphH/NCA== X-Received: by 2002:a05:620a:408c:b0:89f:52d:8560 with SMTP id af79cd13be357-8b6181ed8b6mr519458985a.47.1764861009954; Thu, 04 Dec 2025 07:10:09 -0800 (PST) Received: from x1.com ([142.188.210.156]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b627a9fd23sm154263285a.46.2025.12.04.07.10.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Dec 2025 07:10:09 -0800 (PST) From: Peter Xu To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jason Gunthorpe , Nico Pache , Zi Yan , Alex Mastro , David Hildenbrand , Alex Williamson , Zhi Wang , David Laight , Yi Liu , Ankit Agrawal , peterx@redhat.com, Kevin Tian , Andrew Morton Subject: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Date: Thu, 4 Dec 2025 10:10:01 -0500 Message-ID: <20251204151003.171039-3-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251204151003.171039-1-peterx@redhat.com> References: <20251204151003.171039-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add one new file operation, get_mapping_order(). It can be used by file backends to report mapping order hints. By default, Linux assumed we will map in PAGE_SIZE chunks. With this hint, the driver can report the possibility of mapping chunks that are larger than PAGE_SIZE. Then, the VA allocator will try to use that as alignment when allocating the VA ranges. This is useful because when chunks to be mapped are larger than PAGE_SIZE, VA alignment matters and it needs to be aligned with the size of the chunk to be mapped. Said that, no matter what is the alignment used for the VA allocation, the driver can still decide which size to map the chunks. It is also not an issue if it keeps mapping in PAGE_SIZE. get_mapping_order() is defined to take three parameters. Besides the 1st parameter which will be the file object pointer, the 2nd + 3rd parameters being the pgoff + size of the mmap() request. Its retval is defined as the order, which must be non-negative to enable the alignment. When zero is returned, it should behave like when the hint is not provided, IOW, alignment will still be PAGE_SIZE. When the order is too big, ignore the hint. Normally drivers are trusted, so it's more of an extra layer of safety measure. Suggested-by: Jason Gunthorpe Signed-off-by: Peter Xu --- Documentation/filesystems/vfs.rst | 4 +++ include/linux/fs.h | 1 + mm/mmap.c | 59 +++++++++++++++++++++++++++---- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/= vfs.rst index 4f13b01e42eb5..b707ddbebbf52 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1069,6 +1069,7 @@ This describes how the VFS can manipulate an open fil= e. As of kernel int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigne= d long, unsigned long, unsigned long); + int (*get_mapping_order)(struct file *, unsigned long, size_t); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t = *, size_t, unsigned int); @@ -1165,6 +1166,9 @@ otherwise noted. ``get_unmapped_area`` called by the mmap(2) system call =20 +``get_mapping_order`` + called by the mmap(2) system call to get mapping order hint + ``check_flags`` called by the fcntl(2) system call for F_SETFL command =20 diff --git a/include/linux/fs.h b/include/linux/fs.h index dd3b57cfadeeb..5ba373576bfe5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2287,6 +2287,7 @@ struct file_operations { int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned= long, unsigned long, unsigned long); + int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t l= en); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *= , size_t, unsigned int); diff --git a/mm/mmap.c b/mm/mmap.c index 8fa397a18252e..be3dd0623f00c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -808,6 +808,33 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_s= truct *mm, struct file *fi return arch_get_unmapped_area(filp, addr, len, pgoff, flags, vm_flags); } =20 +static inline bool file_has_mmap_order_hint(struct file *file) +{ + return file && file->f_op && file->f_op->get_mapping_order; +} + +static inline bool +mmap_should_align(struct file *file, unsigned long addr, unsigned long len) +{ + /* When THP not enabled at all, skip */ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return false; + + /* Never try any alignment if the mmap() address hint is provided */ + if (addr) + return false; + + /* Anonymous THP could use some better alignment when len aligned */ + if (!file) + return IS_ALIGNED(len, PMD_SIZE); + + /* + * It's a file mapping, no address hint provided by caller, try any + * alignment if the file backend would provide a hint + */ + return file_has_mmap_order_hint(file); +} + unsigned long __get_unmapped_area(struct file *file, unsigned long addr, unsigned long l= en, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags) @@ -815,8 +842,9 @@ __get_unmapped_area(struct file *file, unsigned long ad= dr, unsigned long len, unsigned long (*get_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long) =3D NULL; - unsigned long error =3D arch_mmap_check(addr, len, flags); + unsigned long align; + if (error) return error; =20 @@ -841,13 +869,30 @@ __get_unmapped_area(struct file *file, unsigned long = addr, unsigned long len, =20 if (get_area) { addr =3D get_area(file, addr, len, pgoff, flags); - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file - && !addr /* no hint */ - && IS_ALIGNED(len, PMD_SIZE)) { - /* Ensures that larger anonymous mappings are THP aligned. */ + } else if (mmap_should_align(file, addr, len)) { + if (file_has_mmap_order_hint(file)) { + int order; + /* + * Allow driver to opt-in on the order hint. + * + * Sanity check on the order returned. Treating + * either negative or too big order to be invalid, + * where alignment will be skipped. + */ + order =3D file->f_op->get_mapping_order(file, pgoff, len); + if (order < 0) + order =3D 0; + if (check_shl_overflow(PAGE_SIZE, order, &align)) + /* No alignment applied */ + align =3D PAGE_SIZE; + } else { + /* Default alignment for anonymous THPs */ + align =3D PMD_SIZE; + } + addr =3D thp_get_unmapped_area_vmflags(file, addr, len, - pgoff, flags, PMD_SIZE, - vm_flags); + pgoff, flags, + align, vm_flags); } else { addr =3D mm_get_unmapped_area_vmflags(current->mm, file, addr, len, pgoff, flags, vm_flags); --=20 2.50.1