From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D62221F890B; Fri, 24 Jan 2025 23:56:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762972; cv=none; b=Pp97rJCNPepMkHckPti6ljlaT4Sr7BfpjJv5WIws/SsYej5cg2mHZugiAlPEoxvM6LoP+6ZVNUzqNklxK9QJ4LeydTvAwoD6RMB+ghsQLosJzd6CQHQ8YXWFU62BBHxQYD7B1YlmA3w5gI83qsMjhdexIC1NsTnYXU582+nr+e0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762972; c=relaxed/simple; bh=kFEgH5mCfrIFBrw/Jyzmbzc/gWtoLzw19ckTMatStHM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dfu8fGlEcpoeSb/f267kNB44DH9ziD8z2OK9TZvs3pfaVRg1SZzsxqReTw5GteH85TBBdR6fiJ3++9Q60HZw9XORxbUemfuBOwEodLZYIG4Ohbsefk2sZVzIU6CfV8vNhEFXZheAzm3o6c7QmzY3UrznVJBBByo8dQldVlmiopE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=MxaiLSyw; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="MxaiLSyw" Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIJo4r001375; Fri, 24 Jan 2025 23:55:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=SBsij 5SflBt2Meq/0t4PQFvud6q4J6+i2o2Phuq8JSQ=; b=MxaiLSywB8uB8wmqgMOmK yl8K0Aklj279cAA7G4xPcodpnwSRWYGHQOHf2DOpQym6+foRxx8LjV2Ka+hBsOID wz/yQUVE2ZaekBfUZm+dXTq0fwV6+vmgZ0s4W2NhDX9NMl8lD3Yf5hQ6Q7yRo5lY uB9MHV3wuG+pWVfvAm2LlhlCqYAugk4pv/pHWz3LwOPVIss0pYx49iubxWfXyAQN aBTBKPoW/rxgHA26oxVTbXk4H9FzDJV1+qkU8XlBoGAenQTWpXa3S2aXGdfQgdOO pyFF2LgodTDNlH5Beu6mZ/FBTfrHCThLeH6AwK+oT/UHmlU8NQJNZ1IUmzHD4F26 A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awyh5rk1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:06 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMD09R036431; Fri, 24 Jan 2025 23:55:05 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4a2t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:05 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPi018051; Fri, 24 Jan 2025 23:55:04 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-2; Fri, 24 Jan 2025 23:55:04 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 01/20] mm: Add msharefs filesystem Date: Fri, 24 Jan 2025 15:54:35 -0800 Message-ID: <20250124235454.84587-2-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: H3g567xUR2NiiHY65FwJbZPol2W-orgQ X-Proofpoint-GUID: H3g567xUR2NiiHY65FwJbZPol2W-orgQ Content-Type: text/plain; charset="utf-8" From: Khalid Aziz Add a ram-based filesystem that contains page table sharing information and files that enables processes to share page tables. This patch adds the basic filesystem that can be mounted and a CONFIG_MSHARE option for compiling support in a kernel. Signed-off-by: Khalid Aziz Signed-off-by: Anthony Yznaga --- Documentation/filesystems/msharefs.rst | 107 +++++++++++++++++++++++++ include/uapi/linux/magic.h | 1 + mm/Kconfig | 9 +++ mm/Makefile | 4 + mm/mshare.c | 96 ++++++++++++++++++++++ 5 files changed, 217 insertions(+) create mode 100644 Documentation/filesystems/msharefs.rst create mode 100644 mm/mshare.c diff --git a/Documentation/filesystems/msharefs.rst b/Documentation/filesys= tems/msharefs.rst new file mode 100644 index 000000000000..c3c7168aa18f --- /dev/null +++ b/Documentation/filesystems/msharefs.rst @@ -0,0 +1,107 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +msharefs - a filesystem to support shared page tables +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +msharefs is a ram-based filesystem that allows multiple processes to +share page table entries for shared pages. To enable support for +msharefs the kernel must be compiled with CONFIG_MSHARE set. + +msharefs is typically mounted like this:: + + mount -t msharefs none /sys/fs/mshare + +A file created on msharefs creates a new shared region where all +processes mapping that region will map it using shared page table +entries. ioctls are used to initialize or retrieve the start address +and size of a shared region and to map objects in the shared +region. It is important to note that an msharefs file is a control +file for the shared region and does not contain the contents +of the region itself. + +Here are the basic steps for using mshare:: + +1. Mount msharefs on /sys/fs/mshare:: + + mount -t msharefs msharefs /sys/fs/mshare + +2. mshare regions have alignment and size requirements. Start + address for the region must be aligned to an address boundary and + be a multiple of fixed size. This alignment and size requirement + can be obtained by reading the file ``/sys/fs/mshare/mshare_info`` + which returns a number in text format. mshare regions must be + aligned to this boundary and be a multiple of this size. + +3. For the process creating an mshare region:: + +a. Create a file on /sys/fs/mshare, for example: + +.. code-block:: c + + fd =3D open("/sys/fs/mshare/shareme", + O_RDWR|O_CREAT|O_EXCL, 0600); + +b. Establish the starting address and size of the region: + +.. code-block:: c + + struct mshare_info minfo; + + minfo.start =3D TB(2); + minfo.size =3D BUFFER_SIZE; + ioctl(fd, MSHAREFS_SET_SIZE, &minfo); + +c. Map some memory in the region: + +.. code-block:: c + + struct mshare_create mcreate; + + mcreate.addr =3D TB(2); + mcreate.size =3D BUFFER_SIZE; + mcreate.offset =3D 0; + mcreate.prot =3D PROT_READ | PROT_WRITE; + mcreate.flags =3D MAP_ANONYMOUS | MAP_SHARED | MAP_FIXED; + mcreate.fd =3D -1; + + ioctl(fd, MSHAREFS_CREATE_MAPPING, &mcreate); + +d. Map the mshare region into the process: + +.. code-block:: c + + mmap((void *)TB(2), BUF_SIZE, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +e. Write and read to mshared region normally. + + +4. For processes attaching an mshare region:: + +a. Open the file on msharefs, for example: + +.. code-block:: c + + fd =3D open("/sys/fs/mshare/shareme", O_RDWR); + +b. Get information about mshare'd region from the file: + +.. code-block:: c + + struct mshare_info minfo; + + ioctl(fd, MSHAREFS_GET_SIZE, &minfo); + +c. Map the mshare'd region into the process: + +.. code-block:: c + + mmap(minfo.start, minfo.size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +5. To delete the mshare region: + +.. code-block:: c + + unlink("/sys/fs/mshare/shareme"); diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..e53dd6063cba 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define MSHARE_MAGIC 0x4d534852 /* "MSHR" */ =20 #endif /* __LINUX_MAGIC_H__ */ diff --git a/mm/Kconfig b/mm/Kconfig index 1b501db06417..ba3dbe31f86a 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1358,6 +1358,15 @@ config PT_RECLAIM =20 Note: now only empty user PTE page table pages will be reclaimed. =20 +config MSHARE + bool "Mshare" + depends on MMU + help + Enable msharefs: A ram-based filesystem that allows multiple + processes to share page table entries for shared pages. A file + created on msharefs represents a shared region where all processes + mapping that region will map objects within it with shared PTEs. + Ioctls are used to configure and map objects into the shared region =20 source "mm/damon/Kconfig" =20 diff --git a/mm/Makefile b/mm/Makefile index 850386a67b3e..68bc967863f9 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -48,6 +48,10 @@ ifdef CONFIG_64BIT mmu-$(CONFIG_MMU) +=3D mseal.o endif =20 +ifdef CONFIG_MSHARE +mmu-$(CONFIG_MMU) +=3D mshare.o +endif + obj-y :=3D filemap.o mempool.o oom_kill.o fadvise.o \ maccess.o page-writeback.o folio-compat.o \ readahead.o swap.o truncate.o vmscan.o shrinker.o \ diff --git a/mm/mshare.c b/mm/mshare.c new file mode 100644 index 000000000000..49d32e0c20d2 --- /dev/null +++ b/mm/mshare.c @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Enable cooperating processes to share page table between + * them to reduce the extra memory consumed by multiple copies + * of page tables. + * + * This code adds an in-memory filesystem - msharefs. + * msharefs is used to manage page table sharing + * + * + * Copyright (C) 2024 Oracle Corp. All rights reserved. + * Author: Khalid Aziz + * + */ + +#include +#include +#include + +static const struct file_operations msharefs_file_operations =3D { + .open =3D simple_open, +}; + +static const struct super_operations mshare_s_ops =3D { + .statfs =3D simple_statfs, +}; + +static int +msharefs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + + sb->s_blocksize =3D PAGE_SIZE; + sb->s_blocksize_bits =3D PAGE_SHIFT; + sb->s_magic =3D MSHARE_MAGIC; + sb->s_op =3D &mshare_s_ops; + sb->s_time_gran =3D 1; + + inode =3D new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino =3D 1; + inode->i_mode =3D S_IFDIR | 0777; + simple_inode_init_ts(inode); + inode->i_op =3D &simple_dir_inode_operations; + inode->i_fop =3D &simple_dir_operations; + set_nlink(inode, 2); + + sb->s_root =3D d_make_root(inode); + if (!sb->s_root) + return -ENOMEM; + + return 0; +} + +static int +msharefs_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, msharefs_fill_super); +} + +static const struct fs_context_operations msharefs_context_ops =3D { + .get_tree =3D msharefs_get_tree, +}; + +static int +mshare_init_fs_context(struct fs_context *fc) +{ + fc->ops =3D &msharefs_context_ops; + return 0; +} + +static struct file_system_type mshare_fs =3D { + .name =3D "msharefs", + .init_fs_context =3D mshare_init_fs_context, + .kill_sb =3D kill_litter_super, +}; + +static int __init +mshare_init(void) +{ + int ret; + + ret =3D sysfs_create_mount_point(fs_kobj, "mshare"); + if (ret) + return ret; + + ret =3D register_filesystem(&mshare_fs); + if (ret) + sysfs_remove_mount_point(fs_kobj, "mshare"); + + return ret; +} + +core_initcall(mshare_init); --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 643C41F152E; Fri, 24 Jan 2025 23:56:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762970; cv=none; b=A+SMiKiMIzkGnXyZNZOT8RqRFGSDVqFJORTPeJDwoDUtyjuVrqv/ZmGHASUWMHjZ8Uj2MeVk5o5S8ey/wQkSmiIrM8KhswHH4R0wyclGxs+1qTPx90bSzQKi17FrnqUIG+cboK+SvBfp2WErScdCS7qlSMNM+5l7Fre/gwYiG3k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762970; c=relaxed/simple; bh=vJC5l/Xtc8mUerZxSYMEwyUFjgnyCqGHI/uVzckoJfM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rVhwr+C/d21mPOy+bGbUY0VuEh9DOsjMbwZxeb0M4MgrsJpno1H/Dv9ckYqYjcySjcWSX268O+VmyrIPOIMBHh/6IpSlevjESegbhNJr0EgQyrLx0oiIK+FmKDLZg71OmoASMHJDdWweA8WMLeRBYH1WRLjP4Scutknm0Yfqo5w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=HLHVaWZH; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="HLHVaWZH" Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OI9Khf021995; Fri, 24 Jan 2025 23:55:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=In3/Q U715xLhM8W3K3qF4Q+QnYxI0SeFwCtvEoqrM48=; b=HLHVaWZH0xw0fUWfpJLKK Yj4Z6fAZf9KRr4nhNheCUsucOkOPRtT9rUPzrmzkRBCKuu1sVUtXUNSLOfpV8vnO mzsnDq8PVtK0+cPlMv21opwInIiL9RXaIcA8qUMCwfBtDkTa+Cw2l6Hgjb4lxtL1 8wHiwShDXIcH5fbohFiO6wkw3j7XMRY3aV3hfxkKGc80voJOUS8yjDJwnl2tCx7o 8XHCVk4ukhdWd+KUJuxLfcCnVRXk6LKglh3ldQ9aFYOEdZD7NYB639sjv6SFJPNt p1Xgj7uGT9HaVf/GhAYRM9w1lsOsYc35T/QPV1hWYjMmqw/v5CETdhBxzWFAVgZV Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awpx5vpr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:11 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMl7J0036434; Fri, 24 Jan 2025 23:55:10 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4a4w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:10 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPk018051; Fri, 24 Jan 2025 23:55:09 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-3; Fri, 24 Jan 2025 23:55:08 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 02/20] mm/mshare: pre-populate msharefs with information file Date: Fri, 24 Jan 2025 15:54:36 -0800 Message-ID: <20250124235454.84587-3-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: HTWbFrhMb8YrF3VvR_cxKBeK5wrE_S0U X-Proofpoint-GUID: HTWbFrhMb8YrF3VvR_cxKBeK5wrE_S0U Content-Type: text/plain; charset="utf-8" From: Khalid Aziz Users of mshare need to know the size and alignment requirement for shared regions. Pre-populate msharefs with a file, mshare_info, that provides this information. For now, pagetable sharing is hardcoded to be at the PUD level. Signed-off-by: Khalid Aziz Signed-off-by: Anthony Yznaga --- mm/mshare.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 75 insertions(+), 2 deletions(-) diff --git a/mm/mshare.c b/mm/mshare.c index 49d32e0c20d2..6d3760d1af8e 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -17,18 +17,74 @@ #include #include =20 +const unsigned long mshare_align =3D P4D_SIZE; + static const struct file_operations msharefs_file_operations =3D { .open =3D simple_open, }; =20 +struct msharefs_info { + struct dentry *info_dentry; +}; + +static ssize_t +mshare_info_read(struct file *file, char __user *buf, size_t nbytes, + loff_t *ppos) +{ + char s[80]; + + sprintf(s, "%ld\n", mshare_align); + return simple_read_from_buffer(buf, nbytes, ppos, s, strlen(s)); +} + +static const struct file_operations mshare_info_ops =3D { + .read =3D mshare_info_read, + .llseek =3D noop_llseek, +}; + static const struct super_operations mshare_s_ops =3D { .statfs =3D simple_statfs, }; =20 +static int +msharefs_create_mshare_info(struct super_block *sb) +{ + struct msharefs_info *info =3D sb->s_fs_info; + struct dentry *root =3D sb->s_root; + struct dentry *dentry; + struct inode *inode; + int ret; + + ret =3D -ENOMEM; + inode =3D new_inode(sb); + if (!inode) + goto out; + + inode->i_ino =3D 2; + simple_inode_init_ts(inode); + inode_init_owner(&nop_mnt_idmap, inode, NULL, S_IFREG | 0444); + inode->i_fop =3D &mshare_info_ops; + + dentry =3D d_alloc_name(root, "mshare_info"); + if (!dentry) + goto out; + + info->info_dentry =3D dentry; + d_add(dentry, inode); + + return 0; +out: + iput(inode); + + return ret; +} + static int msharefs_fill_super(struct super_block *sb, struct fs_context *fc) { + struct msharefs_info *info; struct inode *inode; + int ret; =20 sb->s_blocksize =3D PAGE_SIZE; sb->s_blocksize_bits =3D PAGE_SHIFT; @@ -36,6 +92,12 @@ msharefs_fill_super(struct super_block *sb, struct fs_co= ntext *fc) sb->s_op =3D &mshare_s_ops; sb->s_time_gran =3D 1; =20 + info =3D kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + + sb->s_fs_info =3D info; + inode =3D new_inode(sb); if (!inode) return -ENOMEM; @@ -51,7 +113,9 @@ msharefs_fill_super(struct super_block *sb, struct fs_co= ntext *fc) if (!sb->s_root) return -ENOMEM; =20 - return 0; + ret =3D msharefs_create_mshare_info(sb); + + return ret; } =20 static int @@ -71,10 +135,19 @@ mshare_init_fs_context(struct fs_context *fc) return 0; } =20 +static void +msharefs_kill_super(struct super_block *sb) +{ + struct msharefs_info *info =3D sb->s_fs_info; + + kfree(info); + kill_litter_super(sb); +} + static struct file_system_type mshare_fs =3D { .name =3D "msharefs", .init_fs_context =3D mshare_init_fs_context, - .kill_sb =3D kill_litter_super, + .kill_sb =3D msharefs_kill_super, }; =20 static int __init --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3046F1EEA2C; Fri, 24 Jan 2025 23:56:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762974; cv=none; b=gXPCSketluHsf20hhBGq29Mjy431hKgrG7V/aYTQ5rJnyiMzFIuwX0bs6+faaK/vRRbFs1wvc/iFzTtIh0X4Mocxm6C6pAOX+hG/ToctAMUt2lJqK3j1DeTptQ4WJQF5KtWN8pEhp3Q1uRVIpYkrtcvXMJfN+q6+a6K7JafFQgo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762974; c=relaxed/simple; bh=ac6bMDUbpSNsLS1pkOOBz2xI7DS0JNfwYfCLr8acXrI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aEQ08VqpBh9e3aXYsfSY0CNrrj8pbnOjmbBQUVzScpXpZBGp8oAzU/vid9PUinXHTU6zhEHSHtxRLm+x+SkoH9DGc2KS6A9r82pJTTw+DjOLzNtEsJPNPM5WH7YP5Gj9vkVpMg2l+zMYN6wmKmCx8DsuTJzMAh4LM7Np0MBLjaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=afhvDSIm; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="afhvDSIm" Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIRdZF000609; Fri, 24 Jan 2025 23:55:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=nqagY yeBhaL2RlbKKTLc6lxU3rvGauZsyUuzovUQ98I=; b=afhvDSImLqAMVCsYQmNcG ariAxBqtOlnnMHGzw4+P9TuWi+BLYUBCKMY/F/pj5x5FO7305VUzzvylCnuwmP4B hhYHZynes5p/hwSoIKvX5ccNh78E9B6di+gQeN+eIe16U3Df3EiBZIxNruSiHRU1 SJ50wwpE0RCkh3J6hYADyKFwjnQbcBOZwctnm2XyerXnxLsMg1bRN4OGjrNW+imS 2mYM2mcM+4MR+bLIYKhWG2LvVwU9fbAiSjcoxgT8/1PTvpcKSwoyu2z+mjgfJzs6 mVUJ4aXylk0CMjxwTAbeXUPl4y5yw2BzkPNJ2p8eCfMQMM1C7R1iqiPhQIjQHyEG w== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44b06j5j3u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:14 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OLP7ek036648; Fri, 24 Jan 2025 23:55:14 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4a6c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:14 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPm018051; Fri, 24 Jan 2025 23:55:13 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-4; Fri, 24 Jan 2025 23:55:12 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 03/20] mm/mshare: make msharefs writable and support directories Date: Fri, 24 Jan 2025 15:54:37 -0800 Message-ID: <20250124235454.84587-4-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=781 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: q87c7Eu1K_1GSRczZ328BcrLzGhppaPX X-Proofpoint-GUID: q87c7Eu1K_1GSRczZ328BcrLzGhppaPX Content-Type: text/plain; charset="utf-8" From: Khalid Aziz Make msharefs filesystem writable and allow creating directories to support better access control to mshare'd regions defined in msharefs. Signed-off-by: Khalid Aziz Signed-off-by: Anthony Yznaga --- mm/mshare.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 116 insertions(+), 1 deletion(-) diff --git a/mm/mshare.c b/mm/mshare.c index 6d3760d1af8e..b755346da827 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -19,14 +19,129 @@ =20 const unsigned long mshare_align =3D P4D_SIZE; =20 +static const struct inode_operations msharefs_dir_inode_ops; +static const struct inode_operations msharefs_file_inode_ops; + static const struct file_operations msharefs_file_operations =3D { .open =3D simple_open, }; =20 +static struct inode +*msharefs_get_inode(struct mnt_idmap *idmap, struct super_block *sb, + const struct inode *dir, umode_t mode) +{ + struct inode *inode =3D new_inode(sb); + + if (!inode) + return ERR_PTR(-ENOMEM); + + inode->i_ino =3D get_next_ino(); + inode_init_owner(&nop_mnt_idmap, inode, dir, mode); + simple_inode_init_ts(inode); + + switch (mode & S_IFMT) { + case S_IFREG: + inode->i_op =3D &msharefs_file_inode_ops; + inode->i_fop =3D &msharefs_file_operations; + break; + case S_IFDIR: + inode->i_op =3D &msharefs_dir_inode_ops; + inode->i_fop =3D &simple_dir_operations; + inc_nlink(inode); + break; + default: + discard_new_inode(inode); + return ERR_PTR(-EINVAL); + } + + return inode; +} + +static int +msharefs_mknod(struct mnt_idmap *idmap, struct inode *dir, + struct dentry *dentry, umode_t mode) +{ + struct inode *inode; + + inode =3D msharefs_get_inode(idmap, dir->i_sb, dir, mode); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + d_instantiate(dentry, inode); + dget(dentry); + inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); + + return 0; +} + +static int +msharefs_create(struct mnt_idmap *idmap, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + return msharefs_mknod(idmap, dir, dentry, mode | S_IFREG); +} + +static int +msharefs_mkdir(struct mnt_idmap *idmap, struct inode *dir, + struct dentry *dentry, umode_t mode) +{ + int ret =3D msharefs_mknod(idmap, dir, dentry, mode | S_IFDIR); + + if (!ret) + inc_nlink(dir); + return ret; +} + struct msharefs_info { struct dentry *info_dentry; }; =20 +static inline bool +is_msharefs_info_file(const struct dentry *dentry) +{ + struct msharefs_info *info =3D dentry->d_sb->s_fs_info; + + return info->info_dentry =3D=3D dentry; +} + +static int +msharefs_rename(struct mnt_idmap *idmap, + struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry, + unsigned int flags) +{ + if (is_msharefs_info_file(old_dentry) || + is_msharefs_info_file(new_dentry)) + return -EPERM; + + return simple_rename(idmap, old_dir, old_dentry, new_dir, + new_dentry, flags); +} + +static int +msharefs_unlink(struct inode *dir, struct dentry *dentry) +{ + if (is_msharefs_info_file(dentry)) + return -EPERM; + + return simple_unlink(dir, dentry); +} + +static const struct inode_operations msharefs_file_inode_ops =3D { + .setattr =3D simple_setattr, + .getattr =3D simple_getattr, +}; + +static const struct inode_operations msharefs_dir_inode_ops =3D { + .create =3D msharefs_create, + .lookup =3D simple_lookup, + .link =3D simple_link, + .unlink =3D msharefs_unlink, + .mkdir =3D msharefs_mkdir, + .rmdir =3D simple_rmdir, + .rename =3D msharefs_rename, +}; + static ssize_t mshare_info_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) @@ -105,7 +220,7 @@ msharefs_fill_super(struct super_block *sb, struct fs_c= ontext *fc) inode->i_ino =3D 1; inode->i_mode =3D S_IFDIR | 0777; simple_inode_init_ts(inode); - inode->i_op =3D &simple_dir_inode_operations; + inode->i_op =3D &msharefs_dir_inode_ops; inode->i_fop =3D &simple_dir_operations; set_nlink(inode, 2); =20 --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA0CE1EE009; Fri, 24 Jan 2025 23:56:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762967; cv=none; b=GaP/HU3RW/61ujcA+vv620VE17jYkVnQiybboY1u3Zb7y2UIWuUbzvdG29kqalKk+sk3XzsxpFw76ejgJ39yX3DBThXHNV8aIXCorIpS/DUcdWFLoYua8AGqjoacgus/HJQbMAN0xqKrswXD+YOrMlIYYikK7fU4f9K3dD8EO+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762967; c=relaxed/simple; bh=7T0+gS5pbd7zyOp8YcgTaGqBemKbGNrkHRa0N6K/1kk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tNl4OXXb2/J+8mDFXtI2BZuQa7CMHnToMujPXc/UhvR+EIrblcv1Rocw8LRYpxgGxgKMBqwL+JtRqHmd3d1eIdq+RqQePk4OHdnLOA09inMjsM2erzbb+8p750FVEHqiiJO8KPJAZl6eZ3Z1SJh0Alli5wL+ibvrJ2l5jHVIcdk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=QY/j1Qke; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="QY/j1Qke" Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIV8E3018170; Fri, 24 Jan 2025 23:55:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=NhoD2 UyohCxfveoHgDD+cR1/Lz5zOg29Zxa7SdPOnus=; b=QY/j1QkegXiviWjeo98nh F0qYOSLXLhPu8lHvxPBkZWcdoJTdL2kQQ5KKhw+r7FvhKfeT89mU/GBPMUJZNV3J PLsm+Z+tUfUcTvEMYs/I9TZQg/kCUageWG+NC9Ow86/c7xqZ6XfG7HIwmrRcphEF +HN9Liqi6GKNxKo2ChglsrDyGdD6jGWhUk28dqmv7mwWG6OZYzVbfs0c30TFBalR OpDG5IKFMnf11Dpt5fmN2XHOxGC1WWE47hJZcGx69mrCkjQIGSQGxoJ1wXQjKiWq zw7sr5SS7cLQITieF9uBlt92uCVh0tlWN+AiCW9hSbdu9Fwbafc5QECw6zdjIMir A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44b96am96g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:18 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50ON3eWg036495; Fri, 24 Jan 2025 23:55:17 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4a7h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:17 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPo018051; Fri, 24 Jan 2025 23:55:16 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-5; Fri, 24 Jan 2025 23:55:16 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 04/20] mm/mshare: allocate an mm_struct for msharefs files Date: Fri, 24 Jan 2025 15:54:38 -0800 Message-ID: <20250124235454.84587-5-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=829 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: Zg9k0kA44bw59zVgGS2FFhExNRlmmq9n X-Proofpoint-ORIG-GUID: Zg9k0kA44bw59zVgGS2FFhExNRlmmq9n Content-Type: text/plain; charset="utf-8" When a new file is created under msharefs, allocate a new mm_struct to be associated with it for the lifetime of the file. The mm_struct will hold the VMAs and pagetables for the mshare region the file represents. Signed-off-by: Khalid Aziz Signed-off-by: Anthony Yznaga --- mm/mshare.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/mm/mshare.c b/mm/mshare.c index b755346da827..060292fb6a00 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -19,6 +19,10 @@ =20 const unsigned long mshare_align =3D P4D_SIZE; =20 +struct mshare_data { + struct mm_struct *mm; +}; + static const struct inode_operations msharefs_dir_inode_ops; static const struct inode_operations msharefs_file_inode_ops; =20 @@ -26,11 +30,51 @@ static const struct file_operations msharefs_file_opera= tions =3D { .open =3D simple_open, }; =20 +static int +msharefs_fill_mm(struct inode *inode) +{ + struct mm_struct *mm; + struct mshare_data *m_data =3D NULL; + int ret =3D 0; + + mm =3D mm_alloc(); + if (!mm) { + ret =3D -ENOMEM; + goto err_free; + } + + mm->mmap_base =3D mm->task_size =3D 0; + + m_data =3D kzalloc(sizeof(*m_data), GFP_KERNEL); + if (!m_data) { + ret =3D -ENOMEM; + goto err_free; + } + m_data->mm =3D mm; + inode->i_private =3D m_data; + + return 0; + +err_free: + if (mm) + mmput(mm); + kfree(m_data); + return ret; +} + +static void +msharefs_delmm(struct mshare_data *m_data) +{ + mmput(m_data->mm); + kfree(m_data); +} + static struct inode *msharefs_get_inode(struct mnt_idmap *idmap, struct super_block *sb, const struct inode *dir, umode_t mode) { struct inode *inode =3D new_inode(sb); + int ret; =20 if (!inode) return ERR_PTR(-ENOMEM); @@ -43,6 +87,11 @@ static struct inode case S_IFREG: inode->i_op =3D &msharefs_file_inode_ops; inode->i_fop =3D &msharefs_file_operations; + ret =3D msharefs_fill_mm(inode); + if (ret) { + discard_new_inode(inode); + inode =3D ERR_PTR(ret); + } break; case S_IFDIR: inode->i_op =3D &msharefs_dir_inode_ops; @@ -142,6 +191,16 @@ static const struct inode_operations msharefs_dir_inod= e_ops =3D { .rename =3D msharefs_rename, }; =20 +static void +mshare_evict_inode(struct inode *inode) +{ + struct mshare_data *m_data =3D inode->i_private; + + if (m_data) + msharefs_delmm(m_data); + clear_inode(inode); +} + static ssize_t mshare_info_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) @@ -159,6 +218,7 @@ static const struct file_operations mshare_info_ops =3D= { =20 static const struct super_operations mshare_s_ops =3D { .statfs =3D simple_statfs, + .evict_inode =3D mshare_evict_inode, }; =20 static int --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E85521F541E; Fri, 24 Jan 2025 23:56:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762971; cv=none; b=cZRE6Q8WLr80VM87Dwr94k955TMgHP2Xs7oNDLJ06UBI8hegg9e2Zx/IDuTyh6udJdAhkbnFaF54eWmER6tSmIpRiiip5MaSNv6RzvyE/20L1kby9UomBY4tYI7e/7vogUOVJSEj5T6uEiYNUra4hAMfQ0fcK3oKF0xMaIlu4zY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762971; c=relaxed/simple; bh=217bX5CbLlem3zqXe/vk/U08gUEHdHM1T00iFoP3OzA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oFXDTBrVThtUP17s4dqpRAT7C7Rh6Ht1x5WneuoU2U4dENCVlrXOSIa+WKh/pl2HHxwORbrQu2WVPIUOVrvcQbv8SdyAuIV6UKUA067vSExNxMskNTj2xDXwMfRHmlu9Jb2CKYwwMaFROKtN7aPThUtihtwAO9E/gh6Muy0PjIk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=U17qv4kg; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="U17qv4kg" Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OINfWM031123; Fri, 24 Jan 2025 23:55:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=YR94n q9P1Qyhdr85jFXsDeNLWyF6Jz4qdsVA+7ycWG8=; b=U17qv4kgLrflB4RHAeog7 4wlpmeoJpaSoQ3N6swcZcCacC+QK6zUVbGVxCJmnF7CR/w80FJ7sR+odgKJ3UuIi OtXw3u5uE7nk55riixTIRmxB1uecSItgWbCILUpnr38OgT7yMveyfZgnrtM3Mwc5 R4hrlqSpm3uGwIGQujCqMhibGcC6BLSM7MvvyvSMWBxnn4qJF4zPCQNI5ZbLscNt +fjr0vMc5f5X+fkUYSZjrH2NoU3EDMtHPKSjXjV4COMwxKAV42siTVIk3coAlQZM 9Cr4ZMVkNtfJ+Z0mriz976U2A8GWiJ/ceDvqqwJGKc2kAhsj0MAfAxxUNcjwFLZZ g== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485nsmvmr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:22 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMmRjc036584; Fri, 24 Jan 2025 23:55:21 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4a92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:21 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPq018051; Fri, 24 Jan 2025 23:55:20 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-6; Fri, 24 Jan 2025 23:55:19 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 05/20] mm/mshare: Add ioctl support Date: Fri, 24 Jan 2025 15:54:39 -0800 Message-ID: <20250124235454.84587-6-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: Z20-vg57O9zfZiKygPd_gm_amo1x_1Ej X-Proofpoint-ORIG-GUID: Z20-vg57O9zfZiKygPd_gm_amo1x_1Ej Content-Type: text/plain; charset="utf-8" From: Khalid Aziz Reserve a range of ioctls for msharefs and add the first two ioctls to get and set the start address and size of an mshare region. Signed-off-by: Khalid Aziz Signed-off-by: Anthony Yznaga --- .../userspace-api/ioctl/ioctl-number.rst | 1 + include/uapi/linux/msharefs.h | 29 ++++++++ mm/mshare.c | 68 +++++++++++++++++++ 3 files changed, 98 insertions(+) create mode 100644 include/uapi/linux/msharefs.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documenta= tion/userspace-api/ioctl/ioctl-number.rst index 243f1f1b554a..aa22b5412e4d 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -303,6 +303,7 @@ Code Seq# Include File = Comments 'v' 20-27 arch/powerpc/include/uapi/asm/vas-api.h VAS API 'v' C0-FF linux/meye.h confl= ict! 'w' all CERN = SCI driver +'x' 00-1F linux/msharefs.h mshar= efs filesystem 'y' 00-1F packe= t based user level communications 'z' 00-3F CAN b= us card conflict! diff --git a/include/uapi/linux/msharefs.h b/include/uapi/linux/msharefs.h new file mode 100644 index 000000000000..c7b509c7e093 --- /dev/null +++ b/include/uapi/linux/msharefs.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * msharefs defines a memory region that is shared across processes. + * ioctl is used on files created under msharefs to set various + * attributes on these shared memory regions + * + * + * Copyright (C) 2024 Oracle Corp. All rights reserved. + * Author: Khalid Aziz + */ + +#ifndef _UAPI_LINUX_MSHAREFS_H +#define _UAPI_LINUX_MSHAREFS_H + +#include +#include + +/* + * msharefs specific ioctl commands + */ +#define MSHAREFS_GET_SIZE _IOR('x', 0, struct mshare_info) +#define MSHAREFS_SET_SIZE _IOW('x', 1, struct mshare_info) + +struct mshare_info { + __u64 start; + __u64 size; +}; + +#endif diff --git a/mm/mshare.c b/mm/mshare.c index 060292fb6a00..056cb5a82547 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -10,24 +10,91 @@ * * Copyright (C) 2024 Oracle Corp. All rights reserved. * Author: Khalid Aziz + * Author: Matthew Wilcox * */ =20 #include #include +#include #include +#include =20 const unsigned long mshare_align =3D P4D_SIZE; =20 struct mshare_data { struct mm_struct *mm; + spinlock_t m_lock; + struct mshare_info minfo; }; =20 +static long +msharefs_set_size(struct mm_struct *host_mm, struct mshare_data *m_data, + struct mshare_info *minfo) +{ + /* + * Validate alignment for start address and size + */ + if (!minfo->size || ((minfo->start | minfo->size) & (mshare_align - 1))) { + spin_unlock(&m_data->m_lock); + return -EINVAL; + } + + host_mm->mmap_base =3D minfo->start; + host_mm->task_size =3D minfo->size; + + m_data->minfo.start =3D host_mm->mmap_base; + m_data->minfo.size =3D host_mm->task_size; + spin_unlock(&m_data->m_lock); + + return 0; +} + +static long +msharefs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) +{ + struct mshare_data *m_data =3D filp->private_data; + struct mm_struct *host_mm =3D m_data->mm; + struct mshare_info minfo; + + switch (cmd) { + case MSHAREFS_GET_SIZE: + spin_lock(&m_data->m_lock); + minfo =3D m_data->minfo; + spin_unlock(&m_data->m_lock); + + if (copy_to_user((void __user *)arg, &minfo, sizeof(minfo))) + return -EFAULT; + + return 0; + + case MSHAREFS_SET_SIZE: + if (copy_from_user(&minfo, (struct mshare_info __user *)arg, + sizeof(minfo))) + return -EFAULT; + + /* + * If this mshare region has been set up once already, bail out + */ + spin_lock(&m_data->m_lock); + if (m_data->minfo.size !=3D 0) { + spin_unlock(&m_data->m_lock); + return -EINVAL; + } + + return msharefs_set_size(host_mm, m_data, &minfo); + + default: + return -ENOTTY; + } +} + static const struct inode_operations msharefs_dir_inode_ops; static const struct inode_operations msharefs_file_inode_ops; =20 static const struct file_operations msharefs_file_operations =3D { .open =3D simple_open, + .unlocked_ioctl =3D msharefs_ioctl, }; =20 static int @@ -51,6 +118,7 @@ msharefs_fill_mm(struct inode *inode) goto err_free; } m_data->mm =3D mm; + spin_lock_init(&m_data->m_lock); inode->i_private =3D m_data; =20 return 0; --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E84EC1F540D; Fri, 24 Jan 2025 23:56:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762971; cv=none; b=ulPOIwIdcsNPOOUFBzFTUTU/6l4O85eWJ1wkdmUbjc2QwZNWdFyp6RhXjDweduQdlAFXS4QcYO9zDOMnSMp84sNIWn7/RryHYTjT6lohp51eGODLIVTNkbtyufBkDutTSVUOH3QN81NQMHYg+219ZowGkaHu9gSkmRTrQNZgt0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762971; c=relaxed/simple; bh=KA0RXiiZYKNy+BTXtgfifYFrTPjnKGtN7rVQlC/9mqY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nG3xfyQ9Qhntaj3QO86nCYg146I21H8hNsc+2bBX4PLVXhjF6KZWIpgJVlJ+9cTt5FL7VQ4s9qrzGE8kggftBN40svbUTIVTwZ+98e/MbsUrv4XZWbffBPmC3S/XCd1LllZvD+TXbJq8q7GYgbRzqtk1wpJJDwJTjjtkoNlPtJI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=XLe18pnp; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="XLe18pnp" Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIeKEZ031124; Fri, 24 Jan 2025 23:55:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=Gmt7Q B3NcOmNhQjBszQbUI804S0rjvRagMuhEI7yvaU=; b=XLe18pnpG/WvE0w7WwOXY y5bIblHn+FdWnW185Ykktr4sL6Xalmuzdvdq/qHYMpRSdi04j7yf3G2gLCAOwqeB fG2s3UJ9KB1t74mh62AMLQTe2v8989/wdSHUdojO0jsNTy50uFy027OHPa8QotEv R6Ri1dWaU2E006hYgPKoVSgJf67h/+Jhr1rH5KxT103ZQfoSo0rl3bJ/zsRQI4vh atf7VHTBh+aTE1GNhGUW3jDHJC2/4vgvLBi6Yp3S5ZgtPNJowCSCOF/ZfQknDTFq jC414yxxNtO9fER2M5ALcVKQZveFIoMWPpoENC9GIW3eYktUka4GcR5EX0S0/qs2 Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485nsmvms-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:26 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMEpto036739; Fri, 24 Jan 2025 23:55:25 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4aav-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:25 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPs018051; Fri, 24 Jan 2025 23:55:24 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-7; Fri, 24 Jan 2025 23:55:24 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 06/20] mm/mshare: Add a vma flag to indicate an mshare region Date: Fri, 24 Jan 2025 15:54:40 -0800 Message-ID: <20250124235454.84587-7-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=724 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: fAnTHbbqevGjh-8GbPtHZ8_gFx9Uqmua X-Proofpoint-ORIG-GUID: fAnTHbbqevGjh-8GbPtHZ8_gFx9Uqmua Content-Type: text/plain; charset="utf-8" From: Khalid Aziz An mshare region contains zero or more actual vmas that map objects in the mshare range with shared page tables. Signed-off-by: Khalid Aziz Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Anthony Yznaga --- include/linux/mm.h | 19 +++++++++++++++++++ include/trace/events/mmflags.h | 7 +++++++ 2 files changed, 26 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8483e09aeb2c..bca7aee40f4d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -440,6 +440,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_DROPPABLE VM_NONE #endif =20 +#ifdef CONFIG_MSHARE +#define VM_MSHARE_BIT 41 +#define VM_MSHARE BIT(VM_MSHARE_BIT) +#else +#define VM_MSHARE VM_NONE +#endif + #ifdef CONFIG_64BIT /* VM is sealed, in vm_flags */ #define VM_SEALED _BITUL(63) @@ -1092,6 +1099,18 @@ static inline bool vma_is_anon_shmem(struct vm_area_= struct *vma) { return false; =20 int vma_is_stack_for_current(struct vm_area_struct *vma); =20 +#ifdef CONFIG_MSHARE +static inline bool vma_is_mshare(const struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_MSHARE; +} +#else +static inline bool vma_is_mshare(const struct vm_area_struct *vma) +{ + return false; +} +#endif + /* flush_tlb_range() takes a vma, not a mm, and can care about flags */ #define TLB_FLUSH_VMA(mm,flags) { .vm_mm =3D (mm), .vm_flags =3D (flags) } =20 diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 3bc8656c8359..0c7d50ab56cd 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -160,6 +160,12 @@ IF_HAVE_PG_ARCH_3(arch_3) # define IF_HAVE_VM_DROPPABLE(flag, name) #endif =20 +#ifdef CONFIG_MSHARE +# define IF_HAVE_VM_MSHARE(flag, name) {flag, name}, +#else +# define IF_HAVE_VM_MSHARE(flag, name) +#endif + #define __def_vmaflag_names \ {VM_READ, "read" }, \ {VM_WRITE, "write" }, \ @@ -193,6 +199,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY, "softdirty" ) \ {VM_HUGEPAGE, "hugepage" }, \ {VM_NOHUGEPAGE, "nohugepage" }, \ IF_HAVE_VM_DROPPABLE(VM_DROPPABLE, "droppable" ) \ +IF_HAVE_VM_MSHARE(VM_MSHARE, "mshare" ) \ {VM_MERGEABLE, "mergeable" } \ =20 #define show_vma_flags(flags) \ --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA06C1E9905; Fri, 24 Jan 2025 23:56:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762967; cv=none; b=ERwLQ49RU/YVEaW6ovzVacy+0JxlOdW1NFxHqXiUgJKtZwi798+Mbc55z41jpM2zW+HA8sHn09pGWqgwd6oNK9nMRfUwAT8P6l9zjoJEDyWqjCmIJ7XfwqleHB3qk1pbjHl8PKH1Y45J58SyU3HyTifVdb+5uuDsTfyEMGhNMSY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762967; c=relaxed/simple; bh=hkODf8pSBZwGyP464LA+YMAYvvWiQaPmursGJe64fc8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fZNoA1VWZCGnklDK8voG1OCVLXh7YST22SYVBf1S6nF8dxJ4PfGYC6oauBaRVAHL10PkNKZyRI3vYIbQRK08O1ZcyFbhCBQKUtBQNWiB/Rpd9Orbs9D4mQdJhj8e0bXo5C17lI8iwrp86BvyHhBkdWEoT/R+vB50VtZUSPqD39s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=nbz8o9rG; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="nbz8o9rG" Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIEkFN002187; Fri, 24 Jan 2025 23:55:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=WMU6L Wuuz0sPUG0ZZmDbq73rQ3vuBHP+1rFTOp/x2RM=; b=nbz8o9rGukImzGsDob9en 39eAIUc2+JsNnnyy1oLFl6BBl0xeOR1PcFdBkSlAA4rzc0+GhaIcJPfVnGLg/+JN Wfmcvc0WXR2fVzZRDhwPHkpPwWKxXt3mhdvCMlfd7qSDhFqmAd5qfiBojuf9+t2/ DnaUiWy3tMqp6/snf+Ooy2j3r6VSq8PQXW3LSAW9zYbTyxssN8BYDgy/WOBkWkeP Ys9AtYq4Nt3ZvDSXjD3H9k72fimLgFlPwjb5zTCe+C7nENNJvedZOE8MQ3qtKRwJ jaZdedCDLHtrPy3KxnVgcpiJ24btlclTgnD1e5hJxh1R4eHwhj+7FCVDEbEnwsn3 Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awufwwhg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:30 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMPchG036590; Fri, 24 Jan 2025 23:55:29 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4ac7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:29 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPu018051; Fri, 24 Jan 2025 23:55:28 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-8; Fri, 24 Jan 2025 23:55:27 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 07/20] mm/mshare: Add mmap support Date: Fri, 24 Jan 2025 15:54:41 -0800 Message-ID: <20250124235454.84587-8-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=765 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: kbiUWFzTnM-XfItFGqup7QCxCSWShv0i X-Proofpoint-GUID: kbiUWFzTnM-XfItFGqup7QCxCSWShv0i Content-Type: text/plain; charset="utf-8" From: Khalid Aziz Add support for mapping an mshare region into a process after the region has been established in msharefs. Disallow operations that could split the resulting msharefs vma such as partial unmaps and protection changes. Fault handling, mapping, unmapping, and protection changes for objects mapped into an mshare region will be done using the shared vmas created for them in the host mm. This functionality will be added in later patches. Signed-off-by: Khalid Aziz Signed-off-by: Anthony Yznaga --- mm/mshare.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/mm/mshare.c b/mm/mshare.c index 056cb5a82547..529a90fe1602 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -16,6 +16,7 @@ =20 #include #include +#include #include #include #include @@ -28,6 +29,74 @@ struct mshare_data { struct mshare_info minfo; }; =20 +static int mshare_vm_op_split(struct vm_area_struct *vma, unsigned long ad= dr) +{ + return -EINVAL; +} + +static int mshare_vm_op_mprotect(struct vm_area_struct *vma, unsigned long= start, + unsigned long end, unsigned long newflags) +{ + return -EINVAL; +} + +static const struct vm_operations_struct msharefs_vm_ops =3D { + .may_split =3D mshare_vm_op_split, + .mprotect =3D mshare_vm_op_mprotect, +}; + +/* + * msharefs_mmap() - mmap an mshare region + */ +static int +msharefs_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct mshare_data *m_data =3D file->private_data; + + vma->vm_private_data =3D m_data; + vm_flags_set(vma, VM_MSHARE | VM_DONTEXPAND); + vma->vm_ops =3D &msharefs_vm_ops; + + return 0; +} + +static unsigned long +msharefs_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, unsigned long flags) +{ + struct mshare_data *m_data =3D file->private_data; + struct mm_struct *mm =3D current->mm; + unsigned long mshare_start, mshare_size; + const unsigned long mmap_end =3D arch_get_mmap_end(addr, len, flags); + + mmap_assert_write_locked(mm); + + if ((flags & MAP_TYPE) =3D=3D MAP_PRIVATE) + return -EINVAL; + + spin_lock(&m_data->m_lock); + mshare_start =3D m_data->minfo.start; + mshare_size =3D m_data->minfo.size; + spin_unlock(&m_data->m_lock); + + if ((mshare_size =3D=3D 0) || (len !=3D mshare_size)) + return -EINVAL; + + if (len > mmap_end - mmap_min_addr) + return -ENOMEM; + + if (addr && (addr !=3D mshare_start)) + return -EINVAL; + + if (flags & MAP_FIXED) + return addr; + + if (find_vma_intersection(mm, mshare_start, mshare_start + mshare_size)) + return -EEXIST; + + return mshare_start; +} + static long msharefs_set_size(struct mm_struct *host_mm, struct mshare_data *m_data, struct mshare_info *minfo) @@ -94,6 +163,8 @@ static const struct inode_operations msharefs_file_inode= _ops; =20 static const struct file_operations msharefs_file_operations =3D { .open =3D simple_open, + .mmap =3D msharefs_mmap, + .get_unmapped_area =3D msharefs_get_unmapped_area, .unlocked_ioctl =3D msharefs_ioctl, }; =20 --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02F541FA161; Fri, 24 Jan 2025 23:56:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762987; cv=none; b=pieuTlw+my+Pd6ZXMLnYZZARxf9BsfHjrZh7yQ1KOpLdFaXl9hLEvZZJacSVIxr6RsHh0jCo+4zedNi/hIRoWcR97htX6vOwAEUVV8eJ4TiVLPEI1nm+3gN7O8AcHMZu41u8NvMBvQUGwiHWsCp8USnO8v0n50xP5dj90ggzpOY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762987; c=relaxed/simple; bh=3PW7iCBTwBLcxZKogfBgwoW3ehwvbpc/i/0LbqUN4HM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iSb7JRT3XMhdYqyehw9/Lyz1SKZHs02gFrxIgEuDVO9OKJ0lOsgh3s0fAyQ2Tb9xJs8hE8hIpxc4k2fOAcI9iqDGt//9o3D/Rnj2w7I+BaTgyDBM/QPfRmjnQrTDuDqlmqYit81j3Hq6mYcmGG8z0YzfQ+q+HpV2YXWpg2ziPq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=dkhH1ABN; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="dkhH1ABN" Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIHwxb022439; Fri, 24 Jan 2025 23:55:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=rU9rP udGLwBZN+ul8ATDIQZHL+5JLK1AK68Xl6t+yJk=; b=dkhH1ABNUHDgkMbcNjOVe HLOlO3zpfZ6/zcXgH1MWSMApJp53Zwemc5AMBv8VgZ8H72pMejt0H3iho7yOmvzF H9xMiQUdAlws7OQLQt3lEzupvIdFAz9lcZVHNY52vnuTO1jN4+4NOIRkaG0aXtrA L01uc+qtH7o9rswF60olv2Vbr/OTaxT5KAyuoPLHYUCrd1ijEpIlZlcF0e0FXYcT iEBP96177UVbNmeouIy0f2vKY7Z3tP7ni1ruGvCXQ6tLrF4NkSSbvbYlOZXQNQuo FddCVtbP+byHiQ2Hg2u0Y9K411peP9nCUuiZ90jxeGRg+PLZXMcNjZr1ChereVuS g== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485qm4y4a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:33 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMEElh036463; Fri, 24 Jan 2025 23:55:32 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4acw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:32 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxPw018051; Fri, 24 Jan 2025 23:55:31 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-9; Fri, 24 Jan 2025 23:55:31 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 08/20] mm/mshare: flush all TLBs when updating PTEs in an mshare range Date: Fri, 24 Jan 2025 15:54:42 -0800 Message-ID: <20250124235454.84587-9-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: G0_gkNzHGYYH-z86dDhJuuCCuwBORxu- X-Proofpoint-ORIG-GUID: G0_gkNzHGYYH-z86dDhJuuCCuwBORxu- Content-Type: text/plain; charset="utf-8" Unlike the mm of a task, an mshare host mm is not updated on context switch. In particular this means that mm_cpumask is never updated which results in TLB flushes for updates to mshare PTEs only being done on the local CPU. To ensure entries are flushed for non-local TLBs, set up an mmu notifier on the mshare mm and use the .arch_invalidate_secondary_tlbs callback to flush all TLBs. arch_invalidate_secondary_tlbs guarantees that TLB entries will be flushed before pages are freed when unmapping pages in an mshare region. Signed-off-by: Anthony Yznaga --- mm/mshare.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/mshare.c b/mm/mshare.c index 529a90fe1602..8dca4199dd01 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -17,9 +17,11 @@ #include #include #include +#include #include #include #include +#include =20 const unsigned long mshare_align =3D P4D_SIZE; =20 @@ -27,6 +29,17 @@ struct mshare_data { struct mm_struct *mm; spinlock_t m_lock; struct mshare_info minfo; + struct mmu_notifier mn; +}; + +static void mshare_invalidate_tlbs(struct mmu_notifier *mn, struct mm_stru= ct *mm, + unsigned long start, unsigned long end) +{ + flush_tlb_all(); +} + +static const struct mmu_notifier_ops mshare_mmu_ops =3D { + .arch_invalidate_secondary_tlbs =3D mshare_invalidate_tlbs, }; =20 static int mshare_vm_op_split(struct vm_area_struct *vma, unsigned long ad= dr) @@ -191,6 +204,10 @@ msharefs_fill_mm(struct inode *inode) m_data->mm =3D mm; spin_lock_init(&m_data->m_lock); inode->i_private =3D m_data; + m_data->mn.ops =3D &mshare_mmu_ops; + ret =3D mmu_notifier_register(&m_data->mn, mm); + if (ret) + goto err_free; =20 return 0; =20 --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5933B1F151A; Fri, 24 Jan 2025 23:56:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762970; cv=none; b=DsNsX57RJXjHqHn2blEvrRlNX7hcm5Q+eu1NemGshnxrdQYm6m+Nx+5YD/5IEoGRujOqUl7HWvHjIbRX6KeRW90EsgUeqs6Ld+B2r/1JNdV+7yn5Ls3yy6XOvOOaCy0Ah2PXr9xHcfmFfzAb8D+EaN9v8SEIPNVGSaG5TaBSk2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762970; c=relaxed/simple; bh=EXQ9/MByyHkI899jmdslLv1+kSCXuduI6Ees0Ya4Lok=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QQphH4uNmSBJuQbcdh8roiHupDBA3Eyjew3y5jdJPnkHeq0CgdozccmosmKkE5Wk+rwtLX2MRj4YJJ2wehJ/ALuPleQstN7bMr3Y/E+sT5BalZBwQ45Hju9XQzgn6nHQc9Fved/oSJF6ujROodL7EQO0fvTlwpwA1Usg88tKsxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=ein04UV3; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="ein04UV3" Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIGFs8002260; Fri, 24 Jan 2025 23:55:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=ZcO5N eJEjDQdCDY/fuqAGnADUV80X4TyMt4r7D6ggW4=; b=ein04UV3awDrUwjxe3S2F MV9y3gYFGT9/MmyXGDXQb/BrB4uwuWGc4p+o69qDmO4VbJa80nrnfelxKQ0j20S6 DLl0AF/0hXWyLfa/sqN6uW5ekxmZ5JVfq3CNVgsTyu7UnodQaM8GBZR3YRcxOQoe BKUSS40IOOJhkFcsGTpMTXRMr9tc9v1iEpBT2RNUqT1hEbiEX5dF9CXP35nIF053 hgACxW5JUNXzK+NZd/ubylr/8d2N+Ra8/1T0WMsz81NyTR+P1len+ibflq/QY0Lf R2oXk8MelD8F9VONynGg7P60sbsQv9iplPAtzA5S2se/DYERWIE6fN0vKy0dvWDq g== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awufwwhj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:36 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OLJ9Zl036493; Fri, 24 Jan 2025 23:55:35 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4adr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:35 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQ0018051; Fri, 24 Jan 2025 23:55:34 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-10; Fri, 24 Jan 2025 23:55:34 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 09/20] sched/numa: do not scan msharefs vmas Date: Fri, 24 Jan 2025 15:54:43 -0800 Message-ID: <20250124235454.84587-10-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=932 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: RLJBBpgx7vozRd6-hI4ZYSvtQXLVXi04 X-Proofpoint-GUID: RLJBBpgx7vozRd6-hI4ZYSvtQXLVXi04 Content-Type: text/plain; charset="utf-8" Scanning an msharefs vma results changes to the shared page table but with TLB flushes only going to the process with the vma. Signed-off-by: Anthony Yznaga --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3e9ca38512de..e9aa1e35f40e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3374,7 +3374,8 @@ static void task_numa_work(struct callback_head *work) =20 for (; vma; vma =3D vma_next(&vmi)) { if (!vma_migratable(vma) || !vma_policy_mof(vma) || - is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { + is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP) || + vma_is_mshare(vma)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); continue; } --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEFEC1EE7AC; Fri, 24 Jan 2025 23:56:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762968; cv=none; b=rHwXqzEVCA25bInQ2BIz4Hk/L7UWU/gdL+A29RzdWDrQ3w7QHh//VwFLC91ryX5f5nN9Gy5WYRZY8vt/OUj0rF1r3FiDIlFsAylU4js0EpoP0sRvYfzoTvH1SBjfZE6vJ1WcH7Ov5C2erHofJcrqNNwf3ykH1fLuSuLUDhKvqvc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762968; c=relaxed/simple; bh=5T7XZddVEIPtBMu8+bgxjFq296KeaC/MWrT9OSVvO4s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oRyjs3WK5jjLHLYQRiUhj8SEQvQmwoXbzjjSQTiMVXv8uNCrwnwFISK2lFY+zpr8pWyRnzNFrbunfrvI2ngaXvAykMHlrGzB8k82t93I11jkCIEDOCI8V8u7m0N+O8Va4ntpy/FOu5rf1zsSLcT92LKQVHqSPp/f0kdJlsGfO0E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=UXJrjoNs; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="UXJrjoNs" Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIvxBh022712; Fri, 24 Jan 2025 23:55:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=/eSgh HqoT5c0J4YzgWIiRZ+XGq6p2xYWtpvcjE+ANB4=; b=UXJrjoNsfNb4WKRHZ5xkw OEpw+EPE0TRIcETStVh274mVMU/ko8F9hL67gSxzr76IzNOQKPesaZEDIek7KNQo lTIOIiv8DRhEIC8wock9il8CVWN62KiAPGD112LB69JF6EcmZpHbQQyF4QZ7sbt1 rn3+NWx5FKLfy01BnPhLbUWAxq+AEdhO/sy850W6MLIBaCiVRDw2UNyvhIAR4ofG xbShQ0LMrZ4goJ6FKuRIenPA1Cszgvlb0tYwJtkOIhLiGfy5IkaK5OkK7mVGAx66 2lHmeqb8SHErOXUSAJy4e7cWQmBAGpn2IiZyZ7FthV6IG7ozjw+QWPoUmFjH8KS5 Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485qm4y4e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:40 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OLP7er036648; Fri, 24 Jan 2025 23:55:39 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4aer-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:39 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQ2018051; Fri, 24 Jan 2025 23:55:38 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-11; Fri, 24 Jan 2025 23:55:38 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 10/20] mm: add mmap_read_lock_killable_nested() Date: Fri, 24 Jan 2025 15:54:44 -0800 Message-ID: <20250124235454.84587-11-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=825 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: VWyly9zRQiKlVwOlv3-7fatJDdYAY01l X-Proofpoint-ORIG-GUID: VWyly9zRQiKlVwOlv3-7fatJDdYAY01l Content-Type: text/plain; charset="utf-8" This will be used to support mshare functionality where the read lock on an mshare host mm is taken while holding the lock on a process mm. Signed-off-by: Anthony Yznaga --- include/linux/mmap_lock.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 45a21faa3ff6..4671b4435d2a 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -191,6 +191,13 @@ static inline void mmap_read_lock(struct mm_struct *mm) __mmap_lock_trace_acquire_returned(mm, false, true); } =20 +static inline void mmap_read_lock_nested(struct mm_struct *mm, int subclas= s) +{ + __mmap_lock_trace_start_locking(mm, false); + down_read_nested(&mm->mmap_lock, subclass); + __mmap_lock_trace_acquire_returned(mm, false, true); +} + static inline int mmap_read_lock_killable(struct mm_struct *mm) { int ret; --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 985E51F12EC; Fri, 24 Jan 2025 23:56:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762984; cv=none; b=uzhXgNtXjapkCDps0u2+wjpYTYNu23jY/MP8h/s4u4JzAJrPYZVm3lvFt7Ydp3sdtoNJvcC/Kt4VJ9/qP1ldUCZegTC8FLshaJXRIfy9Ex76yfXAKot3CAAmeea3SF3f63r6RecQcfKJsSDkqaN7GrmboRnyJEC3EFbHXenfan8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762984; c=relaxed/simple; bh=6xQbFeFQ9z80aCp3di2KRUyJ0K7BqfnERU/9u4Qz0q0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QLr2MbzK/edMm1VuGMBANVQ4+K4rPfCaui94SDTHyiUgE4suzrN5ocqrGrGxfleSL95W2tAJm7vk/3nn7khhVUghWCoerpqLTGBiDwSrExmUBDVsCQrHeKjn2Z80a43ywe9Z1qNFX0MFHNVhhb5hOcrA5+O3mgxwQw7H7ALd0XM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=QeFtJJdY; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="QeFtJJdY" Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIExIp022002; Fri, 24 Jan 2025 23:55:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=OSIj2 mJijhCNko8dcXyJfuTUdjY7NtJAOi00OHXcvao=; b=QeFtJJdYB0wtvXXNjl/QM MPDJrbyYUVLf9oTAT9Q1GPOvNcwMRB/wQjVpDMKhajG22HNnFiLBqaUY4k9lk+Ch p+HSZLNHXxr6eUHaE1S7kk8q9oFnK7c0bR3w9v/pxvPddRrPLiJ8EFvdkZ+MksWF kkNtUZMettwjyChukyChxUmO4z5Z6wpnm0a7epnj4d56SjuEL1ws3QX/AiXktMNC kWNWqqyzD5/dcni/8xsWCkA+gvfBNoHQTEyIJWLR4DRHSIgRunNbVBi2Zx0tqZ+x ony2WUw8mZyEiA/55WfKntKeHX9zSZgPpASTVJgawqgXcW31dZtBBWMWuOFY2Ra+ A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awpx5vpy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:43 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OM0FXi036437; Fri, 24 Jan 2025 23:55:43 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4af9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:43 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQ4018051; Fri, 24 Jan 2025 23:55:42 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-12; Fri, 24 Jan 2025 23:55:42 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 11/20] mm: add and use unmap_page_range vm_ops hook Date: Fri, 24 Jan 2025 15:54:45 -0800 Message-ID: <20250124235454.84587-12-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: MwcrHgPHqQLpE3U3E5SYpIbs-4Wi_R5L X-Proofpoint-GUID: MwcrHgPHqQLpE3U3E5SYpIbs-4Wi_R5L Content-Type: text/plain; charset="utf-8" Special handling is needed when unmapping a hugetlb vma and will be needed when unmapping an msharefs vma once support is added for handling faults in an mshare region. Signed-off-by: Anthony Yznaga --- include/linux/mm.h | 10 ++++++++++ ipc/shm.c | 17 +++++++++++++++++ mm/hugetlb.c | 25 +++++++++++++++++++++++++ mm/memory.c | 36 +++++++++++++----------------------- 4 files changed, 65 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bca7aee40f4d..1314af11596d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -39,6 +39,7 @@ struct anon_vma_chain; struct user_struct; struct pt_regs; struct folio_batch; +struct zap_details; =20 extern int sysctl_page_lock_unfairness; =20 @@ -687,8 +688,17 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); + void (*unmap_page_range)(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + struct zap_details *details); }; =20 +void __unmap_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + struct zap_details *details); + #ifdef CONFIG_NUMA_BALANCING static inline void vma_numab_state_init(struct vm_area_struct *vma) { diff --git a/ipc/shm.c b/ipc/shm.c index 99564c870084..cadd551e60b9 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -585,6 +585,22 @@ static struct mempolicy *shm_get_policy(struct vm_area= _struct *vma, } #endif =20 +static void shm_unmap_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + struct zap_details *details) +{ + struct file *file =3D vma->vm_file; + struct shm_file_data *sfd =3D shm_file_data(file); + + if (sfd->vm_ops->unmap_page_range) { + sfd->vm_ops->unmap_page_range(tlb, vma, addr, end, details); + return; + } + + __unmap_page_range(tlb, vma, addr, end, details); +} + static int shm_mmap(struct file *file, struct vm_area_struct *vma) { struct shm_file_data *sfd =3D shm_file_data(file); @@ -685,6 +701,7 @@ static const struct vm_operations_struct shm_vm_ops =3D= { .set_policy =3D shm_set_policy, .get_policy =3D shm_get_policy, #endif + .unmap_page_range =3D shm_unmap_page_range, }; =20 /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 87761b042ed0..ac3ef62a3dc4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5147,6 +5147,30 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_faul= t *vmf) return 0; } =20 +static void hugetlb_vm_op_unmap_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + struct zap_details *details) +{ + zap_flags_t zap_flags =3D details ? details->zap_flags : 0; + + /* + * It is undesirable to test vma->vm_file as it + * should be non-null for valid hugetlb area. + * However, vm_file will be NULL in the error + * cleanup path of mmap_region. When + * hugetlbfs ->mmap method fails, + * mmap_region() nullifies vma->vm_file + * before calling this function to clean up. + * Since no pte has actually been setup, it is + * safe to do nothing in this case. + */ + if (!vma->vm_file) + return; + + __unmap_hugepage_range(tlb, vma, addr, end, NULL, zap_flags); +} + /* * When a new function is introduced to vm_operations_struct and added * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops. @@ -5160,6 +5184,7 @@ const struct vm_operations_struct hugetlb_vm_ops =3D { .close =3D hugetlb_vm_op_close, .may_split =3D hugetlb_vm_op_split, .pagesize =3D hugetlb_vm_op_pagesize, + .unmap_page_range =3D hugetlb_vm_op_unmap_page_range, }; =20 static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, diff --git a/mm/memory.c b/mm/memory.c index 2a20e3810534..20bafbb10ea7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1875,7 +1875,7 @@ static inline unsigned long zap_p4d_range(struct mmu_= gather *tlb, return addr; } =20 -void unmap_page_range(struct mmu_gather *tlb, +void __unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end, struct zap_details *details) @@ -1895,6 +1895,16 @@ void unmap_page_range(struct mmu_gather *tlb, tlb_end_vma(tlb, vma); } =20 +void unmap_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + struct zap_details *details) +{ + if (vma->vm_ops && vma->vm_ops->unmap_page_range) + vma->vm_ops->unmap_page_range(tlb, vma, addr, end, details); + else + __unmap_page_range(tlb, vma, addr, end, details); +} =20 static void unmap_single_vma(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start_addr, @@ -1916,28 +1926,8 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (unlikely(vma->vm_flags & VM_PFNMAP)) untrack_pfn(vma, 0, 0, mm_wr_locked); =20 - if (start !=3D end) { - if (unlikely(is_vm_hugetlb_page(vma))) { - /* - * It is undesirable to test vma->vm_file as it - * should be non-null for valid hugetlb area. - * However, vm_file will be NULL in the error - * cleanup path of mmap_region. When - * hugetlbfs ->mmap method fails, - * mmap_region() nullifies vma->vm_file - * before calling this function to clean up. - * Since no pte has actually been setup, it is - * safe to do nothing in this case. - */ - if (vma->vm_file) { - zap_flags_t zap_flags =3D details ? - details->zap_flags : 0; - __unmap_hugepage_range(tlb, vma, start, end, - NULL, zap_flags); - } - } else - unmap_page_range(tlb, vma, start, end, details); - } + if (start !=3D end) + unmap_page_range(tlb, vma, start, end, details); } =20 /** --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 332F11EEA2C; Fri, 24 Jan 2025 23:56:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762980; cv=none; b=VjRNvnUDG9AD6y/Yp+1WcuY780rfohBHP4+3qY5Li0jbysL2ZtPdeLnLUbJUXGjeJgtM5fu7hZc0FqzFf2NFM7EK7tKKXtHJBroUQ+5KIoHgm4Rndna4UodeJ7lpFJDf+N91qKpTOl9GXxwSOx8Wbgz+aDx5IyN/8HVdcSMaYSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762980; c=relaxed/simple; bh=lzvBO4yX3D/dD9KhQ92TR352Jq+289sMGHtFZgSsg8A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jiOrisofpYsoqJQ+0eLR7iB3vE3zVO+TcNTTibQb53Ir0r6Bkak9Ybekcec2X5FrzEXn9YrqbQFJ5YfCYyTkkWP6/G9Ut48b8SEt5HazbGLz/1E9ge1mFf1elI3eyo5kOga1r1cidoc0UakP4ThQ2eIKChU5Q+iTQimaob7mA2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=Mm1LIh12; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Mm1LIh12" Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIbuNm019103; Fri, 24 Jan 2025 23:55:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=brN5Y FFSbK9qb6sU8kyqnzGs0ZV4dIdkz0Y2OGeQQDI=; b=Mm1LIh125wbmWKLr3jwj4 R7menthV7n7TpEZr/0aypVito0pT68YHzWA2WWLbgm9qulCEOARfgF22jhK4ghzu JuFx9Tq5FDeJqPnFnV+tQfW55CPAHFNEg5nuyZD0XTE2kdsyH4gNFjFfdHuOH9ze 7sYTwIl4yAYK1WjO1snF2PKEEohEso8ZJ2pTjai5F6/SXKLeGDoRNdftR0fnjKV5 iuzXnAnoHPloNzwNDPGr1PwAmV+vfmrRujHz5+hIW2/YF2IwYvAYEi2v6xh5aP3e NHyYjMFdt6qYoyOw47Z5bbiE65oHmOmMD1L48ApCP5PIAkliRqBYxIaESjoAgv/q A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485qaw404-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:47 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OLoFNG036500; Fri, 24 Jan 2025 23:55:47 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4ag8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:46 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQ6018051; Fri, 24 Jan 2025 23:55:45 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-13; Fri, 24 Jan 2025 23:55:45 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 12/20] mm/mshare: prepare for page table sharing support Date: Fri, 24 Jan 2025 15:54:46 -0800 Message-ID: <20250124235454.84587-13-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: np-h1M02s4XSflXejt2j3zPrBIjWPfJo X-Proofpoint-ORIG-GUID: np-h1M02s4XSflXejt2j3zPrBIjWPfJo Content-Type: text/plain; charset="utf-8" From: Khalid Aziz In preparation for enabling the handling of page faults in an mshare region provide a way to link an mshare shared page table to a process page table and otherwise find the actual vma in order to handle a page fault. Modify the unmap path to ensure that page tables in mshare regions are unlinked and kept intact when a process exits or an mshare region is explicitly unmapped. Signed-off-by: Khalid Aziz Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Anthony Yznaga --- include/linux/mm.h | 6 +++++ mm/memory.c | 38 ++++++++++++++++++++++------ mm/mshare.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 98 insertions(+), 8 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1314af11596d..9889c4757f45 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1110,11 +1110,17 @@ static inline bool vma_is_anon_shmem(struct vm_area= _struct *vma) { return false; int vma_is_stack_for_current(struct vm_area_struct *vma); =20 #ifdef CONFIG_MSHARE +vm_fault_t find_shared_vma(struct vm_area_struct **vma, unsigned long *add= rp); static inline bool vma_is_mshare(const struct vm_area_struct *vma) { return vma->vm_flags & VM_MSHARE; } #else +static inline vm_fault_t find_shared_vma(struct vm_area_struct **vma, unsi= gned long *addrp) +{ + WARN_ON_ONCE(1); + return VM_FAULT_SIGBUS; +} static inline bool vma_is_mshare(const struct vm_area_struct *vma) { return false; diff --git a/mm/memory.c b/mm/memory.c index 20bafbb10ea7..9374bb184a5f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -263,7 +263,8 @@ static inline void free_pud_range(struct mmu_gather *tl= b, p4d_t *p4d, =20 static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, unsigned long addr, unsigned long end, - unsigned long floor, unsigned long ceiling) + unsigned long floor, unsigned long ceiling, + bool shared_pud) { p4d_t *p4d; unsigned long next; @@ -275,7 +276,10 @@ static inline void free_p4d_range(struct mmu_gather *t= lb, pgd_t *pgd, next =3D p4d_addr_end(addr, end); if (p4d_none_or_clear_bad(p4d)) continue; - free_pud_range(tlb, p4d, addr, next, floor, ceiling); + if (unlikely(shared_pud)) + p4d_clear(p4d); + else + free_pud_range(tlb, p4d, addr, next, floor, ceiling); } while (p4d++, addr =3D next, addr !=3D end); =20 start &=3D PGDIR_MASK; @@ -297,9 +301,10 @@ static inline void free_p4d_range(struct mmu_gather *t= lb, pgd_t *pgd, /* * This function frees user-level page tables of a process. */ -void free_pgd_range(struct mmu_gather *tlb, +static void __free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, - unsigned long floor, unsigned long ceiling) + unsigned long floor, unsigned long ceiling, + bool shared_pud) { pgd_t *pgd; unsigned long next; @@ -355,10 +360,17 @@ void free_pgd_range(struct mmu_gather *tlb, next =3D pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(pgd)) continue; - free_p4d_range(tlb, pgd, addr, next, floor, ceiling); + free_p4d_range(tlb, pgd, addr, next, floor, ceiling, shared_pud); } while (pgd++, addr =3D next, addr !=3D end); } =20 +void free_pgd_range(struct mmu_gather *tlb, + unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling) +{ + __free_pgd_range(tlb, addr, end, floor, ceiling, false); +} + void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, struct vm_area_struct *vma, unsigned long floor, unsigned long ceiling, bool mm_wr_locked) @@ -395,9 +407,12 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_s= tate *mas, =20 /* * Optimization: gather nearby vmas into one call down + * + * Do not free the shared page tables of an mshare region. */ while (next && next->vm_start <=3D vma->vm_end + PMD_SIZE - && !is_vm_hugetlb_page(next)) { + && !is_vm_hugetlb_page(next) + && !vma_is_mshare(next)) { vma =3D next; next =3D mas_find(mas, ceiling - 1); if (unlikely(xa_is_zero(next))) @@ -408,9 +423,11 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_s= tate *mas, unlink_file_vma_batch_add(&vb, vma); } unlink_file_vma_batch_final(&vb); - free_pgd_range(tlb, addr, vma->vm_end, - floor, next ? next->vm_start : ceiling); + __free_pgd_range(tlb, addr, vma->vm_end, + floor, next ? next->vm_start : ceiling, + vma_is_mshare(vma)); } + vma =3D next; } while (vma); } @@ -6148,6 +6165,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vm= a, unsigned long address, if (ret) goto out; =20 + if (unlikely(vma_is_mshare(vma))) { + WARN_ON_ONCE(1); + return VM_FAULT_SIGBUS; + } + if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, flags & FAULT_FLAG_INSTRUCTION, flags & FAULT_FLAG_REMOTE)) { diff --git a/mm/mshare.c b/mm/mshare.c index 8dca4199dd01..9ada1544aeb1 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -42,6 +42,56 @@ static const struct mmu_notifier_ops mshare_mmu_ops =3D { .arch_invalidate_secondary_tlbs =3D mshare_invalidate_tlbs, }; =20 +static p4d_t *walk_to_p4d(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + p4d_t *p4d; + + pgd =3D pgd_offset(mm, addr); + p4d =3D p4d_alloc(mm, pgd, addr); + if (!p4d) + return NULL; + + return p4d; +} + +/* Returns holding the host mm's lock for read. Caller must release. */ +vm_fault_t +find_shared_vma(struct vm_area_struct **vmap, unsigned long *addrp) +{ + struct vm_area_struct *vma, *guest =3D *vmap; + struct mshare_data *m_data =3D guest->vm_private_data; + struct mm_struct *host_mm =3D m_data->mm; + unsigned long host_addr; + p4d_t *p4d, *guest_p4d; + + mmap_read_lock_nested(host_mm, SINGLE_DEPTH_NESTING); + host_addr =3D *addrp - guest->vm_start + host_mm->mmap_base; + p4d =3D walk_to_p4d(host_mm, host_addr); + guest_p4d =3D walk_to_p4d(guest->vm_mm, *addrp); + if (!p4d_same(*guest_p4d, *p4d)) { + set_p4d(guest_p4d, *p4d); + mmap_read_unlock(host_mm); + return VM_FAULT_NOPAGE; + } + + *addrp =3D host_addr; + vma =3D find_vma(host_mm, host_addr); + + /* XXX: expand stack? */ + if (vma && vma->vm_start > host_addr) + vma =3D NULL; + + *vmap =3D vma; + + /* + * release host mm lock unless a matching vma is found + */ + if (!vma) + mmap_read_unlock(host_mm); + return 0; +} + static int mshare_vm_op_split(struct vm_area_struct *vma, unsigned long ad= dr) { return -EINVAL; @@ -53,9 +103,21 @@ static int mshare_vm_op_mprotect(struct vm_area_struct = *vma, unsigned long start return -EINVAL; } =20 +static void mshare_vm_op_unmap_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end, + struct zap_details *details) +{ + /* + * The msharefs vma is being unmapped. Do not unmap pages in the + * mshare region itself. + */ +} + static const struct vm_operations_struct msharefs_vm_ops =3D { .may_split =3D mshare_vm_op_split, .mprotect =3D mshare_vm_op_mprotect, + .unmap_page_range =3D mshare_vm_op_unmap_page_range, }; =20 /* --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68A8D1F9F52; Fri, 24 Jan 2025 23:56:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762985; cv=none; b=g8zNkWAOO7yusgIJ1kQlqToxcmFu+oxbBaY+Ig5HlAx4496tt6uHuoEpXlyBilCKfbFT5FA0QiyhD2ldJ+M/fgvJk1ipH085Hs1O/pyLEhFylOgjIhkPeEy0NVJR8H9R6NH9ViuMleR47p5xorMiKY6ygrVjN+AZDoTz/0Yl84s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762985; c=relaxed/simple; bh=xCdkXXItEa2FHdDdBHgXXWt/ZfsvCW5Qe+7mnLT88Cs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dNGHjYE6K/TJBdqKBuVvKMf9B3alDmRuPCXdrlE/lqxb9vnR6RaAEVXXr0R959VAqstP0Vdzk4zoQutbiidvlS/KVodJ46T4WQPnpkyUptRrlZq8mQXQkaH4AtJvixnOBtGfPVyyiSAL6ezIgIIymMMY+L8y16COTSWBm+u4M1g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=XeEkIkrW; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="XeEkIkrW" Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIeKEb031124; Fri, 24 Jan 2025 23:55:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=nCdkv 5pMD/BW8EJZHRB98QYPFY1ik8038tE9HN9GQDQ=; b=XeEkIkrWDwXm5HtebbhBD p3lABo0GMrvK4ZvGeh56K+Ji9vVwuOJ6CfP9WQgDNnIg/fylBvrEnda76gj1hlaH djlWZy7JPMD3YWvJ07uS/u2FQf/WNtgihIgs+ahOJw7m476tTNkDwMXzkkvgVlMb WaAhY1K9BmKIjf9cBYWQCBii9KnDOWEQ86r5Jgcb+DoKoPYGmvQzlxO/2s1/HjSE 2X8wi2WkZkemqVoSzjBSlGdAQCPFca5ePjZctgTNBHixFaMq6YMIwXCF0STwxFVX 1V8aXMFIhRqshW3vli+id7PjxsJ0s6TGm5KFe5oka0l703yelWNl+uHD03Bpstfy Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485nsmvn0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:50 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMD09a036431; Fri, 24 Jan 2025 23:55:50 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4ahx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:49 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQ8018051; Fri, 24 Jan 2025 23:55:48 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-14; Fri, 24 Jan 2025 23:55:48 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 13/20] x86/mm: enable page table sharing Date: Fri, 24 Jan 2025 15:54:47 -0800 Message-ID: <20250124235454.84587-14-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: m7wXHyW69pLXZ8gl0E-xcyVHCx0Mq1Za X-Proofpoint-ORIG-GUID: m7wXHyW69pLXZ8gl0E-xcyVHCx0Mq1Za Content-Type: text/plain; charset="utf-8" Enable x86 support for handling page faults in an mshare region by redirecting page faults to operate on the mshare mm_struct and vmas contained in it. Some permissions checks are done using vma flags in architecture-specfic fault handling code so the actual vma needed to complete the handling is acquired before calling handle_mm_fault(). Because of this an ARCH_SUPPORTS_MSHARE config option is added. Signed-off-by: Anthony Yznaga --- arch/Kconfig | 3 +++ arch/x86/Kconfig | 1 + arch/x86/mm/fault.c | 37 ++++++++++++++++++++++++++++++++++++- mm/Kconfig | 2 +- 4 files changed, 41 insertions(+), 2 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 6682b2a53e34..32474cdcb882 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1640,6 +1640,9 @@ config HAVE_ARCH_PFN_VALID config ARCH_SUPPORTS_DEBUG_PAGEALLOC bool =20 +config ARCH_SUPPORTS_MSHARE + bool + config ARCH_SUPPORTS_PAGE_TABLE_CHECK bool =20 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2e1a3e4386de..453a39098dfa 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -120,6 +120,7 @@ config X86 select ARCH_SUPPORTS_ACPI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_DEBUG_PAGEALLOC + select ARCH_SUPPORTS_MSHARE if X86_64 select ARCH_SUPPORTS_PAGE_TABLE_CHECK if X86_64 select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <=3D 4096 diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e6c469b323cc..4b55ade61a01 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1217,6 +1217,8 @@ void do_user_addr_fault(struct pt_regs *regs, struct mm_struct *mm; vm_fault_t fault; unsigned int flags =3D FAULT_FLAG_DEFAULT; + bool is_shared_vma; + unsigned long addr; =20 tsk =3D current; mm =3D tsk->mm; @@ -1330,6 +1332,12 @@ void do_user_addr_fault(struct pt_regs *regs, if (!vma) goto lock_mmap; =20 + /* mshare does not support per-VMA locks yet */ + if (vma_is_mshare(vma)) { + vma_end_read(vma); + goto lock_mmap; + } + if (unlikely(access_error(error_code, vma))) { bad_area_access_error(regs, error_code, address, NULL, vma); count_vm_vma_lock_event(VMA_LOCK_SUCCESS); @@ -1358,17 +1366,38 @@ void do_user_addr_fault(struct pt_regs *regs, lock_mmap: =20 retry: + addr =3D address; + is_shared_vma =3D false; vma =3D lock_mm_and_find_vma(mm, address, regs); if (unlikely(!vma)) { bad_area_nosemaphore(regs, error_code, address); return; } =20 + if (unlikely(vma_is_mshare(vma))) { + fault =3D find_shared_vma(&vma, &addr); + + if (fault) { + mmap_read_unlock(mm); + goto done; + } + + if (!vma) { + mmap_read_unlock(mm); + bad_area_nosemaphore(regs, error_code, address); + return; + } + + is_shared_vma =3D true; + } + /* * Ok, we have a good vm_area for this memory access, so * we can handle it.. */ if (unlikely(access_error(error_code, vma))) { + if (unlikely(is_shared_vma)) + mmap_read_unlock(vma->vm_mm); bad_area_access_error(regs, error_code, address, mm, vma); return; } @@ -1386,7 +1415,11 @@ void do_user_addr_fault(struct pt_regs *regs, * userland). The return to userland is identified whenever * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags. */ - fault =3D handle_mm_fault(vma, address, flags, regs); + fault =3D handle_mm_fault(vma, addr, flags, regs); + + if (unlikely(is_shared_vma) && ((fault & VM_FAULT_COMPLETED) || + (fault & VM_FAULT_RETRY) || fault_signal_pending(fault, regs))) + mmap_read_unlock(mm); =20 if (fault_signal_pending(fault, regs)) { /* @@ -1414,6 +1447,8 @@ void do_user_addr_fault(struct pt_regs *regs, goto retry; } =20 + if (unlikely(is_shared_vma)) + mmap_read_unlock(vma->vm_mm); mmap_read_unlock(mm); done: if (likely(!(fault & VM_FAULT_ERROR))) diff --git a/mm/Kconfig b/mm/Kconfig index ba3dbe31f86a..4fc056bb5643 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1360,7 +1360,7 @@ config PT_RECLAIM =20 config MSHARE bool "Mshare" - depends on MMU + depends on MMU && ARCH_SUPPORTS_MSHARE help Enable msharefs: A ram-based filesystem that allows multiple processes to share page table entries for shared pages. A file --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DCD91F9F67; Fri, 24 Jan 2025 23:56:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762986; cv=none; b=h1jwcBBFpWoQWc68M1TCdMsfj0d5tW9pcpW09i6USfE2K2TuNaZ8Q2moiS0laXw6ipVM9IZ2Raas8SbDWYgsu/p8NJYMwoOjjQt0Ad6/KE9HGD4UW2OzMCMCA6qLbXJmiP03hci3qetsfvaYELLNUdErPHgqr1ofcYNHDiVVGXo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762986; c=relaxed/simple; bh=VDs2JcgsMRcQoxPc5jG5PW9l2zO89tAiKcTbOIEaz+s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ts+MqoHLL40TpsBnuBqA2hGDVexgjIfWAudEcdp1GFliAfsySV2BJmGVyL3/S675eAm3v21DdAwdt2tD9ppEd3Q/ZLCnWXNtwYFhTCv//yd63lGEkL57R/jU2ON2wur14aC/1vJFRAIHzlj0Rync3mP0MPeK4EBmOPP1IqKg0iE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=fUnoSigK; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="fUnoSigK" Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OI8hMZ022258; Fri, 24 Jan 2025 23:55:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=yJXvn Yoxkm1NK3qwW6jmL1jgVFm2IaD/yhkctg4U92M=; b=fUnoSigK9HLwd6NsS95Dr GyOpPYzCA/4kAye89j7nfUI1NddpqygTwjTKIAx8WdCkHtrXLbtzP35MIHAsmrUW YVajTmDtXQjTlPNn92R8YLpqiHdUn002WI1acMbJkbOE6cMhRzIe1TEbnETKclZv PO8InVxSqjARDaOLg0THwBHKBh7H5l7Xp4Mhgk1Fml5N0jH9z71iYiikv6v2wV/f gSumYFsJlOo8v1MNe/KVyxtq253t9KB/+8zgIauIoMpKEGsrxO4wJMhdS9HhNoxx qdHFU0fU68u6E+2F/P0wK00JlGdSUsPJJNY44LDZVvFxwFLeMFT0nDibDEg32Rfu w== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awpx5vq1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:53 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OLKNdP036521; Fri, 24 Jan 2025 23:55:53 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4ajv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:53 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQA018051; Fri, 24 Jan 2025 23:55:52 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-15; Fri, 24 Jan 2025 23:55:52 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 14/20] mm: create __do_mmap() to take an mm_struct * arg Date: Fri, 24 Jan 2025 15:54:48 -0800 Message-ID: <20250124235454.84587-15-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=915 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: Oa18II2kZqrAU3hlh8jjOS5UYIop2-kj X-Proofpoint-GUID: Oa18II2kZqrAU3hlh8jjOS5UYIop2-kj Content-Type: text/plain; charset="utf-8" In preparation for mapping objects into an mshare region, create __do_mmap() to allow mapping into a specified mm. There are no functional changes otherwise. Signed-off-by: Anthony Yznaga --- include/linux/mm.h | 16 ++++++++++++++++ mm/mmap.c | 7 +++---- mm/vma.c | 15 +++++++-------- mm/vma.h | 2 +- 4 files changed, 27 insertions(+), 13 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9889c4757f45..80429d1a6ae4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3398,10 +3398,26 @@ get_unmapped_area(struct file *file, unsigned long = addr, unsigned long len, return __get_unmapped_area(file, addr, len, pgoff, flags, 0); } =20 +#ifdef CONFIG_MMU +unsigned long __do_mmap(struct file *file, unsigned long addr, + unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf, struct mm_struct *mm); +static inline unsigned long do_mmap(struct file *file, unsigned long addr, + unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf) +{ + return __do_mmap(file, addr, len, prot, flags, vm_flags, pgoff, + populate, uf, current->mm); +} +#else extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); +#endif + extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool unlock); diff --git a/mm/mmap.c b/mm/mmap.c index cda01071c7b1..2d327b148bfc 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -334,13 +334,12 @@ static inline bool file_mmap_ok(struct file *file, st= ruct inode *inode, * Returns: Either an error, or the address at which the requested mapping= has * been performed. */ -unsigned long do_mmap(struct file *file, unsigned long addr, +unsigned long __do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, - struct list_head *uf) + struct list_head *uf, struct mm_struct *mm) { - struct mm_struct *mm =3D current->mm; int pkey =3D 0; =20 *populate =3D 0; @@ -558,7 +557,7 @@ unsigned long do_mmap(struct file *file, unsigned long = addr, vm_flags |=3D VM_NORESERVE; } =20 - addr =3D mmap_region(file, addr, len, vm_flags, pgoff, uf); + addr =3D mmap_region(file, addr, len, vm_flags, pgoff, uf, mm); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) =3D=3D MAP_POPULATE)) diff --git a/mm/vma.c b/mm/vma.c index af1d549b179c..28942701e301 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -2433,9 +2433,8 @@ static void __mmap_complete(struct mmap_state *map, s= truct vm_area_struct *vma) =20 static unsigned long __mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, struct mm_struct *mm) { - struct mm_struct *mm =3D current->mm; struct vm_area_struct *vma =3D NULL; int error; VMA_ITERATOR(vmi, mm, addr); @@ -2485,13 +2484,13 @@ static unsigned long __mmap_region(struct file *fil= e, unsigned long addr, =20 /** * mmap_region() - Actually perform the userland mapping of a VMA into - * current->mm with known, aligned and overflow-checked @addr and @len, and + * mm with known, aligned and overflow-checked @addr and @len, and * correctly determined VMA flags @vm_flags and page offset @pgoff. * * This is an internal memory management function, and should not be used * directly. * - * The caller must write-lock current->mm->mmap_lock. + * The caller must write-lock mm->mmap_lock. * * @file: If a file-backed mapping, a pointer to the struct file describin= g the * file to be mapped, otherwise NULL. @@ -2508,12 +2507,12 @@ static unsigned long __mmap_region(struct file *fil= e, unsigned long addr, */ unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, struct mm_struct *mm) { unsigned long ret; bool writable_file_mapping =3D false; =20 - mmap_assert_write_locked(current->mm); + mmap_assert_write_locked(mm); =20 /* Check to see if MDWE is applicable. */ if (map_deny_write_exec(vm_flags, vm_flags)) @@ -2532,13 +2531,13 @@ unsigned long mmap_region(struct file *file, unsign= ed long addr, writable_file_mapping =3D true; } =20 - ret =3D __mmap_region(file, addr, len, vm_flags, pgoff, uf); + ret =3D __mmap_region(file, addr, len, vm_flags, pgoff, uf, mm); =20 /* Clear our write mapping regardless of error. */ if (writable_file_mapping) mapping_unmap_writable(file->f_mapping); =20 - validate_mm(current->mm); + validate_mm(mm); return ret; } =20 diff --git a/mm/vma.h b/mm/vma.h index a2e8710b8c47..e704f56577f3 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -243,7 +243,7 @@ void mm_drop_all_locks(struct mm_struct *mm); =20 unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); + struct list_head *uf, struct mm_struct *mm); =20 int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *brkvma, unsigned long addr, unsigned long request, unsigned long flags); --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1E9F1F12FA; Fri, 24 Jan 2025 23:56:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762985; cv=none; b=A0Y9lhPNyNlTs/l4exYWBdDqNkFOmCbkx7ShyK4yQghjPLJeSi3XwnpInxWprzgDyVqFPGdHh41iJtK0k8X32E6+/n6y4asjaaGRHPWZ4pNRUXMg5RBToMZjGlR94WfGVbbcGh7l0P1kLedwsjfJjos3WgvwBVquMtL/P7nIMu8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762985; c=relaxed/simple; bh=TsZz4DGP6w00s3ZTot4Nn5rn1+kOmHLDotmoFkt2kso=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vB9fF7thywDp4hOzf3Ekia4lo8wolZQayAFJkujoWMn3/1l1e50ye9FNWPO2u1Tdb8zDZbQXiDrYHsroKEjAR5Q9tcx/IKCqsg3HSpWOr8bP0XV+1yenKfo87nqhraw1OFN/lLMh1fBERnq5ZzaQURzPILQnA2bzcQpxDirpqRE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=QXXOQ6N2; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="QXXOQ6N2" Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIioeL018166; Fri, 24 Jan 2025 23:55:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=W2H0c tTufvhNvnXZJs2k6LGmiWdy4rqMSHVr4raXjx0=; b=QXXOQ6N23uk7G9VbZbPAF f/QqSAc5iOsT9xCDTdmT6oyWhAgniQUG3Z7HmFAfZ0bDBwKBIvV+YT7Vjcn2YUsw 4BXCgx2yNsC2NOEOkIHJOoc4cywTGNGprHkX2qJmp3Xx+WG/52Czo16Ed3Ov5QhM 7/Iq8gqTqAgrLzC/67EBmTQU8e0ckWYvZEkC9eoZSGIttbuEitMyfH3mwJVOB5cq nCTIcpd5kRDYGcIOT0w0nMvOGTNdnhFcaiRQEsEqsDLWzbY7hJ3r5tDcUZkZM2sm w/OjPwtIAYWhM4xN3xfWaR0SlPwYAKwlvDlxi/32YuE+r3YbcTLRLr2NJor30AJL g== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44b96am96t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:57 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50ONC16b036516; Fri, 24 Jan 2025 23:55:56 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4akt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:56 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQC018051; Fri, 24 Jan 2025 23:55:55 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-16; Fri, 24 Jan 2025 23:55:55 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 15/20] mm: pass the mm in vma_munmap_struct Date: Fri, 24 Jan 2025 15:54:49 -0800 Message-ID: <20250124235454.84587-16-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: 3Bsi0dEb_NSrsadh2i7dfRhctL6iq62g X-Proofpoint-ORIG-GUID: 3Bsi0dEb_NSrsadh2i7dfRhctL6iq62g Content-Type: text/plain; charset="utf-8" Allow unmap to work with an mshare host mm. Signed-off-by: Anthony Yznaga --- mm/vma.c | 10 ++++++---- mm/vma.h | 1 + 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/mm/vma.c b/mm/vma.c index 28942701e301..60a37a9eb15e 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -1174,7 +1174,7 @@ static void vms_complete_munmap_vmas(struct vma_munma= p_struct *vms, struct vm_area_struct *vma; struct mm_struct *mm; =20 - mm =3D current->mm; + mm =3D vms->mm; mm->map_count -=3D vms->vma_count; mm->locked_vm -=3D vms->locked_vm; if (vms->unlock) @@ -1382,13 +1382,15 @@ static int vms_gather_munmap_vmas(struct vma_munmap= _struct *vms, * @start: The aligned start address to munmap * @end: The aligned end address to munmap * @uf: The userfaultfd list_head + * @mm: The mm struct * @unlock: Unlock after the operation. Only unlocked on success */ static void init_vma_munmap(struct vma_munmap_struct *vms, struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct list_head *uf, - bool unlock) + struct mm_struct *mm, bool unlock) { + vms->mm =3D mm; vms->vmi =3D vmi; vms->vma =3D vma; if (vma) { @@ -1432,7 +1434,7 @@ int do_vmi_align_munmap(struct vma_iterator *vmi, str= uct vm_area_struct *vma, struct vma_munmap_struct vms; int error; =20 - init_vma_munmap(&vms, vmi, vma, start, end, uf, unlock); + init_vma_munmap(&vms, vmi, vma, start, end, uf, mm, unlock); error =3D vms_gather_munmap_vmas(&vms, &mas_detach); if (error) goto gather_failed; @@ -2229,7 +2231,7 @@ static int __mmap_prepare(struct mmap_state *map, str= uct list_head *uf) =20 /* Find the first overlapping VMA and initialise unmap state. */ vms->vma =3D vma_find(vmi, map->end); - init_vma_munmap(vms, vmi, vms->vma, map->addr, map->end, uf, + init_vma_munmap(vms, vmi, vms->vma, map->addr, map->end, uf, map->mm, /* unlock =3D */ false); =20 /* OK, we have overlapping VMAs - prepare to unmap them. */ diff --git a/mm/vma.h b/mm/vma.h index e704f56577f3..03d69321312d 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -49,6 +49,7 @@ struct vma_munmap_struct { unsigned long exec_vm; unsigned long stack_vm; unsigned long data_vm; + struct mm_struct *mm; }; =20 enum vma_merge_state { --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E418B1F4282; Fri, 24 Jan 2025 23:56:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762993; cv=none; b=j54ak7bEoh/FgEGZERrcIGazWUWHzxfSEgQd5Dfg1AVN7e6VAI419JF8VQcIqM0qkJYhilcQ06qKMXLWq3Sv/spFbZzIHJPjpfX6LfNuX0HgC0quG1RvZYK0SnfHpy6Axc8uo77lbl6v0gwXQvjsoUSUO4APoOHW4uoUffgyO2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762993; c=relaxed/simple; bh=bCi69afgKdy8QjDhA126o09f6hVRjp1J+V6hT56xlI8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KdGEDLnPp0c1CU/nz70GI1iauxKIg+sMVzVLpwFTvCe4GG8LlvGDfZzZJoVPULGC/QWnCAnu0MdW3riwQHZeXSDUoMtjN+6AEFWj4iRnPpJHuDwOWlFpKY0oNi2CEt4oYhXGkpCT6oinPz81ze1Ek8525CcJBLzwvT+N+x3eAyY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=a2Fn/my0; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="a2Fn/my0" Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIi39S019133; Fri, 24 Jan 2025 23:56:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=fcIGc V/98l8ZSjQDmXrF3affeH5gAuXvROyoFcfZEuw=; b=a2Fn/my0YFitA1HTu3rMs vVjepHWSc4mtb1D4lZwU2+HUiFmyAfRG+b2J9B7sTQIhkhHv+5so55NFpuw+9QB6 5IlnBjtBB1nScTizxp8+HvWpPPlV8Xk7iu5bC2OppEWguINRcXS8x0rTEEduQ8zY 6DphIs0lyq7VA2SY/CjUxmOmQR2yda66f9hy9vTq5C4r9eoCY48aAV/BQhk/YKEf gXDR570rmgzu1e/LVmfqRVSIMF9lquv00jDKnJAiVDdsXsi3N1RZMtOdsZjGiBDt gYZ8Dcz/6PLRmaL0lP7CNShlYBppS11nC8cJec/kCQ23yo/t+0IRNU8t1fN4NuzE w== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4485qaw408-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:00 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMD09b036431; Fri, 24 Jan 2025 23:55:59 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4ama-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:55:59 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQE018051; Fri, 24 Jan 2025 23:55:58 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-17; Fri, 24 Jan 2025 23:55:58 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 16/20] mshare: add MSHAREFS_CREATE_MAPPING Date: Fri, 24 Jan 2025 15:54:50 -0800 Message-ID: <20250124235454.84587-17-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-GUID: 5QNgZKQltQ9yn3rK-ZEnWzGSYsobWy1O X-Proofpoint-ORIG-GUID: 5QNgZKQltQ9yn3rK-ZEnWzGSYsobWy1O Content-Type: text/plain; charset="utf-8" Add an ioctl for mapping objects within an mshare region. The arguments are the same as mmap(). Only shared anonymous memory mapped with MAP_FIXED is supported initially. Signed-off-by: Anthony Yznaga --- include/uapi/linux/msharefs.h | 9 +++++ mm/mshare.c | 65 +++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+) diff --git a/include/uapi/linux/msharefs.h b/include/uapi/linux/msharefs.h index c7b509c7e093..fea0afdf000d 100644 --- a/include/uapi/linux/msharefs.h +++ b/include/uapi/linux/msharefs.h @@ -20,10 +20,19 @@ */ #define MSHAREFS_GET_SIZE _IOR('x', 0, struct mshare_info) #define MSHAREFS_SET_SIZE _IOW('x', 1, struct mshare_info) +#define MSHAREFS_CREATE_MAPPING _IOW('x', 2, struct mshare_create) =20 struct mshare_info { __u64 start; __u64 size; }; =20 +struct mshare_create { + __u64 addr; + __u64 size; + __u64 offset; + __u32 prot; + __u32 flags; + __u32 fd; +}; #endif diff --git a/mm/mshare.c b/mm/mshare.c index 9ada1544aeb1..d70f10210b46 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -194,12 +194,60 @@ msharefs_set_size(struct mm_struct *host_mm, struct m= share_data *m_data, return 0; } =20 +static long +msharefs_create_mapping(struct mm_struct *host_mm, struct mshare_data *m_d= ata, + struct mshare_create *mcreate) +{ + unsigned long mshare_start, mshare_end; + unsigned long mapped_addr; + unsigned long populate =3D 0; + unsigned long addr =3D mcreate->addr; + unsigned long size =3D mcreate->size; + unsigned int fd =3D mcreate->fd; + int prot =3D mcreate->prot; + int flags =3D mcreate->flags; + vm_flags_t vm_flags; + int err =3D -EINVAL; + + mshare_start =3D m_data->minfo.start; + mshare_end =3D mshare_start + m_data->minfo.size; + + if ((addr < mshare_start) || (addr >=3D mshare_end) || + (addr + size > mshare_end)) + goto out; + + /* + * Only anonymous shared memory at fixed addresses is allowed for now. + */ + if ((flags & (MAP_SHARED | MAP_FIXED)) !=3D (MAP_SHARED | MAP_FIXED)) + goto out; + if (fd !=3D -1) + goto out; + + if (mmap_write_lock_killable(host_mm)) { + err =3D -EINTR; + goto out; + } + + err =3D 0; + mapped_addr =3D __do_mmap(NULL, addr, size, prot, flags, vm_flags, + 0, &populate, NULL, host_mm); + + if (IS_ERR_VALUE(mapped_addr)) + err =3D (long)mapped_addr; + + mmap_write_unlock(host_mm); +out: + return err; +} + static long msharefs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct mshare_data *m_data =3D filp->private_data; struct mm_struct *host_mm =3D m_data->mm; struct mshare_info minfo; + struct mshare_create mcreate; =20 switch (cmd) { case MSHAREFS_GET_SIZE: @@ -228,6 +276,23 @@ msharefs_ioctl(struct file *filp, unsigned int cmd, un= signed long arg) =20 return msharefs_set_size(host_mm, m_data, &minfo); =20 + case MSHAREFS_CREATE_MAPPING: + if (copy_from_user(&mcreate, (struct mshare_create __user *)arg, + sizeof(mcreate))) + return -EFAULT; + + /* + * validate mshare region + */ + spin_lock(&m_data->m_lock); + if (m_data->minfo.size =3D=3D 0) { + spin_unlock(&m_data->m_lock); + return -EINVAL; + } + spin_unlock(&m_data->m_lock); + + return msharefs_create_mapping(host_mm, m_data, &mcreate); + default: return -ENOTTY; } --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 808841F4729; Fri, 24 Jan 2025 23:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762996; cv=none; b=Wyti0lUQ8xI+893anz0pua2DG5XAI3HvW7psHWYvYhU/+xFxDNFOIbJjT4oSm7232xJ1IJeV1+m8FHYjJcCpd/tcpnWH5Jv+y1AZDYI5VnUnouP0/4/XdqdbunOnp8dFFPbNQd5DNTxanQqPDejxlyghc93libtwlGDKIs1t+nI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737762996; c=relaxed/simple; bh=HuaBhXJgjM+5zMI0EFUsmAd6q2svXLGu1wIAmSY2fGg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q6CT0P36DzRl4l+Y1y0ogUo+3HxoRh+WhmJwk3X4VG1dG9hSiCidKEEveeY6itAvhROqhZYscVepvxmJ+QC/6Ta3ZHktJLFmi8NU1j4thFj97VcCDDwYOAEvl3yghwHXI8wQlumNIuk/sTALcmnBvORwQhqzPGgFpGmo93FObUs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=HreZ+IIN; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="HreZ+IIN" Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIBX66001521; Fri, 24 Jan 2025 23:56:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=X0po0 sWopN9sdAxoG7FE/M7ahK8VIPD0eThxsGOy3dw=; b=HreZ+IINxUUt+Ox8BrN6k 6Z+qo+dQKjtgOpUCTcmokqyKC8EDmnBK2Cf+RCng15P/GkxFg0YXJdOy4fwxNpBY Dg33t34z09UTLSboAUdg1aUmGbgXxjuHDSoaCEy3SrV/28oxHKjTvHW3OIS8Zx5D DGKtvBeW6Ae3gzmirx1E1l76v6cYlLcpkQyY2Q9SyYL+msDRSVRrRQaHj9J2imvK 5KRtFLNDEJ2pS4aQtFeUrnedt0uyCLN7NtseR0U2zF2H23db2kvhKkEO7XVws/uZ ATY/rdXNTX7N0L3xJlOQh/jjZUdGVZNw2s8m51crRzx6N9pzGPxZCreQtHBzT8EX Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44b06j5j48-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:03 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMMu6P036606; Fri, 24 Jan 2025 23:56:02 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4an8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:02 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQG018051; Fri, 24 Jan 2025 23:56:01 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-18; Fri, 24 Jan 2025 23:56:01 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 17/20] mshare: add MSHAREFS_UNMAP Date: Fri, 24 Jan 2025 15:54:51 -0800 Message-ID: <20250124235454.84587-18-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=914 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: rYZnMe-DfE5Cl4XSqOR0IwdWDQ6875RN X-Proofpoint-GUID: rYZnMe-DfE5Cl4XSqOR0IwdWDQ6875RN Content-Type: text/plain; charset="utf-8" Add an ioctl for unmapping objects in an mshare region. Signed-off-by: Anthony Yznaga --- include/uapi/linux/msharefs.h | 7 ++++++ mm/mshare.c | 44 +++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/uapi/linux/msharefs.h b/include/uapi/linux/msharefs.h index fea0afdf000d..f7af1f2b5ee7 100644 --- a/include/uapi/linux/msharefs.h +++ b/include/uapi/linux/msharefs.h @@ -21,6 +21,7 @@ #define MSHAREFS_GET_SIZE _IOR('x', 0, struct mshare_info) #define MSHAREFS_SET_SIZE _IOW('x', 1, struct mshare_info) #define MSHAREFS_CREATE_MAPPING _IOW('x', 2, struct mshare_create) +#define MSHAREFS_UNMAP _IOW('x', 3, struct mshare_unmap) =20 struct mshare_info { __u64 start; @@ -35,4 +36,10 @@ struct mshare_create { __u32 flags; __u32 fd; }; + +struct mshare_unmap { + __u64 addr; + __u64 size; +}; + #endif diff --git a/mm/mshare.c b/mm/mshare.c index d70f10210b46..8f53b8132895 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -241,6 +241,32 @@ msharefs_create_mapping(struct mm_struct *host_mm, str= uct mshare_data *m_data, return err; } =20 +static long +msharefs_unmap(struct mm_struct *host_mm, struct mshare_data *m_data, + struct mshare_unmap *m_unmap) +{ + unsigned long mshare_start, mshare_end; + unsigned long addr =3D m_unmap->addr; + unsigned long size =3D m_unmap->size; + int err; + + mshare_start =3D m_data->minfo.start; + mshare_end =3D mshare_start + m_data->minfo.size; + + if ((addr < mshare_start) || (addr >=3D mshare_end) || + (addr + size > mshare_end)) + return -EINVAL; + + if (mmap_write_lock_killable(host_mm)) + return -EINTR; + + err =3D do_munmap(host_mm, addr, size, NULL); + + mmap_write_unlock(host_mm); + + return err; +} + static long msharefs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { @@ -248,6 +274,7 @@ msharefs_ioctl(struct file *filp, unsigned int cmd, uns= igned long arg) struct mm_struct *host_mm =3D m_data->mm; struct mshare_info minfo; struct mshare_create mcreate; + struct mshare_unmap m_unmap; =20 switch (cmd) { case MSHAREFS_GET_SIZE: @@ -293,6 +320,23 @@ msharefs_ioctl(struct file *filp, unsigned int cmd, un= signed long arg) =20 return msharefs_create_mapping(host_mm, m_data, &mcreate); =20 + case MSHAREFS_UNMAP: + if (copy_from_user(&m_unmap, (struct mshare_unmap __user *)arg, + sizeof(m_unmap))) + return -EFAULT; + + /* + * validate mshare region + */ + spin_lock(&m_data->m_lock); + if (m_data->minfo.size =3D=3D 0) { + spin_unlock(&m_data->m_lock); + return -EINVAL; + } + spin_unlock(&m_data->m_lock); + + return msharefs_unmap(host_mm, m_data, &m_unmap); + default: return -ENOTTY; } --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E7E31F76D3; Fri, 24 Jan 2025 23:57:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737763021; cv=none; b=redh08TH2F7AEn9dO27TvW7va9szr+DKAYYjzcdaUqs234UXpHk65JQ0gxB2YVwS8hkXC8ZmT3krOidPU1nFRPmiaHGeA5j0Ryfk4hfmDTy509i1ST5NyrjAXcnbgBkv7qprKf/Raq3Q+4B198YU/xpxTxzG2S1zg7Yu/KDXDUU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737763021; c=relaxed/simple; bh=DKA9dIletWKBiV4Hg22ND8mbQB18lfg8FRVKfVLg9x0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q7+BckCjPjEuZU5XQWrAX5bMeNOM37p0RSToILeBcPp3MTfsQ+vDub6WZg1/UuOTMMuH3E6QYLm8E08ev31Dfl2JMJpG66J7H2QDVzXofIya1cK2LuFC32/nTYY+IqfrqcScJH0EYoBnTMlk2GXpePHsz35YzuiXgrlMHbdHPlA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=NBNz1vpN; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NBNz1vpN" Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIXg65000799; Fri, 24 Jan 2025 23:56:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=8uftA rRhojvCPMIRnGj1x1yBh7nth+rSToT9QfbKQbE=; b=NBNz1vpNVG2DdlahpKAac mHO3DpyehCcOesPJsPnk36jiCZRFlXyz9Y8PTUI7Hnr4gFTFjbSxnZ8oKQoK/e3D YShuehhAQHh4kFuqiX6xE1Dt2E881mSqfHP3bmUwYsgQSJlS2PjaQJNVyT9/WOZS QqpVHaIbVebMBSuDVaH9otVoCtmhY/9pMaDlI/MuoyXFDkG/AksQjFluDgd4yr3H 8+3PIMBHFEW/B1nL1bh/H1xpLlGX5rDCDlAlZV6E6aC8MIJStkhyOxHe+pAt6ki5 1a1OSflAkFUO3oEeHVB/HBKm+9KteLBZVAsvfXVp3WGJZD8idzzQ1RJdlD0LU/Ak A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awyh5rkj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:07 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OLoFNH036500; Fri, 24 Jan 2025 23:56:06 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4ap1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:06 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQI018051; Fri, 24 Jan 2025 23:56:05 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-19; Fri, 24 Jan 2025 23:56:05 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 18/20] mm/mshare: provide a way to identify an mm as an mshare host mm Date: Fri, 24 Jan 2025 15:54:52 -0800 Message-ID: <20250124235454.84587-19-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=924 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: seWEfcu2KQdw5VizmqhNebvh3voJaHP1 X-Proofpoint-GUID: seWEfcu2KQdw5VizmqhNebvh3voJaHP1 Content-Type: text/plain; charset="utf-8" Add new mm flag, MMF_MSHARE. Signed-off-by: Anthony Yznaga --- include/linux/mm_types.h | 2 ++ mm/mshare.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5f1b2dc788e2..dfbeb50e4c9b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1642,6 +1642,8 @@ enum { #define MMF_TOPDOWN 31 /* mm searches top down by default */ #define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN) =20 +#define MMF_MSHARE 32 /* mm is an mshare host mm */ + #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK) diff --git a/mm/mshare.c b/mm/mshare.c index 8f53b8132895..4c3f6c2410d6 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -365,6 +365,7 @@ msharefs_fill_mm(struct inode *inode) goto err_free; } =20 + set_bit(MMF_MSHARE, &mm->flags); mm->mmap_base =3D mm->task_size =3D 0; =20 m_data =3D kzalloc(sizeof(*m_data), GFP_KERNEL); --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 947E91EEA33; Fri, 24 Jan 2025 23:58:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737763090; cv=none; b=l2RV9FXAbX7V1u3CB3qsgEwQ5fEVzHn71kLSTx922W6djiChN0ro5YAY7FEri5cfDhGMui5O1RrBMvIhGFyWjLbPQr4ttXv0Xm1OwmmOqxrpooomIi5j3R0lavg2sd8zFjrEU2hS4O9jnc9k0E6WCs/842NuPp18c8s94cY1SZc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737763090; c=relaxed/simple; bh=f19dEU0waQS5E0P/wf9Asn87RUd8SB9dbmqgvMGp6K8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Igs5iDYaQjJ9u4POoYrbxnWEt6VvXP8313NCz4eW1xLjBdSAHjiMPtSyu5YI26Sb1lLaHAKxUmDSr9q6sFeV7MjVJKPhN6/7l8Of5R2BPXq9Gnr7vPwFK5uVoEbGfJvHNnROly3hINJic2Hd6tr8e+axyfJ0fGmd3h59STB3fjg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=B3fjZHZb; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="B3fjZHZb" Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIBhnG000613; Fri, 24 Jan 2025 23:56:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=SM5Ww 7fWlIZajv0rUMSJsMcTKUFDtvzpjq2xibJD7zc=; b=B3fjZHZbj/1OatG3T8blh RsVs7af8iubXIbp5pCA/LmBolWeub9ZWDjgfFw/eUeRi9kdXt4spr34dzUUkjVYH 8YFGdxJ4Z5/RpbERJYmTWBrzS9ds8FNE2A0kyKEfgzObXPdQl7S+wid3oS7YndzN RZR9201pZ9PZfSijDDQmyNa/mBgI2y7nL9PF9mqYFcwu3MYFtqG6NtUY/EBfUsRO 9kYqDw1UwZamvlgb0Lzhzui1+zNUkKTJbSnBNmyZJxKmd7jmxIsBEVT1go/vz6CP ybAZTHMbWjqjXO8Y8QBEbLr+RzKlZtiw3vGENMpoLhyb+Mkvr3LTEbk1ar79lSPf Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44awyh5rkm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:10 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMPchS036590; Fri, 24 Jan 2025 23:56:09 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4aq7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:09 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQK018051; Fri, 24 Jan 2025 23:56:08 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-20; Fri, 24 Jan 2025 23:56:08 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 19/20] mm/mshare: get memcg from current->mm instead of mshare mm Date: Fri, 24 Jan 2025 15:54:53 -0800 Message-ID: <20250124235454.84587-20-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=912 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: mhtNEYRHB3aSsRiiq5ZgkAR2eaM2NkII X-Proofpoint-GUID: mhtNEYRHB3aSsRiiq5ZgkAR2eaM2NkII Content-Type: text/plain; charset="utf-8" Because handle_mm_fault() may operate on a vma from an mshare host mm, the mm passed to cgroup functions count_memcg_event_mm() and get_mem_cgroup_from_mm() may be an mshare host mm. These functions find a memcg by dereferencing mm->owner which is set when an mm is allocated. Since the task that created an mshare file may exit before the file is deleted, use current->mm instead to find the memcg to update or charge to. This may not be the right solution but is hopefully a good starting point. If charging should always go to a single memcg associated with the mshare file, perhaps active_memcg could be used. Signed-off-by: Anthony Yznaga --- include/linux/memcontrol.h | 3 +++ mm/memcontrol.c | 3 ++- mm/mshare.c | 3 +++ 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6e74b8254d9b..e458ca80e833 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -987,6 +987,9 @@ static inline void count_memcg_events_mm(struct mm_stru= ct *mm, if (mem_cgroup_disabled()) return; =20 + if (test_bit(MMF_MSHARE, &mm->flags)) + mm =3D current->mm; + rcu_read_lock(); memcg =3D mem_cgroup_from_task(rcu_dereference(mm->owner)); if (likely(memcg)) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 46f8b372d212..ba6267615ee6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -938,7 +938,8 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_str= uct *mm) mm =3D current->mm; if (unlikely(!mm)) return root_mem_cgroup; - } + } else if (test_bit(MMF_MSHARE, &mm->flags)) + mm =3D current->mm; =20 rcu_read_lock(); do { diff --git a/mm/mshare.c b/mm/mshare.c index 4c3f6c2410d6..5cc416cfd78c 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -381,6 +381,9 @@ msharefs_fill_mm(struct inode *inode) if (ret) goto err_free; =20 +#ifdef CONFIG_MEMCG + mm->owner =3D NULL; +#endif return 0; =20 err_free: --=20 2.43.5 From nobody Tue Feb 10 17:08:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D8E41F4729; Fri, 24 Jan 2025 23:56:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737763008; cv=none; b=dmlPqhlmrlQQC0m+72VgNmPrLLL4mqeG3Y8UplaSO23yU8jGzSK5rjrOcBiCi/D7JplnKXZx7vGtyXO5Xs5Mcbkfmr9bXCm1MOo2FwRlJpHgjc3htPuBFzLS7guHqiJhcWSEcIWpuxMsNyE08WWwTXv7ABJGujiLd5CKdS2lw9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737763008; c=relaxed/simple; bh=r9ZQ12ZIgNTFTk7kaqzMBVjLOWGf8XLqE9dCL3V25Co=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pzIzt3/kycmiijU/DTdFc/zdM6w9zjLJh896VtMfB/aJ3orb86q3rtPndaXzkYxRMmun+je3jKhUjuA12Cb9tUVj5BUCCGzYeg8qYQTpIuIwVCFZNlm2K1K/6jxk/57efOgYwrUwa9bWwNYYX+wGSP3Nb/lUAaSy9ejelYHV8JE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=TKAFsMxw; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="TKAFsMxw" Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50OIDTDv002066; Fri, 24 Jan 2025 23:56:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=W/gkk 8s5Xxm+HTDJna5mPHZqxpMgtw4DEDxvy6ObgxU=; b=TKAFsMxw1dXtdtOqEkfGV AVdqkRY2RaTdu3ZmNNPD24QQMG/BLy7lFLPKQesBM2vgbBHmiE9tOyS6uYXZD9a8 igkuBfXhy7Q+VQB0Bm8CIozCZ7QQEBlmnDjvf+UaytMwnp0OMnxpxlBDMaVKYQmq P9D0laJVFb14OHi14s8rtz/KnzjNr/Fh6nc8gkVLCGAATk+D/arv9PRIvMfr7kHG 6/lLtbfBmh/jzKaJR+mgTQutGpbydQuuNkiX+EmfQy6FHGubtdh8EJOxsuwy9JOT yAsDLorshtbQQvpBZtJ7Z37vUlSH8xD+TAbBo5nI9nKMmO7mYH7lUg8XO5iRsEmU A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 44b06j5j4a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:13 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 50OMEEls036463; Fri, 24 Jan 2025 23:56:13 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44917u4aqt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 24 Jan 2025 23:56:13 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 50ONsxQM018051; Fri, 24 Jan 2025 23:56:12 GMT Received: from localhost.us.oracle.com (dhcp-10-65-130-174.vpn.oracle.com [10.65.130.174]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44917u49ww-21; Fri, 24 Jan 2025 23:56:11 +0000 From: Anthony Yznaga To: akpm@linux-foundation.org, willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org Cc: anthony.yznaga@oracle.com, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: [PATCH 20/20] mm/mshare: associate a mem cgroup with an mshare file Date: Fri, 24 Jan 2025 15:54:54 -0800 Message-ID: <20250124235454.84587-21-anthony.yznaga@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-24_10,2025-01-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2501240162 X-Proofpoint-ORIG-GUID: Z6OzEU8bsVr0C8NMG9E5ROuvsfbs4fjT X-Proofpoint-GUID: Z6OzEU8bsVr0C8NMG9E5ROuvsfbs4fjT Content-Type: text/plain; charset="utf-8" This patch shows one approach to associating a specific mem cgroup to an mshare file and was inspired by code in mem_cgroup_sk_alloc(). Essentially when a process creates an mshare region, a reference is taken on the mem cgroup that the process belongs to and a pointer to the memcg is saved. At fault time set_active_memcg() is used to temporarily enable charging of __GFP_ACCOUNT allocations to the saved memcg. This does consolidate pagetable charges to a single memcg, but there are issues to address such as how to handle the case where the memcg is deleted but becomes a hidden, zombie memcg because the mshare file has a reference to it. Signed-off-by: Anthony Yznaga --- arch/x86/mm/fault.c | 11 +++++++++++ include/linux/mm.h | 5 +++++ mm/mshare.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 4b55ade61a01..1b50417f68ad 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -21,6 +21,7 @@ #include #include /* find_and_lock_vma() */ #include +#include =20 #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -1219,6 +1220,8 @@ void do_user_addr_fault(struct pt_regs *regs, unsigned int flags =3D FAULT_FLAG_DEFAULT; bool is_shared_vma; unsigned long addr; + struct mem_cgroup *mshare_memcg; + struct mem_cgroup *memcg; =20 tsk =3D current; mm =3D tsk->mm; @@ -1375,6 +1378,8 @@ void do_user_addr_fault(struct pt_regs *regs, } =20 if (unlikely(vma_is_mshare(vma))) { + mshare_memcg =3D get_mshare_memcg(vma); + fault =3D find_shared_vma(&vma, &addr); =20 if (fault) { @@ -1402,6 +1407,9 @@ void do_user_addr_fault(struct pt_regs *regs, return; } =20 + if (is_shared_vma && mshare_memcg) + memcg =3D set_active_memcg(mshare_memcg); + /* * If for any reason at all we couldn't handle the fault, * make sure we exit gracefully rather than endlessly redo @@ -1417,6 +1425,9 @@ void do_user_addr_fault(struct pt_regs *regs, */ fault =3D handle_mm_fault(vma, addr, flags, regs); =20 + if (is_shared_vma && mshare_memcg) + set_active_memcg(memcg); + if (unlikely(is_shared_vma) && ((fault & VM_FAULT_COMPLETED) || (fault & VM_FAULT_RETRY) || fault_signal_pending(fault, regs))) mmap_read_unlock(mm); diff --git a/include/linux/mm.h b/include/linux/mm.h index 80429d1a6ae4..eaa304d22a9d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1110,12 +1110,17 @@ static inline bool vma_is_anon_shmem(struct vm_area= _struct *vma) { return false; int vma_is_stack_for_current(struct vm_area_struct *vma); =20 #ifdef CONFIG_MSHARE +struct mem_cgroup *get_mshare_memcg(struct vm_area_struct *vma); vm_fault_t find_shared_vma(struct vm_area_struct **vma, unsigned long *add= rp); static inline bool vma_is_mshare(const struct vm_area_struct *vma) { return vma->vm_flags & VM_MSHARE; } #else +static inline struct mem_cgroup *get_mshare_memcg(struct vm_area_struct *v= ma) +{ + return NULL; +} static inline vm_fault_t find_shared_vma(struct vm_area_struct **vma, unsi= gned long *addrp) { WARN_ON_ONCE(1); diff --git a/mm/mshare.c b/mm/mshare.c index 5cc416cfd78c..a56e56c90aaa 100644 --- a/mm/mshare.c +++ b/mm/mshare.c @@ -16,6 +16,7 @@ =20 #include #include +#include #include #include #include @@ -30,8 +31,22 @@ struct mshare_data { spinlock_t m_lock; struct mshare_info minfo; struct mmu_notifier mn; +#ifdef CONFIG_MEMCG + struct mem_cgroup *memcg; +#endif }; =20 +struct mem_cgroup *get_mshare_memcg(struct vm_area_struct *vma) +{ + struct mshare_data *m_data =3D vma->vm_private_data; + +#ifdef CONFIG_MEMCG + return m_data->memcg; +#else + return NULL; +#endif +} + static void mshare_invalidate_tlbs(struct mmu_notifier *mn, struct mm_stru= ct *mm, unsigned long start, unsigned long end) { @@ -358,6 +373,9 @@ msharefs_fill_mm(struct inode *inode) struct mm_struct *mm; struct mshare_data *m_data =3D NULL; int ret =3D 0; +#ifdef CONFIG_MEMCG + struct mem_cgroup *memcg; +#endif =20 mm =3D mm_alloc(); if (!mm) { @@ -383,6 +401,17 @@ msharefs_fill_mm(struct inode *inode) =20 #ifdef CONFIG_MEMCG mm->owner =3D NULL; + + rcu_read_lock(); + memcg =3D mem_cgroup_from_task(current); + if (mem_cgroup_is_root(memcg)) + goto out; + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + goto out; + if (css_tryget(&memcg->css)) + m_data->memcg =3D memcg; +out: + rcu_read_unlock(); #endif return 0; =20 @@ -396,6 +425,10 @@ msharefs_fill_mm(struct inode *inode) static void msharefs_delmm(struct mshare_data *m_data) { +#ifdef CONFIG_MEMCG + if (m_data->memcg) + css_put(&m_data->memcg->css); +#endif mmput(m_data->mm); kfree(m_data); } --=20 2.43.5