From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C6BF245AFC
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486823; cv=none;
 b=etqdHzpjBsmAif2JBOjEjjk46z1j9sQPDWiU7koEQwZYY+f3saUO/DlZ2btzjPy5+WYDnEHpSg+Kb4NZioKGN3wk7ZM1OkvSwt/mTutE6ylLqfBdvebq5fNMmrcns1tu180qt2iACy6YQGhcvDHRyvMicaBuYZJg5Wr7WSMz8Kc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486823; c=relaxed/simple;
	bh=DnHwwceX94T6CFAGCwdOIkJAjmpCwFYCPBayU0ptLsM=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=edKKtQw5xz5+uI5mRjjl8W4lb8j2F+1NC6IedeerVPzHwjU84yZ3m8SZ8G+nzc3wov8lCCwHeOZXVhIMJbJkxPd0gOJ1x3FBvCb19DfKuZvDb+D7WoIKtAPYXWwTFk2Pl9AoRiQlHJYleUxEyuFnIH7OpqfU0MNLP+/9HPQ+RHU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=qKcVF4q/; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="qKcVF4q/"
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-21f6cd48c56so21398645ad.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486821; x=1740091621;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ZTYk4jMv74ux8pRXqzfOQIPTG5wd2Gv112zjF6xUHQA=;
        b=qKcVF4q/5JcwevH7gLqvXH9DAd4s9lKN7kuCXV2eXtxgpCYHOHxgVdMxz5l6RzPeft
         Dv+lENIk1QpAptdKelxPzloLOtBHMa4XrwBuVF2BnnMtTtBXtyrfTdYShinnYx8DVI4Q
         mt5bIrGtdj+98tw4y/OUmrLKY3PbE5XEyopFOUJFY2wsvG5bal43dvH8xMLVEmXfkx+U
         emKgaTd5i3FRKhZntFIxXPOIgM9OuKOsqQMdSpKIf0r3zasFFcXA5ojgtNb821GrAyJq
         iIkfM0fKwOck5RZeF1U+CfcuC+PuGcGygRf1EVEsVsD5o9ymGAQGZcWSH4hOAroUM6he
         72vQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486821; x=1740091621;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ZTYk4jMv74ux8pRXqzfOQIPTG5wd2Gv112zjF6xUHQA=;
        b=C6y4PCLd0A3n0zpD1J2S4wqeVDWZL55QajQsk/wtfOy1Xp11O6XFafrAeYJ1wMfP7P
         0PrrbqJ0t0QRHk86AY24IT/BEkZvV29yZPeyWZvoGpMSudnNz0wV8sznHiBuDSCtDy0S
         Cg18MLkr6Abzbk+fGz0VN/M1R6LEzv67ritnD21vNinLjg1nDtcI+F3ik+l8nB5ViUgM
         37xOjXAnBTPJHYmIw6fV8DuE6ZOz4KYIdhiOjNZaHSdkIBqT6DHAkbyNIcvh9iGzv4oq
         lvQbtzQLOZLUC1sj2SkJM8qJ4avg1ttRE5VkUuF3/oqoUEAJG6IbrXGFJuryQsjZph6P
         R21w==
X-Forwarded-Encrypted: i=1;
 AJvYcCV8RFZ8VpZSo2OB3U2odg3PY3BVi02ZfkyuAEPwXYjr/4JDaf2/dsOLvo3iKx7PGuKquQ521cBJofXz3jE=@vger.kernel.org
X-Gm-Message-State: AOJu0YxkPuDlmF7CUvnv2DfJuLf5YlG+OT+FzSgBZHvf/Lgboh2TJ1xZ
	w7nUJtAbGw7EvJUjOdZ3qikfA/35AhJIKi+VlfVW2LQX3hI4cO6j9fL32y99ntfmiFC40cCjAMS
	lag==
X-Google-Smtp-Source: 
 AGHT+IGoxbQEHrRgpcds0LXGn7Wsw+mSSdilT5DRw00yKSF5iCDYNJNuQzJz7chpHTNiW9yQ8A9B90jlKU8=
X-Received: from pgcu129.prod.google.com ([2002:a63:7987:0:b0:ad5:53f5:6975])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a21:112:b0:1ee:7b6c:e2f4
 with SMTP id adf61e73a8af0-1ee7b6ce592mr3099897637.26.1739486821322; Thu, 13
 Feb 2025 14:47:01 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:38 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-2-surenb@google.com>
Subject: [PATCH v10 01/18] mm: introduce vma_start_read_locked{_nested}
 helpers
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com,
	"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce helper functions which can be used to read-lock a VMA when
holding mmap_lock for read.  Replace direct accesses to vma->vm_lock with
these new helpers.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/mm.h | 24 ++++++++++++++++++++++++
 mm/userfaultfd.c   | 22 +++++-----------------
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 838097542939..16b3cd3de29a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -735,6 +735,30 @@ static inline bool vma_start_read(struct vm_area_struc=
t *vma)
 	return true;
 }
=20
+/*
+ * Use only while holding mmap read lock which guarantees that locking wil=
l not
+ * fail (nobody can concurrently write-lock the vma). vma_start_read() sho=
uld
+ * not be used in such cases because it might fail due to mm_lock_seq over=
flow.
+ * This functionality is used to obtain vma read lock and drop the mmap re=
ad lock.
+ */
+static inline void vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
+{
+	mmap_assert_locked(vma->vm_mm);
+	down_read_nested(&vma->vm_lock->lock, subclass);
+}
+
+/*
+ * Use only while holding mmap read lock which guarantees that locking wil=
l not
+ * fail (nobody can concurrently write-lock the vma). vma_start_read() sho=
uld
+ * not be used in such cases because it might fail due to mm_lock_seq over=
flow.
+ * This functionality is used to obtain vma read lock and drop the mmap re=
ad lock.
+ */
+static inline void vma_start_read_locked(struct vm_area_struct *vma)
+{
+	mmap_assert_locked(vma->vm_mm);
+	down_read(&vma->vm_lock->lock);
+}
+
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
 	rcu_read_lock(); /* keeps vma alive till the end of up_read */
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index af3dfc3633db..4527c385935b 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -84,16 +84,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_st=
ruct *mm,
=20
 	mmap_read_lock(mm);
 	vma =3D find_vma_and_prepare_anon(mm, address);
-	if (!IS_ERR(vma)) {
-		/*
-		 * We cannot use vma_start_read() as it may fail due to
-		 * false locked (see comment in vma_start_read()). We
-		 * can avoid that by directly locking vm_lock under
-		 * mmap_lock, which guarantees that nobody can lock the
-		 * vma for write (vma_start_write()) under us.
-		 */
-		down_read(&vma->vm_lock->lock);
-	}
+	if (!IS_ERR(vma))
+		vma_start_read_locked(vma);
=20
 	mmap_read_unlock(mm);
 	return vma;
@@ -1491,14 +1483,10 @@ static int uffd_move_lock(struct mm_struct *mm,
 	mmap_read_lock(mm);
 	err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
 	if (!err) {
-		/*
-		 * See comment in uffd_lock_vma() as to why not using
-		 * vma_start_read() here.
-		 */
-		down_read(&(*dst_vmap)->vm_lock->lock);
+		vma_start_read_locked(*dst_vmap);
 		if (*dst_vmap !=3D *src_vmap)
-			down_read_nested(&(*src_vmap)->vm_lock->lock,
-					 SINGLE_DEPTH_NESTING);
+			vma_start_read_locked_nested(*src_vmap,
+						SINGLE_DEPTH_NESTING);
 	}
 	mmap_read_unlock(mm);
 	return err;
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3397270EB2
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486826; cv=none;
 b=T0JrXfBBAOcMzhRZhTQc03/1/a/CAlIGEY4HnziSkB2st4VIEm5Ys44XuJbrRlhaZeHtxEMoET1APP/xNe1H06qCoolyfqrrGOLaMFmtLSa9zZHmZjVuOuUVgU9UFoUI+lxd3PwZfz3sqdGgaLDwDmBTBwzYnGHyWSCcqbVuHFU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486826; c=relaxed/simple;
	bh=KABNpsb25nd8x3O/oXZpCeD64dxAEJiUS2WGVDITMD4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=G7tM8zXP8KKukLEXaKb1DpsOIGAok4l4PmAQHkmpRSY2ANFKOzP9jZVVijlDYS0xODwVp2P4soygU40pssxKQop9+eUmlyxi6s8pJRNXu/t8lbpvH+peL2MQ8NjkrfLdi3DZke1dxhj+30P83cx9q+IG6X2FcaoCEjNps11rKPk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=mfuOxG4q; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="mfuOxG4q"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-21f379b7bb3so26395825ad.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486824; x=1740091624;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=nixUczrejX4FKBcz+HGm8JxV/4Rz6UEjTIeiIS536Tg=;
        b=mfuOxG4qHIiNgbAI52ssBmz1Y9eamsYLh7Nq9JRHwOBAS166WRb/oNfXk+i8CpNMm+
         kbK07TYg2BghHPo1+5S9+0NsETT3tzwS0SsTn4dZaFEAxh5HiILSzJgiO9TxITuhHX0z
         A8Rmj4yURrkesWynjVdFZMke8i+IVxJr+t6VwI6ngjDvLXjWQTnTurjpE4bz8RWAgc8M
         NUnz0FHzOyHgt++snPFYgLlwLH/5JLFn7JEY82q0IGM0NU0IBP4OKdreFZfyqvf9Ylu6
         Sj3Bx2aj77KcHFdYr0ExaYoeC9Ghp6Rjug2s9CMwKp2nVUfsxDTXihg/uiTUSk32RryU
         Y1rg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486824; x=1740091624;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=nixUczrejX4FKBcz+HGm8JxV/4Rz6UEjTIeiIS536Tg=;
        b=XjWif7bzqhfYTEQcUPXf4roYmCbv32AOw6KqNFEQ5kPA3GfQh/Uawpvv8ZV1lHNs4r
         HiASgPU3YC75ablrqDcs8lFisDgsJpYbMxQsUAgN0XElI3CDKQlKNKth3WLvuPAqk4BP
         JlozJfIY17QcIyYEm0v1WNVHEN+FAnhGEqD6tkLy8x5AgzCt/HtbJ3lKTkkVhZJ+mY5l
         kmbW5xKMk9Uo1sU6xxiZDhdMvUyWhgs+Jrv+hpmkRV/iEYofBjY5EFnbG7q1K6MV23FZ
         ZvAti4eaVT21kT2lLXGNXoeeQQXohyxBWIXNVGFOoKYziCv48UqzP4AZFydwVohAXzlD
         oKgA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUnyqFP0om6QyM9cm26fmtu+bjuuAzkKgxIDWZ3uQWs7+AaeetksLyN5VJJGEaa4jg8cZWZLCqc+gr/HC4=@vger.kernel.org
X-Gm-Message-State: AOJu0YygV2UpK/1TDQ/7K+CDKXaxstnbSMj+IvOTkTnqkR+J2/0MHwpD
	G9nXMvrD0BH5Jxb8LI08TaqYT4E98z6+lBcxkD+7aZRnm6CrkFq8VxYD7D8pFw0MY7VBZH/F6UA
	MPQ==
X-Google-Smtp-Source: 
 AGHT+IHexWPTyEo/ZnzJkqYLOzvXPZYi2c9eLV49bS4E26ucZsJK7lHyl46qKggig4vIRb/FsPaNzCiwY30=
X-Received: from pgjp13.prod.google.com ([2002:a63:e64d:0:b0:ad8:73cf:4390])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a20:c998:b0:1ee:1e05:206d
 with SMTP id adf61e73a8af0-1ee5e5b7c8amr15452640637.21.1739486823384; Thu, 13
 Feb 2025 14:47:03 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:39 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-3-surenb@google.com>
Subject: [PATCH v10 02/18] mm: move per-vma lock into vm_area_struct
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com,
	"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Back when per-vma locks were introduces, vm_lock was moved out of
vm_area_struct in [1] because of the performance regression caused by
false cacheline sharing.  Recent investigation [2] revealed that the
regressions is limited to a rather old Broadwell microarchitecture and
even there it can be mitigated by disabling adjacent cacheline
prefetching, see [3].

Splitting single logical structure into multiple ones leads to more
complicated management, extra pointer dereferences and overall less
maintainable code.  When that split-away part is a lock, it complicates
things even further.  With no performance benefits, there are no reasons
for this split.  Merging the vm_lock back into vm_area_struct also allows
vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset.  Move
vm_lock back into vm_area_struct, aligning it at the cacheline boundary
and changing the cache to be cacheline-aligned as well.  With kernel
compiled using defconfig, this causes VMA memory consumption to grow from
160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes:

    slabinfo before:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    160   51    2 : ...

    slabinfo after moving vm_lock:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vm_area_struct   ...    256   32    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
which is 5.5MB per 100000 VMAs.  Note that the size of this structure is
dependent on the kernel configuration and typically the original size is
higher than 160 bytes.  Therefore these calculations are close to the
worst case scenario.  A more realistic vm_area_struct usage before this
change is:

     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    176   46    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages,
which is 3.9MB per 100000 VMAs.  This memory consumption growth can be
addressed later by optimizing the vm_lock.

[1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/
[2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/
[3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbf=
P_pR+-2g@mail.gmail.com/

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/mm.h               | 28 ++++++++++--------
 include/linux/mm_types.h         |  6 ++--
 kernel/fork.c                    | 49 ++++----------------------------
 tools/testing/vma/vma_internal.h | 33 +++++----------------
 4 files changed, 32 insertions(+), 84 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b3cd3de29a..e75fae95b48d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -697,6 +697,12 @@ static inline void vma_numab_state_free(struct vm_area=
_struct *vma) {}
 #endif /* CONFIG_NUMA_BALANCING */
=20
 #ifdef CONFIG_PER_VMA_LOCK
+static inline void vma_lock_init(struct vm_area_struct *vma)
+{
+	init_rwsem(&vma->vm_lock.lock);
+	vma->vm_lock_seq =3D UINT_MAX;
+}
+
 /*
  * Try to read-lock a vma. The function is allowed to occasionally yield f=
alse
  * locked result to avoid performance overhead, in which case we fall back=
 to
@@ -714,7 +720,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.=
sequence))
 		return false;
=20
-	if (unlikely(down_read_trylock(&vma->vm_lock->lock) =3D=3D 0))
+	if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0))
 		return false;
=20
 	/*
@@ -729,7 +735,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 	 * This pairs with RELEASE semantics in vma_end_write_all().
 	 */
 	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo=
ck_seq))) {
-		up_read(&vma->vm_lock->lock);
+		up_read(&vma->vm_lock.lock);
 		return false;
 	}
 	return true;
@@ -744,7 +750,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 static inline void vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
 {
 	mmap_assert_locked(vma->vm_mm);
-	down_read_nested(&vma->vm_lock->lock, subclass);
+	down_read_nested(&vma->vm_lock.lock, subclass);
 }
=20
 /*
@@ -756,13 +762,13 @@ static inline void vma_start_read_locked_nested(struc=
t vm_area_struct *vma, int
 static inline void vma_start_read_locked(struct vm_area_struct *vma)
 {
 	mmap_assert_locked(vma->vm_mm);
-	down_read(&vma->vm_lock->lock);
+	down_read(&vma->vm_lock.lock);
 }
=20
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
 	rcu_read_lock(); /* keeps vma alive till the end of up_read */
-	up_read(&vma->vm_lock->lock);
+	up_read(&vma->vm_lock.lock);
 	rcu_read_unlock();
 }
=20
@@ -791,7 +797,7 @@ static inline void vma_start_write(struct vm_area_struc=
t *vma)
 	if (__is_vma_write_locked(vma, &mm_lock_seq))
 		return;
=20
-	down_write(&vma->vm_lock->lock);
+	down_write(&vma->vm_lock.lock);
 	/*
 	 * We should use WRITE_ONCE() here because we can have concurrent reads
 	 * from the early lockless pessimistic check in vma_start_read().
@@ -799,7 +805,7 @@ static inline void vma_start_write(struct vm_area_struc=
t *vma)
 	 * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
-	up_write(&vma->vm_lock->lock);
+	up_write(&vma->vm_lock.lock);
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
@@ -811,7 +817,7 @@ static inline void vma_assert_write_locked(struct vm_ar=
ea_struct *vma)
=20
 static inline void vma_assert_locked(struct vm_area_struct *vma)
 {
-	if (!rwsem_is_locked(&vma->vm_lock->lock))
+	if (!rwsem_is_locked(&vma->vm_lock.lock))
 		vma_assert_write_locked(vma);
 }
=20
@@ -844,6 +850,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str=
uct *mm,
=20
 #else /* CONFIG_PER_VMA_LOCK */
=20
+static inline void vma_lock_init(struct vm_area_struct *vma) {}
 static inline bool vma_start_read(struct vm_area_struct *vma)
 		{ return false; }
 static inline void vma_end_read(struct vm_area_struct *vma) {}
@@ -878,10 +885,6 @@ static inline void assert_fault_locked(struct vm_fault=
 *vmf)
=20
 extern const struct vm_operations_struct vma_dummy_vm_ops;
=20
-/*
- * WARNING: vma_init does not initialize vma->vm_lock.
- * Use vm_area_alloc()/vm_area_free() if vma needs locking.
- */
 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *=
mm)
 {
 	memset(vma, 0, sizeof(*vma));
@@ -890,6 +893,7 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
 	vma_mark_detached(vma, false);
 	vma_numab_state_init(vma);
+	vma_lock_init(vma);
 }
=20
 /* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8efafef4637e..8a645bcb2b31 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -740,8 +740,6 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-	/* Unstable RCU readers are allowed to read this. */
-	struct vma_lock *vm_lock;
 #endif
=20
 	/*
@@ -794,6 +792,10 @@ struct vm_area_struct {
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
 	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+#ifdef CONFIG_PER_VMA_LOCK
+	/* Unstable RCU readers are allowed to read this. */
+	struct vma_lock vm_lock ____cacheline_aligned_in_smp;
+#endif
 } __randomize_layout;
=20
 #ifdef CONFIG_NUMA
diff --git a/kernel/fork.c b/kernel/fork.c
index 735405a9c5f3..bdbabe73fb29 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep;
 /* SLAB cache for mm_struct structures (tsk->mm) */
 static struct kmem_cache *mm_cachep;
=20
-#ifdef CONFIG_PER_VMA_LOCK
-
-/* SLAB cache for vm_area_struct.lock */
-static struct kmem_cache *vma_lock_cachep;
-
-static bool vma_lock_alloc(struct vm_area_struct *vma)
-{
-	vma->vm_lock =3D kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL);
-	if (!vma->vm_lock)
-		return false;
-
-	init_rwsem(&vma->vm_lock->lock);
-	vma->vm_lock_seq =3D UINT_MAX;
-
-	return true;
-}
-
-static inline void vma_lock_free(struct vm_area_struct *vma)
-{
-	kmem_cache_free(vma_lock_cachep, vma->vm_lock);
-}
-
-#else /* CONFIG_PER_VMA_LOCK */
-
-static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return tru=
e; }
-static inline void vma_lock_free(struct vm_area_struct *vma) {}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
 struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
@@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct =
*mm)
 		return NULL;
=20
 	vma_init(vma, mm);
-	if (!vma_lock_alloc(vma)) {
-		kmem_cache_free(vm_area_cachep, vma);
-		return NULL;
-	}
=20
 	return vma;
 }
@@ -496,10 +463,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru=
ct *orig)
 	 * will be reinitialized.
 	 */
 	data_race(memcpy(new, orig, sizeof(*new)));
-	if (!vma_lock_alloc(new)) {
-		kmem_cache_free(vm_area_cachep, new);
-		return NULL;
-	}
+	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 	vma_numab_state_init(new);
 	dup_anon_vma_name(orig, new);
@@ -511,7 +475,6 @@ void __vm_area_free(struct vm_area_struct *vma)
 {
 	vma_numab_state_free(vma);
 	free_anon_vma_name(vma);
-	vma_lock_free(vma);
 	kmem_cache_free(vm_area_cachep, vma);
 }
=20
@@ -522,7 +485,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
 						  vm_rcu);
=20
 	/* The vma should not be locked while being destroyed. */
-	VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma);
+	VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma);
 	__vm_area_free(vma);
 }
 #endif
@@ -3200,11 +3163,9 @@ void __init proc_caches_init(void)
 			sizeof(struct fs_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
-
-	vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
-#ifdef CONFIG_PER_VMA_LOCK
-	vma_lock_cachep =3D KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT);
-#endif
+	vm_area_cachep =3D KMEM_CACHE(vm_area_struct,
+			SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC|
+			SLAB_ACCOUNT);
 	mmap_init();
 	nsproxy_cache_init();
 }
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index bb273927af0f..4506e6fb3c6f 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -275,10 +275,10 @@ struct vm_area_struct {
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
-	 *  - vm_lock->lock (in write mode)
+	 *  - vm_lock.lock (in write mode)
 	 * Can be read reliably while holding one of:
 	 *  - mmap_lock (in read or write mode)
-	 *  - vm_lock->lock (in read or write mode)
+	 *  - vm_lock.lock (in read or write mode)
 	 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
 	 * while holding nothing (except RCU to keep the VMA struct allocated).
 	 *
@@ -287,7 +287,7 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-	struct vma_lock *vm_lock;
+	struct vma_lock vm_lock;
 #endif
=20
 	/*
@@ -464,17 +464,10 @@ static inline struct vm_area_struct *vma_next(struct =
vma_iterator *vmi)
 	return mas_find(&vmi->mas, ULONG_MAX);
 }
=20
-static inline bool vma_lock_alloc(struct vm_area_struct *vma)
+static inline void vma_lock_init(struct vm_area_struct *vma)
 {
-	vma->vm_lock =3D calloc(1, sizeof(struct vma_lock));
-
-	if (!vma->vm_lock)
-		return false;
-
-	init_rwsem(&vma->vm_lock->lock);
+	init_rwsem(&vma->vm_lock.lock);
 	vma->vm_lock_seq =3D UINT_MAX;
-
-	return true;
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *);
@@ -497,6 +490,7 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
 	vma_mark_detached(vma, false);
+	vma_lock_init(vma);
 }
=20
 static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
@@ -507,10 +501,6 @@ static inline struct vm_area_struct *vm_area_alloc(str=
uct mm_struct *mm)
 		return NULL;
=20
 	vma_init(vma, mm);
-	if (!vma_lock_alloc(vma)) {
-		free(vma);
-		return NULL;
-	}
=20
 	return vma;
 }
@@ -523,10 +513,7 @@ static inline struct vm_area_struct *vm_area_dup(struc=
t vm_area_struct *orig)
 		return NULL;
=20
 	memcpy(new, orig, sizeof(*new));
-	if (!vma_lock_alloc(new)) {
-		free(new);
-		return NULL;
-	}
+	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
=20
 	return new;
@@ -696,14 +683,8 @@ static inline void mpol_put(struct mempolicy *)
 {
 }
=20
-static inline void vma_lock_free(struct vm_area_struct *vma)
-{
-	free(vma->vm_lock);
-}
-
 static inline void __vm_area_free(struct vm_area_struct *vma)
 {
-	vma_lock_free(vma);
 	free(vma);
 }
=20
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8CC1270ECA
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486828; cv=none;
 b=qcstt1KSMZ4/+2rQjO39+Cw+uFsIQ2Z4uwJweFBrVaJrNHGzR6sItBELN6HY1U3d/8O/YlsJ1bWw9XRIMBaCZ8On+gE4JULrQTS0Ate2kWBZo+GpOk+OD3yuQsXlttkKRno1G9SZEf5J7t3UQ/n8EqZXErfAcV0kaGwmcIBNpcE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486828; c=relaxed/simple;
	bh=1wvMLI0NBniwlQu60PYpOeJQ3NRJ5/21wehA42VzWGw=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=OH921lx6YjtV2tVsNLO4Q9uH7aiNl5Le9rvncDLGsE9GoXeeydIQNHYyaTtsGE1OyCcR/LCvRMjnVpIJAc41eIeNTGK9GQW2aJSCDEJwyArjPXsVx5pi1XbyU+G7uQK2OrAWcdPE0VSynF98TpQmP27hXNLw0jiZTFc5pQSELTk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=Qmx/QaSc; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Qmx/QaSc"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-220e1593b85so16022145ad.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486826; x=1740091626;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=IpC5RSofcM0ZStaR556vDop/rdew408+n2xS+qUK8/0=;
        b=Qmx/QaScaUPEU4CHocLbhQmjHpcOE91fyYeTGDG2wtBOUfv7DuSpcM4pD0L8pKyVb2
         KBvblQJcOUWpIqo3G9YwqwvHizCOUtqbf7bKMfyOUWbmAYXfInKZ3Q4alnZbL4qwuTT3
         aNH1d4KhQ2e5sLbBLvs+nQcpi3wT2IviKMklIcxrO1ov5VZb8tffiGDmyaypvQ+2QV9U
         kkNjKZYgNXnn5ldT291o9cQeDpJTDcaN5rB+LBbz7Hl/0TWzKv9U0swm7XWvDQB8d0km
         rbNDsnS/lB+MTDQHMVVnTUtMT3Uf3x63RXOGxpw3nj3eaBDzKVW4ISoYw9VzI0+qALkH
         x/oQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486826; x=1740091626;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=IpC5RSofcM0ZStaR556vDop/rdew408+n2xS+qUK8/0=;
        b=mvf3GeekxdkxXbnWmhTbsoPw873fKVbhjgQb1R2KCcFaEK8GmBwKoP8GoP4BdtvCX7
         Ew9wZ46XsTMJNA5GQZbZmWWeqKjgqD8LLwS72gxlZT7NUHP1SNJhnABAO8HjjrDGu0uy
         1pcXXnMURxK8v3rh9jp/KHZwJpzWESh7JkDDibNCefsyjT/evvxxTow7iAy8/ASfXtA7
         NS5QFyVGn0l7/NimXCwbVOlpsPzM1HPWR+1MRskSN7o6oaEjq4BoMqgQbF1SmQkjpxen
         //P3ZlvPAHTr7W4EsUuwKxUBaTPmaymyYOXFgs8F5Xldl/gp9TuyNcNLkGp+sAZUZshi
         BqIg==
X-Forwarded-Encrypted: i=1;
 AJvYcCXobUu/8VfoHnsp+ahkN1sj7QkzG8G0J5LUEewjeBgH0zkPJxHGZiOgwXPTjrT+1YQIr8Reyr0EfjcDOUM=@vger.kernel.org
X-Gm-Message-State: AOJu0Yw5wYS47K5Zc3VTch8KhPaI+hSdfvZrJdl+1RCQOPwYcxb+GjBY
	T4YKbTuyqnNmCxlzSd/ks+XjwlmX/9r2jYV5gFLMqmOcz1sWYWm2eDCeQ8t/jv+MueShFBVPHSp
	Aug==
X-Google-Smtp-Source: 
 AGHT+IFlXfVeKvLwc31ADPhBrs1WWy20ACeukrGHKQSKrbfZQq/alc0hc5JcyIHneAFIaVwEe3JXjLuYCE8=
X-Received: from pfbbi29.prod.google.com
 ([2002:a05:6a00:311d:b0:730:83d2:d6a3])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a21:50a:b0:1ee:69cb:4b42
 with SMTP id adf61e73a8af0-1ee6b399cd4mr12028188637.32.1739486826085; Thu, 13
 Feb 2025 14:47:06 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:40 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-4-surenb@google.com>
Subject: [PATCH v10 03/18] mm: mark vma as detached until it's added into vma
 tree
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com,
	"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Current implementation does not set detached flag when a VMA is first
allocated.  This does not represent the real state of the VMA, which is
detached until it is added into mm's VMA tree.  Fix this by marking new
VMAs as detached and resetting detached flag only after VMA is added into
a tree.

Introduce vma_mark_attached() to make the API more readable and to
simplify possible future cleanup when vma->vm_mm might be used to indicate
detached vma and vma_mark_attached() will need an additional mm parameter.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/mm.h               | 27 ++++++++++++++++++++-------
 kernel/fork.c                    |  4 ++++
 mm/memory.c                      |  2 +-
 mm/vma.c                         |  6 +++---
 mm/vma.h                         |  2 ++
 tools/testing/vma/vma_internal.h | 17 ++++++++++++-----
 6 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e75fae95b48d..cd5ee61e98f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -821,12 +821,21 @@ static inline void vma_assert_locked(struct vm_area_s=
truct *vma)
 		vma_assert_write_locked(vma);
 }
=20
-static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta=
ched)
+static inline void vma_mark_attached(struct vm_area_struct *vma)
+{
+	vma->detached =3D false;
+}
+
+static inline void vma_mark_detached(struct vm_area_struct *vma)
 {
 	/* When detaching vma should be write-locked */
-	if (detached)
-		vma_assert_write_locked(vma);
-	vma->detached =3D detached;
+	vma_assert_write_locked(vma);
+	vma->detached =3D true;
+}
+
+static inline bool is_vma_detached(struct vm_area_struct *vma)
+{
+	return vma->detached;
 }
=20
 static inline void release_fault_lock(struct vm_fault *vmf)
@@ -857,8 +866,8 @@ static inline void vma_end_read(struct vm_area_struct *=
vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
 		{ mmap_assert_write_locked(vma->vm_mm); }
-static inline void vma_mark_detached(struct vm_area_struct *vma,
-				     bool detached) {}
+static inline void vma_mark_attached(struct vm_area_struct *vma) {}
+static inline void vma_mark_detached(struct vm_area_struct *vma) {}
=20
 static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *=
mm,
 		unsigned long address)
@@ -891,7 +900,10 @@ static inline void vma_init(struct vm_area_struct *vma=
, struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-	vma_mark_detached(vma, false);
+#ifdef CONFIG_PER_VMA_LOCK
+	/* vma is not locked, can't use vma_mark_detached() */
+	vma->detached =3D true;
+#endif
 	vma_numab_state_init(vma);
 	vma_lock_init(vma);
 }
@@ -1086,6 +1098,7 @@ static inline int vma_iter_bulk_store(struct vma_iter=
ator *vmi,
 	if (unlikely(mas_is_err(&vmi->mas)))
 		return -ENOMEM;
=20
+	vma_mark_attached(vma);
 	return 0;
 }
=20
diff --git a/kernel/fork.c b/kernel/fork.c
index bdbabe73fb29..5bf3e407c795 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -465,6 +465,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru=
ct *orig)
 	data_race(memcpy(new, orig, sizeof(*new)));
 	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
+#ifdef CONFIG_PER_VMA_LOCK
+	/* vma is not locked, can't use vma_mark_detached() */
+	new->detached =3D true;
+#endif
 	vma_numab_state_init(new);
 	dup_anon_vma_name(orig, new);
=20
diff --git a/mm/memory.c b/mm/memory.c
index a8d6dbd03668..e600a5ff3c7a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6414,7 +6414,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s=
truct *mm,
 		goto inval;
=20
 	/* Check if the VMA got isolated after we found it */
-	if (vma->detached) {
+	if (is_vma_detached(vma)) {
 		vma_end_read(vma);
 		count_vm_vma_lock_event(VMA_LOCK_MISS);
 		/* The area was replaced with another one */
diff --git a/mm/vma.c b/mm/vma.c
index 39146c19f316..498507d8a262 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -341,7 +341,7 @@ static void vma_complete(struct vma_prepare *vp, struct=
 vma_iterator *vmi,
=20
 	if (vp->remove) {
 again:
-		vma_mark_detached(vp->remove, true);
+		vma_mark_detached(vp->remove);
 		if (vp->file) {
 			uprobe_munmap(vp->remove, vp->remove->vm_start,
 				      vp->remove->vm_end);
@@ -1238,7 +1238,7 @@ static void reattach_vmas(struct ma_state *mas_detach)
=20
 	mas_set(mas_detach, 0);
 	mas_for_each(mas_detach, vma, ULONG_MAX)
-		vma_mark_detached(vma, false);
+		vma_mark_attached(vma);
=20
 	__mt_destroy(mas_detach->tree);
 }
@@ -1313,7 +1313,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_s=
truct *vms,
 		if (error)
 			goto munmap_gather_failed;
=20
-		vma_mark_detached(next, true);
+		vma_mark_detached(next);
 		nrpages =3D vma_pages(next);
=20
 		vms->nr_pages +=3D nrpages;
diff --git a/mm/vma.h b/mm/vma.h
index e55e68abfbe3..bffb56afce5f 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -205,6 +205,7 @@ static inline int vma_iter_store_gfp(struct vma_iterato=
r *vmi,
 	if (unlikely(mas_is_err(&vmi->mas)))
 		return -ENOMEM;
=20
+	vma_mark_attached(vma);
 	return 0;
 }
=20
@@ -437,6 +438,7 @@ static inline void vma_iter_store(struct vma_iterator *=
vmi,
=20
 	__mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1);
 	mas_store_prealloc(&vmi->mas, vma);
+	vma_mark_attached(vma);
 }
=20
 static inline unsigned long vma_iter_addr(struct vma_iterator *vmi)
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index 4506e6fb3c6f..f93f7f74f97b 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -471,12 +471,16 @@ static inline void vma_lock_init(struct vm_area_struc=
t *vma)
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *);
-static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta=
ched)
+static inline void vma_mark_attached(struct vm_area_struct *vma)
+{
+	vma->detached =3D false;
+}
+
+static inline void vma_mark_detached(struct vm_area_struct *vma)
 {
 	/* When detaching vma should be write-locked */
-	if (detached)
-		vma_assert_write_locked(vma);
-	vma->detached =3D detached;
+	vma_assert_write_locked(vma);
+	vma->detached =3D true;
 }
=20
 extern const struct vm_operations_struct vma_dummy_vm_ops;
@@ -489,7 +493,8 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-	vma_mark_detached(vma, false);
+	/* vma is not locked, can't use vma_mark_detached() */
+	vma->detached =3D true;
 	vma_lock_init(vma);
 }
=20
@@ -515,6 +520,8 @@ static inline struct vm_area_struct *vm_area_dup(struct=
 vm_area_struct *orig)
 	memcpy(new, orig, sizeof(*new));
 	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
+	/* vma is not locked, can't use vma_mark_detached() */
+	new->detached =3D true;
=20
 	return new;
 }
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAF66270ED8
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486830; cv=none;
 b=JpsyjqqmASRpQfRdvKUZwiHA+s3RKtYM69Fo9UMy+zOh8jeeRRkQ7dznz+BjFFpBih9h6PgsOZkdEWfLqW+RKTzp/KBlcDkPbWsiwJLovL0aIxMeMBGq3I6T4yo5GBj980YLSa5MM5SXgWqUhJjET35mK4X6Y84p7jea9AQ0PYQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486830; c=relaxed/simple;
	bh=eAoyMkZ3PdTwAaRTTe6AgBUYiwXhMeA9ce78rcSc1YE=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=HZ3M8sd/2YD4DVuxVZ7wcymCmhWZaR/3w79nrLKfHWMBHtTnR6U2C2MehK4lRiPeickQyhzMzMJF6fRB3m9dTIi+tkoz8+37ybS4Tenh8RFP9tEUqPiCIBKu7lqiAmC4R7zyIx0Zme1dEX8tdpsrXD9pY9G3O4f0rPBSmXutEts=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=wItkdLpa; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="wItkdLpa"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-2fc1eabf4f7so1701442a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486828; x=1740091628;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=jH+IjsWoi1CDqDTbyIn4LxmH0nN4gQ/NWbFp+/jF/ZM=;
        b=wItkdLpa5UdbKG6TkBhgJPZXAYbQtNII5PVm8nxCEizvhUvExpKD+JcN37u4V58SGl
         cOUoz6Yddg5MfSWIASZx0/T3HZQFO590CqyE+ArM5O8GvbcyFol9HpzMEkkhzjM52zN8
         5zcc4vEZdedRQYPQCLzfFqQerJKsJ+2fPnAHyuAjwcpwBJdy91QZffYN5Cn9fUaCtZrE
         2FCMeUH0XID6HHQezrx1Y6hfrFzTJ+JRvAjtOiegO4mPEytxh01Pf1cp2f/NixVYo+tm
         EwnfN9T4BZbbaBuBYbXo77iIftIEk9aldDA7chMMtHpiUQEY7/KSf3w7wvgdaZ3aVCij
         rvtQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486828; x=1740091628;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=jH+IjsWoi1CDqDTbyIn4LxmH0nN4gQ/NWbFp+/jF/ZM=;
        b=PTfsijrrIY61q0f9FzuC5Jz2Vp2SdzCiuq+/gugDy8CHn5+GVkPO74TekY21X9jNP3
         t537+jq3/QjSedoH2uo0iYFX7EiRBYensdUNENafOaa/i2sr4zN1nU9WgsnRP2Q0yUJt
         EcJZqsjZF2yju1cnwD4+YQbNkt0kEXJ7ew9n4mTH01hFfT47EiFZjccAXgQEVlb4tyrp
         QB/7VOjE1Hl+RAO10LOo/RK/oLDY/6p5AI/fpM/h9sV0Z9qCannlmXIoQQx4LekJjCBH
         zdstTjspuvceiPquCi3wTdmJ2PMH36Ok3yZHoF7G8IJceYpmVlZMeYm7eZIG+wsdVjGw
         Az8Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCXTWB8iOFPfg4BEl7iN1qJDLaFXuq8U2VKFhtK4sUnyVBOAcp6SnVVS2IYnY3TujrXcJHt9gFPYdNLoYBw=@vger.kernel.org
X-Gm-Message-State: AOJu0Yws1cSp5GIOI++QZLSLbD7fA302xKvRTg26s6F6Cj1zwK2Gg9W9
	NfHNNCbEJLUb+NKpLSg1BjYTppV2Qpx0ZnErNn/eRfBFFggG4e+FpPUgQcwBBf+mPZCU2f1DHLv
	MQg==
X-Google-Smtp-Source: 
 AGHT+IFAejlKUMRiGK6lse6cj9gDccmhauhvLNb8QCnLKB76TP7y/Vj9rouXSbsWYDlN1QYS+niVoTXiz0o=
X-Received: from pjk14.prod.google.com ([2002:a17:90b:558e:b0:2ef:7352:9e97])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90a:e7d0:b0:2ef:e0bb:1ef2
 with SMTP id 98e67ed59e1d1-2fc0e98dc7dmr6643326a91.19.1739486828102; Thu, 13
 Feb 2025 14:47:08 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:41 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-5-surenb@google.com>
Subject: [PATCH v10 04/18] mm: introduce vma_iter_store_attached() to use with
 attached vmas
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

vma_iter_store() functions can be used both when adding a new vma and
when updating an existing one. However for existing ones we do not need
to mark them attached as they are already marked that way. With
vma->detached being a separate flag, double-marking a vmas as attached
or detached is not an issue because the flag will simply be overwritten
with the same value. However once we fold this flag into the refcount
later in this series, re-attaching or re-detaching a vma becomes an
issue since these operations will be incrementing/decrementing a
refcount.
Introduce vma_iter_store_new() and vma_iter_store_overwrite() to replace
vma_iter_store() and avoid re-attaching a vma during vma update. Add
assertions in vma_mark_attached()/vma_mark_detached() to catch invalid
usage. Update vma tests to check for vma detached state correctness.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Change VM_BUG_ON_VMA() to WARN_ON_ONCE() in vma_assert_{attached|detached=
},
per Lorenzo Stoakes
- Rename vma_iter_store() into vma_iter_store_new(), per Lorenzo Stoakes
- Expand changelog, per Lorenzo Stoakes
- Update vma tests to check for vma detached state correctness,
per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-5-surenb@google.com/

 include/linux/mm.h               | 14 +++++++++++
 mm/nommu.c                       |  4 +--
 mm/vma.c                         | 12 ++++-----
 mm/vma.h                         | 11 +++++++--
 tools/testing/vma/vma.c          | 42 +++++++++++++++++++++++++-------
 tools/testing/vma/vma_internal.h | 10 ++++++++
 6 files changed, 74 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd5ee61e98f2..1b8e72888124 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -821,8 +821,19 @@ static inline void vma_assert_locked(struct vm_area_st=
ruct *vma)
 		vma_assert_write_locked(vma);
 }
=20
+static inline void vma_assert_attached(struct vm_area_struct *vma)
+{
+	WARN_ON_ONCE(vma->detached);
+}
+
+static inline void vma_assert_detached(struct vm_area_struct *vma)
+{
+	WARN_ON_ONCE(!vma->detached);
+}
+
 static inline void vma_mark_attached(struct vm_area_struct *vma)
 {
+	vma_assert_detached(vma);
 	vma->detached =3D false;
 }
=20
@@ -830,6 +841,7 @@ static inline void vma_mark_detached(struct vm_area_str=
uct *vma)
 {
 	/* When detaching vma should be write-locked */
 	vma_assert_write_locked(vma);
+	vma_assert_attached(vma);
 	vma->detached =3D true;
 }
=20
@@ -866,6 +878,8 @@ static inline void vma_end_read(struct vm_area_struct *=
vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
 		{ mmap_assert_write_locked(vma->vm_mm); }
+static inline void vma_assert_attached(struct vm_area_struct *vma) {}
+static inline void vma_assert_detached(struct vm_area_struct *vma) {}
 static inline void vma_mark_attached(struct vm_area_struct *vma) {}
 static inline void vma_mark_detached(struct vm_area_struct *vma) {}
=20
diff --git a/mm/nommu.c b/mm/nommu.c
index baa79abdaf03..8b31d8396297 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1191,7 +1191,7 @@ unsigned long do_mmap(struct file *file,
 	setup_vma_to_mm(vma, current->mm);
 	current->mm->map_count++;
 	/* add the VMA to the tree */
-	vma_iter_store(&vmi, vma);
+	vma_iter_store_new(&vmi, vma);
=20
 	/* we flush the region from the icache only when the first executable
 	 * mapping of it is made  */
@@ -1356,7 +1356,7 @@ static int split_vma(struct vma_iterator *vmi, struct=
 vm_area_struct *vma,
=20
 	setup_vma_to_mm(vma, mm);
 	setup_vma_to_mm(new, mm);
-	vma_iter_store(vmi, new);
+	vma_iter_store_new(vmi, new);
 	mm->map_count++;
 	return 0;
=20
diff --git a/mm/vma.c b/mm/vma.c
index 498507d8a262..f72b73f57451 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -320,7 +320,7 @@ static void vma_complete(struct vma_prepare *vp, struct=
 vma_iterator *vmi,
 		 * us to insert it before dropping the locks
 		 * (it may either follow vma or precede it).
 		 */
-		vma_iter_store(vmi, vp->insert);
+		vma_iter_store_new(vmi, vp->insert);
 		mm->map_count++;
 	}
=20
@@ -700,7 +700,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
 			      vmg->__adjust_middle_start ? vmg->middle : NULL);
 	vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
 	vmg_adjust_set_range(vmg);
-	vma_iter_store(vmg->vmi, vmg->target);
+	vma_iter_store_overwrite(vmg->vmi, vmg->target);
=20
 	vma_complete(&vp, vmg->vmi, vma->vm_mm);
=20
@@ -1707,7 +1707,7 @@ int vma_link(struct mm_struct *mm, struct vm_area_str=
uct *vma)
 		return -ENOMEM;
=20
 	vma_start_write(vma);
-	vma_iter_store(&vmi, vma);
+	vma_iter_store_new(&vmi, vma);
 	vma_link_file(vma);
 	mm->map_count++;
 	validate_mm(mm);
@@ -2386,7 +2386,7 @@ static int __mmap_new_vma(struct mmap_state *map, str=
uct vm_area_struct **vmap)
=20
 	/* Lock the VMA since it is modified after insertion into VMA tree */
 	vma_start_write(vma);
-	vma_iter_store(vmi, vma);
+	vma_iter_store_new(vmi, vma);
 	map->mm->map_count++;
 	vma_link_file(vma);
=20
@@ -2862,7 +2862,7 @@ int expand_upwards(struct vm_area_struct *vma, unsign=
ed long address)
 				anon_vma_interval_tree_pre_update_vma(vma);
 				vma->vm_end =3D address;
 				/* Overwrite old entry in mtree. */
-				vma_iter_store(&vmi, vma);
+				vma_iter_store_overwrite(&vmi, vma);
 				anon_vma_interval_tree_post_update_vma(vma);
=20
 				perf_event_mmap(vma);
@@ -2942,7 +2942,7 @@ int expand_downwards(struct vm_area_struct *vma, unsi=
gned long address)
 				vma->vm_start =3D address;
 				vma->vm_pgoff -=3D grow;
 				/* Overwrite old entry in mtree. */
-				vma_iter_store(&vmi, vma);
+				vma_iter_store_overwrite(&vmi, vma);
 				anon_vma_interval_tree_post_update_vma(vma);
=20
 				perf_event_mmap(vma);
diff --git a/mm/vma.h b/mm/vma.h
index bffb56afce5f..55be77ff042f 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -413,9 +413,10 @@ static inline struct vm_area_struct *vma_iter_load(str=
uct vma_iterator *vmi)
 }
=20
 /* Store a VMA with preallocated memory */
-static inline void vma_iter_store(struct vma_iterator *vmi,
-				  struct vm_area_struct *vma)
+static inline void vma_iter_store_overwrite(struct vma_iterator *vmi,
+					    struct vm_area_struct *vma)
 {
+	vma_assert_attached(vma);
=20
 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
 	if (MAS_WARN_ON(&vmi->mas, vmi->mas.status !=3D ma_start &&
@@ -438,7 +439,13 @@ static inline void vma_iter_store(struct vma_iterator =
*vmi,
=20
 	__mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1);
 	mas_store_prealloc(&vmi->mas, vma);
+}
+
+static inline void vma_iter_store_new(struct vma_iterator *vmi,
+				      struct vm_area_struct *vma)
+{
 	vma_mark_attached(vma);
+	vma_iter_store_overwrite(vmi, vma);
 }
=20
 static inline unsigned long vma_iter_addr(struct vma_iterator *vmi)
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index c7ffa71841ca..11f761769b5b 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -74,10 +74,22 @@ static struct vm_area_struct *alloc_vma(struct mm_struc=
t *mm,
 	ret->vm_end =3D end;
 	ret->vm_pgoff =3D pgoff;
 	ret->__vm_flags =3D flags;
+	vma_assert_detached(ret);
=20
 	return ret;
 }
=20
+/* Helper function to allocate a VMA and link it to the tree. */
+static int attach_vma(struct mm_struct *mm, struct vm_area_struct *vma)
+{
+	int res;
+
+	res =3D vma_link(mm, vma);
+	if (!res)
+		vma_assert_attached(vma);
+	return res;
+}
+
 /* Helper function to allocate a VMA and link it to the tree. */
 static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
 						 unsigned long start,
@@ -90,7 +102,7 @@ static struct vm_area_struct *alloc_and_link_vma(struct =
mm_struct *mm,
 	if (vma =3D=3D NULL)
 		return NULL;
=20
-	if (vma_link(mm, vma)) {
+	if (attach_vma(mm, vma)) {
 		vm_area_free(vma);
 		return NULL;
 	}
@@ -108,6 +120,7 @@ static struct vm_area_struct *alloc_and_link_vma(struct=
 mm_struct *mm,
 /* Helper function which provides a wrapper around a merge new VMA operati=
on. */
 static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
 {
+	struct vm_area_struct *vma;
 	/*
 	 * For convenience, get prev and next VMAs. Which the new VMA operation
 	 * requires.
@@ -116,7 +129,11 @@ static struct vm_area_struct *merge_new(struct vma_mer=
ge_struct *vmg)
 	vmg->prev =3D vma_prev(vmg->vmi);
 	vma_iter_next_range(vmg->vmi);
=20
-	return vma_merge_new_range(vmg);
+	vma =3D vma_merge_new_range(vmg);
+	if (vma)
+		vma_assert_attached(vma);
+
+	return vma;
 }
=20
 /*
@@ -125,7 +142,12 @@ static struct vm_area_struct *merge_new(struct vma_mer=
ge_struct *vmg)
  */
 static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
 {
-	return vma_merge_existing_range(vmg);
+	struct vm_area_struct *vma;
+
+	vma =3D vma_merge_existing_range(vmg);
+	if (vma)
+		vma_assert_attached(vma);
+	return vma;
 }
=20
 /*
@@ -260,8 +282,8 @@ static bool test_simple_merge(void)
 		.pgoff =3D 1,
 	};
=20
-	ASSERT_FALSE(vma_link(&mm, vma_left));
-	ASSERT_FALSE(vma_link(&mm, vma_right));
+	ASSERT_FALSE(attach_vma(&mm, vma_left));
+	ASSERT_FALSE(attach_vma(&mm, vma_right));
=20
 	vma =3D merge_new(&vmg);
 	ASSERT_NE(vma, NULL);
@@ -285,7 +307,7 @@ static bool test_simple_modify(void)
 	struct vm_area_struct *init_vma =3D alloc_vma(&mm, 0, 0x3000, 0, flags);
 	VMA_ITERATOR(vmi, &mm, 0x1000);
=20
-	ASSERT_FALSE(vma_link(&mm, init_vma));
+	ASSERT_FALSE(attach_vma(&mm, init_vma));
=20
 	/*
 	 * The flags will not be changed, the vma_modify_flags() function
@@ -351,7 +373,7 @@ static bool test_simple_expand(void)
 		.pgoff =3D 0,
 	};
=20
-	ASSERT_FALSE(vma_link(&mm, vma));
+	ASSERT_FALSE(attach_vma(&mm, vma));
=20
 	ASSERT_FALSE(expand_existing(&vmg));
=20
@@ -372,7 +394,7 @@ static bool test_simple_shrink(void)
 	struct vm_area_struct *vma =3D alloc_vma(&mm, 0, 0x3000, 0, flags);
 	VMA_ITERATOR(vmi, &mm, 0);
=20
-	ASSERT_FALSE(vma_link(&mm, vma));
+	ASSERT_FALSE(attach_vma(&mm, vma));
=20
 	ASSERT_FALSE(vma_shrink(&vmi, vma, 0, 0x1000, 0));
=20
@@ -1522,11 +1544,11 @@ static bool test_copy_vma(void)
=20
 	vma =3D alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
 	vma_new =3D copy_vma(&vma, 0, 0x2000, 0, &need_locks);
-
 	ASSERT_NE(vma_new, vma);
 	ASSERT_EQ(vma_new->vm_start, 0);
 	ASSERT_EQ(vma_new->vm_end, 0x2000);
 	ASSERT_EQ(vma_new->vm_pgoff, 0);
+	vma_assert_attached(vma_new);
=20
 	cleanup_mm(&mm, &vmi);
=20
@@ -1535,6 +1557,7 @@ static bool test_copy_vma(void)
 	vma =3D alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
 	vma_next =3D alloc_and_link_vma(&mm, 0x6000, 0x8000, 6, flags);
 	vma_new =3D copy_vma(&vma, 0x4000, 0x2000, 4, &need_locks);
+	vma_assert_attached(vma_new);
=20
 	ASSERT_EQ(vma_new, vma_next);
=20
@@ -1576,6 +1599,7 @@ static bool test_expand_only_mode(void)
 	ASSERT_EQ(vma->vm_pgoff, 3);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(vma_iter_addr(&vmi), 0x3000);
+	vma_assert_attached(vma);
=20
 	cleanup_mm(&mm, &vmi);
 	return true;
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index f93f7f74f97b..34277842156c 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -470,6 +470,16 @@ static inline void vma_lock_init(struct vm_area_struct=
 *vma)
 	vma->vm_lock_seq =3D UINT_MAX;
 }
=20
+static inline void vma_assert_attached(struct vm_area_struct *vma)
+{
+	WARN_ON_ONCE(vma->detached);
+}
+
+static inline void vma_assert_detached(struct vm_area_struct *vma)
+{
+	WARN_ON_ONCE(!vma->detached);
+}
+
 static inline void vma_assert_write_locked(struct vm_area_struct *);
 static inline void vma_mark_attached(struct vm_area_struct *vma)
 {
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAE7C271269
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486832; cv=none;
 b=l2oRZPpJhGFMZ9EB/nHCIAHWAqgzET1l5XYMZxuI6Pa5ud3vPvJfoI5x3KoGqh8dEX0vSnR6lNOnXSAswPv51G8+EM2/s3pdwEM85Lqa++HsT58d7rK6QL+U3IG5/2//hnaAhrgvhgj4OD3/0iaxM+vIkLiLDpRzjIMDuiSuyuA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486832; c=relaxed/simple;
	bh=ATtmJB9nahc4mob8MkIy5UZyQCyWqmMlFiBDBX7j9Mg=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=gufAjavZCRb+rxJAJBXI0G7Y/cGW2KLeBvUOCLyFmmPjEVABzW82VxB0olEeJgZ/m5tMhC1KEH4e03yBixjtxNKkakX+/azrxqWdURFwtkzeBN7gEVsu0iCGzjhTZ9mHPf6cLsd8Kiz+Bl2QUCZMj/ZYf1bxgjNPTOesKMwaxYk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=hFvHVQTv; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="hFvHVQTv"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-2f9da17946fso4901763a91.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486830; x=1740091630;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=hOJYwLimtWGBUdGEfM7iYC8bKir+nyDvGIJQLaOGSbI=;
        b=hFvHVQTvWUSO/aAQy3R4NQszie9FA13Hd0K1EPjbSaoHwdNZMaoGYMqGwkhark0DRV
         HjwMxd12IWRds+Sq2FjbBkTr7Gse4aGCFcGU9wxVkKUaQ0V06QR8i+WZyNiVTKzkezPM
         406oihRqCTWkEwpUYvGDHuGr/S2GmKCmoFQsqxUOJiOFqv9ZnapzTHhbJ8+GP2HKcpFn
         QrBIl7eCowm1M8GRpuFkqe2nu6Cb+7X7C74lOU8xiMRSI9u2Ar1ZKE9OB0gQb0OYEwvg
         8p/ryhPBbKBbqccAqVkPStz586qen0q9YQj3HBPXMHE/FOZZELuf7WTdjII+1wKZDndC
         VihA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486830; x=1740091630;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=hOJYwLimtWGBUdGEfM7iYC8bKir+nyDvGIJQLaOGSbI=;
        b=a8fi6DPpwDNkUA+kDP7IO+x6CD80eFH13TmFNM2GaZybmOeqqtyqmhzsJ28VxT0Y3u
         /IfaesyjNnvqMmm2wvT2v1JVjsqwfj2SbBNQPAE046KAGvIkO2vxB3OapTANg1Y5jR+s
         3fBFaqfyHDkQe4qCfQ/zhkPkilTdoMgjlbG2KBpCN33sPUjmiV6crQ+Ki6o0+EH/M2+C
         usbwerdUV3WnQEkH8CVYmtxvWK+z9i8kxindfa2s21SHguvI0VQospd5At5RTALDj6LQ
         g8KUBnvTn/yKiXWXTrhYYZhwgV6KWPfJa+MS9eF+y0UpOGLBmgyKXhskirJrISEAnDdA
         s3zw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXL6yg5oWuzdkRvsj5J7akuFQyBONnyub4Gf3PP7oeMsCfQedpSVZvdFpKCKE0DINAxCS+IadRo/ku4Ofw=@vger.kernel.org
X-Gm-Message-State: AOJu0YyRzLIfa3gAtxRKgu7bu8jNywlk4ZbR/szJ3fHfL7e29E1YiHeI
	XnlidMFm1vU244xdsJ71Q1Dwc/IKgQK/BGO/o/DL4Y3TxjYccWv91G29zkMc166OzNwAMyFGX1a
	aNQ==
X-Google-Smtp-Source: 
 AGHT+IEOMdpbXGqVMbthd+OaRfeUWGsGbBZNR4yGOFrRj8/MuVDw8EpNm8XS12LJXdTQ6lIEpIij678/S+8=
X-Received: from pjur3.prod.google.com ([2002:a17:90a:d403:b0:2fa:27e2:a64d])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:5245:b0:2ee:f687:6adb
 with SMTP id 98e67ed59e1d1-2fbf5bc1df4mr12403072a91.3.1739486830202; Thu, 13
 Feb 2025 14:47:10 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:42 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-6-surenb@google.com>
Subject: [PATCH v10 05/18] mm: mark vmas detached upon exit
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When exit_mmap() removes vmas belonging to an exiting task, it does not
mark them as detached since they can't be reached by other tasks and they
will be freed shortly. Once we introduce vma reuse, all vmas will have to
be in detached state before they are freed to ensure vma when reused is
in a consistent state. Add missing vma_mark_detached() before freeing the
vma.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Add Reviewed-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-6-surenb@google.com/

 mm/vma.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index f72b73f57451..a16a83d0253f 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -427,10 +427,12 @@ void remove_vma(struct vm_area_struct *vma, bool unre=
achable)
 	if (vma->vm_file)
 		fput(vma->vm_file);
 	mpol_put(vma_policy(vma));
-	if (unreachable)
+	if (unreachable) {
+		vma_mark_detached(vma);
 		__vm_area_free(vma);
-	else
+	} else {
 		vm_area_free(vma);
+	}
 }
=20
 /*
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D026127128A
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486834; cv=none;
 b=tI0egMWLHqI9OiDNs1MsUq0lnMAlsxJRn/bXPZh6jBb8jGsX666VkjZ9VYr3PdupW5CQ/sBpCaMkj1mrg8xaaqu8q3FSgyPMevVNdYmushmwya4m3jhKxKw/mDKygdUxogimLJJ/0iNNvU3oX3UGquyR9UV3LdUHN/0EAwQD72o=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486834; c=relaxed/simple;
	bh=8Njh8fZrPvZrNtKwBc51MD6ktP5kHik3Z5tjuwiDAWI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=YG/7n88vyzM76JjS2BomEsnieAwspC22l2D5Eoueflj7EbnlcdnQ8/s6m0xEtZZiK7I+weWwOXXRltuQ+nHV07B0MQ4XXcVr/TMFZ8KrTd7un4b3o6I3vxafn4H3yhcyk9TGxalBwfXVbDmvCQ+tkfvfNa2xUmLq/ubW4p0YV9M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=s+bvfVUv; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="s+bvfVUv"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-21f6cd48c56so21400415ad.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486832; x=1740091632;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=f967rAL9Gs1nAaqEwqbs3/KxJ+f7SIc9ekPyYd69/e8=;
        b=s+bvfVUvgOWtFpnu14BOSzEiPG0kIYIzScBGcOYe3kTw+dg7dsui4vqngb3xVAohOO
         Q9cwgcFFFR5x1iEpV/1CpTwDhE/I15zNDeWD8xsl3jFLHjbZRpZX4mkE817bJRVCkoEV
         mjr1tNkGkVge5EkfDz8z5KrCxbDDcDf2jQktO5lo5KIduWOfkbRI0/1lRhrSSOiNos8g
         /xlAjcwYiPRZDyp5jp0gx1rKZIlgGCrC/Vj16Eo6g+e6wbhNzTMIGanj7H4lLMbZXuuc
         efBKGKIfVntD69uf699/Hvw+VdxczLIa1x+wSYVT1UkbGxCIdPB3+byWCbom6AApotQI
         0FAw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486832; x=1740091632;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=f967rAL9Gs1nAaqEwqbs3/KxJ+f7SIc9ekPyYd69/e8=;
        b=uIZlSwuQT7lNwnk2pFO6NKRTMk9uJM6bUleAhqzaevqNGY2epHqdP7uyu/i63ZQOVM
         BfQtwVxpbxkST7a0HbBAOc+2w1TUhuKNHVCtotVg67ulsuxR985dQUl0KW9VZg7cUcYn
         TLjinluyoQ+q7mWbIiR1uV82OSLhVYGQPDtJ0R46GmHnadAdi4Ma6cI0lo6kLVG21b/J
         Ls8nvyHDMDvkGz37HdqbjVUD/CId/RvzhwzFVqL3fyDxT0BSmUFWbZOL8YvjtgofbxgU
         EuFXIg5H8QwmARR67jbd/GYGvJ1Af46yI/AVmchE65APH9ClVf+Yf1YnOp7tRZfxiGUs
         aKig==
X-Forwarded-Encrypted: i=1;
 AJvYcCWX3Nz8AThVpgndwc10WavA/YkhxTOw1NfSJVhUiFNqICQviAX5msD6v9ssVNDPQNFfan7Ez5Shi1pUBGY=@vger.kernel.org
X-Gm-Message-State: AOJu0YwWQlL2gAdUNHowBJZbZ75LXBjVp3GWtc8N/aqwkI997HT/FVJg
	TF+Xv9NZqewqVi/d7Ix1NkUKZGufUqYGT2LJLQWemLDQG0BKMhDsQ+wy5NBN0KUbjisnkGBRAiz
	59w==
X-Google-Smtp-Source: 
 AGHT+IGSWz7GU9dfI59mEnX3KWjK84PY7XxuIsdno4nvi5XjdZpP5t6KXvINYXrYmtHx9RdlsY/1nSR1Ez8=
X-Received: from pgy35.prod.google.com ([2002:a63:1863:0:b0:ad5:5841:9f23])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a21:3a94:b0:1ee:6b2f:1a2a
 with SMTP id adf61e73a8af0-1ee6b2f1d52mr9549965637.15.1739486832181; Thu, 13
 Feb 2025 14:47:12 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:43 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-7-surenb@google.com>
Subject: [PATCH v10 06/18] types: move struct rcuwait into types.h
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com,
	"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move rcuwait struct definition into types.h so that rcuwait can be used
without including rcuwait.h which includes other headers. Without this
change mm_types.h can't use rcuwait due to a the following circular
dependency:

mm_types.h -> rcuwait.h -> signal.h -> mm_types.h

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Add Acked-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-7-surenb@google.com/

 include/linux/rcuwait.h | 13 +------------
 include/linux/types.h   | 12 ++++++++++++
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
index 27343424225c..9ad134a04b41 100644
--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -4,18 +4,7 @@
=20
 #include <linux/rcupdate.h>
 #include <linux/sched/signal.h>
-
-/*
- * rcuwait provides a way of blocking and waking up a single
- * task in an rcu-safe manner.
- *
- * The only time @task is non-nil is when a user is blocked (or
- * checking if it needs to) on a condition, and reset as soon as we
- * know that the condition has succeeded and are awoken.
- */
-struct rcuwait {
-	struct task_struct __rcu *task;
-};
+#include <linux/types.h>
=20
 #define __RCUWAIT_INITIALIZER(name)		\
 	{ .task =3D NULL, }
diff --git a/include/linux/types.h b/include/linux/types.h
index 1c509ce8f7f6..a3d2182c2686 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -248,5 +248,17 @@ typedef void (*swap_func_t)(void *a, void *b, int size=
);
 typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv=
);
 typedef int (*cmp_func_t)(const void *a, const void *b);
=20
+/*
+ * rcuwait provides a way of blocking and waking up a single
+ * task in an rcu-safe manner.
+ *
+ * The only time @task is non-nil is when a user is blocked (or
+ * checking if it needs to) on a condition, and reset as soon as we
+ * know that the condition has succeeded and are awoken.
+ */
+struct rcuwait {
+	struct task_struct __rcu *task;
+};
+
 #endif /*  __ASSEMBLY__ */
 #endif /* _LINUX_TYPES_H */
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B038327129F
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486836; cv=none;
 b=GbW9eRae53n3KwIo4G8Sn9HSUFIUud1BlZqDEZRvJZBAvo8xhGJNjhOuXqaLCVeNXifqUj1qlg18aIx0wnURFmKPUKh0lk7sd3AQvgGbEOamA1MwWE1KQfqk7Hr/1pTi4DDhvNIB/wE+NGZbY2MHA7iyMThU/KzpkjgHudbBRTY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486836; c=relaxed/simple;
	bh=JhPdcuk4AGpU93OVpkaMGB6F3cd+gFPx66MB+rFF950=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=YWdPd3Jh4ibP+nWJDiLEF7a2Vf6O71+YIIWQtBRFlIlS7w1O0U0HwWoLOD8ocLNx+EIOKFvpHUzYZHbrQLKBrcGc3mn9LVlO5YVo3Pra9mBTFN9Amg2lYRWMhg6Cvu6gtUkMhdTz7xAksnmiQPKxZfOXnHp6sBkxfRDnlOGCxM0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=gFTq+RN9; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="gFTq+RN9"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-21fb94c7fc6so29875675ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486834; x=1740091634;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=8lnVYcB8HwzzUuW9qEx+Yoh+VM8+OMN/pADhYBeUDVo=;
        b=gFTq+RN9EKpCzcXhSXt1fQwnI9nLa4aj+w4+MlXgVKkVKvOD6TI4HIz7vf6ONp8Yg7
         rtqCCptndANKS5nzZw1B9Wa2zt58ZLAJsiNB0LNJZ6TJQNmzFNKitwa3FfgG7aeVtQ2i
         20dUX/c3ytQkDzo3ZtY2JIpRAZwnC92bW0j1ujF9xp2fhASgHGyUzZ+CWmnILpwvHsGV
         iPmQTaRFqeSo9mlH1HIhu2CRZbQEPToGfUAGpUIZ+83RBI0L1Y1Ci47xtU0SHK0CyNuq
         4Wbio3sfkzUKDqdGIUHKxg1yhm5D6h9trJx8nVKTze3eXMbtpMVIr5eAuWV0ZHOnW6aq
         y7bg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486834; x=1740091634;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=8lnVYcB8HwzzUuW9qEx+Yoh+VM8+OMN/pADhYBeUDVo=;
        b=psRT7zWGuAx2VYddubHI2t/qRwFCY6DIEmj50pB1OCxkJTiU7HA4XzfKeBdslMZgFv
         jGNY2GoWdycMlHBYXAOc7R+7cZZB2ygueCYvus/8/J4dL8MKYyvEDhg0LjxhSzRsGOwD
         WdK7OR6C8iEP6QEI9mARFG3+x/HqY4SRTtdMs2vUg6bcbQXSZYhydCMBXOT0rNtRIJKf
         65kRT63jXNMkV2tj9BYKjTjHATZtK1izhfzZ+YHtWBrGQOVCoal5/0LD/OsgHJeZ1D17
         +37fvzS+0VSX6RDHoCbYQwrio/2I50M47emmQxRyfKyWiFnAVFc45qXUDJKaAB1f7Uvm
         sdlQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCW49iHSq1YgO64t4TFwWuJvQF7cKCyj/ZPtYbNMPHrktrQcVI9KGZT0yhzvRIFLJ8qnlk5uUzytEpnnBY4=@vger.kernel.org
X-Gm-Message-State: AOJu0YyCb0kpwvwqmCrHqLJk74p3Jq24z3i9OPQd5bicsFd0WLEt7naq
	cE5JGI/dqxHlbOdI+0IiqUW180ton17ovGa1879j3nKs6+cxhxQMJemX7satBcvcJrSL2EII/0O
	+kw==
X-Google-Smtp-Source: 
 AGHT+IFvP7Rtv2Mvdra3Yg1NiqesqrI7PbBRDKrXPHHWmHQO1TyYtp4d3ffasZylr6rr8GVtjs5kwYbmVJQ=
X-Received: from pfbgd3.prod.google.com ([2002:a05:6a00:8303:b0:730:8c7f:979])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a20:c997:b0:1ee:3b53:b77f
 with SMTP id adf61e73a8af0-1ee6b4013abmr9718076637.37.1739486834016; Thu, 13
 Feb 2025 14:47:14 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:44 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-8-surenb@google.com>
Subject: [PATCH v10 07/18] mm: allow
 vma_start_read_locked/vma_start_read_locked_nested
 to fail
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With upcoming replacement of vm_lock with vm_refcnt, we need to handle a
possibility of vma_start_read_locked/vma_start_read_locked_nested failing
due to refcount overflow. Prepare for such possibility by changing these
APIs and adjusting their users.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Refactor the code, per Lorenzo Stoakes
- Remove Vlastimil's Acked-by since code is changed

[1] https://lore.kernel.org/all/20250111042604.3230628-8-surenb@google.com/

 include/linux/mm.h |  6 ++++--
 mm/userfaultfd.c   | 30 +++++++++++++++++++++++-------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1b8e72888124..7fa7c39162fd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -747,10 +747,11 @@ static inline bool vma_start_read(struct vm_area_stru=
ct *vma)
  * not be used in such cases because it might fail due to mm_lock_seq over=
flow.
  * This functionality is used to obtain vma read lock and drop the mmap re=
ad lock.
  */
-static inline void vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
+static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
 {
 	mmap_assert_locked(vma->vm_mm);
 	down_read_nested(&vma->vm_lock.lock, subclass);
+	return true;
 }
=20
 /*
@@ -759,10 +760,11 @@ static inline void vma_start_read_locked_nested(struc=
t vm_area_struct *vma, int
  * not be used in such cases because it might fail due to mm_lock_seq over=
flow.
  * This functionality is used to obtain vma read lock and drop the mmap re=
ad lock.
  */
-static inline void vma_start_read_locked(struct vm_area_struct *vma)
+static inline bool vma_start_read_locked(struct vm_area_struct *vma)
 {
 	mmap_assert_locked(vma->vm_mm);
 	down_read(&vma->vm_lock.lock);
+	return true;
 }
=20
 static inline void vma_end_read(struct vm_area_struct *vma)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 4527c385935b..867898c4e30b 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -84,8 +84,12 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_st=
ruct *mm,
=20
 	mmap_read_lock(mm);
 	vma =3D find_vma_and_prepare_anon(mm, address);
-	if (!IS_ERR(vma))
-		vma_start_read_locked(vma);
+	if (!IS_ERR(vma)) {
+		bool locked =3D vma_start_read_locked(vma);
+
+		if (!locked)
+			vma =3D ERR_PTR(-EAGAIN);
+	}
=20
 	mmap_read_unlock(mm);
 	return vma;
@@ -1482,12 +1486,24 @@ static int uffd_move_lock(struct mm_struct *mm,
=20
 	mmap_read_lock(mm);
 	err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
-	if (!err) {
-		vma_start_read_locked(*dst_vmap);
-		if (*dst_vmap !=3D *src_vmap)
-			vma_start_read_locked_nested(*src_vmap,
-						SINGLE_DEPTH_NESTING);
+	if (err)
+		goto out;
+
+	if (!vma_start_read_locked(*dst_vmap)) {
+		err =3D -EAGAIN;
+		goto out;
 	}
+
+	/* Nothing further to do if both vmas are locked. */
+	if (*dst_vmap =3D=3D *src_vmap)
+		goto out;
+
+	if (!vma_start_read_locked_nested(*src_vmap, SINGLE_DEPTH_NESTING)) {
+		/* Undo dst_vmap locking if src_vmap failed to lock */
+		vma_end_read(*dst_vmap);
+		err =3D -EAGAIN;
+	}
+out:
 	mmap_read_unlock(mm);
 	return err;
 }
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7437271824
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:16 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486838; cv=none;
 b=Y2oDole1A7BbGkhhI/FR0EDpGSaDPa2pyBCIbY80orfFjdlXBo9gXJqqbhK0CgZSDifhzsJFPpt+YFBCxQIGInM5Ru6AU5qlKpzKbDMafCmoyMP6RiqDHv2ahaQ2BOMttTM8kjrcvnuwtJbVLaJIUZ+V9bIfzypi/+VzFqtgpaQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486838; c=relaxed/simple;
	bh=jd3zK0O73VvJ9q8T3yi04/lCfi5p05ochW8W4AzVfqM=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=qQUPUX/ZL1E/MKm5KlslV8XgQVN0Q7dWDJHVOgqT8H0icxbyUGQE+/8WvjJ2Nw4+ClZlkN4XJOYjoFnioCvkkolu/N4PREgwQt80n/BOA9WMc0UURwIT6+XjS2hOHujM6jbGoUYm5z4qTTDkwlo1TrnBqESgJM2ML+a8nrU75gY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=lNddyp9Y; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="lNddyp9Y"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-2fc1e7efdffso2828768a91.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486836; x=1740091636;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=PeBZHGW/4v8XWLim1qpAbGeBbAKuN/isMOjrQKZ6x9o=;
        b=lNddyp9YUtSkjoGkvm+zgCFC8vAIcGIm5M4oJWQj1sBdmhLrzEkbS+Qx7af4su84MH
         ft/poeZYLBq43pNX92CDKA839VnY68hfVWVaBGkfO4TM708x8WcfQWEXL4Lv9fCw79Wy
         d7GMkDbJpAzyGa+ZwQ2yn+Jy6C9KX+0R2fXAzyWaxySeRn0sugRxJokKkEOycXVRoTY6
         D18CEZuMCHfM8qy/IlRmoCRXJZwIRXLS59qty9/K0cD7Pc/T5jAoBJ3GgLGLQiGl5mQd
         K3sxJwMrVm5NpJ9DHUz9UCrN3Hv4Z0uK4vRQo7wwd7A68qMP2Ipx6j51KaVVQXscaMce
         DMiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486836; x=1740091636;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=PeBZHGW/4v8XWLim1qpAbGeBbAKuN/isMOjrQKZ6x9o=;
        b=HK8SzdON0tG4q7iWuOkHWVpcy9HQGTZ89m2uYA8jIIJT2nE+TKUNlZFmVxoCCZXcGR
         /9uMRn7jR1yFxoKQ9NTy/GazezClJSnCpngrA23jZctLGdQiF8CEx6BORupCTwsJ7+l6
         xwayLeb+WaRFCpmf4p0Ap68wxQPqtdPGDxMa+XHXu9PocpdCYfQC+aZGwhwaO4X6mnMr
         Z1UXl0qsa8eHBdg1332zo6VDoDT4Ji5ApkuUoN3ssOTqdFmSACQLQHQWBclObrimANNj
         IuNzAvI1j7n7Bg2BBu/2k52+u5PX5EaouHPkTBth5RhxIRRcAgToo0afQmlA1s3oCwZq
         /PdA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUuXlnuyUcC3aMN7EWsZta+FAmde57fLTscYq5kp70pozMV0GCD3DYECYcrfoK0OIrbYdPPZ7tWuQSlvrE=@vger.kernel.org
X-Gm-Message-State: AOJu0YyGVKGTBeH6sB7QhGfBaa9qEtTv6qfO5bubhabbPPiBJJQs88wB
	HynSU5AdtO7edZI96gZ9RyL56tKQ12PbF3dSW5dyVXC/E0agHlatzum5q4pNFLdsEGXuaf/eDNa
	LwQ==
X-Google-Smtp-Source: 
 AGHT+IHfEnz0nPhbgU0f7q7Auc5HNsVT5eJa8wXZ0ZtBNf3PSVTzFb4/Oj/O6W2xc+itT5Ikm9K7oZ7OyHo=
X-Received: from pgvm22.prod.google.com ([2002:a65:62d6:0:b0:ada:4ec0:a7cd])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a21:512:b0:1e0:ca33:8ccf
 with SMTP id adf61e73a8af0-1ee6b416346mr10537954637.34.1739486836048; Thu, 13
 Feb 2025 14:47:16 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:45 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-9-surenb@google.com>
Subject: [PATCH v10 08/18] mm: move mmap_init_lock() out of the header file
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

mmap_init_lock() is used only from mm_init() in fork.c, therefore it does
not have to reside in the header file. This move lets us avoid including
additional headers in mmap_lock.h later, when mmap_init_lock() needs to
initialize rcuwait object.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Drop inline for mmap_init_lock(), per Lorenzo Stoakes
- Add Reviewed-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-9-surenb@google.com/

 include/linux/mmap_lock.h | 6 ------
 kernel/fork.c             | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 45a21faa3ff6..4706c6769902 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -122,12 +122,6 @@ static inline bool mmap_lock_speculate_retry(struct mm=
_struct *mm, unsigned int
=20
 #endif /* CONFIG_PER_VMA_LOCK */
=20
-static inline void mmap_init_lock(struct mm_struct *mm)
-{
-	init_rwsem(&mm->mmap_lock);
-	mm_lock_seqcount_init(mm);
-}
-
 static inline void mmap_write_lock(struct mm_struct *mm)
 {
 	__mmap_lock_trace_start_locking(mm, true);
diff --git a/kernel/fork.c b/kernel/fork.c
index 5bf3e407c795..f1af413e5aa4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1230,6 +1230,12 @@ static void mm_init_uprobes_state(struct mm_struct *=
mm)
 #endif
 }
=20
+static void mmap_init_lock(struct mm_struct *mm)
+{
+	init_rwsem(&mm->mmap_lock);
+	mm_lock_seqcount_init(mm);
+}
+
 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct =
*p,
 	struct user_namespace *user_ns)
 {
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8AD0274271
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486840; cv=none;
 b=GquX+6VEUD23toFMStJAP5YAeQKMtATL+hsnzRRCAe7Q0fCvRf2fkmB4amWjMrDcE/vHR7W1b0trY2jaEvHU/ywYn2Fo6Btrv+4neXjudx9x5wmQO0Hud/kGYiTojMbx/8Z0Mn40UzxP7OaCIBfOUfABxsz30T0Qb2VIc8sCfFk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486840; c=relaxed/simple;
	bh=Y4hMd1vQU5kJmXf3djw5SF9D6KSEwo+wW27aKEsq5PI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Q2t1m2Ka39NOSoTHTJ0SqFjXhmfEtBLzaoRbHQMGlb++SUR2aWulmpJYmV8QYMIccnizHRu5ZiXCNaH/NV9MZOaBWU2qAPDHnljeZVMuVT4fEGcNumvn2a90sU6aVWhdBLziqN50wIQxPcKw4Tal+kZDCrskt2Toy6R4g4MrE94=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=g+5+IuDc; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="g+5+IuDc"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-2fc1c7c8396so1868133a91.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486838; x=1740091638;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=YNoCxF31vK1w52ANaPw+siaL/yNI9CD2wHk/1nIOSYo=;
        b=g+5+IuDcL2JDSdwxIS2lyLPKLwz9qSKJ6G/fDejQTqxKb552in/KKjFqBX3Z10WUEI
         RljhEmJxkdxAskjo+LHrrhI0SkX1lcPRyAyua/e58b6+JPHhlUQ0PZx1UAZhc011o4OQ
         Y4TPF7Vmnlxa0NsHrBtjwHrRKsnrfYJtOQzyIFPQ/ZDhQ1hPnBMq4LqXCuXRhzUmFA1+
         JavTXxwF/3vHOkxQJ7VGH3VazWShFib2S16v4ZbEXxNXkTKaEfKnz5cvp2W9l2rkkw+h
         pdyih6g/Tw6PRKAhMXKX0YLm4HXq3EfMa22yjK67KMhH6mxSkwxUN2Ak1ZhUDtduNByD
         56Nw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486838; x=1740091638;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=YNoCxF31vK1w52ANaPw+siaL/yNI9CD2wHk/1nIOSYo=;
        b=NjgOrvA/BXixg47PF6FxOBHbEphXMT6mI8KZSBW7IWmXbnJGoMqrteXE1vVQnwVqZ2
         dXJO+1uzHoCixcOwyWW4xLkZpE2VbwUchM3glfZf11KjwFdQNp4eqMgGagjJpDMifFMY
         lpQwAuTtKrgxnu3D6JVxi1FNxlBmr6ZyxnqEg2TVJKbvFJofiANijCds0xfHDpHRuMIY
         TTTcb3R/fZJa8Y8EQs4XlkvGKc7tjFTlKeysBlj+DCcGbAVI1Vm2IyZgzVkVeYj/T80P
         b5h/5Q7sepVjCMblstNXKZkQX6ogoAj8gMhLo+a+95uWqa971f42bLEKknbTpH8URP1F
         ovBw==
X-Forwarded-Encrypted: i=1;
 AJvYcCU/vxI77wsn3Wsd7WfBG9aexKtgS1rRNOK25jQl+Ewgfy5XZLs2SUcbO9EjHrjbzp27T9/XUyUnVwL96do=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyo2kzBkF4YNnnTYvmZJ1EdLkcu+Px1Iiqdp0R+X2xHIzA2+5D8
	0Hh98zObyDBjUeXxn5mRi4mflNlPUHzEYNjZtSPL1WqOiWSV3c3GkLoEgb9YAfX4nnoWF4BQ9Lm
	7nQ==
X-Google-Smtp-Source: 
 AGHT+IErQJcr8UIFZH8RFb6dXrkV59J3DVPaTleZ6Gt1R/Rh5Le0xTpOCinKGJEwYZoy9GO+2bPedgwmiK8=
X-Received: from pjm4.prod.google.com ([2002:a17:90b:2fc4:b0:2ea:5613:4d5d])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:54c4:b0:2ee:ab29:1a65
 with SMTP id 98e67ed59e1d1-2fbf5bc07e4mr13915594a91.4.1739486838053; Thu, 13
 Feb 2025 14:47:18 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:46 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-10-surenb@google.com>
Subject: [PATCH v10 09/18] mm: uninline the main body of vma_start_write()
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

vma_start_write() is used in many places and will grow in size very soon.
It is not used in performance critical paths and uninlining it should
limit the future code size growth.
No functional changes.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Add Reviewed-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-10-surenb@google.com/

 include/linux/mm.h | 12 +++---------
 mm/memory.c        | 14 ++++++++++++++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7fa7c39162fd..557d66e187ff 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -787,6 +787,8 @@ static bool __is_vma_write_locked(struct vm_area_struct=
 *vma, unsigned int *mm_l
 	return (vma->vm_lock_seq =3D=3D *mm_lock_seq);
 }
=20
+void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se=
q);
+
 /*
  * Begin writing to a VMA.
  * Exclude concurrent readers under the per-VMA lock until the currently
@@ -799,15 +801,7 @@ static inline void vma_start_write(struct vm_area_stru=
ct *vma)
 	if (__is_vma_write_locked(vma, &mm_lock_seq))
 		return;
=20
-	down_write(&vma->vm_lock.lock);
-	/*
-	 * We should use WRITE_ONCE() here because we can have concurrent reads
-	 * from the early lockless pessimistic check in vma_start_read().
-	 * We don't really care about the correctness of that early check, but
-	 * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
-	 */
-	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
-	up_write(&vma->vm_lock.lock);
+	__vma_start_write(vma, mm_lock_seq);
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
diff --git a/mm/memory.c b/mm/memory.c
index e600a5ff3c7a..3d9c5141193f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6393,6 +6393,20 @@ struct vm_area_struct *lock_mm_and_find_vma(struct m=
m_struct *mm,
 #endif
=20
 #ifdef CONFIG_PER_VMA_LOCK
+void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se=
q)
+{
+	down_write(&vma->vm_lock.lock);
+	/*
+	 * We should use WRITE_ONCE() here because we can have concurrent reads
+	 * from the early lockless pessimistic check in vma_start_read().
+	 * We don't really care about the correctness of that early check, but
+	 * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
+	 */
+	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
+	up_write(&vma->vm_lock.lock);
+}
+EXPORT_SYMBOL_GPL(__vma_start_write);
+
 /*
  * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed =
to be
  * stable and not isolated. If the VMA is not found or is being modified t=
he
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D059627FE6E
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486842; cv=none;
 b=CC5TumrTsoRS096hapYEF3yEi18cN5ixQgjxB9SjsRM2EWq1fGXx1MbwsguSuR611v3yBvZrQ+osqZJENy34nkBen6FuBVXTgTT2x8vZA7mDlWHmHTCt9MK7OhCGwWV6I0NwnwwQjFQT/IWBn44/QFGrxV56VD7qcOys9rk4ihs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486842; c=relaxed/simple;
	bh=8Us/4Sut55vL/Q+kFzZMiRWAFIlHFVPIi/mXDPDz6vI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=JmZZlXza5KjOg4jgUDdN55UdF4I9cHNwfLyI54P6sh77j+KPJTT+j2nnF2nLUSByO3WB5VeQLdHxrW84xwmj5c5eJSmJZ4T+IylHdT3VIsYJEihbZwjxV4mPsvhkFC2bTpUnAo8NiuEYAo1Fpjto9WW+7biF8rEIDD4CPQk1/Ug=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=kwrsBOjE; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="kwrsBOjE"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-220e62c4f27so22488725ad.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486840; x=1740091640;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=3RwEBR7WcDA6C9/htn1wwdAW8BOhyJsqS310jozD7lM=;
        b=kwrsBOjEvM8c8+JaDKacp3jI/4fyBQbggsjAJlUbahQ+1nEDFLjbMe8RPtmlaeuVC3
         em/aC18ylP0M+HU3fNhkzktEdKPWon92P6hpKwiLqH8CUdij/Ok7pyFt1VmO5ShtDs1e
         tdXnZj5eNCPAOmD2lYDV0z0KvykBqOOS67epO4AH3PLmfUIBv/xEpkx+/n4OR15XTMt3
         9G+jjQKfXBseVg9bIHHe7yZDyden0wyFFHTOfMsu2wIcKrSUQNkJJC9ICkvMfrZf4bXK
         +wJQiTaupAKXTHDK7x89J1xYMmXl9JIGkR2xut2BRA9PI/pY29sxUc46Cy5b7e27jgOB
         IhDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486840; x=1740091640;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=3RwEBR7WcDA6C9/htn1wwdAW8BOhyJsqS310jozD7lM=;
        b=Rc1QIu7Bv92v92BxW+1QBhwAxWx5rE1ERmeMuGIFGU3x1C4ZkZesJv9dHlf0wDDmSb
         JMCexcmuj34pFTskN41M7hb37GvHg2G/maWrEqJV5WOLndOvowmfmrxUNW6ZZsZ13Um6
         wluT4lfHr32OyZB89yV6GJYK+jMuC5KUJP30AmVNzvwXG2PSMCRmRfxzIeLXHP4fCgy5
         ZgPVUcsF28o/7N+VJwb3C/BNGO5c9A+c/2CzeYnGbX50co7QRt5jUoLVWzwQpwiQaLwq
         xz4c0e3b9dwKQGEd0NRF1bRT4RFPilD6HK0A+aPf0QEOHhLU8Gwkl8OEhNUvCnt8Bpk4
         0qkg==
X-Forwarded-Encrypted: i=1;
 AJvYcCXS6VYrCXGdYym4nOz+36eMONpB/nGt0C1OsKMRQ2GwtWlGpSUh+dMbKunkptve12GOBoWdBuorQB5MR50=@vger.kernel.org
X-Gm-Message-State: AOJu0YxxUUIH9PkbHOuNDDEI/5ygjcsNia6bX+CIDCU8mTIASjry4fmP
	Fh7IAOMpqHbZzrQfUl1N1ojrLhXI79BAoRAr71/+tU3jDpgbOERFv4wJisWOsp2xz8ZgcXMY7gd
	MRQ==
X-Google-Smtp-Source: 
 AGHT+IG5tsVaIrkzimQ3d73aZezGeIWfqvUQe6a1x7KexmMSk23y9v/qQliOxJeTTumAOAHvlPYYt46pBpQ=
X-Received: from pgid14.prod.google.com ([2002:a63:ed0e:0:b0:801:e378:a64a])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a21:2e81:b0:1ee:5cf2:9c07
 with SMTP id adf61e73a8af0-1ee5cf29ed0mr11346671637.3.1739486840113; Thu, 13
 Feb 2025 14:47:20 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:47 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-11-surenb@google.com>
Subject: [PATCH v10 10/18] refcount: provide ops for cases when object's
 memory can be reused
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com, Will Deacon <will@kernel.org>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

For speculative lookups where a successful inc_not_zero() pins the
object, but where we still need to double check if the object acquired
is indeed the one we set out to acquire (identity check), needs this
validation to happen *after* the increment.
Similarly, when a new object is initialized and its memory might have
been previously occupied by another object, all stores to initialize the
object should happen *before* refcount initialization.

Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering
is required for reference counting.

Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering
between acquiring a reference count on an object and performing the
identity check for that object.
Add refcount_set_release() to guarantee proper ordering between stores
initializing object attributes and the store initializing the refcount.
refcount_set_release() should be done after all other object attributes
are initialized. Once refcount_set_release() is called, the object should
be considered visible to other tasks even if it was not yet added into an
object collection normally used to discover it. This is because other
tasks might have discovered the object previously occupying the same
memory and after memory reuse they can succeed in taking refcount for the
new object and start using it.

Object reuse example to consider:

consumer:
    obj =3D lookup(collection, key);
    if (!refcount_inc_not_zero_acquire(&obj->ref))
        return;
    if (READ_ONCE(obj->key) !=3D key) { /* identity check */
        put_ref(obj);
        return;
    }
    use(obj->value);

                 producer:
                     remove(collection, obj->key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key =3D KEY_INVALID;
                     free(obj);
                     obj =3D malloc(); /* obj is reused */
                     obj->key =3D new_key;
                     obj->value =3D new_value;
                     refcount_set_release(obj->ref, 1);
                     add(collection, new_key, obj);

refcount_{add|inc}_not_zero_acquire() is required to prevent the following
reordering when refcount_inc_not_zero() is used instead:

consumer:
    obj =3D lookup(collection, key);
    if (READ_ONCE(obj->key) !=3D key) { /* reordered identity check */
        put_ref(obj);
        return;
    }
                 producer:
                     remove(collection, obj->key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key =3D KEY_INVALID;
                     free(obj);
                     obj =3D malloc(); /* obj is reused */
                     obj->key =3D new_key;
                     obj->value =3D new_value;
                     refcount_set_release(obj->ref, 1);
                     add(collection, new_key, obj);

    if (!refcount_inc_not_zero(&obj->ref))
        return;
    use(obj->value); /* USING WRONG OBJECT */

refcount_set_release() is required to prevent the following reordering
when refcount_set() is used instead:

consumer:
    obj =3D lookup(collection, key);

                 producer:
                     remove(collection, obj->key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key =3D KEY_INVALID;
                     free(obj);
                     obj =3D malloc(); /* obj is reused */
                     obj->key =3D new_key; /* new_key =3D=3D old_key */
                     refcount_set(obj->ref, 1);

    if (!refcount_inc_not_zero_acquire(&obj->ref))
        return;
    if (READ_ONCE(obj->key) !=3D key) { /* pass since new_key =3D=3D old_ke=
y */
        put_ref(obj);
        return;
    }
    use(obj->value); /* USING STALE obj->value */

                     obj->value =3D new_value; /* reordered store */
                     add(collection, key, obj);

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz> #slab
Tested-by: Shivank Garg <shivankg@amd.com>
---
 Documentation/RCU/whatisRCU.rst               |  10 ++
 Documentation/core-api/refcount-vs-atomic.rst |  37 +++++-
 include/linux/refcount.h                      | 106 ++++++++++++++++++
 include/linux/slab.h                          |   9 ++
 4 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.=
rst
index 1ef5784c1b84..53faeed7c190 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -971,6 +971,16 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU=
`` object must be
 initialized after each and every call to kmem_cache_alloc(), which renders
 reference-free spinlock acquisition completely unsafe.  Therefore, when
 using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
+If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire()
+and refcount_set_release() APIs should be used to ensure correct operation
+ordering when verifying object identity and when initializing newly
+allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire()
+ensures that identity checks happen *after* reference count is taken.
+refcount_set_release() should be called after a newly allocated object is
+fully initialized and release fence ensures that new values are visible
+*before* refcount can be successfully taken by other users. Once
+refcount_set_release() is called, the object should be considered visible
+by other tasks.
 (Those willing to initialize their locks in a kmem_cache constructor
 may also use locking, including cache-friendly sequence locking.)
=20
diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/=
core-api/refcount-vs-atomic.rst
index 79a009ce11df..9551a7bbfd38 100644
--- a/Documentation/core-api/refcount-vs-atomic.rst
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -86,7 +86,19 @@ Memory ordering guarantee changes:
  * none (both fully unordered)
=20
=20
-case 2) - increment-based ops that return no value
+case 2) - non-"Read/Modify/Write" (RMW) ops with release ordering
+-------------------------------------------
+
+Function changes:
+
+ * atomic_set_release() --> refcount_set_release()
+
+Memory ordering guarantee changes:
+
+ * none (both provide RELEASE ordering)
+
+
+case 3) - increment-based ops that return no value
 --------------------------------------------------
=20
 Function changes:
@@ -98,7 +110,7 @@ Memory ordering guarantee changes:
=20
  * none (both fully unordered)
=20
-case 3) - decrement-based RMW ops that return no value
+case 4) - decrement-based RMW ops that return no value
 ------------------------------------------------------
=20
 Function changes:
@@ -110,7 +122,7 @@ Memory ordering guarantee changes:
  * fully unordered --> RELEASE ordering
=20
=20
-case 4) - increment-based RMW ops that return a value
+case 5) - increment-based RMW ops that return a value
 -----------------------------------------------------
=20
 Function changes:
@@ -126,7 +138,20 @@ Memory ordering guarantees changes:
    result of obtaining pointer to the object!
=20
=20
-case 5) - generic dec/sub decrement-based RMW ops that return a value
+case 6) - increment-based RMW ops with acquire ordering that return a value
+-----------------------------------------------------
+
+Function changes:
+
+ * atomic_inc_not_zero() --> refcount_inc_not_zero_acquire()
+ * no atomic counterpart --> refcount_add_not_zero_acquire()
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> ACQUIRE ordering on success
+
+
+case 7) - generic dec/sub decrement-based RMW ops that return a value
 ---------------------------------------------------------------------
=20
 Function changes:
@@ -139,7 +164,7 @@ Memory ordering guarantees changes:
  * fully ordered --> RELEASE ordering + ACQUIRE ordering on success
=20
=20
-case 6) other decrement-based RMW ops that return a value
+case 8) other decrement-based RMW ops that return a value
 ---------------------------------------------------------
=20
 Function changes:
@@ -154,7 +179,7 @@ Memory ordering guarantees changes:
 .. note:: atomic_add_unless() only provides full order on success.
=20
=20
-case 7) - lock-based RMW
+case 9) - lock-based RMW
 ------------------------
=20
 Function changes:
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 35f039ecb272..4589d2e7bfea 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -87,6 +87,15 @@
  * The decrements dec_and_test() and sub_and_test() also provide acquire
  * ordering on success.
  *
+ * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() provide
+ * acquire and release ordering for cases when the memory occupied by the
+ * object might be reused to store another object. This is important for t=
he
+ * cases where secondary validation is required to detect such reuse, e.g.
+ * SLAB_TYPESAFE_BY_RCU. The secondary validation checks have to happen af=
ter
+ * the refcount is taken, hence acquire order is necessary. Similarly, whe=
n the
+ * object is initialized, all stores to its attributes should be visible b=
efore
+ * the refcount is set, otherwise a stale attribute value might be used by
+ * another task which succeeds in taking a refcount to the new object.
  */
=20
 #ifndef _LINUX_REFCOUNT_H
@@ -125,6 +134,31 @@ static inline void refcount_set(refcount_t *r, int n)
 	atomic_set(&r->refs, n);
 }
=20
+/**
+ * refcount_set_release - set a refcount's value with release ordering
+ * @r: the refcount
+ * @n: value to which the refcount will be set
+ *
+ * This function should be used when memory occupied by the object might be
+ * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU.
+ *
+ * Provides release memory ordering which will order previous memory opera=
tions
+ * against this store. This ensures all updates to this object are visible
+ * once the refcount is set and stale values from the object previously
+ * occupying this memory are overwritten with new ones.
+ *
+ * This function should be called only after new object is fully initializ=
ed.
+ * After this call the object should be considered visible to other tasks =
even
+ * if it was not yet added into an object collection normally used to disc=
over
+ * it. This is because other tasks might have discovered the object previo=
usly
+ * occupying the same memory and after memory reuse they can succeed in ta=
king
+ * refcount to the new object and start using it.
+ */
+static inline void refcount_set_release(refcount_t *r, int n)
+{
+	atomic_set_release(&r->refs, n);
+}
+
 /**
  * refcount_read - get a refcount's value
  * @r: the refcount
@@ -178,6 +212,52 @@ static inline __must_check bool refcount_add_not_zero(=
int i, refcount_t *r)
 	return __refcount_add_not_zero(i, r, NULL);
 }
=20
+static inline __must_check __signed_wrap
+bool __refcount_add_not_zero_acquire(int i, refcount_t *r, int *oldp)
+{
+	int old =3D refcount_read(r);
+
+	do {
+		if (!old)
+			break;
+	} while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
+
+	if (oldp)
+		*oldp =3D old;
+
+	if (unlikely(old < 0 || old + i < 0))
+		refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
+
+	return old;
+}
+
+/**
+ * refcount_add_not_zero_acquire - add a value to a refcount with acquire =
ordering unless it is 0
+ *
+ * @i: the value to add to the refcount
+ * @r: the refcount
+ *
+ * Will saturate at REFCOUNT_SATURATED and WARN.
+ *
+ * This function should be used when memory occupied by the object might be
+ * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU.
+ *
+ * Provides acquire memory ordering on success, it is assumed the caller h=
as
+ * guaranteed the object memory to be stable (RCU, etc.). It does provide a
+ * control dependency and thereby orders future stores. See the comment on=
 top.
+ *
+ * Use of this function is not recommended for the normal reference counti=
ng
+ * use case in which references are taken and released one at a time.  In =
these
+ * cases, refcount_inc_not_zero_acquire() should instead be used to increm=
ent a
+ * reference count.
+ *
+ * Return: false if the passed refcount is 0, true otherwise
+ */
+static inline __must_check bool refcount_add_not_zero_acquire(int i, refco=
unt_t *r)
+{
+	return __refcount_add_not_zero_acquire(i, r, NULL);
+}
+
 static inline __signed_wrap
 void __refcount_add(int i, refcount_t *r, int *oldp)
 {
@@ -236,6 +316,32 @@ static inline __must_check bool refcount_inc_not_zero(=
refcount_t *r)
 	return __refcount_inc_not_zero(r, NULL);
 }
=20
+static inline __must_check bool __refcount_inc_not_zero_acquire(refcount_t=
 *r, int *oldp)
+{
+	return __refcount_add_not_zero_acquire(1, r, oldp);
+}
+
+/**
+ * refcount_inc_not_zero_acquire - increment a refcount with acquire order=
ing unless it is 0
+ * @r: the refcount to increment
+ *
+ * Similar to refcount_inc_not_zero(), but provides acquire memory orderin=
g on
+ * success.
+ *
+ * This function should be used when memory occupied by the object might be
+ * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU.
+ *
+ * Provides acquire memory ordering on success, it is assumed the caller h=
as
+ * guaranteed the object memory to be stable (RCU, etc.). It does provide a
+ * control dependency and thereby orders future stores. See the comment on=
 top.
+ *
+ * Return: true if the increment was successful, false otherwise
+ */
+static inline __must_check bool refcount_inc_not_zero_acquire(refcount_t *=
r)
+{
+	return __refcount_inc_not_zero_acquire(r, NULL);
+}
+
 static inline void __refcount_inc(refcount_t *r, int *oldp)
 {
 	__refcount_add(1, r, oldp);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 09eedaecf120..ad902a2d692b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -136,6 +136,15 @@ enum _slab_flag_bits {
  * rcu_read_lock before reading the address, then rcu_read_unlock after
  * taking the spinlock within the structure expected at that address.
  *
+ * Note that object identity check has to be done *after* acquiring a
+ * reference, therefore user has to ensure proper ordering for loads.
+ * Similarly, when initializing objects allocated with SLAB_TYPESAFE_BY_RC=
U,
+ * the newly allocated object has to be fully initialized *before* its
+ * refcount gets initialized and proper ordering for stores is required.
+ * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() are
+ * designed with the proper fences required for reference counting objects
+ * allocated with SLAB_TYPESAFE_BY_RCU.
+ *
  * Note that it is not possible to acquire a lock within a structure
  * allocated with SLAB_TYPESAFE_BY_RCU without first acquiring a reference
  * as described above.  The reason is that SLAB_TYPESAFE_BY_RCU pages
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A921F274274
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:22 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486844; cv=none;
 b=W2l9rUx0R8QTsvFNPbESIwdE9Et2Qo/FDu+xY7z+p6h5XlbDpY/N+zoQi/w3P+LK1EKmlrwCqyopIvBQsxSvSvmxb/RVeq+Ly3lEt1wvWbTc6qHkvjccqYcWrnLz8xoHfXBgw/w7ir3SNtYpbFmyeisHJuIww2MY0lAgQzxIZNw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486844; c=relaxed/simple;
	bh=WpaO6E6d4lQ17I42nTsw4iPOWxgreoNygD4rGF1BfSY=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=F3vG+FZIar0PSR4f3Ov8ebHQ1MMfH8DwrA1Pf47wVjbE1pxNwYzdii9ShXEw9U9NxYHy3vzlhba2JkLH9XWP3L5pMXXymNwVVVEqTJdVF5LTGPpCi9I89WPyUaD9QS9Hh7uaIeK+kZGvkaqove4GVIinjWTeG1OnhfEVWHDLxgs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=is4xf5wa; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="is4xf5wa"
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-220d1c24b25so31170575ad.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486842; x=1740091642;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=T2U39AcH2pHA/F+TfYeY5gc/lOd8HOvLuGlgxJNlsXk=;
        b=is4xf5wa8TO0pYycgep56N9Yc5hT4s+jgrvcfvJnE+iXCwxCagqapXniBmlASdKRbK
         lHSq4JbwytZjaquZ8SqtZeSsIQ1e4cHkmPARw7VMoxrT6m/oUV6/ghmJ2auOVdTDRjF/
         UhntJLKxP9+AhRDlFXTXXH57UA3Sg6bCaOzyrXEKaQFSSsOR3WbWfOB47M32YbkYdhd4
         NmS7zYynMF1UVcGJp0Qik7rb0Ii04Snpqz+UK3XjahifpuNS3ezZlVoni1y9MhxLsrqI
         CBFCEFWxSLYdtbPFzyQTf3IXCEgyQhzOmApwnDPBUmiQNVwQMRdqQ3EyDsWglVmW5afu
         2i1g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486842; x=1740091642;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=T2U39AcH2pHA/F+TfYeY5gc/lOd8HOvLuGlgxJNlsXk=;
        b=R2QZh6TYcwMBYZcWS43VNgyIFcIyTykq+d6KnZUbU1hfl06Pz2GImPe0fsTAgQMgCa
         KIqTjHdnVN9e8wS39X8mOnpnNJUhW2iJZLqQ7qSmbaiAyZDthNynljDg1Jk9O6pRFOyp
         U8sAIBErKKpJkx8RlL/O08HI0yDu+I09yjXYIZcmzbXIgo5EaPl2ifspjJAJ4yU+/oYC
         YzBAY9AhAKgCA2MY91vBt0op+M5j9AlFpZhNFrDxEQPo9lo4iBdkw4i91jjOmWnqPO7U
         n03EXbhcmUge1rs2acxH16gpfapHQIyGtK5D5zl4OIycDbX1dIlV9ZCBv/0BvzzG1c/d
         5odQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXD9z4Z5qwuOC8ZJdR+nZp/du2rSaHOjq/havgcyuhHk5xi14W9enY5j0EGTsIt1zSXvpQt6y1ngpSH0wQ=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyj2BH9iw4umwTsm73nCIW8K2k9XbqSTKilFtoAVwYo+yukLVVu
	IfXgDGEZO0QtV46FaNGIAUrEphgSG3oV30eDMcZkE7XYZHV0Fw3uyedYr2zPVSBg8DCvz1MqrNP
	ppg==
X-Google-Smtp-Source: 
 AGHT+IH/ab4L3c0kIEwgibMtPWiyLfRxDS8XonLWZO0QHAdpAq0IEkMMWm+kC4zlUCoEzXuh3expkwlJzJM=
X-Received: from pji4.prod.google.com ([2002:a17:90b:3fc4:b0:2ea:aa56:49c])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:902:f706:b0:21f:7964:e989
 with SMTP id d9443c01a7336-220d2368b12mr72348425ad.52.1739486841935; Thu, 13
 Feb 2025 14:47:21 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:48 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-12-surenb@google.com>
Subject: [PATCH v10 11/18] refcount: introduce
 __refcount_{add|inc}_not_zero_limited_acquire
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce functions to increase refcount but with a top limit above which
they will fail to increase (the limit is inclusive). Setting the limit to
INT_MAX indicates no limit.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Change refcount limit to be used with xxx_acquire functions

[1] https://lore.kernel.org/all/20250111042604.3230628-11-surenb@google.com/

 include/linux/refcount.h | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 4589d2e7bfea..80dc023ac2bf 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -213,13 +213,20 @@ static inline __must_check bool refcount_add_not_zero=
(int i, refcount_t *r)
 }
=20
 static inline __must_check __signed_wrap
-bool __refcount_add_not_zero_acquire(int i, refcount_t *r, int *oldp)
+bool __refcount_add_not_zero_limited_acquire(int i, refcount_t *r, int *ol=
dp,
+					     int limit)
 {
 	int old =3D refcount_read(r);
=20
 	do {
 		if (!old)
 			break;
+
+		if (i > limit - old) {
+			if (oldp)
+				*oldp =3D old;
+			return false;
+		}
 	} while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
=20
 	if (oldp)
@@ -231,6 +238,18 @@ bool __refcount_add_not_zero_acquire(int i, refcount_t=
 *r, int *oldp)
 	return old;
 }
=20
+static inline __must_check bool
+__refcount_inc_not_zero_limited_acquire(refcount_t *r, int *oldp, int limi=
t)
+{
+	return __refcount_add_not_zero_limited_acquire(1, r, oldp, limit);
+}
+
+static inline __must_check __signed_wrap
+bool __refcount_add_not_zero_acquire(int i, refcount_t *r, int *oldp)
+{
+	return __refcount_add_not_zero_limited_acquire(i, r, oldp, INT_MAX);
+}
+
 /**
  * refcount_add_not_zero_acquire - add a value to a refcount with acquire =
ordering unless it is 0
  *
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99B8127FE92
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:24 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486846; cv=none;
 b=WoL/PpwvkdV21v4ltYFwTSdTGxjDal4cRm1J9ReIexiQLe3kbR2A9nYWFKUOnvxWaOjzq1Znkj//L4CKxFE3PfS8kPS3c+VXQtZsC5PD6ANTgNkcfhZXR8jGHnTmMfJacS3Mv91psYtgEC+YEhlJlYAH00drPmmPt2GJI7C3vEU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486846; c=relaxed/simple;
	bh=z5rGC7bLaUBRVYMVo3oFk7L1/p6Jd740lfEdqHxRPzM=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=JDX+C3LDNATS6gTzkbpskLGOrgEsaCGzVZ5leJEc00Ue6TyMLj9vo6IXCjp1T6qaoAhCzUNQZ9/Q5ngk/wbQnzUm89cyNtY/FJPtmPZvRpROCMhY8lx9gyaFsw8hfSzlFz5sXzTIHO/7im5naqSRokuIGW2J4q0o5zOe5NlfCik=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=eyge4hcf; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="eyge4hcf"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-21f5ae4d62dso54142585ad.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486844; x=1740091644;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=wFgt1YPKwtEKZHtY8eCCGOT/871l+BokJrFagsVrDsk=;
        b=eyge4hcfuMoLOkuBfB/OMfr19GNzjlGVPuB7on+WcTSnjxoU/PtAveu21gXacrI4i0
         7wK1MtgscOrr4aL41D2MDD1eKPoDGK/IwO12MIIqiHl0IOPwJNlhWk1eqv4AodrOXMJw
         fRok1TGAg0wIEkEjuxSY/8AQlqDQv3nXtKp3cCuJFBzwpYvsDKnrZbOc6n1CXgO4CrNs
         Umbz+k471flYhQThlcjTibnGar315UlMbmePSLsOgRUreNPu39wfqLSkY6gKL6+DVVu7
         tCzE1laU6BxGJwO/yk9uGdH07DTn0kn0ka+9nFEm9E1bKMnaYwafA2tm19fLNfGZLBzz
         0s/g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486844; x=1740091644;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=wFgt1YPKwtEKZHtY8eCCGOT/871l+BokJrFagsVrDsk=;
        b=ezEWYHA9J/zt7g4MmB+begjvE47XEr9vwVAEbv4m0eEBcDJgR1a+kkIpk+Zb/0x4oV
         JG9ouCGeu9jXFu5Z0l0gnApZMNRZpnBbITyXLFXEJgkeVvPyXgpYFrwMCVBT5KqGtUnT
         nk8jYce/OEtfKDIG4ipBEHatTSa6qMjnbBtJGSsaqCGex/IGNeY2ZhiF2h4018LdMtgd
         BFxU2oidf4ZYlQ0UrhjtjwObbNrFYG3rld95nU17YibqgJE6lYtqTl+fvLaUnIZl0WVZ
         bYj2cfwPdYgpQanXlGOMEAWobmx8TtDyTlHHVawTPCLm7/EG7cu8Mof+5hHC2VXYzxmh
         HKAQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXYPysopqKx4SGyqFGY70syXAXIPITYBeNSBrauRjWrerFl1r/ILfw8jlF5tMgyQLGto1+rLkKkleHBHps=@vger.kernel.org
X-Gm-Message-State: AOJu0YyOr+bkNcYQr3vhFmZEBx6eCHHK8GkqrwAFFVw/4b3FIQ76NTxX
	9rUPUt1P9MgRkAHOtzQ7AUU3TS3tP8sogzGijZsPSYpW2Ea7l515otkd3F2nSl/Tu6/YXZ7AqvG
	Ouw==
X-Google-Smtp-Source: 
 AGHT+IFBfclnXX1eyWHaQBadHjNWP/bw7xtO4voK2pMlEan4fNA0aBMg12hKdOHhEFlJAYATtSiZC0Oajhs=
X-Received: from plgb14.prod.google.com ([2002:a17:902:d50e:b0:21f:39fb:79d3])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:903:32c8:b0:220:ea90:1925
 with SMTP id d9443c01a7336-220ea9019a9mr22932205ad.35.1739486843944; Thu, 13
 Feb 2025 14:47:23 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:49 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-13-surenb@google.com>
Subject: [PATCH v10 12/18] mm: replace vm_lock and detached flag with a
 reference count
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

rw_semaphore is a sizable structure of 40 bytes and consumes
considerable space for each vm_area_struct. However vma_lock has
two important specifics which can be used to replace rw_semaphore
with a simpler structure:
1. Readers never wait. They try to take the vma_lock and fall back to
mmap_lock if that fails.
2. Only one writer at a time will ever try to write-lock a vma_lock
because writers first take mmap_lock in write mode.
Because of these requirements, full rw_semaphore functionality is not
needed and we can replace rw_semaphore and the vma->detached flag with
a refcount (vm_refcnt).

When vma is in detached state, vm_refcnt is 0 and only a call to
vma_mark_attached() can take it out of this state. Note that unlike
before, now we enforce both vma_mark_attached() and vma_mark_detached()
to be done only after vma has been write-locked. vma_mark_attached()
changes vm_refcnt to 1 to indicate that it has been attached to the vma
tree. When a reader takes read lock, it increments vm_refcnt, unless the
top usable bit of vm_refcnt (0x40000000) is set, indicating presence of
a writer. When writer takes write lock, it sets the top usable bit to
indicate its presence. If there are readers, writer will wait using newly
introduced mm->vma_writer_wait. Since all writers take mmap_lock in write
mode first, there can be only one writer at a time. The last reader to
release the lock will signal the writer to wake up.
refcount might overflow if there are many competing readers, in which case
read-locking will fail. Readers are expected to handle such failures.

In summary:
1. all readers increment the vm_refcnt;
2. writer sets top usable (writer) bit of vm_refcnt;
3. readers cannot increment the vm_refcnt if the writer bit is set;
4. in the presence of readers, writer must wait for the vm_refcnt to drop
to 1 (plus the VMA_LOCK_OFFSET writer bit), indicating an attached vma
with no readers;
5. vm_refcnt overflow is handled by the readers.

While this vm_lock replacement does not yet result in a smaller
vm_area_struct (it stays at 256 bytes due to cacheline alignment), it
allows for further size optimization by structure member regrouping
to bring the size of vm_area_struct below 192 bytes.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Use __refcount_inc_not_zero_limited_acquire() in vma_start_read(),
per Hillf Danton
- Refactor vma_assert_locked() to avoid vm_refcnt read when CONFIG_DEBUG_VM=
=3Dn,
per Mateusz Guzik
- Update changelog, per Wei Yang
- Change vma_start_read() to return EAGAIN if vma got isolated and changed
lock_vma_under_rcu() back to detect this condition, per Wei Yang
- Change VM_BUG_ON_VMA() to WARN_ON_ONCE() when checking vma detached state,
per Lorenzo Stoakes
- Remove Vlastimil's Reviewed-by since code is changed

[1] https://lore.kernel.org/all/20250111042604.3230628-12-surenb@google.com/

 include/linux/mm.h               | 128 ++++++++++++++++++++-----------
 include/linux/mm_types.h         |  22 +++---
 kernel/fork.c                    |  13 ++--
 mm/init-mm.c                     |   1 +
 mm/memory.c                      |  91 +++++++++++++++++++---
 tools/testing/vma/linux/atomic.h |   5 ++
 tools/testing/vma/vma_internal.h |  63 ++++++++-------
 7 files changed, 218 insertions(+), 105 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 557d66e187ff..11a042c27aee 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -32,6 +32,7 @@
 #include <linux/memremap.h>
 #include <linux/slab.h>
 #include <linux/cacheinfo.h>
+#include <linux/rcuwait.h>
=20
 struct mempolicy;
 struct anon_vma;
@@ -697,19 +698,54 @@ static inline void vma_numab_state_free(struct vm_are=
a_struct *vma) {}
 #endif /* CONFIG_NUMA_BALANCING */
=20
 #ifdef CONFIG_PER_VMA_LOCK
-static inline void vma_lock_init(struct vm_area_struct *vma)
+static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_re=
fcnt)
 {
-	init_rwsem(&vma->vm_lock.lock);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	static struct lock_class_key lockdep_key;
+
+	lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0);
+#endif
+	if (reset_refcnt)
+		refcount_set(&vma->vm_refcnt, 0);
 	vma->vm_lock_seq =3D UINT_MAX;
 }
=20
+static inline bool is_vma_writer_only(int refcnt)
+{
+	/*
+	 * With a writer and no readers, refcnt is VMA_LOCK_OFFSET if the vma
+	 * is detached and (VMA_LOCK_OFFSET + 1) if it is attached. Waiting on
+	 * a detached vma happens only in vma_mark_detached() and is a rare
+	 * case, therefore most of the time there will be no unnecessary wakeup.
+	 */
+	return refcnt & VMA_LOCK_OFFSET && refcnt <=3D VMA_LOCK_OFFSET + 1;
+}
+
+static inline void vma_refcount_put(struct vm_area_struct *vma)
+{
+	/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+	struct mm_struct *mm =3D vma->vm_mm;
+	int oldcnt;
+
+	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
+	if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
+
+		if (is_vma_writer_only(oldcnt - 1))
+			rcuwait_wake_up(&mm->vma_writer_wait);
+	}
+}
+
 /*
  * Try to read-lock a vma. The function is allowed to occasionally yield f=
alse
  * locked result to avoid performance overhead, in which case we fall back=
 to
  * using mmap_lock. The function should never yield false unlocked result.
+ * Returns the vma on success, NULL on failure to lock and EAGAIN if vma g=
ot
+ * detached.
  */
-static inline bool vma_start_read(struct vm_area_struct *vma)
+static inline struct vm_area_struct *vma_start_read(struct vm_area_struct =
*vma)
 {
+	int oldcnt;
+
 	/*
 	 * Check before locking. A race might cause false locked result.
 	 * We can use READ_ONCE() for the mm_lock_seq here, and don't need
@@ -718,15 +754,25 @@ static inline bool vma_start_read(struct vm_area_stru=
ct *vma)
 	 * need ordering is below.
 	 */
 	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.=
sequence))
-		return false;
+		return NULL;
=20
-	if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0))
-		return false;
+	/*
+	 * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
+	 * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
+	 * Acquire fence is required here to avoid reordering against later
+	 * vm_lock_seq check and checks inside lock_vma_under_rcu().
+	 */
+	if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &o=
ldcnt,
+							      VMA_REF_LIMIT))) {
+		/* return EAGAIN if vma got detached from under us */
+		return oldcnt ? NULL : ERR_PTR(-EAGAIN);
+	}
=20
+	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
 	/*
-	 * Overflow might produce false locked result.
+	 * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
 	 * False unlocked result is impossible because we modify and check
-	 * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
+	 * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq
 	 * modification invalidates all existing locks.
 	 *
 	 * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
@@ -735,10 +781,11 @@ static inline bool vma_start_read(struct vm_area_stru=
ct *vma)
 	 * This pairs with RELEASE semantics in vma_end_write_all().
 	 */
 	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo=
ck_seq))) {
-		up_read(&vma->vm_lock.lock);
-		return false;
+		vma_refcount_put(vma);
+		return NULL;
 	}
-	return true;
+
+	return vma;
 }
=20
 /*
@@ -749,8 +796,14 @@ static inline bool vma_start_read(struct vm_area_struc=
t *vma)
  */
 static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
 {
+	int oldcnt;
+
 	mmap_assert_locked(vma->vm_mm);
-	down_read_nested(&vma->vm_lock.lock, subclass);
+	if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &o=
ldcnt,
+							      VMA_REF_LIMIT)))
+		return false;
+
+	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
 	return true;
 }
=20
@@ -762,16 +815,12 @@ static inline bool vma_start_read_locked_nested(struc=
t vm_area_struct *vma, int
  */
 static inline bool vma_start_read_locked(struct vm_area_struct *vma)
 {
-	mmap_assert_locked(vma->vm_mm);
-	down_read(&vma->vm_lock.lock);
-	return true;
+	return vma_start_read_locked_nested(vma, 0);
 }
=20
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
-	rcu_read_lock(); /* keeps vma alive till the end of up_read */
-	up_read(&vma->vm_lock.lock);
-	rcu_read_unlock();
+	vma_refcount_put(vma);
 }
=20
 /* WARNING! Can only be used if mmap_lock is expected to be write-locked */
@@ -813,38 +862,35 @@ static inline void vma_assert_write_locked(struct vm_=
area_struct *vma)
=20
 static inline void vma_assert_locked(struct vm_area_struct *vma)
 {
-	if (!rwsem_is_locked(&vma->vm_lock.lock))
-		vma_assert_write_locked(vma);
+	unsigned int mm_lock_seq;
+
+	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <=3D 1 &&
+		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
 }
=20
+/*
+ * WARNING: to avoid racing with vma_mark_attached()/vma_mark_detached(), =
these
+ * assertions should be made either under mmap_write_lock or when the obje=
ct
+ * has been isolated under mmap_write_lock, ensuring no competing writers.
+ */
 static inline void vma_assert_attached(struct vm_area_struct *vma)
 {
-	WARN_ON_ONCE(vma->detached);
+	WARN_ON_ONCE(!refcount_read(&vma->vm_refcnt));
 }
=20
 static inline void vma_assert_detached(struct vm_area_struct *vma)
 {
-	WARN_ON_ONCE(!vma->detached);
+	WARN_ON_ONCE(refcount_read(&vma->vm_refcnt));
 }
=20
 static inline void vma_mark_attached(struct vm_area_struct *vma)
 {
-	vma_assert_detached(vma);
-	vma->detached =3D false;
-}
-
-static inline void vma_mark_detached(struct vm_area_struct *vma)
-{
-	/* When detaching vma should be write-locked */
 	vma_assert_write_locked(vma);
-	vma_assert_attached(vma);
-	vma->detached =3D true;
+	vma_assert_detached(vma);
+	refcount_set(&vma->vm_refcnt, 1);
 }
=20
-static inline bool is_vma_detached(struct vm_area_struct *vma)
-{
-	return vma->detached;
-}
+void vma_mark_detached(struct vm_area_struct *vma);
=20
 static inline void release_fault_lock(struct vm_fault *vmf)
 {
@@ -867,9 +913,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str=
uct *mm,
=20
 #else /* CONFIG_PER_VMA_LOCK */
=20
-static inline void vma_lock_init(struct vm_area_struct *vma) {}
-static inline bool vma_start_read(struct vm_area_struct *vma)
-		{ return false; }
+static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_re=
fcnt) {}
+static inline struct vm_area_struct *vma_start_read(struct vm_area_struct =
*vma)
+		{ return NULL; }
 static inline void vma_end_read(struct vm_area_struct *vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
@@ -910,12 +956,8 @@ static inline void vma_init(struct vm_area_struct *vma=
, struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-#ifdef CONFIG_PER_VMA_LOCK
-	/* vma is not locked, can't use vma_mark_detached() */
-	vma->detached =3D true;
-#endif
 	vma_numab_state_init(vma);
-	vma_lock_init(vma);
+	vma_lock_init(vma, false);
 }
=20
 /* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8a645bcb2b31..48ddfedfff83 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -19,6 +19,7 @@
 #include <linux/workqueue.h>
 #include <linux/seqlock.h>
 #include <linux/percpu_counter.h>
+#include <linux/types.h>
=20
 #include <asm/mmu.h>
=20
@@ -639,9 +640,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc=
(const char *name)
 }
 #endif
=20
-struct vma_lock {
-	struct rw_semaphore lock;
-};
+#define VMA_LOCK_OFFSET	0x40000000
+#define VMA_REF_LIMIT	(VMA_LOCK_OFFSET - 1)
=20
 struct vma_numab_state {
 	/*
@@ -719,19 +719,13 @@ struct vm_area_struct {
 	};
=20
 #ifdef CONFIG_PER_VMA_LOCK
-	/*
-	 * Flag to indicate areas detached from the mm->mm_mt tree.
-	 * Unstable RCU readers are allowed to read this.
-	 */
-	bool detached;
-
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
-	 *  - vm_lock->lock (in write mode)
+	 *  - vm_refcnt bit at VMA_LOCK_OFFSET is set
 	 * Can be read reliably while holding one of:
 	 *  - mmap_lock (in read or write mode)
-	 *  - vm_lock->lock (in read or write mode)
+	 *  - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1
 	 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
 	 * while holding nothing (except RCU to keep the VMA struct allocated).
 	 *
@@ -794,7 +788,10 @@ struct vm_area_struct {
 	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
 #ifdef CONFIG_PER_VMA_LOCK
 	/* Unstable RCU readers are allowed to read this. */
-	struct vma_lock vm_lock ____cacheline_aligned_in_smp;
+	refcount_t vm_refcnt ____cacheline_aligned_in_smp;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map vmlock_dep_map;
+#endif
 #endif
 } __randomize_layout;
=20
@@ -929,6 +926,7 @@ struct mm_struct {
 					  * by mmlist_lock
 					  */
 #ifdef CONFIG_PER_VMA_LOCK
+		struct rcuwait vma_writer_wait;
 		/*
 		 * This field has lock-like semantics, meaning it is sometimes
 		 * accessed with ACQUIRE/RELEASE semantics.
diff --git a/kernel/fork.c b/kernel/fork.c
index f1af413e5aa4..48a0038f606f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -463,12 +463,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru=
ct *orig)
 	 * will be reinitialized.
 	 */
 	data_race(memcpy(new, orig, sizeof(*new)));
-	vma_lock_init(new);
+	vma_lock_init(new, true);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
-#ifdef CONFIG_PER_VMA_LOCK
-	/* vma is not locked, can't use vma_mark_detached() */
-	new->detached =3D true;
-#endif
 	vma_numab_state_init(new);
 	dup_anon_vma_name(orig, new);
=20
@@ -477,6 +473,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc=
t *orig)
=20
 void __vm_area_free(struct vm_area_struct *vma)
 {
+	/* The vma should be detached while being destroyed. */
+	vma_assert_detached(vma);
 	vma_numab_state_free(vma);
 	free_anon_vma_name(vma);
 	kmem_cache_free(vm_area_cachep, vma);
@@ -488,8 +486,6 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
 	struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct,
 						  vm_rcu);
=20
-	/* The vma should not be locked while being destroyed. */
-	VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma);
 	__vm_area_free(vma);
 }
 #endif
@@ -1234,6 +1230,9 @@ static void mmap_init_lock(struct mm_struct *mm)
 {
 	init_rwsem(&mm->mmap_lock);
 	mm_lock_seqcount_init(mm);
+#ifdef CONFIG_PER_VMA_LOCK
+	rcuwait_init(&mm->vma_writer_wait);
+#endif
 }
=20
 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct =
*p,
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 6af3ad675930..4600e7605cab 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -40,6 +40,7 @@ struct mm_struct init_mm =3D {
 	.arg_lock	=3D  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
 	.mmlist		=3D LIST_HEAD_INIT(init_mm.mmlist),
 #ifdef CONFIG_PER_VMA_LOCK
+	.vma_writer_wait =3D __RCUWAIT_INITIALIZER(init_mm.vma_writer_wait),
 	.mm_lock_seq	=3D SEQCNT_ZERO(init_mm.mm_lock_seq),
 #endif
 	.user_ns	=3D &init_user_ns,
diff --git a/mm/memory.c b/mm/memory.c
index 3d9c5141193f..528407c0d7cf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6393,9 +6393,47 @@ struct vm_area_struct *lock_mm_and_find_vma(struct m=
m_struct *mm,
 #endif
=20
 #ifdef CONFIG_PER_VMA_LOCK
+static inline bool __vma_enter_locked(struct vm_area_struct *vma, bool det=
aching)
+{
+	unsigned int tgt_refcnt =3D VMA_LOCK_OFFSET;
+
+	/* Additional refcnt if the vma is attached. */
+	if (!detaching)
+		tgt_refcnt++;
+
+	/*
+	 * If vma is detached then only vma_mark_attached() can raise the
+	 * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
+	 */
+	if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
+		return false;
+
+	rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
+	rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
+		   refcount_read(&vma->vm_refcnt) =3D=3D tgt_refcnt,
+		   TASK_UNINTERRUPTIBLE);
+	lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
+
+	return true;
+}
+
+static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *det=
ached)
+{
+	*detached =3D refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt);
+	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
+}
+
 void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se=
q)
 {
-	down_write(&vma->vm_lock.lock);
+	bool locked;
+
+	/*
+	 * __vma_enter_locked() returns false immediately if the vma is not
+	 * attached, otherwise it waits until refcnt is indicating that vma
+	 * is attached with no readers.
+	 */
+	locked =3D __vma_enter_locked(vma, false);
+
 	/*
 	 * We should use WRITE_ONCE() here because we can have concurrent reads
 	 * from the early lockless pessimistic check in vma_start_read().
@@ -6403,10 +6441,40 @@ void __vma_start_write(struct vm_area_struct *vma, =
unsigned int mm_lock_seq)
 	 * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
-	up_write(&vma->vm_lock.lock);
+
+	if (locked) {
+		bool detached;
+
+		__vma_exit_locked(vma, &detached);
+		WARN_ON_ONCE(detached); /* vma should remain attached */
+	}
 }
 EXPORT_SYMBOL_GPL(__vma_start_write);
=20
+void vma_mark_detached(struct vm_area_struct *vma)
+{
+	vma_assert_write_locked(vma);
+	vma_assert_attached(vma);
+
+	/*
+	 * We are the only writer, so no need to use vma_refcount_put().
+	 * The condition below is unlikely because the vma has been already
+	 * write-locked and readers can increment vm_refcnt only temporarily
+	 * before they check vm_lock_seq, realize the vma is locked and drop
+	 * back the vm_refcnt. That is a narrow window for observing a raised
+	 * vm_refcnt.
+	 */
+	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
+		/* Wait until vma is detached with no readers. */
+		if (__vma_enter_locked(vma, true)) {
+			bool detached;
+
+			__vma_exit_locked(vma, &detached);
+			WARN_ON_ONCE(!detached);
+		}
+	}
+}
+
 /*
  * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed =
to be
  * stable and not isolated. If the VMA is not found or is being modified t=
he
@@ -6424,15 +6492,18 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm=
_struct *mm,
 	if (!vma)
 		goto inval;
=20
-	if (!vma_start_read(vma))
-		goto inval;
+	vma =3D vma_start_read(vma);
+	if (IS_ERR_OR_NULL(vma)) {
+		/* Check if the VMA got isolated after we found it */
+		if (PTR_ERR(vma) =3D=3D -EAGAIN) {
+			vma_end_read(vma);
+			count_vm_vma_lock_event(VMA_LOCK_MISS);
+			/* The area was replaced with another one */
+			goto retry;
+		}
=20
-	/* Check if the VMA got isolated after we found it */
-	if (is_vma_detached(vma)) {
-		vma_end_read(vma);
-		count_vm_vma_lock_event(VMA_LOCK_MISS);
-		/* The area was replaced with another one */
-		goto retry;
+		/* Failed to lock the VMA */
+		goto inval;
 	}
 	/*
 	 * At this point, we have a stable reference to a VMA: The VMA is
diff --git a/tools/testing/vma/linux/atomic.h b/tools/testing/vma/linux/ato=
mic.h
index 3e1b6adc027b..788c597c4fde 100644
--- a/tools/testing/vma/linux/atomic.h
+++ b/tools/testing/vma/linux/atomic.h
@@ -9,4 +9,9 @@
 #define atomic_set(x, y) uatomic_set(x, y)
 #define U8_MAX UCHAR_MAX
=20
+#ifndef atomic_cmpxchg_relaxed
+#define  atomic_cmpxchg_relaxed		uatomic_cmpxchg
+#define  atomic_cmpxchg_release         uatomic_cmpxchg
+#endif /* atomic_cmpxchg_relaxed */
+
 #endif	/* _LINUX_ATOMIC_H */
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index 34277842156c..ba838097d3f6 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -25,7 +25,7 @@
 #include <linux/maple_tree.h>
 #include <linux/mm.h>
 #include <linux/rbtree.h>
-#include <linux/rwsem.h>
+#include <linux/refcount.h>
=20
 extern unsigned long stack_guard_gap;
 #ifdef CONFIG_MMU
@@ -135,10 +135,6 @@ typedef __bitwise unsigned int vm_fault_t;
  */
 #define pr_warn_once pr_err
=20
-typedef struct refcount_struct {
-	atomic_t refs;
-} refcount_t;
-
 struct kref {
 	refcount_t refcount;
 };
@@ -233,15 +229,12 @@ struct mm_struct {
 	unsigned long flags; /* Must use atomic bitops to access */
 };
=20
-struct vma_lock {
-	struct rw_semaphore lock;
-};
-
-
 struct file {
 	struct address_space	*f_mapping;
 };
=20
+#define VMA_LOCK_OFFSET	0x40000000
+
 struct vm_area_struct {
 	/* The first cache line has the info for VMA tree walking. */
=20
@@ -269,16 +262,13 @@ struct vm_area_struct {
 	};
=20
 #ifdef CONFIG_PER_VMA_LOCK
-	/* Flag to indicate areas detached from the mm->mm_mt tree */
-	bool detached;
-
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
-	 *  - vm_lock.lock (in write mode)
+	 *  - vm_refcnt bit at VMA_LOCK_OFFSET is set
 	 * Can be read reliably while holding one of:
 	 *  - mmap_lock (in read or write mode)
-	 *  - vm_lock.lock (in read or write mode)
+	 *  - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1
 	 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
 	 * while holding nothing (except RCU to keep the VMA struct allocated).
 	 *
@@ -287,7 +277,6 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-	struct vma_lock vm_lock;
 #endif
=20
 	/*
@@ -340,6 +329,10 @@ struct vm_area_struct {
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
 	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+#ifdef CONFIG_PER_VMA_LOCK
+	/* Unstable RCU readers are allowed to read this. */
+	refcount_t vm_refcnt;
+#endif
 } __randomize_layout;
=20
 struct vm_fault {};
@@ -464,33 +457,40 @@ static inline struct vm_area_struct *vma_next(struct =
vma_iterator *vmi)
 	return mas_find(&vmi->mas, ULONG_MAX);
 }
=20
-static inline void vma_lock_init(struct vm_area_struct *vma)
-{
-	init_rwsem(&vma->vm_lock.lock);
-	vma->vm_lock_seq =3D UINT_MAX;
-}
-
+/*
+ * WARNING: to avoid racing with vma_mark_attached()/vma_mark_detached(), =
these
+ * assertions should be made either under mmap_write_lock or when the obje=
ct
+ * has been isolated under mmap_write_lock, ensuring no competing writers.
+ */
 static inline void vma_assert_attached(struct vm_area_struct *vma)
 {
-	WARN_ON_ONCE(vma->detached);
+	WARN_ON_ONCE(!refcount_read(&vma->vm_refcnt));
 }
=20
 static inline void vma_assert_detached(struct vm_area_struct *vma)
 {
-	WARN_ON_ONCE(!vma->detached);
+	WARN_ON_ONCE(refcount_read(&vma->vm_refcnt));
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *);
 static inline void vma_mark_attached(struct vm_area_struct *vma)
 {
-	vma->detached =3D false;
+	vma_assert_write_locked(vma);
+	vma_assert_detached(vma);
+	refcount_set(&vma->vm_refcnt, 1);
 }
=20
 static inline void vma_mark_detached(struct vm_area_struct *vma)
 {
-	/* When detaching vma should be write-locked */
 	vma_assert_write_locked(vma);
-	vma->detached =3D true;
+	vma_assert_attached(vma);
+	/* We are the only writer, so no need to use vma_refcount_put(). */
+	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
+		/*
+		 * Reader must have temporarily raised vm_refcnt but it will
+		 * drop it without using the vma since vma is write-locked.
+		 */
+	}
 }
=20
 extern const struct vm_operations_struct vma_dummy_vm_ops;
@@ -503,9 +503,7 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-	/* vma is not locked, can't use vma_mark_detached() */
-	vma->detached =3D true;
-	vma_lock_init(vma);
+	vma->vm_lock_seq =3D UINT_MAX;
 }
=20
 static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
@@ -528,10 +526,9 @@ static inline struct vm_area_struct *vm_area_dup(struc=
t vm_area_struct *orig)
 		return NULL;
=20
 	memcpy(new, orig, sizeof(*new));
-	vma_lock_init(new);
+	refcount_set(&new->vm_refcnt, 0);
+	new->vm_lock_seq =3D UINT_MAX;
 	INIT_LIST_HEAD(&new->anon_vma_chain);
-	/* vma is not locked, can't use vma_mark_detached() */
-	new->detached =3D true;
=20
 	return new;
 }
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93589280A25
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486849; cv=none;
 b=gjC7vc4dy2oNtBJdkkciEoy7xQREVa8uDdbqiUaYIN9Qu52V3yg+T/ekZxeofBX8OI/6/DJFr/u7W5hPLo3IFxQUrz+ezPVKHsARzHasBwjNS6ys5gcitaSvLj3K5LqXFiScs6dB65qA+NN0fTeJ8bPBh37yoEKdb1L1kmmFF8Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486849; c=relaxed/simple;
	bh=r9UnP2J9gPwMql3KM31oVxbeTh6TkWqpKwsWUGtheZ8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=t9WZ7fqB6Ezbsj7CHxlNix0Yz8QW1IbFs6PHwx/r1EQp1ljFeKcDrzyl7uszYWkPaj5ywffo4VRkEo3/cmfOgRKStCvljq31aRpQxfkBa3Co6T0Kx79eBWUiQR2LfZ8sZ62vgBVistruRi9pT3IVCTcMPShWQgQ1w7EqXM4RsOs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=2tdpIUvf; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="2tdpIUvf"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-2fc1cb0c2cbso2917170a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486846; x=1740091646;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=p2XHEc7Lc99pXt/+24Q50pb3G2smdSAP5b2ttcYEcJw=;
        b=2tdpIUvfcSvCPNil8p321yyyb4x6P46eUT/Pioq9k5oYqlGabBU+BCPW+aR9BhgxV4
         wh9aD1BU9pO7f8EPgix7kJUx/WM6SkBgLJDQUGgcKK/jGKccH/Jy5sZLEUBgNvTgR5Lj
         1oaTqDgmuvDUOkWd2+dcB2ogPJ9fQH6mB86IwfnobQ8Q4TOrf4lOYuUz7j1p0+S5O+yv
         prt05ugF/MZxrGUbPssTdxZ9PfYz4FhuNwJk91S+8neR3e3OcqhQdApm3bQGLdJw4drR
         WH2kzgpXY8DIqxeSW7Nv1jd5n0hivXFiMdl/w357vhteyKb6GFEO/Q92WWjFHRxJIibq
         +CSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486846; x=1740091646;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=p2XHEc7Lc99pXt/+24Q50pb3G2smdSAP5b2ttcYEcJw=;
        b=HmCO1+NiOK2BZ0rxBHWttkwInM8QCVZttnMJ2bnVGsPUa0tyBC6tXPtxUv43513436
         hJ7JNJmaEYLPQXmauPHlSVDfE3xWq3pUrcxFMrjKsLAFZ1cW4f16N1w6cRTfOaJ7bRWz
         gkrZBL3VENYg1ZDC0MAOlNU/rytA4UvqhTxpNTSFscpf4edHoxNYvgVf6HE5tueu6d8X
         DOEb/sr/kt69kg7zlM70d43ZE/EnjrstsyENbK7MRzZAQMa3IV0UppTPMBt2R1aByj+z
         8osCMOBB32mTqzFZdpmZLiKGzWVW+kgo9wa2ynDMNf+Q+v0XYretx7sCh3l6mM5F7VvI
         2F6g==
X-Forwarded-Encrypted: i=1;
 AJvYcCUnSjs5t1P9x/QVLQSwRrofiZ1ccXoSHfJSlg9809qW1DKXzSduC64JnxfgUV76u1sVXsGJh5dbcDiRl1I=@vger.kernel.org
X-Gm-Message-State: AOJu0YzfKXLoyDAAsUcqkGSPYxtiP5phFTxVfnq8dTeFUTO/2QekGZTY
	9u+dRTTcTgQAFPE/bCRTLEX2aTCwm7R0rmV1tFpIwnCt3VdNyyYl+egXiBD761SEMuopCnNcorh
	NYQ==
X-Google-Smtp-Source: 
 AGHT+IHxvu/76zqwjXOeUtKeapkgzxoZmdl37UyKOvAGkMpiwzZvuuSG6bTHkHJu7fwwbTV32aeg1MBY6BY=
X-Received: from pjap1.prod.google.com ([2002:a17:90a:e41:b0:2fc:c98:ea47])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:1e47:b0:2ee:74a1:fba2
 with SMTP id 98e67ed59e1d1-2fc0e5b92e1mr6734995a91.20.1739486845903; Thu, 13
 Feb 2025 14:47:25 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:50 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-14-surenb@google.com>
Subject: [PATCH v10 13/18] mm: move lesser used vma_area_struct members into
 the last cacheline
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move several vma_area_struct members which are rarely or never used
during page fault handling into the last cacheline to better pack
vm_area_struct. As a result vm_area_struct will fit into 3 as opposed
to 4 cachelines. New typical vm_area_struct layout:

struct vm_area_struct {
    union {
        struct {
            long unsigned int vm_start;              /*     0     8 */
            long unsigned int vm_end;                /*     8     8 */
        };                                           /*     0    16 */
        freeptr_t          vm_freeptr;               /*     0     8 */
    };                                               /*     0    16 */
    struct mm_struct *         vm_mm;                /*    16     8 */
    pgprot_t                   vm_page_prot;         /*    24     8 */
    union {
        const vm_flags_t   vm_flags;                 /*    32     8 */
        vm_flags_t         __vm_flags;               /*    32     8 */
    };                                               /*    32     8 */
    unsigned int               vm_lock_seq;          /*    40     4 */

    /* XXX 4 bytes hole, try to pack */

    struct list_head           anon_vma_chain;       /*    48    16 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    struct anon_vma *          anon_vma;             /*    64     8 */
    const struct vm_operations_struct  * vm_ops;     /*    72     8 */
    long unsigned int          vm_pgoff;             /*    80     8 */
    struct file *              vm_file;              /*    88     8 */
    void *                     vm_private_data;      /*    96     8 */
    atomic_long_t              swap_readahead_info;  /*   104     8 */
    struct mempolicy *         vm_policy;            /*   112     8 */
    struct vma_numab_state *   numab_state;          /*   120     8 */
    /* --- cacheline 2 boundary (128 bytes) --- */
    refcount_t          vm_refcnt (__aligned__(64)); /*   128     4 */

    /* XXX 4 bytes hole, try to pack */

    struct {
        struct rb_node     rb (__aligned__(8));      /*   136    24 */
        long unsigned int  rb_subtree_last;          /*   160     8 */
    } __attribute__((__aligned__(8))) shared;        /*   136    32 */
    struct anon_vma_name *     anon_name;            /*   168     8 */
    struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   176     8 */

    /* size: 192, cachelines: 3, members: 18 */
    /* sum members: 176, holes: 2, sum holes: 8 */
    /* padding: 8 */
    /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
} __attribute__((__aligned__(64)));

Memory consumption per 1000 VMAs becomes 48 pages:

    slabinfo after vm_area_struct changes:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vm_area_struct   ...    192   42    2 : ...

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Update vm_area_struct for tests, per Lorenzo Stoakes
- Add Reviewed-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-13-surenb@google.com/

 include/linux/mm_types.h         | 38 +++++++++++++++-----------------
 tools/testing/vma/vma_internal.h | 37 +++++++++++++++----------------
 2 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 48ddfedfff83..63ab51699120 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -735,17 +735,6 @@ struct vm_area_struct {
 	 */
 	unsigned int vm_lock_seq;
 #endif
-
-	/*
-	 * For areas with an address space and backing store,
-	 * linkage into the address_space->i_mmap interval tree.
-	 *
-	 */
-	struct {
-		struct rb_node rb;
-		unsigned long rb_subtree_last;
-	} shared;
-
 	/*
 	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
 	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
@@ -765,14 +754,6 @@ struct vm_area_struct {
 	struct file * vm_file;		/* File we map to (can be NULL). */
 	void * vm_private_data;		/* was vm_pte (shared mem) */
=20
-#ifdef CONFIG_ANON_VMA_NAME
-	/*
-	 * For private and shared anonymous mappings, a pointer to a null
-	 * terminated string containing the name given to the vma, or NULL if
-	 * unnamed. Serialized by mmap_lock. Use anon_vma_name to access.
-	 */
-	struct anon_vma_name *anon_name;
-#endif
 #ifdef CONFIG_SWAP
 	atomic_long_t swap_readahead_info;
 #endif
@@ -785,7 +766,6 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA_BALANCING
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
-	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
 #ifdef CONFIG_PER_VMA_LOCK
 	/* Unstable RCU readers are allowed to read this. */
 	refcount_t vm_refcnt ____cacheline_aligned_in_smp;
@@ -793,6 +773,24 @@ struct vm_area_struct {
 	struct lockdep_map vmlock_dep_map;
 #endif
 #endif
+	/*
+	 * For areas with an address space and backing store,
+	 * linkage into the address_space->i_mmap interval tree.
+	 *
+	 */
+	struct {
+		struct rb_node rb;
+		unsigned long rb_subtree_last;
+	} shared;
+#ifdef CONFIG_ANON_VMA_NAME
+	/*
+	 * For private and shared anonymous mappings, a pointer to a null
+	 * terminated string containing the name given to the vma, or NULL if
+	 * unnamed. Serialized by mmap_lock. Use anon_vma_name to access.
+	 */
+	struct anon_vma_name *anon_name;
+#endif
+	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
 } __randomize_layout;
=20
 #ifdef CONFIG_NUMA
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index ba838097d3f6..b385170fbb8f 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -279,16 +279,6 @@ struct vm_area_struct {
 	unsigned int vm_lock_seq;
 #endif
=20
-	/*
-	 * For areas with an address space and backing store,
-	 * linkage into the address_space->i_mmap interval tree.
-	 *
-	 */
-	struct {
-		struct rb_node rb;
-		unsigned long rb_subtree_last;
-	} shared;
-
 	/*
 	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
 	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
@@ -308,14 +298,6 @@ struct vm_area_struct {
 	struct file * vm_file;		/* File we map to (can be NULL). */
 	void * vm_private_data;		/* was vm_pte (shared mem) */
=20
-#ifdef CONFIG_ANON_VMA_NAME
-	/*
-	 * For private and shared anonymous mappings, a pointer to a null
-	 * terminated string containing the name given to the vma, or NULL if
-	 * unnamed. Serialized by mmap_lock. Use anon_vma_name to access.
-	 */
-	struct anon_vma_name *anon_name;
-#endif
 #ifdef CONFIG_SWAP
 	atomic_long_t swap_readahead_info;
 #endif
@@ -328,11 +310,28 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA_BALANCING
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
-	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
 #ifdef CONFIG_PER_VMA_LOCK
 	/* Unstable RCU readers are allowed to read this. */
 	refcount_t vm_refcnt;
 #endif
+	/*
+	 * For areas with an address space and backing store,
+	 * linkage into the address_space->i_mmap interval tree.
+	 *
+	 */
+	struct {
+		struct rb_node rb;
+		unsigned long rb_subtree_last;
+	} shared;
+#ifdef CONFIG_ANON_VMA_NAME
+	/*
+	 * For private and shared anonymous mappings, a pointer to a null
+	 * terminated string containing the name given to the vma, or NULL if
+	 * unnamed. Serialized by mmap_lock. Use anon_vma_name to access.
+	 */
+	struct anon_vma_name *anon_name;
+#endif
+	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
 } __randomize_layout;
=20
 struct vm_fault {};
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DEA7280A39
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486850; cv=none;
 b=pzOpPt7seouDURvDSM16vewA4q0zpo9Qa+4bmH8aPzqR9lSoEcBKyzL0wmutaghb3ZlkqkzUEBVJRyLzYRmh8BeptILw45ZkEXxC6pGcZ+JoRCVlRZu9jFjeeKy4lGx0u6ZG2DR7ARPWHECnRP1bI4t183zN2z3BkeOV6WXeyzg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486850; c=relaxed/simple;
	bh=I5W26NtkIfZ7r+KXGjsOlCoBqVfLd6FA3hx3qLn7yhE=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=jbd/E+Wor1AKHrLy+frd7Yyde0MMsB8LQvqGPUQ0np53YrOkIxzGMhHAk310Pcm322mW0Jo5AUvou1GOZ57GIZQiJdXCJWDSQvMQzs+bIwNVLQ5dwX/Opnuz7FFaHvEWSfATNwBDvLg+fRac1tWHwhOw2HmyBKTFNLivzlSeqVg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=iJQuURRM; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="iJQuURRM"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-220d6018858so19090025ad.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486848; x=1740091648;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=VVzZ14hy4ikDmyPBi6h+538M/HP0SirBYvfDsPDJpOs=;
        b=iJQuURRM+eNGzJd4EMe8to1zSDgOIKMLe4XBPTLiocchArGn1vBGsmEbDq2d6sAVsU
         KCThFXpfb5LGwPbt5d6UAGBfgWRy5IKm3RqGEW/PETu0DMgoQYL4E2Zvx5TfFCNgmk1Z
         gQbn2lMNbuSmOIIHiONmKVXMz8/7y3+xNkWfDwHxorDnakT56fbO6fKTUcSz1hkV2ySe
         jgiwGC4Y9MMuZB9meJkBoz2XKkvXejZiR8aEu9uVHOpEHgLhdU1lNAavkBDS1WyxKSbE
         PHyJHJVBCYRNU31Ymlq8GrWHrodmJoG3stgtFtBQ9pP82JEHvDb2RabMg6raDsfQXoib
         /lng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486848; x=1740091648;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=VVzZ14hy4ikDmyPBi6h+538M/HP0SirBYvfDsPDJpOs=;
        b=U3dUuv4E3xk71CRPDtGmWNTn5zJfuZNckczVLFdrEBCP8DrWphMbXAPLz7FaEJzPMq
         UD/P9fia/DrltVYbhQ6+xSsxRMzJNJEKWALIDGz1ZDAWsnkRYxG1TTFGe1v6aECedMRX
         5Bm+zx6ypIWVdIU/MJgOPUObp4JHk/twAsdStQlMio+JGJggNs4blK1hLdVaS8JKCRP1
         qcsGNFOm2hcYzYc7IBVU+f8xK31RBL0V+03zhQH56guMGOODgfCCzXCqQa9mq08e0/QW
         JAq/gmcUwb80zv+iSrHq3B0E9iycT80OtJL7JJZNzRs4dx+8BfUFe3MIDw2b0MHh3EdW
         U77A==
X-Forwarded-Encrypted: i=1;
 AJvYcCUjHqg1f3fZKdg1j6P8r8tyj4wewel2vAslDCsxzS7oJoqb0vpd4YkketU9HBYOqUz0LN1GxpOJas3pjBc=@vger.kernel.org
X-Gm-Message-State: AOJu0YxvmIPtiaW3IKYpkWo+MmwLXxBWNRQClbdlIKIWvonJU3o6nRnB
	+UBoXryHgb3ftEq2p3ixPAVi1nQoMg2zl3uDc5FqhjyrkkmD5uHbMIUONosZeKrmyYihHIQEwzX
	BUg==
X-Google-Smtp-Source: 
 AGHT+IGwKOeZSc6Bvv2juhw3MUD/Tqd1UKmaxd3x+03s9Ch1BsTNeSistfaqs0trat+JLYOevAMQcp8yyOk=
X-Received: from plhj11.prod.google.com ([2002:a17:903:24b:b0:220:d668:ff81])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:903:32cf:b0:215:a04a:89d5
 with SMTP id d9443c01a7336-220d1eb5718mr75972105ad.2.1739486847961; Thu, 13
 Feb 2025 14:47:27 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:51 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-15-surenb@google.com>
Subject: [PATCH v10 14/18] mm/debug: print vm_refcnt state when dumping the
 vma
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

vm_refcnt encodes a number of useful states:
- whether vma is attached or detached
- the number of current vma readers
- presence of a vma writer
Let's include it in the vma dump.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Minimized duplicate code, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-14-surenb@google.com/

 mm/debug.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/debug.c b/mm/debug.c
index e1282b85a877..2d1bd67d957b 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -181,11 +181,17 @@ void dump_vma(const struct vm_area_struct *vma)
 	pr_emerg("vma %px start %px end %px mm %px\n"
 		"prot %lx anon_vma %px vm_ops %px\n"
 		"pgoff %lx file %px private_data %px\n"
+#ifdef CONFIG_PER_VMA_LOCK
+		"refcnt %x\n"
+#endif
 		"flags: %#lx(%pGv)\n",
 		vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm,
 		(unsigned long)pgprot_val(vma->vm_page_prot),
 		vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
 		vma->vm_file, vma->vm_private_data,
+#ifdef CONFIG_PER_VMA_LOCK
+		refcount_read(&vma->vm_refcnt),
+#endif
 		vma->vm_flags, &vma->vm_flags);
 }
 EXPORT_SYMBOL(dump_vma);
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-oa1-f74.google.com (mail-oa1-f74.google.com
 [209.85.160.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08CAD280A4D
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:30 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.160.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486852; cv=none;
 b=q6AtnokTtaKlzmStrH1t/lcHNeZeO8BKgbuwbVcPfxfHU+7/9I/P8F0axvAgw1odvWC1xdvlLwCakAMaJDRKX79Aq94D1LiZDC4uTl5xSGK89ISiYKoG7hSKqZsv5XmX1ONYblkgwqOrPkDx4UiSRGMfGcUZbUYUVcQJAhdfVzA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486852; c=relaxed/simple;
	bh=5F18Oz1NfRhSsjQbCve5rPYuqz6M7KrBBHSkuVvdo5o=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=p+67uuc5YNy4NL5QXWcX+rH5KVSlC8nK6ZjhRM1Cf+5im8X2r2jhVntMRc6nb4Wc+Vku2YDxlQbAZ44PYMP3vOAIE3I8PMttdHEfCkdwCbENN1VB/uhOJntl6NROBo+rZ46LizgOLFkwinELMMbNaBujC+22zi1CVEAhN6N9IzQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=sv1/ykMK; arc=none smtp.client-ip=209.85.160.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="sv1/ykMK"
Received: by mail-oa1-f74.google.com with SMTP id
 586e51a60fabf-29e8124e922so2588689fac.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486850; x=1740091650;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=pXllq5kyB/DC+XX0EOcvwmmu4CtVKAeaHuoDmEpYjhg=;
        b=sv1/ykMKkL9y3ME0ICgFpk7WlBu+Zf8bi7kmHqgkbjMUjOeTUcbsWGsH1jt1kOEmyi
         ngXVp2vv3xqYHRMeDpTh/RE6u6RjiiuHweVq19fAKSiX9BZd0Jm5fZf7ueBdW6dTzSXs
         LASCIBdbnTaFF/MACIrrbnpY4NGBDa51pm1amfAiMKB/K5hLPJlDlNZtYhT4EBr6kf8B
         m5Xp1Z5+x/9uhfkPu9K5e4SXlsMb3n3MUutXJTWLguF7om5zB9LNtCfqXBchPuvYIyaP
         pqEfvywldqTgChyf+t7ieWz7wo0Yst/W5fKltImhJkN26BqvJCiSmVazM5QiwSG0YBcl
         R/+Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486850; x=1740091650;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=pXllq5kyB/DC+XX0EOcvwmmu4CtVKAeaHuoDmEpYjhg=;
        b=pg4Jkuej4yzSnOCc3kBFrguNyzDSMIBBNxO7SBlyotgjQUiYNe8bjTIgP2+8x6zanh
         GgtoxdAZdbArZD9ykzCl7y4WI2+EnDSecl4AfGLgEr1vMpLeXaSOWpt8jEHMf5AmX7Si
         A50gW7z63Fzb5mSRpfeMLCHJR4+E5hMp5o0QI4DFgcxZ35ptvif2Xl/ttAYKNxDj+zNx
         36Dn5BEa9jWOpd/TY5kmdCIMayZ4uxbGybtFuoV5ZH3snTFDhzMRW8Pizd/ezvvRgrbb
         5UlAl6VaSGnaUJcjFk6rFBgtlgWGbbNkv77yB+76T5shH3yAmzYDXyxaCy4yeJFObxub
         FZZw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXrqCGyvTUePmbDF1jG6L2eG87jvWkQv8Oe9G05i7mgQEcajvOpB4sPNwDlRu5Ep1rWgzvFevCREacBOSE=@vger.kernel.org
X-Gm-Message-State: AOJu0YwyFKZfF1wP0YxAz2DjY54gUhAOHKlnBv2KAKaqeaHz5ii5O+ao
	TkN+LUfH6fq4cSqZ4KSPqfAy+8SpqeR4iWd9zsRwgVnpdPVkL/KCApy/uRqrJ2s3VEUmyvWqyk2
	T4g==
X-Google-Smtp-Source: 
 AGHT+IE3Y43bKeMlQ9EjFiiS4JaeOx4TU+qb97Ij8D1UDPSI8dMXOEq3dWAIQbnurpdiuJXU+uwYfxTupKQ=
X-Received: from oacmp6.prod.google.com
 ([2002:a05:6871:3286:b0:2b8:49d8:2c77])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6870:2107:b0:2b8:306f:c5ad
 with SMTP id 586e51a60fabf-2b8d65155a8mr5274258fac.13.1739486850086; Thu, 13
 Feb 2025 14:47:30 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:52 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-16-surenb@google.com>
Subject: [PATCH v10 15/18] mm: remove extra vma_numab_state_init() call
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

vma_init() already memset's the whole vm_area_struct to 0, so there is
no need to an additional vma_numab_state_init().

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Add Reviewed-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-15-surenb@google.com/

 include/linux/mm.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 11a042c27aee..327cf5944569 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -956,7 +956,6 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-	vma_numab_state_init(vma);
 	vma_lock_init(vma, false);
 }
=20
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9DFC28136A
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486854; cv=none;
 b=LumxogqlkPSgm/5kHMeT7Tyz6DOrOXt+TF0urbIHc8bspBLqQkp962FDXczHzdDBuJvqSnaSC4Oob4swX7PaskqonO+9PxdWdJfsSnWvWDg+onUogKw8sjZt+hFHVonI+6BBKEMddPLZoeLvxopaIvcy3/1UxLfW7ZuI2yEzYr8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486854; c=relaxed/simple;
	bh=PUJSTy45dtpyUmxXxenkiNMkF+uifqCqHqbn1BfbXCI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Pbfz+CoaMkQ3HkcQHmrCwQYgAq9A1N6EENSTVjgXfEmZT5bmSzgSgHQA5wIF1Ur38r9DGH1ElSupCuxF268cR6spSOu9H4InYWCmnRoiaVv8I7gAWmfqX1nz6hjubKlIn6cq5qTE7uJAe90Gl5ztH15sjY/lvNReMVvVhBPXEms=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=o76BVUCG; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="o76BVUCG"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-2fa57c42965so3266887a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486852; x=1740091652;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=3PBMMH2moGlibOV+gRj99WpX+TZ+10z7I1DXCC5PdQ8=;
        b=o76BVUCGJCk47Z1ErORyJfPR++QMfO2WsB1YmhK7DedLiyQa3Soe495iIjVeLw4ifq
         Ug3kT43T2uiHGrtpZkHjO4TWMVgP42xMDva07jNf64eEDnBtx6rl/RVFUn/IJU0gtmmg
         9/WqxB94fCPCBN/nNiu7n2GonrsJXnq9b8jDWRMSSQp9cZLltoONRp9dhdoM2DpTzRHt
         2BHXQ88tJzP0kSCf+X2lZlN/cnZwP3690jUB/fHMm+xlDueoSb744qUL/Uey7jn76s4i
         lKP33ZTl7k0XDjjFm7YPNflGtNSRvgeLnbk4YbbT8zK8Ea/MEumt1XdsyEZ8oBbdpcCP
         bjKg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486852; x=1740091652;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=3PBMMH2moGlibOV+gRj99WpX+TZ+10z7I1DXCC5PdQ8=;
        b=ocqg0J75//3kTMEETm6wY1jjGL8PllzLiICyRCMw20Wy71YCkIHzMrKO1qOCPvw5P0
         VOxKaEVnpnUIFPcu+Lk8oxWe7lfpyg8VkVgUpbBjtcVew/3u3p8Rjalh/ODj0qmFE3pY
         QS7bNzuMPvKrL3ntX2hnJsNehQW01mIB4yvJeLl7JF6/HFFS0f/cnQ+30h91y1vcr+gz
         BfJBAW+qrLEZmm0OcdK4s0yiequuQbiyulfSssZxtZstqJFFEf7xAhViAF3oytNbDzZk
         aILvGWzsuTVn6WrFLieGJubU1wx44ZGiopwCTRqrM9YFCj25coluHRCJAzsvEvTw0VAP
         M/Cg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUKRHfm8rtM8OmBPgPPaoFFDuCl3rRro4OG4SyGjp9gPS/pfJbwdlYiyhgbsPFs3SBdmKHnEp0CGNXkIao=@vger.kernel.org
X-Gm-Message-State: AOJu0YxpfrZ2rn8L1vCT7KGcKOe2vRLTn4PMXZKP5EaNqN0NXxaLjUrh
	fH6Umk41qAjCgKEPnUxWoSIV8yj8zlnd4i75Xxuy77IQDoqQpKm0DJPlQ9abC4T1BwnR1NniChP
	yWg==
X-Google-Smtp-Source: 
 AGHT+IGMj8N9UltRndmqK4fcmCOY71dV/ZTT0TWqfMv3mL8UIMVUEmmB9CXosWINwjoG+HVa2iESOUWrDew=
X-Received: from pfblc21.prod.google.com
 ([2002:a05:6a00:4f55:b0:730:9378:98c1])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a20:2585:b0:1ea:e81c:60fa
 with SMTP id adf61e73a8af0-1ee5c78b20bmr13938165637.20.1739486852080; Thu, 13
 Feb 2025 14:47:32 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:53 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-17-surenb@google.com>
Subject: [PATCH v10 16/18] mm: prepare lock_vma_under_rcu() for vma reuse
 possibility
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Once we make vma cache SLAB_TYPESAFE_BY_RCU, it will be possible for a vma
to be reused and attached to another mm after lock_vma_under_rcu() locks
the vma. lock_vma_under_rcu() should ensure that vma_start_read() is using
the original mm and after locking the vma it should ensure that vma->vm_mm
has not changed from under us.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
 include/linux/mm.h | 12 ++++++++----
 mm/memory.c        |  7 ++++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 327cf5944569..88693568c9ef 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -739,10 +739,13 @@ static inline void vma_refcount_put(struct vm_area_st=
ruct *vma)
  * Try to read-lock a vma. The function is allowed to occasionally yield f=
alse
  * locked result to avoid performance overhead, in which case we fall back=
 to
  * using mmap_lock. The function should never yield false unlocked result.
+ * False locked result is possible if mm_lock_seq overflows or if vma gets
+ * reused and attached to a different mm before we lock it.
  * Returns the vma on success, NULL on failure to lock and EAGAIN if vma g=
ot
  * detached.
  */
-static inline struct vm_area_struct *vma_start_read(struct vm_area_struct =
*vma)
+static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
+						    struct vm_area_struct *vma)
 {
 	int oldcnt;
=20
@@ -753,7 +756,7 @@ static inline struct vm_area_struct *vma_start_read(str=
uct vm_area_struct *vma)
 	 * we don't rely on for anything - the mm_lock_seq read against which we
 	 * need ordering is below.
 	 */
-	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.=
sequence))
+	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(mm->mm_lock_seq.sequence=
))
 		return NULL;
=20
 	/*
@@ -780,7 +783,7 @@ static inline struct vm_area_struct *vma_start_read(str=
uct vm_area_struct *vma)
 	 * after it has been unlocked.
 	 * This pairs with RELEASE semantics in vma_end_write_all().
 	 */
-	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo=
ck_seq))) {
+	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&mm->mm_lock_seq))=
) {
 		vma_refcount_put(vma);
 		return NULL;
 	}
@@ -914,7 +917,8 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str=
uct *mm,
 #else /* CONFIG_PER_VMA_LOCK */
=20
 static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_re=
fcnt) {}
-static inline struct vm_area_struct *vma_start_read(struct vm_area_struct =
*vma)
+static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
+						    struct vm_area_struct *vma)
 		{ return NULL; }
 static inline void vma_end_read(struct vm_area_struct *vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
diff --git a/mm/memory.c b/mm/memory.c
index 528407c0d7cf..6378a873e7c1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6492,7 +6492,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s=
truct *mm,
 	if (!vma)
 		goto inval;
=20
-	vma =3D vma_start_read(vma);
+	vma =3D vma_start_read(mm, vma);
 	if (IS_ERR_OR_NULL(vma)) {
 		/* Check if the VMA got isolated after we found it */
 		if (PTR_ERR(vma) =3D=3D -EAGAIN) {
@@ -6512,8 +6512,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s=
truct *mm,
 	 * fields are accessible for RCU readers.
 	 */
=20
-	/* Check since vm_start/vm_end might change before we lock the VMA */
-	if (unlikely(address < vma->vm_start || address >=3D vma->vm_end))
+	/* Check if the vma we locked is the right one. */
+	if (unlikely(vma->vm_mm !=3D mm ||
+		     address < vma->vm_start || address >=3D vma->vm_end))
 		goto inval_end_read;
=20
 	rcu_read_unlock();
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAC3222D7A1
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:34 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486857; cv=none;
 b=sAEQBQAlnmEqd0iHVBqfSyoUkko6qs4Sw11ozfp83TBbBOzpDa9WSpc0xUswlyNDpEpEOWHRu9qsyePW+gz7mimm1517u6qG+s8KzoN7Vdv4wvgm/Djr2Tx/FvbIoHJTjoDbNMpxAQN3mLd6U9l2z9lbJrSGK0bGPCaA4AC7o/Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486857; c=relaxed/simple;
	bh=1KeUMAl8R/7P3nk3JbGxsisaDeiJQf3xdjP2Bgr3ggM=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=EW0HKFZdqnWS6V2ZR6x0wsXwQbYXany+jjk1gQvIfJyxuR2u9Y5MY1EL+8zQ+asmeXFBBOuMtJLS0YqmDWYo+TPoraBk/jA1kFNvjaELkwZp5FXrljGs7rY2UHQvn+NgNvuZJzpXiMkXlAB5lWWMhrqjLiyZWs+N4m5pcu846Kg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=vjIBpu1k; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="vjIBpu1k"
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-21f444af89fso20257005ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486854; x=1740091654;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=4MXvj5u5NFB0K8/f5IiPMekR+5md5clULG3qbt/TlOg=;
        b=vjIBpu1kuWJvoMtVWCLEEuZYy4kLFROPXgzRRIGvhPOr2RFYV4fxTTOHqslSyRyq2C
         T61Nrwe/W1ySl4VAV5S9+p3FTjmxSozVa3MCdz/97F4T/ov0H2+SVZWQmavLgPLQnw57
         yQXhDZ4BrAv6OM9r4EhMojkp1H+NZFgo3cj8qkShMO3fUBRwYTDsnNt6XRGc9DVkjvMo
         4HiTRdgHczK1Y8ghUZqqzGRo+P6zJ9CKPY/iqoXjLJnWVvw2ozqcVLgxM5ABXVUj4QKx
         OHYPqSGhT8QHgpLM4K8wPEYFlLHAQ/LZAG41mNq3qTxLsD/qe9aSPpXkh1zxJq3M7JhA
         iYJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486854; x=1740091654;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=4MXvj5u5NFB0K8/f5IiPMekR+5md5clULG3qbt/TlOg=;
        b=MEHooeXCvxa20INrklTTh5yNgYYQWl1WZrEuo9RTdc/WPa3nvcuBL1cRbhhFeiEH8t
         IU9bEJha8fr3cxtrphcxaDRfyom6+Mk8JMPV2lbI7cQdHmG4vRZzUtsqhjE9d6F1oQcz
         PLsXxMC8uNplFsveCDme51/tEKR7UYVkusdWKwEXXlorRDJfxZdUE5VThWabNRchPdGh
         WqUX724oUgQ5AcoUhMkVnxU3uKu5+K8mLSzkT0ktHS9Qw/jOMa1aeFRAIe+wvZIS6EZt
         GYaQr3FGW4qSv+jEUSsHQkrfmXSj0cJMMjSzO8LO8RD5nuIx0snxlOCvG+1RmcjC/3sM
         eCAQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCXCKRkLnFOpghthZA2BguUrpeNJh2VdpBXXx4s9AHE3vhsQMzW0KVOQGUwKOGLfqZ19bolcAW1N5V3yHMA=@vger.kernel.org
X-Gm-Message-State: AOJu0YxRUn057J4vPozbtxpIDMIHf7oXKAqCcXnnmqlhzqwGkyMj5ray
	+DOkMbI+jU2h1jXMpzwsBRL081zClTtO8QH5S9vbxtH2crmWMiWnDLqCf90HLngdcK9Zd7HIA8o
	DKA==
X-Google-Smtp-Source: 
 AGHT+IFObHFHIWBZFemkWn8ckufyLtHvFFGrsnemtMs+Zktx9OES6MSNojKQ+ZzlKftfhzzQkT2kzbGTjvc=
X-Received: from pgid14.prod.google.com ([2002:a63:ed0e:0:b0:801:e378:a64a])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:902:d4d2:b0:21f:1ae1:dd26
 with SMTP id d9443c01a7336-220bbcd0acbmr133795865ad.52.1739486853879; Thu, 13
 Feb 2025 14:47:33 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:54 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-18-surenb@google.com>
Subject: [PATCH v10 17/18] mm: make vma cache SLAB_TYPESAFE_BY_RCU
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that
object reuse before RCU grace period is over will be detected by
lock_vma_under_rcu().
Current checks are sufficient as long as vma is detached before it is
freed. The only place this is not currently happening is in exit_mmap().
Add the missing vma_mark_detached() in exit_mmap().
Another issue which might trick lock_vma_under_rcu() during vma reuse
is vm_area_dup(), which copies the entire content of the vma into a new
one, overriding new vma's vm_refcnt and temporarily making it appear as
attached. This might trick a racing lock_vma_under_rcu() to operate on
a reused vma if it found the vma before it got reused. To prevent this
situation, we should ensure that vm_refcnt stays at detached state (0)
when it is copied and advances to attached state only after it is added
into the vma tree. Introduce vm_area_init_from() which preserves new
vma's vm_refcnt and use it in vm_area_dup(). Since all vmas are in
detached state with no current readers when they are freed,
lock_vma_under_rcu() will not be able to take vm_refcnt after vma got
detached even if vma is reused. vma_mark_attached() in modified to
include a release fence to ensure all stores to the vma happen before
vm_refcnt gets initialized.
Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate
vm_area_struct reuse and will minimize the number of call_rcu() calls.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Use refcount_set_release() in vma_mark_attached(), per Will Deacon

[1] https://lore.kernel.org/all/20250111042604.3230628-17-surenb@google.com/

 include/linux/mm.h               |  4 +-
 include/linux/mm_types.h         | 13 ++++--
 include/linux/slab.h             |  6 ---
 kernel/fork.c                    | 73 ++++++++++++++++++++------------
 mm/mmap.c                        |  3 +-
 mm/vma.c                         | 11 ++---
 mm/vma.h                         |  2 +-
 tools/include/linux/refcount.h   |  5 +++
 tools/testing/vma/linux/atomic.h |  1 +
 tools/testing/vma/vma_internal.h |  9 +---
 10 files changed, 71 insertions(+), 56 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 88693568c9ef..7b21b48627b0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -258,8 +258,6 @@ void setup_initial_init_mm(void *start_code, void *end_=
code,
 struct vm_area_struct *vm_area_alloc(struct mm_struct *);
 struct vm_area_struct *vm_area_dup(struct vm_area_struct *);
 void vm_area_free(struct vm_area_struct *);
-/* Use only if VMA has no other users */
-void __vm_area_free(struct vm_area_struct *vma);
=20
 #ifndef CONFIG_MMU
 extern struct rb_root nommu_region_tree;
@@ -890,7 +888,7 @@ static inline void vma_mark_attached(struct vm_area_str=
uct *vma)
 {
 	vma_assert_write_locked(vma);
 	vma_assert_detached(vma);
-	refcount_set(&vma->vm_refcnt, 1);
+	refcount_set_release(&vma->vm_refcnt, 1);
 }
=20
 void vma_mark_detached(struct vm_area_struct *vma);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 63ab51699120..689b2a746189 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -584,6 +584,12 @@ static inline void *folio_get_private(struct folio *fo=
lio)
=20
 typedef unsigned long vm_flags_t;
=20
+/*
+ * freeptr_t represents a SLUB freelist pointer, which might be encoded
+ * and not dereferenceable if CONFIG_SLAB_FREELIST_HARDENED is enabled.
+ */
+typedef struct { unsigned long v; } freeptr_t;
+
 /*
  * A region containing a mapping of a non-memory backed file under NOMMU
  * conditions.  These are held in a global tree and are pinned by the VMAs=
 that
@@ -687,6 +693,9 @@ struct vma_numab_state {
  *
  * Only explicitly marked struct members may be accessed by RCU readers be=
fore
  * getting a stable reference.
+ *
+ * WARNING: when adding new members, please update vm_area_init_from() to =
copy
+ * them during vm_area_struct content duplication.
  */
 struct vm_area_struct {
 	/* The first cache line has the info for VMA tree walking. */
@@ -697,9 +706,7 @@ struct vm_area_struct {
 			unsigned long vm_start;
 			unsigned long vm_end;
 		};
-#ifdef CONFIG_PER_VMA_LOCK
-		struct rcu_head vm_rcu;	/* Used for deferred freeing. */
-#endif
+		freeptr_t vm_freeptr; /* Pointer used by SLAB_TYPESAFE_BY_RCU */
 	};
=20
 	/*
diff --git a/include/linux/slab.h b/include/linux/slab.h
index ad902a2d692b..f8924fd6ea26 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -243,12 +243,6 @@ enum _slab_flag_bits {
 #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
 #endif
=20
-/*
- * freeptr_t represents a SLUB freelist pointer, which might be encoded
- * and not dereferenceable if CONFIG_SLAB_FREELIST_HARDENED is enabled.
- */
-typedef struct { unsigned long v; } freeptr_t;
-
 /*
  * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
  *
diff --git a/kernel/fork.c b/kernel/fork.c
index 48a0038f606f..364b2d4fd3ef 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -449,6 +449,42 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct =
*mm)
 	return vma;
 }
=20
+static void vm_area_init_from(const struct vm_area_struct *src,
+			      struct vm_area_struct *dest)
+{
+	dest->vm_mm =3D src->vm_mm;
+	dest->vm_ops =3D src->vm_ops;
+	dest->vm_start =3D src->vm_start;
+	dest->vm_end =3D src->vm_end;
+	dest->anon_vma =3D src->anon_vma;
+	dest->vm_pgoff =3D src->vm_pgoff;
+	dest->vm_file =3D src->vm_file;
+	dest->vm_private_data =3D src->vm_private_data;
+	vm_flags_init(dest, src->vm_flags);
+	memcpy(&dest->vm_page_prot, &src->vm_page_prot,
+	       sizeof(dest->vm_page_prot));
+	/*
+	 * src->shared.rb may be modified concurrently when called from
+	 * dup_mmap(), but the clone will reinitialize it.
+	 */
+	data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared)));
+	memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx,
+	       sizeof(dest->vm_userfaultfd_ctx));
+#ifdef CONFIG_ANON_VMA_NAME
+	dest->anon_name =3D src->anon_name;
+#endif
+#ifdef CONFIG_SWAP
+	memcpy(&dest->swap_readahead_info, &src->swap_readahead_info,
+	       sizeof(dest->swap_readahead_info));
+#endif
+#ifndef CONFIG_MMU
+	dest->vm_region =3D src->vm_region;
+#endif
+#ifdef CONFIG_NUMA
+	dest->vm_policy =3D src->vm_policy;
+#endif
+}
+
 struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 {
 	struct vm_area_struct *new =3D kmem_cache_alloc(vm_area_cachep, GFP_KERNE=
L);
@@ -458,11 +494,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru=
ct *orig)
=20
 	ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
 	ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
-	/*
-	 * orig->shared.rb may be modified concurrently, but the clone
-	 * will be reinitialized.
-	 */
-	data_race(memcpy(new, orig, sizeof(*new)));
+	vm_area_init_from(orig, new);
 	vma_lock_init(new, true);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 	vma_numab_state_init(new);
@@ -471,7 +503,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc=
t *orig)
 	return new;
 }
=20
-void __vm_area_free(struct vm_area_struct *vma)
+void vm_area_free(struct vm_area_struct *vma)
 {
 	/* The vma should be detached while being destroyed. */
 	vma_assert_detached(vma);
@@ -480,25 +512,6 @@ void __vm_area_free(struct vm_area_struct *vma)
 	kmem_cache_free(vm_area_cachep, vma);
 }
=20
-#ifdef CONFIG_PER_VMA_LOCK
-static void vm_area_free_rcu_cb(struct rcu_head *head)
-{
-	struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct,
-						  vm_rcu);
-
-	__vm_area_free(vma);
-}
-#endif
-
-void vm_area_free(struct vm_area_struct *vma)
-{
-#ifdef CONFIG_PER_VMA_LOCK
-	call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb);
-#else
-	__vm_area_free(vma);
-#endif
-}
-
 static void account_kernel_stack(struct task_struct *tsk, int account)
 {
 	if (IS_ENABLED(CONFIG_VMAP_STACK)) {
@@ -3156,6 +3169,11 @@ void __init mm_cache_init(void)
=20
 void __init proc_caches_init(void)
 {
+	struct kmem_cache_args args =3D {
+		.use_freeptr_offset =3D true,
+		.freeptr_offset =3D offsetof(struct vm_area_struct, vm_freeptr),
+	};
+
 	sighand_cachep =3D kmem_cache_create("sighand_cache",
 			sizeof(struct sighand_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
@@ -3172,8 +3190,9 @@ void __init proc_caches_init(void)
 			sizeof(struct fs_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
-	vm_area_cachep =3D KMEM_CACHE(vm_area_struct,
-			SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC|
+	vm_area_cachep =3D kmem_cache_create("vm_area_struct",
+			sizeof(struct vm_area_struct), &args,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
 			SLAB_ACCOUNT);
 	mmap_init();
 	nsproxy_cache_init();
diff --git a/mm/mmap.c b/mm/mmap.c
index 6401a1d73f4a..15d6cd7cc845 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1305,7 +1305,8 @@ void exit_mmap(struct mm_struct *mm)
 	do {
 		if (vma->vm_flags & VM_ACCOUNT)
 			nr_accounted +=3D vma_pages(vma);
-		remove_vma(vma, /* unreachable =3D */ true);
+		vma_mark_detached(vma);
+		remove_vma(vma);
 		count++;
 		cond_resched();
 		vma =3D vma_next(&vmi);
diff --git a/mm/vma.c b/mm/vma.c
index a16a83d0253f..c7abef5177cc 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -420,19 +420,14 @@ static bool can_vma_merge_right(struct vma_merge_stru=
ct *vmg,
 /*
  * Close a vm structure and free it.
  */
-void remove_vma(struct vm_area_struct *vma, bool unreachable)
+void remove_vma(struct vm_area_struct *vma)
 {
 	might_sleep();
 	vma_close(vma);
 	if (vma->vm_file)
 		fput(vma->vm_file);
 	mpol_put(vma_policy(vma));
-	if (unreachable) {
-		vma_mark_detached(vma);
-		__vm_area_free(vma);
-	} else {
-		vm_area_free(vma);
-	}
+	vm_area_free(vma);
 }
=20
 /*
@@ -1218,7 +1213,7 @@ static void vms_complete_munmap_vmas(struct vma_munma=
p_struct *vms,
 	/* Remove and clean up vmas */
 	mas_set(mas_detach, 0);
 	mas_for_each(mas_detach, vma, ULONG_MAX)
-		remove_vma(vma, /* unreachable =3D */ false);
+		remove_vma(vma);
=20
 	vm_unacct_memory(vms->nr_accounted);
 	validate_mm(mm);
diff --git a/mm/vma.h b/mm/vma.h
index 55be77ff042f..7356ca5a22d3 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -218,7 +218,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_s=
truct *mm,
 		  unsigned long start, size_t len, struct list_head *uf,
 		  bool unlock);
=20
-void remove_vma(struct vm_area_struct *vma, bool unreachable);
+void remove_vma(struct vm_area_struct *vma);
=20
 void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
 		struct vm_area_struct *prev, struct vm_area_struct *next);
diff --git a/tools/include/linux/refcount.h b/tools/include/linux/refcount.h
index 36cb29bc57c2..1ace03e1a4f8 100644
--- a/tools/include/linux/refcount.h
+++ b/tools/include/linux/refcount.h
@@ -60,6 +60,11 @@ static inline void refcount_set(refcount_t *r, unsigned =
int n)
 	atomic_set(&r->refs, n);
 }
=20
+static inline void refcount_set_release(refcount_t *r, unsigned int n)
+{
+	atomic_set_release(&r->refs, n);
+}
+
 static inline unsigned int refcount_read(const refcount_t *r)
 {
 	return atomic_read(&r->refs);
diff --git a/tools/testing/vma/linux/atomic.h b/tools/testing/vma/linux/ato=
mic.h
index 788c597c4fde..683383d0f2bf 100644
--- a/tools/testing/vma/linux/atomic.h
+++ b/tools/testing/vma/linux/atomic.h
@@ -7,6 +7,7 @@
 #define atomic_inc(x) uatomic_inc(x)
 #define atomic_read(x) uatomic_read(x)
 #define atomic_set(x, y) uatomic_set(x, y)
+#define atomic_set_release(x, y) uatomic_set(x, y)
 #define U8_MAX UCHAR_MAX
=20
 #ifndef atomic_cmpxchg_relaxed
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index b385170fbb8f..572ab2cea763 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -476,7 +476,7 @@ static inline void vma_mark_attached(struct vm_area_str=
uct *vma)
 {
 	vma_assert_write_locked(vma);
 	vma_assert_detached(vma);
-	refcount_set(&vma->vm_refcnt, 1);
+	refcount_set_release(&vma->vm_refcnt, 1);
 }
=20
 static inline void vma_mark_detached(struct vm_area_struct *vma)
@@ -696,14 +696,9 @@ static inline void mpol_put(struct mempolicy *)
 {
 }
=20
-static inline void __vm_area_free(struct vm_area_struct *vma)
-{
-	free(vma);
-}
-
 static inline void vm_area_free(struct vm_area_struct *vma)
 {
-	__vm_area_free(vma);
+	free(vma);
 }
=20
 static inline void lru_add_drain(void)
--=20
2.48.1.601.g30ceb7b040-goog
From nobody Sat Feb  7 22:46:41 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A370270EB3
	for <linux-kernel@vger.kernel.org>; Thu, 13 Feb 2025 22:47:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739486858; cv=none;
 b=jJaGcUavdoJvxShkrDiWY1y+fL2Opi/nOnjVX5SDvtGQyrpKu8Sxk01nUDBEgMqboszs2yozjRRFvOO869kHmJH+kNB3FbpbARS2JeZgCJxyMF9y7Y9RUSQVW5FyjpOZdQiP0+2WTUKiBGOc9w9tSVnkIZt/EPpFQhE9EGHwaFw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739486858; c=relaxed/simple;
	bh=ZS+q2FOJ4YzNtq2i12+Nw5i6j6+GhULqmdiyVBu4ei8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=mp+fwSOteYt5pkkjdUgHlepxJOSASIKGbjlMVt2WwKslj24OWapInYiUF8K3K1a6DprGEKix0B65106CC1kmT4i/n9JWOlAfWmmhktMQ1l3MVd+remGckGl+2o1coWQPRdc/sKR5Z9zn+nxIE+QYbbZAMW19JDk65GwV/ZzJip4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=3SEA9xGr; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="3SEA9xGr"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-220d8aa893dso18006555ad.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Feb 2025 14:47:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1739486856; x=1740091656;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=HXs3qm65Mw4fWBObt4HCubdoJZ7UfWSFf9bCMyPuZ5Q=;
        b=3SEA9xGr6Q/n7UkACnVx2iR20sFY+wieZepwr7u/JP5jeuTk4d8TJ09tVGGe6ewyGV
         FORnsQxWRRSkWgaBL+4egQugrCFZsoCp2OeHOcJ/CDkyNkFlw/q5iMus28/NMlkxc8Xy
         jEx1P7XLJpW8o8vGDCAaWVq6UnKDaR7I9WgUtnP0PSM+MrrtPHwTWKfCu68xdyG5kX7I
         VhtrOcU+LvgQcnbvp150kxaymTKkyXUxdMayvs4rRAmZgNkECvSMtgdrc5+Ufmn45kMO
         MjB2qGf4RITKgSXeNZzRVx2sEVIFgGN3f7CVv6qB+Qh51Zi7j923lmxM185YsQS2Y2+z
         wpPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739486856; x=1740091656;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=HXs3qm65Mw4fWBObt4HCubdoJZ7UfWSFf9bCMyPuZ5Q=;
        b=Mun1G1RyIi0Uc7C8dQ0rhURS/BPoSQVe+OuWKI6stJgp3odFL3rmALo+QcSWYmp0S4
         qFTp1G9BwfEP4ioOBHn+fW7Rh2/imZBDw4Y9XDDNBEpMyfqgWazxo3oHwU3OGj2752wF
         njWdHu7+on+3pLAk1K1mxQz2lLRP6rzNV0S5x2pzj8yVwoxfvHWcBCoFPGExgb6W2Tvj
         MAn51YrRIBaNkEsdZVfG8TotAVFs0eJDVz2mCquNV4f1rxcBtXoq7OWHd4jiNZuwOUrz
         KjIAeT4NRe3StnHu3MILcw8RE4ujnJcc7GZAQwuF9gll/iswH7BNGC9wBpBmDp4Dor25
         0SPw==
X-Forwarded-Encrypted: i=1;
 AJvYcCWKv1ZjbS9Z3OcoFwi3r0TXBeDElC2asTZTEr7Wkw6241PYmaoHs6Ekc17NlHtWEkP+3CMQ73BuqHoiCSE=@vger.kernel.org
X-Gm-Message-State: AOJu0YzuYwzuspUZcn3ZqoQ0ImZ1AGHBdKfb7v9mhjdLObhb6HLuTolc
	fUVs/n3qp3fK23GNQ3YiNyZvzLTy/xzQ8XfxHrUfX/6z2139YVt2HPcYtuVAWM8M+TH4O8Ja5sH
	hCA==
X-Google-Smtp-Source: 
 AGHT+IFEeHfewJkwAt0Xqt/tC1plnftNTUCWIl49bO9ckcrGnetyW/ghDc1KfOz7C3qqfgR140G3l01rzEw=
X-Received: from plblc14.prod.google.com
 ([2002:a17:902:fa8e:b0:212:48d4:bf16])
 (user=surenb job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:902:dac4:b0:21f:9107:fca3
 with SMTP id d9443c01a7336-220d20e90a3mr71286775ad.30.1739486856011; Thu, 13
 Feb 2025 14:47:36 -0800 (PST)
Date: Thu, 13 Feb 2025 14:46:55 -0800
In-Reply-To: <20250213224655.1680278-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250213224655.1680278-1-surenb@google.com>
X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog
Message-ID: <20250213224655.1680278-19-surenb@google.com>
Subject: [PATCH v10 18/18] docs/mm: document latest changes to vm_lock
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com,
	vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org,
	dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	lokeshgidra@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com,
	klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com,
	"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Change the documentation to reflect that vm_lock is integrated into vma
and replaced with vm_refcnt.
Document newly introduced vma_start_read_locked{_nested} functions.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
---
Changes since v9 [1]:
- Updated documenation, per Lorenzo Stoakes
- Add Reviewed-by, per Lorenzo Stoakes

[1] https://lore.kernel.org/all/20250111042604.3230628-18-surenb@google.com/

 Documentation/mm/process_addrs.rst | 44 ++++++++++++++++++------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_=
addrs.rst
index 81417fa2ed20..e6756e78b476 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -716,9 +716,14 @@ calls :c:func:`!rcu_read_lock` to ensure that the VMA =
is looked up in an RCU
 critical section, then attempts to VMA lock it via :c:func:`!vma_start_rea=
d`,
 before releasing the RCU lock via :c:func:`!rcu_read_unlock`.
=20
-VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semapho=
re for
-their duration and the caller of :c:func:`!lock_vma_under_rcu` must releas=
e it
-via :c:func:`!vma_end_read`.
+In cases when the user already holds mmap read lock, :c:func:`!vma_start_r=
ead_locked`
+and :c:func:`!vma_start_read_locked_nested` can be used. These functions d=
o not
+fail due to lock contention but the caller should still check their return=
 values
+in case they fail for other reasons.
+
+VMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for =
their
+duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via
+:c:func:`!vma_end_read`.
=20
 VMA **write** locks are acquired via :c:func:`!vma_start_write` in instanc=
es where a
 VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is =
always
@@ -726,9 +731,9 @@ acquired. An mmap write lock **must** be held for the d=
uration of the VMA write
 lock, releasing or downgrading the mmap write lock also releases the VMA w=
rite
 lock so there is no :c:func:`!vma_end_write` function.
=20
-Note that a semaphore write lock is not held across a VMA lock. Rather, a
-sequence number is used for serialisation, and the write semaphore is only
-acquired at the point of write lock to update this.
+Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is=
 temporarily
+modified so that readers can detect the presense of a writer. The referenc=
e counter is
+restored once the vma sequence number used for serialisation is updated.
=20
 This ensures the semantics we require - VMA write locks provide exclusive =
write
 access to the VMA.
@@ -738,7 +743,7 @@ Implementation details
=20
 The VMA lock mechanism is designed to be a lightweight means of avoiding t=
he use
 of the heavily contended mmap lock. It is implemented using a combination =
of a
-read/write semaphore and sequence numbers belonging to the containing
+reference counter and sequence numbers belonging to the containing
 :c:struct:`!struct mm_struct` and the VMA.
=20
 Read locks are acquired via :c:func:`!vma_start_read`, which is an optimis=
tic
@@ -779,28 +784,31 @@ release of any VMA locks on its release makes sense, =
as you would never want to
 keep VMAs locked across entirely separate write operations. It also mainta=
ins
 correct lock ordering.
=20
-Each time a VMA read lock is acquired, we acquire a read lock on the
-:c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking=
 that
-the sequence count of the VMA does not match that of the mm.
+Each time a VMA read lock is acquired, we increment :c:member:`!vma.vm_ref=
cnt`
+reference counter and check that the sequence count of the VMA does not ma=
tch
+that of the mm.
=20
-If it does, the read lock fails. If it does not, we hold the lock, excludi=
ng
-writers, but permitting other readers, who will also obtain this lock unde=
r RCU.
+If it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped.
+If it does not, we keep the reference counter raised, excluding writers, b=
ut
+permitting other readers, who can also obtain this lock under RCU.
=20
 Importantly, maple tree operations performed in :c:func:`!lock_vma_under_r=
cu`
 are also RCU safe, so the whole read lock operation is guaranteed to funct=
ion
 correctly.
=20
-On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock`
-read/write semaphore, before setting the VMA's sequence number under this =
lock,
-also simultaneously holding the mmap write lock.
+On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't =
be
+modified by readers and wait for all readers to drop their reference count.
+Once there are no readers, the VMA's sequence number is set to match that =
of
+the mm. During this entire operation mmap write lock is held.
=20
 This way, if any read locks are in effect, :c:func:`!vma_start_write` will=
 sleep
 until these are finished and mutual exclusion is achieved.
=20
-After setting the VMA's sequence number, the lock is released, avoiding
-complexity with a long-term held write lock.
+After setting the VMA's sequence number, the bit in :c:member:`!vma.vm_ref=
cnt`
+indicating a writer is cleared. From this point on, VMA's sequence number =
will
+indicate VMA's write-locked state until mmap write lock is dropped or down=
graded.
=20
-This clever combination of a read/write semaphore and sequence count allow=
s for
+This clever combination of a reference counter and sequence count allows f=
or
 fast RCU-based per-VMA lock acquisition (especially on page fault, though
 utilised elsewhere) with minimal complexity around lock ordering.
=20
--=20
2.48.1.601.g30ceb7b040-goog