From nobody Sat Feb  7 21:40:51 2026
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0930084A5B
	for <linux-kernel@vger.kernel.org>; Sun, 17 Nov 2024 08:09:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1731830981; cv=none;
 b=kF97dUO1d2rZwp22tkFvS04bdgxhvmfmgwutO1xOvfTHag/FhIWV3MxjsmCGrq1d6IUd5rxgcDv/dtI0hjlO32/6vG00Ofzso4JkzrlNhhbicEgdhdPlqKSIznupl5lzD8hqaTdzhBwohrlhTsRGhMyg4aeKJVvfBlqoRc3gHt4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1731830981; c=relaxed/simple;
	bh=8wju+b7uaLmVu0APJU3rHR7dESQZvzmF8mBUVYwqEm8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=mNrksJNd1WJHxKC1zSN0+ioUR3HAj8a0nUliXediBbd/6o33NONkg33dPOd3Ej+Wv4Bt5Ef5BlGoZQ5N22V/Ru4svcbcuuMfccK0bB3lYxtTOgZSpWMFH2AJDiFcElv+X/V7iqNoY6qgON7rDzXeAW4V0a7NLwgaMcjC1kCweCU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=r6d2k7S2; arc=none smtp.client-ip=209.85.219.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="r6d2k7S2"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-e387e8e8426so1262687276.1
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Nov 2024 00:09:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1731830979; x=1732435779;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=CZkyrFrgAwaR5MxHWX7P6WpGLYaVbagX6YvpJwNn5k0=;
        b=r6d2k7S2tTsh1PbLwKEuNtdpNl0tYMi40vxoL/Z8KtsAStXwQYHNqmoR5011EGS1ky
         ONqt9XXzGocuzrMD1megWm2y87fN92hWCxWGgR5T7mwZEafPp44UqFQt+AfzDcUZWsho
         zrygkwG+tcF6cGWKk+bGayPwezgMV2XoNgT2DIrP4POaMdPeK1jlMyITW5x3j0uwsPhx
         sbTEgpY/xgIPZo+GerwJJjYSbgDFDIRdFucY4AJr+Z2bw9s5P5nt3FWgFZ3UFjDeFY7e
         3qNMOjI9Z/X4drjDrk4ODk4m9i0BbFhwNX1nV2G6m38QvrNdvRNozspro0O+cdU1UV+n
         rr8g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731830979; x=1732435779;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=CZkyrFrgAwaR5MxHWX7P6WpGLYaVbagX6YvpJwNn5k0=;
        b=ZjT7Ys3H9YwRUpJBSjdvTiHwwp/AZ6udw0WEZdqUV6jT7Xhgw77DgrEvAH/PaZYPDU
         ggVGIWvbAWSICEiRbKYMqtQuEqpayhEOHlV3AgHjWuzZHP4cPeekgNBnYMqn1/WEgMqo
         px1WMsY+maRjg/uCtrw6Wv+u46TR8lcY2FziJ2BIvF2Mvp1QfP5HmIfjz+1UwuQwedrM
         g+6X1VvlzXz26cUQERWRWBoFsLve+5JES1qf0U1Hl34QwdfhyB5fiD1MibQa11YWxJ5B
         m4f+DZSCz/F6/1VCfGsPQ7wWwKjYUR0rYzYHW6ez1YEp/N6eQ6kcGEYoR7eIAdjyoOPU
         hVBw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVr7r+7J8tenVeJIKtFdD+6A4OzUU7sT5YdoZvIPRqWk/zWvopFGbw2b+W2USmcIQogDXe/4vBA1Wnbmzs=@vger.kernel.org
X-Gm-Message-State: AOJu0Ywm9/nJyoXXhEZ9moX77RSAy3yIRpArU4HTt+vVYm0iIQZHiHBA
	8Drt97HA7h+L3DtI502GUKex6NTFduYM+2YcFiIO+Vgb6OmhzoNGX+zbF+tLXhJT5vgOSlfMRvM
	H9Q==
X-Google-Smtp-Source: 
 AGHT+IH7Nd8pXV7CscbKL6Oe02lMMMIInGW3q+Det1fKSUjvn9CktAIAKXjQIOt60hBNRgC9PV8ouBWRm4Y=
X-Received: from surenb-desktop.mtv.corp.google.com
 ([2a00:79e0:2e3f:8:bafc:6633:f766:6415])
 (user=surenb job=sendgmr) by 2002:a25:9d85:0:b0:e2e:317a:d599 with SMTP id
 3f1490d57ef6-e3824ac0065mr340301276.2.1731830978836; Sun, 17 Nov 2024
 00:09:38 -0800 (PST)
Date: Sun, 17 Nov 2024 00:09:27 -0800
In-Reply-To: <20241117080931.600731-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241117080931.600731-1-surenb@google.com>
X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog
Message-ID: <20241117080931.600731-2-surenb@google.com>
Subject: [PATCH v3 1/5] mm: introduce vma_start_read_locked{_nested} helpers
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com,
	oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com,
	peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
	souravpanda@google.com, pasha.tatashin@soleen.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce helper functions which can be used to read-lock a VMA when
holding mmap_lock for read. Replace direct accesses to vma->vm_lock
with these new helpers.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm.h | 24 ++++++++++++++++++++++++
 mm/userfaultfd.c   | 22 +++++-----------------
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fecd47239fa9..1ba2e480ae63 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -722,6 +722,30 @@ static inline bool vma_start_read(struct vm_area_struc=
t *vma)
 	return true;
 }
=20
+/*
+ * Use only while holding mmap read lock which guarantees that locking wil=
l not
+ * fail (nobody can concurrently write-lock the vma). vma_start_read() sho=
uld
+ * not be used in such cases because it might fail due to mm_lock_seq over=
flow.
+ * This functionality is used to obtain vma read lock and drop the mmap re=
ad lock.
+ */
+static inline void vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
+{
+	mmap_assert_locked(vma->vm_mm);
+	down_read_nested(&vma->vm_lock->lock, subclass);
+}
+
+/*
+ * Use only while holding mmap read lock which guarantees that locking wil=
l not
+ * fail (nobody can concurrently write-lock the vma). vma_start_read() sho=
uld
+ * not be used in such cases because it might fail due to mm_lock_seq over=
flow.
+ * This functionality is used to obtain vma read lock and drop the mmap re=
ad lock.
+ */
+static inline void vma_start_read_locked(struct vm_area_struct *vma)
+{
+	mmap_assert_locked(vma->vm_mm);
+	down_read(&vma->vm_lock->lock);
+}
+
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
 	rcu_read_lock(); /* keeps vma alive till the end of up_read */
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 60a0be33766f..87db4b32b82a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -84,16 +84,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_st=
ruct *mm,
=20
 	mmap_read_lock(mm);
 	vma =3D find_vma_and_prepare_anon(mm, address);
-	if (!IS_ERR(vma)) {
-		/*
-		 * We cannot use vma_start_read() as it may fail due to
-		 * false locked (see comment in vma_start_read()). We
-		 * can avoid that by directly locking vm_lock under
-		 * mmap_lock, which guarantees that nobody can lock the
-		 * vma for write (vma_start_write()) under us.
-		 */
-		down_read(&vma->vm_lock->lock);
-	}
+	if (!IS_ERR(vma))
+		vma_start_read_locked(vma);
=20
 	mmap_read_unlock(mm);
 	return vma;
@@ -1476,14 +1468,10 @@ static int uffd_move_lock(struct mm_struct *mm,
 	mmap_read_lock(mm);
 	err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
 	if (!err) {
-		/*
-		 * See comment in uffd_lock_vma() as to why not using
-		 * vma_start_read() here.
-		 */
-		down_read(&(*dst_vmap)->vm_lock->lock);
+		vma_start_read_locked(*dst_vmap);
 		if (*dst_vmap !=3D *src_vmap)
-			down_read_nested(&(*src_vmap)->vm_lock->lock,
-					 SINGLE_DEPTH_NESTING);
+			vma_start_read_locked_nested(*src_vmap,
+						SINGLE_DEPTH_NESTING);
 	}
 	mmap_read_unlock(mm);
 	return err;
--=20
2.47.0.338.g60cca15819-goog
From nobody Sat Feb  7 21:40:51 2026
Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com
 [209.85.128.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76ADA13212B
	for <linux-kernel@vger.kernel.org>; Sun, 17 Nov 2024 08:09:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1731830986; cv=none;
 b=exLpDgtI6uKcqdGiwH6dvjmj/Kc6mgm1NFvPW+5riAwH760WwgxA4DwBJR2qqEwBYD4hXKxLDjrzof6vBsM2kY5Mg9niCHseYFNBabSAK9Q1blFFYZNJcPtTlafZv/fArSjivo8oNLkW2DPPihRtE8Mubo2BWARfo7r51tEix4g=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1731830986; c=relaxed/simple;
	bh=Ra2lMbXejYtznchFQf4qpQDXpWZrZyQuB7SW6R39hOQ=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Rl8oNI6VHNFu0iwrp1ekW2Z84HRR2kkwSjYk+H/3XlBTeAYJIIb/wpKf75Q+5HCS+ht6YdiimumVdlkE0EJAFL0HpPqTXCLDL0LulA3JeL/NB1/qM+IhD1jjuOr7icYzHBduE8f+HALw64dhMB6ww8yhGBmzViYZY0f5wTooYgY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=hL1X5Yow; arc=none smtp.client-ip=209.85.128.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="hL1X5Yow"
Received: by mail-yw1-f201.google.com with SMTP id
 00721157ae682-6ee76304cb4so13476717b3.1
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Nov 2024 00:09:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1731830982; x=1732435782;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=sPYYZoVCqtknZIayzm8p1ItIUlzdQ0oYPahBTNB5G3w=;
        b=hL1X5YowCOO5kzsRJ5KoTW0s5+VcuoXlQC5ETeKRUi4e+psxtr0W/1Io0sX5HkNLKf
         45sm0ktAMRc8bG2/OoD1ET2PDq2/M4Qj7d07S3K8qJFHIgV0ms1bR39on5fU4R7x+cbF
         pU1LFlgPekL2Mdv/9qUE0Gdkx830FVlY3HlojoynRLpSc+dvvAhY+WsaVI9hk74x23f2
         3g4pTzjogWIn7KTyVTohNK0sIOczAXB789Xta27ICKB4xSxDiGdD4HXTihI1f/BjFFXO
         fTUqOw8HH5LC/gvHr+PdoRqSl+wgBjUXzcNoNQJbQkOEMMLVuF2eB4TGC0gOlZIUtXx8
         t69Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731830982; x=1732435782;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=sPYYZoVCqtknZIayzm8p1ItIUlzdQ0oYPahBTNB5G3w=;
        b=JrZUC9YuigtvHzBkH0jFzZroAhWgA/q4Izsu+WAagsEm9oQfFID7SvM+YSERvLMVrC
         SR/SnuFwhJLVkQa3RVaBwhHVyzBWNoxbWRE/jnoyXJSOkcy8bgR/HFi+FltqRWyISqig
         hqp93NfSZ9HSmr8DMa+W3LQWRVSySyuKsHflQCzxqMltaDYUbfXoQayFb5wQfs2eFa9W
         etWVFKPJZ7E+Qe7CMkeH76GO/6HGxi+T5oJQ79qc6jrxORGVINOQKQi5CUa8DkqdSlxw
         hPnWtIj4lMIv4KK/JJHU9cqzIfw0Eh5sVUPWKhpxbxhSr8GOZdqyfqhIuSOzlU8SBML/
         M2YQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCX3rF1mMwWva36UokgpNzrCNf0ER+odavtFhCLIeFwGiFs9TKg08CdXSd70hxnJMY8NrIOaYgXtTBP416E=@vger.kernel.org
X-Gm-Message-State: AOJu0YzfjSZi1J8tqiw7lFo1g0EjNYGlwkBeRnJL2+CidWS5f7BR5mWy
	BLz+1WlQcIKscqaXUr4k03Tu/dxF+R2qrecSrgpeFJQIuzRd3FauukpYb6awdVM3KOI2Zdw82sH
	OQA==
X-Google-Smtp-Source: 
 AGHT+IFmq2c1p1+TPAUk2jQ4A7Cpj3uaypUiZEZvBLe44BDcqceNyYvrL452knc8vwv/9E7sGLOMSXBRweU=
X-Received: from surenb-desktop.mtv.corp.google.com
 ([2a00:79e0:2e3f:8:bafc:6633:f766:6415])
 (user=surenb job=sendgmr) by 2002:a0d:e505:0:b0:6e3:8562:ffa with SMTP id
 00721157ae682-6ee55c729b9mr919347b3.5.1731830982118; Sun, 17 Nov 2024
 00:09:42 -0800 (PST)
Date: Sun, 17 Nov 2024 00:09:28 -0800
In-Reply-To: <20241117080931.600731-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241117080931.600731-1-surenb@google.com>
X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog
Message-ID: <20241117080931.600731-3-surenb@google.com>
Subject: [PATCH v3 2/5] mm: move per-vma lock into vm_area_struct
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com,
	oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com,
	peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
	souravpanda@google.com, pasha.tatashin@soleen.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Back when per-vma locks were introduces, vm_lock was moved out of
vm_area_struct in [1] because of the performance regression caused by
false cacheline sharing. Recent investigation [2] revealed that the
regressions is limited to a rather old Broadwell microarchitecture and
even there it can be mitigated by disabling adjacent cacheline
prefetching, see [3].
Splitting single logical structure into multiple ones leads to more
complicated management, extra pointer dereferences and overall less
maintainable code. When that split-away part is a lock, it complicates
things even further. With no performance benefits, there are no reasons
for this split. Merging the vm_lock back into vm_area_struct also allows
vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset.
Move vm_lock back into vm_area_struct, aligning it at the cacheline
boundary and changing the cache to be cacheline-aligned as well.
With kernel compiled using defconfig, this causes VMA memory consumption
to grow from 160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes:

    slabinfo before:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    160   51    2 : ...

    slabinfo after moving vm_lock:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vm_area_struct   ...    256   32    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
which is 5.5MB per 100000 VMAs. Note that the size of this structure is
dependent on the kernel configuration and typically the original size is
higher than 160 bytes. Therefore these calculations are close to the
worst case scenario. A more realistic vm_area_struct usage before this
change is:

     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    176   46    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages,
which is 3.9MB per 100000 VMAs.
This memory consumption growth can be addressed later by optimizing the
vm_lock.

[1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/
[2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/
[3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbf=
P_pR+-2g@mail.gmail.com/

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm.h               | 28 ++++++++++--------
 include/linux/mm_types.h         |  6 ++--
 kernel/fork.c                    | 49 ++++----------------------------
 tools/testing/vma/vma_internal.h | 33 +++++----------------
 4 files changed, 32 insertions(+), 84 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ba2e480ae63..737c003b0a1e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -684,6 +684,12 @@ static inline void vma_numab_state_free(struct vm_area=
_struct *vma) {}
 #endif /* CONFIG_NUMA_BALANCING */
=20
 #ifdef CONFIG_PER_VMA_LOCK
+static inline void vma_lock_init(struct vm_area_struct *vma)
+{
+	init_rwsem(&vma->vm_lock.lock);
+	vma->vm_lock_seq =3D UINT_MAX;
+}
+
 /*
  * Try to read-lock a vma. The function is allowed to occasionally yield f=
alse
  * locked result to avoid performance overhead, in which case we fall back=
 to
@@ -701,7 +707,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.=
sequence))
 		return false;
=20
-	if (unlikely(down_read_trylock(&vma->vm_lock->lock) =3D=3D 0))
+	if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0))
 		return false;
=20
 	/*
@@ -716,7 +722,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 	 * This pairs with RELEASE semantics in vma_end_write_all().
 	 */
 	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo=
ck_seq))) {
-		up_read(&vma->vm_lock->lock);
+		up_read(&vma->vm_lock.lock);
 		return false;
 	}
 	return true;
@@ -731,7 +737,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 static inline void vma_start_read_locked_nested(struct vm_area_struct *vma=
, int subclass)
 {
 	mmap_assert_locked(vma->vm_mm);
-	down_read_nested(&vma->vm_lock->lock, subclass);
+	down_read_nested(&vma->vm_lock.lock, subclass);
 }
=20
 /*
@@ -743,13 +749,13 @@ static inline void vma_start_read_locked_nested(struc=
t vm_area_struct *vma, int
 static inline void vma_start_read_locked(struct vm_area_struct *vma)
 {
 	mmap_assert_locked(vma->vm_mm);
-	down_read(&vma->vm_lock->lock);
+	down_read(&vma->vm_lock.lock);
 }
=20
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
 	rcu_read_lock(); /* keeps vma alive till the end of up_read */
-	up_read(&vma->vm_lock->lock);
+	up_read(&vma->vm_lock.lock);
 	rcu_read_unlock();
 }
=20
@@ -778,7 +784,7 @@ static inline void vma_start_write(struct vm_area_struc=
t *vma)
 	if (__is_vma_write_locked(vma, &mm_lock_seq))
 		return;
=20
-	down_write(&vma->vm_lock->lock);
+	down_write(&vma->vm_lock.lock);
 	/*
 	 * We should use WRITE_ONCE() here because we can have concurrent reads
 	 * from the early lockless pessimistic check in vma_start_read().
@@ -786,7 +792,7 @@ static inline void vma_start_write(struct vm_area_struc=
t *vma)
 	 * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
-	up_write(&vma->vm_lock->lock);
+	up_write(&vma->vm_lock.lock);
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
@@ -798,7 +804,7 @@ static inline void vma_assert_write_locked(struct vm_ar=
ea_struct *vma)
=20
 static inline void vma_assert_locked(struct vm_area_struct *vma)
 {
-	if (!rwsem_is_locked(&vma->vm_lock->lock))
+	if (!rwsem_is_locked(&vma->vm_lock.lock))
 		vma_assert_write_locked(vma);
 }
=20
@@ -831,6 +837,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str=
uct *mm,
=20
 #else /* CONFIG_PER_VMA_LOCK */
=20
+static inline void vma_lock_init(struct vm_area_struct *vma) {}
 static inline bool vma_start_read(struct vm_area_struct *vma)
 		{ return false; }
 static inline void vma_end_read(struct vm_area_struct *vma) {}
@@ -865,10 +872,6 @@ static inline void assert_fault_locked(struct vm_fault=
 *vmf)
=20
 extern const struct vm_operations_struct vma_dummy_vm_ops;
=20
-/*
- * WARNING: vma_init does not initialize vma->vm_lock.
- * Use vm_area_alloc()/vm_area_free() if vma needs locking.
- */
 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *=
mm)
 {
 	memset(vma, 0, sizeof(*vma));
@@ -877,6 +880,7 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
 	vma_mark_detached(vma, false);
 	vma_numab_state_init(vma);
+	vma_lock_init(vma);
 }
=20
 /* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 80fef38d9d64..5c4bfdcfac72 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -716,8 +716,6 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-	/* Unstable RCU readers are allowed to read this. */
-	struct vma_lock *vm_lock;
 #endif
=20
 	/*
@@ -770,6 +768,10 @@ struct vm_area_struct {
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
 	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+#ifdef CONFIG_PER_VMA_LOCK
+	/* Unstable RCU readers are allowed to read this. */
+	struct vma_lock vm_lock ____cacheline_aligned_in_smp;
+#endif
 } __randomize_layout;
=20
 #ifdef CONFIG_NUMA
diff --git a/kernel/fork.c b/kernel/fork.c
index 0061cf2450ef..7823797e31d2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep;
 /* SLAB cache for mm_struct structures (tsk->mm) */
 static struct kmem_cache *mm_cachep;
=20
-#ifdef CONFIG_PER_VMA_LOCK
-
-/* SLAB cache for vm_area_struct.lock */
-static struct kmem_cache *vma_lock_cachep;
-
-static bool vma_lock_alloc(struct vm_area_struct *vma)
-{
-	vma->vm_lock =3D kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL);
-	if (!vma->vm_lock)
-		return false;
-
-	init_rwsem(&vma->vm_lock->lock);
-	vma->vm_lock_seq =3D UINT_MAX;
-
-	return true;
-}
-
-static inline void vma_lock_free(struct vm_area_struct *vma)
-{
-	kmem_cache_free(vma_lock_cachep, vma->vm_lock);
-}
-
-#else /* CONFIG_PER_VMA_LOCK */
-
-static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return tru=
e; }
-static inline void vma_lock_free(struct vm_area_struct *vma) {}
-
-#endif /* CONFIG_PER_VMA_LOCK */
-
 struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
@@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct =
*mm)
 		return NULL;
=20
 	vma_init(vma, mm);
-	if (!vma_lock_alloc(vma)) {
-		kmem_cache_free(vm_area_cachep, vma);
-		return NULL;
-	}
=20
 	return vma;
 }
@@ -496,10 +463,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru=
ct *orig)
 	 * will be reinitialized.
 	 */
 	data_race(memcpy(new, orig, sizeof(*new)));
-	if (!vma_lock_alloc(new)) {
-		kmem_cache_free(vm_area_cachep, new);
-		return NULL;
-	}
+	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 	vma_numab_state_init(new);
 	dup_anon_vma_name(orig, new);
@@ -511,7 +475,6 @@ void __vm_area_free(struct vm_area_struct *vma)
 {
 	vma_numab_state_free(vma);
 	free_anon_vma_name(vma);
-	vma_lock_free(vma);
 	kmem_cache_free(vm_area_cachep, vma);
 }
=20
@@ -522,7 +485,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
 						  vm_rcu);
=20
 	/* The vma should not be locked while being destroyed. */
-	VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma);
+	VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma);
 	__vm_area_free(vma);
 }
 #endif
@@ -3168,11 +3131,9 @@ void __init proc_caches_init(void)
 			sizeof(struct fs_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
-
-	vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
-#ifdef CONFIG_PER_VMA_LOCK
-	vma_lock_cachep =3D KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT);
-#endif
+	vm_area_cachep =3D KMEM_CACHE(vm_area_struct,
+			SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC|
+			SLAB_ACCOUNT);
 	mmap_init();
 	nsproxy_cache_init();
 }
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index 1d9fc97b8e80..11c2c38ca4e8 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -230,10 +230,10 @@ struct vm_area_struct {
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
-	 *  - vm_lock->lock (in write mode)
+	 *  - vm_lock.lock (in write mode)
 	 * Can be read reliably while holding one of:
 	 *  - mmap_lock (in read or write mode)
-	 *  - vm_lock->lock (in read or write mode)
+	 *  - vm_lock.lock (in read or write mode)
 	 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
 	 * while holding nothing (except RCU to keep the VMA struct allocated).
 	 *
@@ -242,7 +242,7 @@ struct vm_area_struct {
 	 * slowpath.
 	 */
 	unsigned int vm_lock_seq;
-	struct vma_lock *vm_lock;
+	struct vma_lock vm_lock;
 #endif
=20
 	/*
@@ -408,17 +408,10 @@ static inline struct vm_area_struct *vma_next(struct =
vma_iterator *vmi)
 	return mas_find(&vmi->mas, ULONG_MAX);
 }
=20
-static inline bool vma_lock_alloc(struct vm_area_struct *vma)
+static inline void vma_lock_init(struct vm_area_struct *vma)
 {
-	vma->vm_lock =3D calloc(1, sizeof(struct vma_lock));
-
-	if (!vma->vm_lock)
-		return false;
-
-	init_rwsem(&vma->vm_lock->lock);
+	init_rwsem(&vma->vm_lock.lock);
 	vma->vm_lock_seq =3D UINT_MAX;
-
-	return true;
 }
=20
 static inline void vma_assert_write_locked(struct vm_area_struct *);
@@ -439,6 +432,7 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
 	vma_mark_detached(vma, false);
+	vma_lock_init(vma);
 }
=20
 static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
@@ -449,10 +443,6 @@ static inline struct vm_area_struct *vm_area_alloc(str=
uct mm_struct *mm)
 		return NULL;
=20
 	vma_init(vma, mm);
-	if (!vma_lock_alloc(vma)) {
-		free(vma);
-		return NULL;
-	}
=20
 	return vma;
 }
@@ -465,10 +455,7 @@ static inline struct vm_area_struct *vm_area_dup(struc=
t vm_area_struct *orig)
 		return NULL;
=20
 	memcpy(new, orig, sizeof(*new));
-	if (!vma_lock_alloc(new)) {
-		free(new);
-		return NULL;
-	}
+	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
=20
 	return new;
@@ -638,14 +625,8 @@ static inline void mpol_put(struct mempolicy *)
 {
 }
=20
-static inline void vma_lock_free(struct vm_area_struct *vma)
-{
-	free(vma->vm_lock);
-}
-
 static inline void __vm_area_free(struct vm_area_struct *vma)
 {
-	vma_lock_free(vma);
 	free(vma);
 }
=20
--=20
2.47.0.338.g60cca15819-goog
From nobody Sat Feb  7 21:40:51 2026
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C4373D551
	for <linux-kernel@vger.kernel.org>; Sun, 17 Nov 2024 08:09:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1731830988; cv=none;
 b=DhsHTVmNhF6WIJvk+ObcEd9mp7vKxOmRNhQJM0x3fV07m5ZLw4T+sx02lPIjS//ADlNMMMfJr7S8LZ0Dtok8eql1zI93Vz2KSA07/sY3HtmNiZDwZrLVbYdsuSNl8gd2nC7BcpLVZMFr2J8LBKcW3YDUKT5c5YekOI3TdgsIPy4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1731830988; c=relaxed/simple;
	bh=jrfr4RcUu3VId0skjHL6ylHM/kjkS/nTfX1rjyeuf1o=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=bFRipHZUDopOxuGyn1AjjEEyBiv58iYDpUqj0xwhbsicSxWs5YmYSSe4EZjuM/bYHXr1gedJrG0OrF2mWtZUsYThtjWLIbZ81d3UmTSUiEIZVfNCsM9kvkYYSI6AMyoVyvaKwJty7UfsHafwQU3VWPY/8dNurt0hDVdGHpVJDAg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=4ofDZfoI; arc=none smtp.client-ip=209.85.219.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="4ofDZfoI"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-e388c4bd92bso741280276.1
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Nov 2024 00:09:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1731830986; x=1732435786;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=b7bh6sSGESOsY9pdAptx272UcEa/Ygad7puDmCEq6Kg=;
        b=4ofDZfoIbCltQmh3wgN46QJAh32w9IycdHUW5mkKFYGfKiLUPfpWiazviTTklA/sOO
         2iSeTmmYCq28Qp31+x6B2JY51debFC1xZHNMQJ1rQFL9VHg2A3dZMOFNQBf0sthliX6n
         hZKPDSVw4TQYI4hHMI9wtrlk+1glCAgLzVMD0XUMLsWF0nuq4jG+MUybYKiHvZMZoZRo
         HoQE4jx2839RUHhglRfMiex2hPgnenHq38VPQOSGO9U5t849vuCVoFJM+fP8DaZn3ySI
         i6+9mJjzlZff0T2+G2ZmrU+BncRZE9IqoyR2e1oqsnV/zzQFTirWpEYztae13wddPg3A
         7PBg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731830986; x=1732435786;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=b7bh6sSGESOsY9pdAptx272UcEa/Ygad7puDmCEq6Kg=;
        b=MhRUc0KXB9dwrYR9H/KTUkQ2cwvQWeglakJIfcki9HShPO+Ol342NCH+1bht/uAzyf
         dKLdUfMQyHMyej4T4QXekIVkoAXDCCb7QUHqCZImqTWrOWLil4SKpiPV52LHPND6vXk8
         FP+4x+zeBJshtswuJhM7xbs+u890FbmmsYBdvbrtGZ8yuJNFy2dpJsDc7CCKgNyktVc5
         24snXPtDVXiPjW02jRpcQgnqZhMh+ipsrlsVFW0y+3WYeIBG4U6aAnWlEUsfmep2n0OQ
         8Ueo8BghbzXwYBKdvz5Jde6cJ3w97cK0PFiiz/rpETup1bKy+G0Fb5tBbW+5st0OLTBc
         nDOA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVBumnoX3FtmaJBPUcURxI/wQJm4z4WcXT5W8FaCmFYJer8VOz6CHEtulBFgh/HlmZkCBUlJsK9Z7s8kxg=@vger.kernel.org
X-Gm-Message-State: AOJu0Yzpch60xioeZSWin6IVWHgaLrT4fsJdw9EkyZGnyZeVWQVq/ZlI
	OVvOpgqGe1P5i1M4ok4OfwOweGByf2HVYwptVpCaYr8rBCKE8tL+i2d4g2Vh1mkAleqsfRn8Mdf
	Frw==
X-Google-Smtp-Source: 
 AGHT+IGF2YWbcHToc+T5cvIS4jQysZu9n+W2Aar36xDvDbuGFhbNn0TYu5GxTnL5QmwYgoOaAYtUUxzh9nI=
X-Received: from surenb-desktop.mtv.corp.google.com
 ([2a00:79e0:2e3f:8:bafc:6633:f766:6415])
 (user=surenb job=sendgmr) by 2002:a25:ab48:0:b0:e28:e6a1:fc53 with SMTP id
 3f1490d57ef6-e382639f42amr153411276.5.1731830986067; Sun, 17 Nov 2024
 00:09:46 -0800 (PST)
Date: Sun, 17 Nov 2024 00:09:29 -0800
In-Reply-To: <20241117080931.600731-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241117080931.600731-1-surenb@google.com>
X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog
Message-ID: <20241117080931.600731-4-surenb@google.com>
Subject: [PATCH v3 3/5] mm: mark vma as detached until it's added into vma
 tree
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com,
	oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com,
	peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
	souravpanda@google.com, pasha.tatashin@soleen.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Current implementation does not set detached flag when a VMA is first
allocated. This does not represent the real state of the VMA, which is
detached until it is added into mm's VMA tree. Fix this by marking new
VMAs as detached and resetting detached flag only after VMA is added
into a tree.
Introduce vma_mark_attached() to make the API more readable and to
simplify possible future cleanup when vma->vm_mm might be used to
indicate detached vma and vma_mark_attached() will need an additional
mm parameter.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mm.h               | 27 ++++++++++++++++++++-------
 kernel/fork.c                    |  4 ++++
 mm/memory.c                      |  2 +-
 mm/vma.c                         |  6 +++---
 mm/vma.h                         |  2 ++
 tools/testing/vma/vma_internal.h | 17 ++++++++++++-----
 6 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 737c003b0a1e..dd1b6190df28 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -808,12 +808,21 @@ static inline void vma_assert_locked(struct vm_area_s=
truct *vma)
 		vma_assert_write_locked(vma);
 }
=20
-static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta=
ched)
+static inline void vma_mark_attached(struct vm_area_struct *vma)
+{
+	vma->detached =3D false;
+}
+
+static inline void vma_mark_detached(struct vm_area_struct *vma)
 {
 	/* When detaching vma should be write-locked */
-	if (detached)
-		vma_assert_write_locked(vma);
-	vma->detached =3D detached;
+	vma_assert_write_locked(vma);
+	vma->detached =3D true;
+}
+
+static inline bool is_vma_detached(struct vm_area_struct *vma)
+{
+	return vma->detached;
 }
=20
 static inline void release_fault_lock(struct vm_fault *vmf)
@@ -844,8 +853,8 @@ static inline void vma_end_read(struct vm_area_struct *=
vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
 		{ mmap_assert_write_locked(vma->vm_mm); }
-static inline void vma_mark_detached(struct vm_area_struct *vma,
-				     bool detached) {}
+static inline void vma_mark_attached(struct vm_area_struct *vma) {}
+static inline void vma_mark_detached(struct vm_area_struct *vma) {}
=20
 static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *=
mm,
 		unsigned long address)
@@ -878,7 +887,10 @@ static inline void vma_init(struct vm_area_struct *vma=
, struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-	vma_mark_detached(vma, false);
+#ifdef CONFIG_PER_VMA_LOCK
+	/* vma is not locked, can't use vma_mark_detached() */
+	vma->detached =3D true;
+#endif
 	vma_numab_state_init(vma);
 	vma_lock_init(vma);
 }
@@ -1073,6 +1085,7 @@ static inline int vma_iter_bulk_store(struct vma_iter=
ator *vmi,
 	if (unlikely(mas_is_err(&vmi->mas)))
 		return -ENOMEM;
=20
+	vma_mark_attached(vma);
 	return 0;
 }
=20
diff --git a/kernel/fork.c b/kernel/fork.c
index 7823797e31d2..f0cec673583c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -465,6 +465,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru=
ct *orig)
 	data_race(memcpy(new, orig, sizeof(*new)));
 	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
+#ifdef CONFIG_PER_VMA_LOCK
+	/* vma is not locked, can't use vma_mark_detached() */
+	new->detached =3D true;
+#endif
 	vma_numab_state_init(new);
 	dup_anon_vma_name(orig, new);
=20
diff --git a/mm/memory.c b/mm/memory.c
index 209885a4134f..d0197a0c0996 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6279,7 +6279,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s=
truct *mm,
 		goto inval;
=20
 	/* Check if the VMA got isolated after we found it */
-	if (vma->detached) {
+	if (is_vma_detached(vma)) {
 		vma_end_read(vma);
 		count_vm_vma_lock_event(VMA_LOCK_MISS);
 		/* The area was replaced with another one */
diff --git a/mm/vma.c b/mm/vma.c
index 8a454a7bbc80..73104d434567 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -295,7 +295,7 @@ static void vma_complete(struct vma_prepare *vp, struct=
 vma_iterator *vmi,
=20
 	if (vp->remove) {
 again:
-		vma_mark_detached(vp->remove, true);
+		vma_mark_detached(vp->remove);
 		if (vp->file) {
 			uprobe_munmap(vp->remove, vp->remove->vm_start,
 				      vp->remove->vm_end);
@@ -1220,7 +1220,7 @@ static void reattach_vmas(struct ma_state *mas_detach)
=20
 	mas_set(mas_detach, 0);
 	mas_for_each(mas_detach, vma, ULONG_MAX)
-		vma_mark_detached(vma, false);
+		vma_mark_attached(vma);
=20
 	__mt_destroy(mas_detach->tree);
 }
@@ -1295,7 +1295,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_s=
truct *vms,
 		if (error)
 			goto munmap_gather_failed;
=20
-		vma_mark_detached(next, true);
+		vma_mark_detached(next);
 		nrpages =3D vma_pages(next);
=20
 		vms->nr_pages +=3D nrpages;
diff --git a/mm/vma.h b/mm/vma.h
index 388d34748674..2e680f357ace 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -162,6 +162,7 @@ static inline int vma_iter_store_gfp(struct vma_iterato=
r *vmi,
 	if (unlikely(mas_is_err(&vmi->mas)))
 		return -ENOMEM;
=20
+	vma_mark_attached(vma);
 	return 0;
 }
=20
@@ -385,6 +386,7 @@ static inline void vma_iter_store(struct vma_iterator *=
vmi,
=20
 	__mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1);
 	mas_store_prealloc(&vmi->mas, vma);
+	vma_mark_attached(vma);
 }
=20
 static inline unsigned long vma_iter_addr(struct vma_iterator *vmi)
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index 11c2c38ca4e8..2fed366d20ef 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -414,13 +414,17 @@ static inline void vma_lock_init(struct vm_area_struc=
t *vma)
 	vma->vm_lock_seq =3D UINT_MAX;
 }
=20
+static inline void vma_mark_attached(struct vm_area_struct *vma)
+{
+	vma->detached =3D false;
+}
+
 static inline void vma_assert_write_locked(struct vm_area_struct *);
-static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta=
ched)
+static inline void vma_mark_detached(struct vm_area_struct *vma)
 {
 	/* When detaching vma should be write-locked */
-	if (detached)
-		vma_assert_write_locked(vma);
-	vma->detached =3D detached;
+	vma_assert_write_locked(vma);
+	vma->detached =3D true;
 }
=20
 extern const struct vm_operations_struct vma_dummy_vm_ops;
@@ -431,7 +435,8 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
-	vma_mark_detached(vma, false);
+	/* vma is not locked, can't use vma_mark_detached() */
+	vma->detached =3D true;
 	vma_lock_init(vma);
 }
=20
@@ -457,6 +462,8 @@ static inline struct vm_area_struct *vm_area_dup(struct=
 vm_area_struct *orig)
 	memcpy(new, orig, sizeof(*new));
 	vma_lock_init(new);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
+	/* vma is not locked, can't use vma_mark_detached() */
+	new->detached =3D true;
=20
 	return new;
 }
--=20
2.47.0.338.g60cca15819-goog
From nobody Sat Feb  7 21:40:51 2026
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36570155312
	for <linux-kernel@vger.kernel.org>; Sun, 17 Nov 2024 08:09:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1731830994; cv=none;
 b=ULIMvE9SCCMO2W9yifmI4Jcjq9daxsFvg0DPXiO2AdUpnzB7PUzGlpiHLdWOaxWp11eXf5zWBjfOnMC5GzzkgzEjcSv579L0SpyPPLi81i4q0KUTMjT2uGsyHEaLYHfIb6whW5Uful3azXoxMaNC/SaFkGBkyDF/7AG5juZ1oLM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1731830994; c=relaxed/simple;
	bh=hCh3MdPAENPnaEG91WJ1QkZxj9VE0Fejb/Ohi4bjigE=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=RIFA2cQoZJFKelucyT0bOnmRupL90O9Jr6Ubhh3JKa7cbo0KnwejEZ6+acu+1RSKjH0YzgZieaB/P9tVtKKFwnZTXB8PDIJnPrrScF8aTuY6bO5EkJeuQBJMa2+RATiNNj6wX6xTQYYgIbNxc2hZeZPyFtFSYMJMOkTVkhrL19g=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=In7hjj2H; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="In7hjj2H"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e3891b4a68bso343257276.2
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Nov 2024 00:09:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1731830991; x=1732435791;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ydc3xnWflSOvT3mKyRecGyeDU6jHNwbSKwasb1j3Pls=;
        b=In7hjj2Hhxb5DeTzfWxDaQ+1Yh/+U5nIl0v2CnunqS7f613p2P4eTQOa/DbKr/78s4
         W9Y3NLwAfdUbeNOAz2fFNG+2BYjepEJJauaEUejFfnahyDe/VLag0dZVksf6Q6d2rk5y
         QUQNQmtoFRnLtl2zhqEgK2RF74BN1VfzBm/qe3/EH3uSGxjfYUXmrQBmICldpfMzNdSZ
         a4oyvFwAsxCy9kZfmoZVfHvQda52sGs748s8f3ezNr52TCkPg2+slXNmzyoEjbXkDcEg
         SKfogxUilhoFreAbW6CnWfJHGcT+iD3Tua0X6tJHwddDTG5mSr48TFCj/DxfCkTFiZOO
         k5IA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731830991; x=1732435791;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ydc3xnWflSOvT3mKyRecGyeDU6jHNwbSKwasb1j3Pls=;
        b=EUmBaDBceVqyVP4XMILSnv3F1aQOA56FLzfHpN7ItErL/9YWiGyPfIJzYZm9WzmaQo
         ff0e+ksfyQvLbdYX7fu7NxYR9dF4+dPukdfoz+8A0zZDbHecs+8Bjux82NG+t/2TMrf/
         IovE1zMo0aTzt5T99l0D8jdx8djStJUay6xlBsQyGNKM7oeTuAIgAri0fmYYSgNM/YkA
         Zu9VwEDgLFl5zdKCeqyEuSyn1FGisqQbibGyJrGbhoWOqsnJkEIXYfrNn6dCp2NfpHi4
         QVd6ZWjfhft3d2q3MTwuid+aUWLrDUmiTZTa/WeyErsewZsq47H6eQ3Nfpsyw4/6ydKa
         aW1A==
X-Forwarded-Encrypted: i=1;
 AJvYcCWajPrDtJNn5osz4snYmetbV0/ff+cBtazlO4CSgpLaGaxJBKJJW5hn/Ne0KwRWMwtdFeWBZ6H95K07OcY=@vger.kernel.org
X-Gm-Message-State: AOJu0YxxOKRyA2/A5697WgoL/ObccvDmLzMw38wN/9tXEk0UkCjutxCK
	2R1sq7vh9C6iCR94R6x7yvCMTw9RopyAGsR4PFUvPBbDL/ZUwb6KVfMi3OEYFWUqELSvVOAutD6
	R1g==
X-Google-Smtp-Source: 
 AGHT+IFBFQn/cqW/Nbk+DrZLqER7IMLE2PxGHl++p5as9o7fv/NkZldv72m8b+Y3E+tx/FPb9DpLbddXQ/Q=
X-Received: from surenb-desktop.mtv.corp.google.com
 ([2a00:79e0:2e3f:8:bafc:6633:f766:6415])
 (user=surenb job=sendgmr) by 2002:a25:e097:0:b0:e28:eae7:f838 with SMTP id
 3f1490d57ef6-e3825bc27f4mr8116276.0.1731830990609; Sun, 17 Nov 2024 00:09:50
 -0800 (PST)
Date: Sun, 17 Nov 2024 00:09:30 -0800
In-Reply-To: <20241117080931.600731-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241117080931.600731-1-surenb@google.com>
X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog
Message-ID: <20241117080931.600731-5-surenb@google.com>
Subject: [PATCH v3 4/5] mm: make vma cache SLAB_TYPESAFE_BY_RCU
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com,
	oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com,
	peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
	souravpanda@google.com, pasha.tatashin@soleen.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that
object reuse before RCU grace period is over will be detected inside
lock_vma_under_rcu().
lock_vma_under_rcu() enters RCU read section, finds the vma at the
given address, locks the vma and checks if it got detached or remapped
to cover a different address range. These last checks are there
to ensure that the vma was not modified after we found it but before
locking it.
vma reuse introduces several new possibilities:
1. vma can be reused after it was found but before it is locked;
2. vma can be reused and reinitialized (including changing its vm_mm)
while being locked in vma_start_read();
3. vma can be reused and reinitialized after it was found but before
it is locked, then attached at a new address or to a new mm while being
read-locked;
For case #1 current checks will help detecting cases when:
- vma was reused but not yet added into the tree (detached check)
- vma was reused at a different address range (address check);
We are missing the check for vm_mm to ensure the reused vma was not
attached to a different mm. This patch adds the missing check.
For case #2, we pass mm to vma_start_read() to prevent access to
unstable vma->vm_mm.
For case #3, we write-lock the vma in vma_mark_attached(), ensuring that
vma does not get re-attached while read-locked by a user of the vma
before it was recycled.
This write-locking should not cause performance issues because contention
during vma_mark_attached() can happen only in the rare vma reuse case.
Even when this happens, it's the slowpath (write-lock) which will be
waiting, not the page fault path.
After these provisions, SLAB_TYPESAFE_BY_RCU is added to vm_area_cachep.
This will facilitate vm_area_struct reuse and will minimize the number
of call_rcu() calls.
Adding a freeptr_t into vm_area_struct (unioned with vm_start/vm_end)
could be used to avoids bloating the structure, however currently
custom free pointers are not supported in combination with a ctor
(see the comment for kmem_cache_args.freeptr_offset).

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mm.h               | 48 ++++++++++++++++++++++++-----
 include/linux/mm_types.h         | 13 +++-----
 kernel/fork.c                    | 53 +++++++++++++++++++-------------
 mm/memory.c                      |  7 +++--
 mm/vma.c                         |  2 +-
 tools/testing/vma/vma_internal.h |  7 +++--
 6 files changed, 86 insertions(+), 44 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index dd1b6190df28..d8e10e1e34ad 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,7 +257,7 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *=
);
 struct vm_area_struct *vm_area_dup(struct vm_area_struct *);
 void vm_area_free(struct vm_area_struct *);
 /* Use only if VMA has no other users */
-void __vm_area_free(struct vm_area_struct *vma);
+void vm_area_free_unreachable(struct vm_area_struct *vma);
=20
 #ifndef CONFIG_MMU
 extern struct rb_root nommu_region_tree;
@@ -690,12 +690,32 @@ static inline void vma_lock_init(struct vm_area_struc=
t *vma)
 	vma->vm_lock_seq =3D UINT_MAX;
 }
=20
+#define VMA_BEFORE_LOCK		offsetof(struct vm_area_struct, vm_lock)
+#define VMA_LOCK_END(vma)	\
+	(((void *)(vma)) + offsetofend(struct vm_area_struct, vm_lock))
+#define VMA_AFTER_LOCK		\
+	(sizeof(struct vm_area_struct) - offsetofend(struct vm_area_struct, vm_lo=
ck))
+
+static inline void vma_clear(struct vm_area_struct *vma)
+{
+	/* Preserve vma->vm_lock */
+	memset(vma, 0, VMA_BEFORE_LOCK);
+	memset(VMA_LOCK_END(vma), 0, VMA_AFTER_LOCK);
+}
+
+static inline void vma_copy(struct vm_area_struct *new, struct vm_area_str=
uct *orig)
+{
+	/* Preserve vma->vm_lock */
+	data_race(memcpy(new, orig, VMA_BEFORE_LOCK));
+	data_race(memcpy(VMA_LOCK_END(new), VMA_LOCK_END(orig), VMA_AFTER_LOCK));
+}
+
 /*
  * Try to read-lock a vma. The function is allowed to occasionally yield f=
alse
  * locked result to avoid performance overhead, in which case we fall back=
 to
  * using mmap_lock. The function should never yield false unlocked result.
  */
-static inline bool vma_start_read(struct vm_area_struct *vma)
+static inline bool vma_start_read(struct mm_struct *mm, struct vm_area_str=
uct *vma)
 {
 	/*
 	 * Check before locking. A race might cause false locked result.
@@ -704,7 +724,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 	 * we don't rely on for anything - the mm_lock_seq read against which we
 	 * need ordering is below.
 	 */
-	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.=
sequence))
+	if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(mm->mm_lock_seq.sequence=
))
 		return false;
=20
 	if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0))
@@ -721,7 +741,7 @@ static inline bool vma_start_read(struct vm_area_struct=
 *vma)
 	 * after it has been unlocked.
 	 * This pairs with RELEASE semantics in vma_end_write_all().
 	 */
-	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo=
ck_seq))) {
+	if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&mm->mm_lock_seq))=
) {
 		up_read(&vma->vm_lock.lock);
 		return false;
 	}
@@ -810,7 +830,18 @@ static inline void vma_assert_locked(struct vm_area_st=
ruct *vma)
=20
 static inline void vma_mark_attached(struct vm_area_struct *vma)
 {
+	/* vma shoudn't be already attached */
+	VM_BUG_ON_VMA(!vma->detached, vma);
+
+	/*
+	 * Lock here can be contended only if the vma got reused after
+	 * lock_vma_under_rcu() found it but before it had a chance to
+	 * read-lock it. Write-locking the vma guarantees that the vma
+	 * won't be attached until all its old users are out.
+	 */
+	down_write(&vma->vm_lock.lock);
 	vma->detached =3D false;
+	up_write(&vma->vm_lock.lock);
 }
=20
 static inline void vma_mark_detached(struct vm_area_struct *vma)
@@ -847,7 +878,11 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_st=
ruct *mm,
 #else /* CONFIG_PER_VMA_LOCK */
=20
 static inline void vma_lock_init(struct vm_area_struct *vma) {}
-static inline bool vma_start_read(struct vm_area_struct *vma)
+static inline void vma_clear(struct vm_area_struct *vma)
+		{ memset(vma, 0, sizeof(*vma)); }
+static inline void vma_copy(struct vm_area_struct *new, struct vm_area_str=
uct *orig)
+		{ data_race(memcpy(new, orig, sizeof(*new))); }
+static inline bool vma_start_read(struct mm_struct *mm, struct vm_area_str=
uct *vma)
 		{ return false; }
 static inline void vma_end_read(struct vm_area_struct *vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
@@ -883,7 +918,7 @@ extern const struct vm_operations_struct vma_dummy_vm_o=
ps;
=20
 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *=
mm)
 {
-	memset(vma, 0, sizeof(*vma));
+	vma_clear(vma);
 	vma->vm_mm =3D mm;
 	vma->vm_ops =3D &vma_dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
@@ -892,7 +927,6 @@ static inline void vma_init(struct vm_area_struct *vma,=
 struct mm_struct *mm)
 	vma->detached =3D true;
 #endif
 	vma_numab_state_init(vma);
-	vma_lock_init(vma);
 }
=20
 /* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5c4bfdcfac72..8f6b0c935c2b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -667,15 +667,10 @@ struct vma_numab_state {
 struct vm_area_struct {
 	/* The first cache line has the info for VMA tree walking. */
=20
-	union {
-		struct {
-			/* VMA covers [vm_start; vm_end) addresses within mm */
-			unsigned long vm_start;
-			unsigned long vm_end;
-		};
-#ifdef CONFIG_PER_VMA_LOCK
-		struct rcu_head vm_rcu;	/* Used for deferred freeing. */
-#endif
+	struct {
+		/* VMA covers [vm_start; vm_end) addresses within mm */
+		unsigned long vm_start;
+		unsigned long vm_end;
 	};
=20
 	/*
diff --git a/kernel/fork.c b/kernel/fork.c
index f0cec673583c..76c68b041f8a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -436,6 +436,11 @@ static struct kmem_cache *vm_area_cachep;
 /* SLAB cache for mm_struct structures (tsk->mm) */
 static struct kmem_cache *mm_cachep;
=20
+static void vm_area_ctor(void *data)
+{
+	vma_lock_init(data);
+}
+
 struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
@@ -462,8 +467,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc=
t *orig)
 	 * orig->shared.rb may be modified concurrently, but the clone
 	 * will be reinitialized.
 	 */
-	data_race(memcpy(new, orig, sizeof(*new)));
-	vma_lock_init(new);
+	vma_copy(new, orig);
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 #ifdef CONFIG_PER_VMA_LOCK
 	/* vma is not locked, can't use vma_mark_detached() */
@@ -475,32 +479,37 @@ struct vm_area_struct *vm_area_dup(struct vm_area_str=
uct *orig)
 	return new;
 }
=20
-void __vm_area_free(struct vm_area_struct *vma)
+static void __vm_area_free(struct vm_area_struct *vma, bool unreachable)
 {
+#ifdef CONFIG_PER_VMA_LOCK
+	/*
+	 * With SLAB_TYPESAFE_BY_RCU, vma can be reused and we need
+	 * vma->detached to be set before vma is returned into the cache.
+	 * This way reused object won't be used by readers until it's
+	 * initialized and reattached.
+	 * If vma is unreachable, there can be no other users and we
+	 * can set vma->detached directly with no risk of a race.
+	 * If vma is reachable, then it should have been already detached
+	 * under vma write-lock or it was never attached.
+	 */
+	if (unreachable)
+		vma->detached =3D true;
+	else
+		VM_BUG_ON_VMA(!is_vma_detached(vma), vma);
+#endif
 	vma_numab_state_free(vma);
 	free_anon_vma_name(vma);
 	kmem_cache_free(vm_area_cachep, vma);
 }
=20
-#ifdef CONFIG_PER_VMA_LOCK
-static void vm_area_free_rcu_cb(struct rcu_head *head)
+void vm_area_free(struct vm_area_struct *vma)
 {
-	struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct,
-						  vm_rcu);
-
-	/* The vma should not be locked while being destroyed. */
-	VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma);
-	__vm_area_free(vma);
+	__vm_area_free(vma, false);
 }
-#endif
=20
-void vm_area_free(struct vm_area_struct *vma)
+void vm_area_free_unreachable(struct vm_area_struct *vma)
 {
-#ifdef CONFIG_PER_VMA_LOCK
-	call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb);
-#else
-	__vm_area_free(vma);
-#endif
+	__vm_area_free(vma, true);
 }
=20
 static void account_kernel_stack(struct task_struct *tsk, int account)
@@ -3135,9 +3144,11 @@ void __init proc_caches_init(void)
 			sizeof(struct fs_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
-	vm_area_cachep =3D KMEM_CACHE(vm_area_struct,
-			SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC|
-			SLAB_ACCOUNT);
+	vm_area_cachep =3D kmem_cache_create("vm_area_struct",
+			sizeof(struct vm_area_struct), 0,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
+			SLAB_ACCOUNT, vm_area_ctor);
+
 	mmap_init();
 	nsproxy_cache_init();
 }
diff --git a/mm/memory.c b/mm/memory.c
index d0197a0c0996..c8a3e820ed66 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6275,7 +6275,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s=
truct *mm,
 	if (!vma)
 		goto inval;
=20
-	if (!vma_start_read(vma))
+	if (!vma_start_read(mm, vma))
 		goto inval;
=20
 	/* Check if the VMA got isolated after we found it */
@@ -6292,8 +6292,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s=
truct *mm,
 	 * fields are accessible for RCU readers.
 	 */
=20
-	/* Check since vm_start/vm_end might change before we lock the VMA */
-	if (unlikely(address < vma->vm_start || address >=3D vma->vm_end))
+	/* Check since vm_mm/vm_start/vm_end might change before we lock the VMA =
*/
+	if (unlikely(vma->vm_mm !=3D mm ||
+		     address < vma->vm_start || address >=3D vma->vm_end))
 		goto inval_end_read;
=20
 	rcu_read_unlock();
diff --git a/mm/vma.c b/mm/vma.c
index 73104d434567..050b83df3df2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -382,7 +382,7 @@ void remove_vma(struct vm_area_struct *vma, bool unreac=
hable)
 		fput(vma->vm_file);
 	mpol_put(vma_policy(vma));
 	if (unreachable)
-		__vm_area_free(vma);
+		vm_area_free_unreachable(vma);
 	else
 		vm_area_free(vma);
 }
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter=
nal.h
index 2fed366d20ef..fd668d6cafc0 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -632,14 +632,15 @@ static inline void mpol_put(struct mempolicy *)
 {
 }
=20
-static inline void __vm_area_free(struct vm_area_struct *vma)
+static inline void vm_area_free(struct vm_area_struct *vma)
 {
 	free(vma);
 }
=20
-static inline void vm_area_free(struct vm_area_struct *vma)
+static inline void vm_area_free_unreachable(struct vm_area_struct *vma)
 {
-	__vm_area_free(vma);
+	vma->detached =3D true;
+	vm_area_free(vma);
 }
=20
 static inline void lru_add_drain(void)
--=20
2.47.0.338.g60cca15819-goog
From nobody Sat Feb  7 21:40:51 2026
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEDE7170A11
	for <linux-kernel@vger.kernel.org>; Sun, 17 Nov 2024 08:09:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1731830997; cv=none;
 b=MpIXkmjvDsuf4ODaJQuqdJi18ki+1zKbA3bupXD47zWAW8UZy+QIWaZmK0nvjJa78V8IAXC7bKh3IHhjQz3YPTS9HjB2BxDZnXaFLBJa57Dzf9VQwZX2N861GTUbuUrRRCf3ktgRrBv/xcUU5Nv69+nyzyCTm5evUyxmVKqlvXk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1731830997; c=relaxed/simple;
	bh=vW4Z0ZGOmtEV/XeePxhUsVUAClcMrrgx+7r+mI7C7D0=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=PdgQ/oy2gJQcx2ZOnlLc/3KIS64IV2QKQTj0kUNWkmfqXz8gIHcHDjP0LcIWSdJP6AIZsCBtmdltHigvU5nIz4as2YjAa25s6SqXvcBiJnj/+x6bHTHzuGPc/PadPgExSW7C+e7pH/OwE2N4806mdI10ussBvHFgE17ytROVfak=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=RLdtbwIV; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--surenb.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="RLdtbwIV"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e30b8fd4ca1so4294901276.3
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Nov 2024 00:09:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1731830995; x=1732435795;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=TVe9xIxnbmkl7SmN6iZyRrlBkQz17Arq0DLY4ssSJic=;
        b=RLdtbwIVV47RHFFJ0tSqs/iwT4hqyIH8joNbIL/CTmntbBzG/OadRByae5MC4zYovN
         3tqPl+nAAI0cyvOAQIlMWnPyCh0M9uJavEHPPI8BzRm2auQwlttQSYTxhCp2Rkaev/M2
         7YPl6YKvvWYW1NzJ2rX+fax2X0EwE62wrywvjagXPNajWPMkQ0Kd2tIbTN/vRURJrB+P
         EJ/dMxSypLvirshPxKr54mBCXcxKqcwTQrMsX+3g/nwi5zUoDwM6sFa3g3uBOFLZ0tM8
         4bc7BZG0GgF/8C8FpmpyVOkAuQbAiQYmcHnFvi1aXHN2T69pYWDFbR15T0jALvdhuMvf
         fRGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731830995; x=1732435795;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=TVe9xIxnbmkl7SmN6iZyRrlBkQz17Arq0DLY4ssSJic=;
        b=CMREIgRrAaXF6X4XtlbdmKWkTosajD8H2PWuoYQN8DDQo3mks1vJeuqkUWnMWP4DfM
         bbRCtBaQj6a4TJWTm58s2MwWfpWAx+j4b5WI5L1zghSA4dS9V0Tp1X/o8stZ6S14RyDy
         tOZCSH81iQHhsOi3CyFx9Fh1GRj0jeYqhjoE6f08HVCYSsPjpbas3rRwYoL4cXbtUwnS
         VvgDMn8phIGgdooWdmqOWZm4ZtXO9Zi728wfrqx/q5u7z3FHLe4X9uT3MviGscSuVBNA
         4EuwB79j6F08phep15jfh3B5u3ZVOWx6KdqkmDaRT/C6oMPKZtJNsSyvLNSYi9h3qNze
         wRXA==
X-Forwarded-Encrypted: i=1;
 AJvYcCX+uk11ZzPqA0co/uy5mawtMQUkOUGsqzgg4Qss4K/rCADnt785lG0XnSMYFwMjeNrMFaX3Tyo7hzSh+Qw=@vger.kernel.org
X-Gm-Message-State: AOJu0YzZGRpjLvjS+ep/TCOR/Hvxg+Ceqolsyu1Y3Tpd6hlF3RzjEIak
	nw6ZHFLX2BbPTJCMRvKThi5l7bpzVa51zuQU5R9lXHxL6KDqVCI/YJl4u5cYnE/bBi7cExBU7IF
	cbw==
X-Google-Smtp-Source: 
 AGHT+IHOo0SNvb429sH9y3xLmKrGp2y4SlbBBZl72lF8+egqZzHX2n9q6YU+duTw+pCfixwfbSLpKdu+Ko4=
X-Received: from surenb-desktop.mtv.corp.google.com
 ([2a00:79e0:2e3f:8:bafc:6633:f766:6415])
 (user=surenb job=sendgmr) by 2002:a25:aa8b:0:b0:e38:816c:df18 with SMTP id
 3f1490d57ef6-e38816cec5dmr3614276.4.1731830994994; Sun, 17 Nov 2024 00:09:54
 -0800 (PST)
Date: Sun, 17 Nov 2024 00:09:31 -0800
In-Reply-To: <20241117080931.600731-1-surenb@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241117080931.600731-1-surenb@google.com>
X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog
Message-ID: <20241117080931.600731-6-surenb@google.com>
Subject: [PATCH v3 5/5] docs/mm: document latest changes to vm_lock
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
	mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com,
	oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com,
	peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com,
	minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
	souravpanda@google.com, pasha.tatashin@soleen.com, corbet@lwn.net,
	linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@android.com, surenb@google.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Change the documentation to reflect that vm_lock is integrated into vma.
Document newly introduced vma_start_read_locked{_nested} functions.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 Documentation/mm/process_addrs.rst | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_=
addrs.rst
index 1bf7ad010fc0..a18450b6496d 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -686,7 +686,11 @@ calls :c:func:`!rcu_read_lock` to ensure that the VMA =
is looked up in an RCU
 critical section, then attempts to VMA lock it via :c:func:`!vma_start_rea=
d`,
 before releasing the RCU lock via :c:func:`!rcu_read_unlock`.
=20
-VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semapho=
re for
+In cases when the user already holds mmap read lock, :c:func:`!vma_start_r=
ead_locked`
+and :c:func:`!vma_start_read_locked_nested` can be used. These functions a=
lways
+succeed in acquiring VMA read lock.
+
+VMA read locks hold the read lock on the :c:member:`!vma.vm_lock` semaphor=
e for
 their duration and the caller of :c:func:`!lock_vma_under_rcu` must releas=
e it
 via :c:func:`!vma_end_read`.
=20
@@ -750,7 +754,7 @@ keep VMAs locked across entirely separate write operati=
ons. It also maintains
 correct lock ordering.
=20
 Each time a VMA read lock is acquired, we acquire a read lock on the
-:c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking=
 that
+:c:member:`!vma.vm_lock` read/write semaphore and hold it, while checking =
that
 the sequence count of the VMA does not match that of the mm.
=20
 If it does, the read lock fails. If it does not, we hold the lock, excludi=
ng
@@ -760,7 +764,7 @@ Importantly, maple tree operations performed in :c:func=
:`!lock_vma_under_rcu`
 are also RCU safe, so the whole read lock operation is guaranteed to funct=
ion
 correctly.
=20
-On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock`
+On the write side, we acquire a write lock on the :c:member:`!vma.vm_lock`
 read/write semaphore, before setting the VMA's sequence number under this =
lock,
 also simultaneously holding the mmap write lock.
=20
--=20
2.47.0.338.g60cca15819-goog