From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5007153812 for ; Sat, 11 Jan 2025 04:26:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569572; cv=none; b=ETfZroatrnVX+rCdQ++p1IeGL/mExVG1PZhaeyLaM4eLEdmYfv+fmsB1HPOKBqpgGhJ32yZbgAtQwXLPfaSIzahXC0FMxjWffHsjxpTvUzcGnso6zs6yzBLi3EcyTWkzpJjEtDD4mmEbbh7Wo9dOrzkLJo8WPr6oNhGdSgQn4XU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569572; c=relaxed/simple; bh=IFsUotKGGDHGRLDCO6jSjVbD3ipjnlQzpmOBsgKaW2o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oTXeN8O1ObKovBG54umzpYHOBfoC6TZwGt8K9hg3qsNdom8Hzz3wbLsBOIUvXECTuhavRg2PUQE4+Q4ckKqKbVlpEF2LZ/nm+8fddSVXsaBao3BKZOw4qOW0LneUx5XxXe32uc7q+YOLKcwL427UKojvPBNwTSsbJCgwxM4yGsM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=FxiBbuFp; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FxiBbuFp" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2162259a5dcso75223505ad.3 for ; Fri, 10 Jan 2025 20:26:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569570; x=1737174370; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1I/dUK+s3BXYUQU4Ou354w8mERlKGBrfGL48eT2fYzA=; b=FxiBbuFphGae2TX5ZR2zxf1WMGHvtZ0ygAK+Yjntpr/rkxip7vWALfzPyIisx7Q24a VTq8e7WZPihrlYr3ZrcFCSelCIBZJd0SppLaufUHvp+wEnlncudqWJDdaIU+fQIq7sUO hC9c2rwxOsvyF+JPsyTG3+7ZfVeB6Iktm0ao9qGbVrGg96kWEgtJ1sGM9p+Ii50FXbhv bAiwLwa/hkKL2ulHR5i8y4sTFeVqW4EOpo2OUHqfxLCcxRvVPKcaCa5N8UKk4h5GLSMW w6y4yYgQINX8HxT6cE7GDGTa3luEDvZSesnIpTt8xl2EpqN9sdNUwnrGFe9Ggk/IMp+D nwgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569570; x=1737174370; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1I/dUK+s3BXYUQU4Ou354w8mERlKGBrfGL48eT2fYzA=; b=WB1cpbPl85vxZBKbpCEuMkyVZ3s39Xmpde0cV7XKBiIwLTO5tdFZo4F3jGsOfpJXPI z4G3tMu5WgZMKDp9g7qzmWEA1Kmg1N4bfCLIToGxmlY9IPCT8nqbGYh0aLrNMOrPBpd4 ef2Y6Q2u/93i72+jgM7+My7RykmGCD5aphhJXeXf4Sn0kCGfznp4LfG5uoQCG9XX/W8Y Mr8QVPPVsHynK1C7DNeEUMN2QRxHFAM//cfuWppyHdeHSlTgt1aLCB+DRV0E9BuYCKAy PyXeD8xV9pEznKs+jCYlJz+xKYF0PhnWMmLEXVIAUpXrec3ZK2RbCCzBj4WvcKL6X6qa idtg== X-Forwarded-Encrypted: i=1; AJvYcCW+vjbGzB4sTQhZv+VH0vE4suEZa1VxYI79X599xb4y6CfDU/dJVF4ugG85g8DFyt5VxUPtQrGTMW7RL4k=@vger.kernel.org X-Gm-Message-State: AOJu0YziwqexG0RIfeTl9XCdu8aqJDOZN87rGuo7HuVO2my7ki6dC7tX MG2OIDoZQNe+Rjgl3tz/yRwKZmrtdLjId61tjvijWesZ0tgq1rh/Ax0lUfWCZ9bZ6EIOo5HMc8v Bqg== X-Google-Smtp-Source: AGHT+IH+ZTrdGgYL0EsT9nxpjvST8Wb4PiTOAlOCLTZpVYAikP8aBq9n9KO+/zLUPpIwa+3cew7Rdi532wM= X-Received: from plrf18.prod.google.com ([2002:a17:902:ab92:b0:216:61ba:610]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d58a:b0:216:6c77:7bbb with SMTP id d9443c01a7336-21a83f573ccmr189000015ad.17.1736569570248; Fri, 10 Jan 2025 20:26:10 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:48 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-2-surenb@google.com> Subject: [PATCH v9 01/17] mm: introduce vma_start_read_locked{_nested} helpers From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, "Liam R. Howlett" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce helper functions which can be used to read-lock a VMA when holding mmap_lock for read. Replace direct accesses to vma->vm_lock with these new helpers. Signed-off-by: Suren Baghdasaryan Reviewed-by: Lorenzo Stoakes Reviewed-by: Davidlohr Bueso Reviewed-by: Shakeel Butt Reviewed-by: Vlastimil Babka Reviewed-by: Liam R. Howlett Tested-by: Shivank Garg --- include/linux/mm.h | 24 ++++++++++++++++++++++++ mm/userfaultfd.c | 22 +++++----------------- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8483e09aeb2c..1c0250c187f6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -735,6 +735,30 @@ static inline bool vma_start_read(struct vm_area_struc= t *vma) return true; } =20 +/* + * Use only while holding mmap read lock which guarantees that locking wil= l not + * fail (nobody can concurrently write-lock the vma). vma_start_read() sho= uld + * not be used in such cases because it might fail due to mm_lock_seq over= flow. + * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. + */ +static inline void vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) +{ + mmap_assert_locked(vma->vm_mm); + down_read_nested(&vma->vm_lock->lock, subclass); +} + +/* + * Use only while holding mmap read lock which guarantees that locking wil= l not + * fail (nobody can concurrently write-lock the vma). vma_start_read() sho= uld + * not be used in such cases because it might fail due to mm_lock_seq over= flow. + * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. + */ +static inline void vma_start_read_locked(struct vm_area_struct *vma) +{ + mmap_assert_locked(vma->vm_mm); + down_read(&vma->vm_lock->lock); +} + static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index af3dfc3633db..4527c385935b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -84,16 +84,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_st= ruct *mm, =20 mmap_read_lock(mm); vma =3D find_vma_and_prepare_anon(mm, address); - if (!IS_ERR(vma)) { - /* - * We cannot use vma_start_read() as it may fail due to - * false locked (see comment in vma_start_read()). We - * can avoid that by directly locking vm_lock under - * mmap_lock, which guarantees that nobody can lock the - * vma for write (vma_start_write()) under us. - */ - down_read(&vma->vm_lock->lock); - } + if (!IS_ERR(vma)) + vma_start_read_locked(vma); =20 mmap_read_unlock(mm); return vma; @@ -1491,14 +1483,10 @@ static int uffd_move_lock(struct mm_struct *mm, mmap_read_lock(mm); err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap); if (!err) { - /* - * See comment in uffd_lock_vma() as to why not using - * vma_start_read() here. - */ - down_read(&(*dst_vmap)->vm_lock->lock); + vma_start_read_locked(*dst_vmap); if (*dst_vmap !=3D *src_vmap) - down_read_nested(&(*src_vmap)->vm_lock->lock, - SINGLE_DEPTH_NESTING); + vma_start_read_locked_nested(*src_vmap, + SINGLE_DEPTH_NESTING); } mmap_read_unlock(mm); return err; --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CA58155A2F for ; Sat, 11 Jan 2025 04:26:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569574; cv=none; b=FR3/cnWoUAAzP4ahNDugbKYUIalfMZyqsUauiT3fG6BZ1KaQQZ327N2YoNbZf+d0toXB5aYesUG7uuq09S/XN2IJ8pqXIFpmXgVbJDRwm3xRbQLhuhHrAmhr/rvWtE/AsHlPHB0lEMBVzFRpcZGWsAO6QNkhfqsdA0JCIDB4nhk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569574; c=relaxed/simple; bh=Jc0JRy/tz8Ev/8dH0bLH5RUwdywtPTreD2KiR+7UBFg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lVP+RPFCvb1tf2CccKIBnmJMGmdTTLPTmYxqbp5DJmbjyqxfvz72TrD+ccm0lPnCeqjNPq2BNP6uueZeO6CkK8SBmTzY0GHoGa/AJqiqZeNXjxmCcXEm5WMdFC+VaNnW80QnbV3K+554WHqjhNaI5K9zob4Pp09QIIK4/qYQN2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=E6TFu0Sv; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="E6TFu0Sv" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef9b9981f1so6886748a91.3 for ; Fri, 10 Jan 2025 20:26:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569572; x=1737174372; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=5pJF9dh9SQ99ZsNMTOkH4coKtI9mgkiS0QQ1s/qtBA0=; b=E6TFu0Svj70TXQ2oRoCyXkhseKG8T9PcmoMMhX5Z9xjDzcfMHbY1sJxOZ6D8XBHyvX F5xi8uq6sP9ua8+DDk2AG7ugLH407yp3CbZJ7NWRX9YnZexZE4rPqxJX6/OwOPsSmXBR TNpDOA4egx7wKEIrQBUeT+9hJs4uR6xq2013hjI9iaYUapDS34kcnSOWvRtT94Ah83Hg dQgQKl/w9SB3KVOFKgJsgd+BButYjHJJD0aPxVxf9TsWOW6d/d3hwD4h9ybfCQfy01yF QXnMOVf2J5tCZJnVajnVfwcfXcI3/8qLbeGn+onvTgbhmTEqGo7uq+A99ic3kr5eSqHJ ulPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569572; x=1737174372; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5pJF9dh9SQ99ZsNMTOkH4coKtI9mgkiS0QQ1s/qtBA0=; b=tLkjTWE8XbNQkqsuaLWjaChvOr8ZGkdloZ9uoO4zNdHMaoUUFTEtkRRrvMVR3/wEVL 2sPqouTeUdIxW9mFODWKlWHdrNdFE5U7HzEK0SblOihQ44ADo+N/8zpr/07E6pyFE06G m5EAL57vFtcRu2AQ6RUEaS+ySDF3JV34QspLzz8pIKZb1SQHrkDLa6Z98dcO+nwWXH02 JblYAvKLnaxy76i/kPgvoZSk5/q5yU1tlv7gy1tSPRuhI8j2RuQciEij88guT7Htzc1m SnlbzLr/Hll34cn5dJYW9nXtbAg2y4UABXD18Ym9HXV1LTzYXR3UduLh30kyWghTqwDS hBWQ== X-Forwarded-Encrypted: i=1; AJvYcCUHIslBGhGQQRrMwEF7WGKQ9tMjxPo5IixGyrHbatmmd0z12IFW1tYluPI2VFpIZE3RQouId7UKN3iFAN4=@vger.kernel.org X-Gm-Message-State: AOJu0YyjM3DDXM3dhecPExPpHv7K+aT595ZUxDQJAz7NeNICebtbNa+n dai8+5hY8iclHqmfgSxEghF1s42dpDYWQbOfk92te4NUlvk7m6ypZwoKHufUfz6hh3dHekR/Gfu Vbw== X-Google-Smtp-Source: AGHT+IFh7TfnUZiu/6FpeM7mSX4ZUoHPFADM7v9c7q9mHn92V8uTq9qQpCpJ89f5SPaq/DM5hqZIsT2lYpA= X-Received: from pjbdj16.prod.google.com ([2002:a17:90a:d2d0:b0:2ee:4826:cae3]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3d09:b0:2ee:cddd:2454 with SMTP id 98e67ed59e1d1-2f548f39a8amr20621472a91.15.1736569572290; Fri, 10 Jan 2025 20:26:12 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:49 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-3-surenb@google.com> Subject: [PATCH v9 02/17] mm: move per-vma lock into vm_area_struct From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, "Liam R. Howlett" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Back when per-vma locks were introduces, vm_lock was moved out of vm_area_struct in [1] because of the performance regression caused by false cacheline sharing. Recent investigation [2] revealed that the regressions is limited to a rather old Broadwell microarchitecture and even there it can be mitigated by disabling adjacent cacheline prefetching, see [3]. Splitting single logical structure into multiple ones leads to more complicated management, extra pointer dereferences and overall less maintainable code. When that split-away part is a lock, it complicates things even further. With no performance benefits, there are no reasons for this split. Merging the vm_lock back into vm_area_struct also allows vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset. Move vm_lock back into vm_area_struct, aligning it at the cacheline boundary and changing the cache to be cacheline-aligned as well. With kernel compiled using defconfig, this causes VMA memory consumption to grow from 160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes: slabinfo before: ... : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 160 51 2 : ... slabinfo after moving vm_lock: ... : ... vm_area_struct ... 256 32 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages, which is 5.5MB per 100000 VMAs. Note that the size of this structure is dependent on the kernel configuration and typically the original size is higher than 160 bytes. Therefore these calculations are close to the worst case scenario. A more realistic vm_area_struct usage before this change is: ... : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 176 46 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages, which is 3.9MB per 100000 VMAs. This memory consumption growth can be addressed later by optimizing the vm_lock. [1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/ [2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/ [3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbf= P_pR+-2g@mail.gmail.com/ Signed-off-by: Suren Baghdasaryan Reviewed-by: Lorenzo Stoakes Reviewed-by: Shakeel Butt Reviewed-by: Vlastimil Babka Reviewed-by: Liam R. Howlett Tested-by: Shivank Garg --- include/linux/mm.h | 28 ++++++++++-------- include/linux/mm_types.h | 6 ++-- kernel/fork.c | 49 ++++---------------------------- tools/testing/vma/vma_internal.h | 33 +++++---------------- 4 files changed, 32 insertions(+), 84 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1c0250c187f6..ed739406b0a7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -697,6 +697,12 @@ static inline void vma_numab_state_free(struct vm_area= _struct *vma) {} #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_PER_VMA_LOCK +static inline void vma_lock_init(struct vm_area_struct *vma) +{ + init_rwsem(&vma->vm_lock.lock); + vma->vm_lock_seq =3D UINT_MAX; +} + /* * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to @@ -714,7 +720,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.= sequence)) return false; =20 - if (unlikely(down_read_trylock(&vma->vm_lock->lock) =3D=3D 0)) + if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0)) return false; =20 /* @@ -729,7 +735,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); return false; } return true; @@ -744,7 +750,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) static inline void vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) { mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock->lock, subclass); + down_read_nested(&vma->vm_lock.lock, subclass); } =20 /* @@ -756,13 +762,13 @@ static inline void vma_start_read_locked_nested(struc= t vm_area_struct *vma, int static inline void vma_start_read_locked(struct vm_area_struct *vma) { mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock->lock); + down_read(&vma->vm_lock.lock); } =20 static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); rcu_read_unlock(); } =20 @@ -791,7 +797,7 @@ static inline void vma_start_write(struct vm_area_struc= t *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; =20 - down_write(&vma->vm_lock->lock); + down_write(&vma->vm_lock.lock); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -799,7 +805,7 @@ static inline void vma_start_write(struct vm_area_struc= t *vma) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock->lock); + up_write(&vma->vm_lock.lock); } =20 static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -811,7 +817,7 @@ static inline void vma_assert_write_locked(struct vm_ar= ea_struct *vma) =20 static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock->lock)) + if (!rwsem_is_locked(&vma->vm_lock.lock)) vma_assert_write_locked(vma); } =20 @@ -844,6 +850,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str= uct *mm, =20 #else /* CONFIG_PER_VMA_LOCK */ =20 +static inline void vma_lock_init(struct vm_area_struct *vma) {} static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} @@ -878,10 +885,6 @@ static inline void assert_fault_locked(struct vm_fault= *vmf) =20 extern const struct vm_operations_struct vma_dummy_vm_ops; =20 -/* - * WARNING: vma_init does not initialize vma->vm_lock. - * Use vm_area_alloc()/vm_area_free() if vma needs locking. - */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *= mm) { memset(vma, 0, sizeof(*vma)); @@ -890,6 +893,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); + vma_lock_init(vma); } =20 /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5f1b2dc788e2..6573d95f1d1e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -730,8 +730,6 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - /* Unstable RCU readers are allowed to read this. */ - struct vma_lock *vm_lock; #endif =20 /* @@ -784,6 +782,10 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + /* Unstable RCU readers are allowed to read this. */ + struct vma_lock vm_lock ____cacheline_aligned_in_smp; +#endif } __randomize_layout; =20 #ifdef CONFIG_NUMA diff --git a/kernel/fork.c b/kernel/fork.c index ded49f18cd95..40a8e615499f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; =20 -#ifdef CONFIG_PER_VMA_LOCK - -/* SLAB cache for vm_area_struct.lock */ -static struct kmem_cache *vma_lock_cachep; - -static bool vma_lock_alloc(struct vm_area_struct *vma) -{ - vma->vm_lock =3D kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq =3D UINT_MAX; - - return true; -} - -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - kmem_cache_free(vma_lock_cachep, vma->vm_lock); -} - -#else /* CONFIG_PER_VMA_LOCK */ - -static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return tru= e; } -static inline void vma_lock_free(struct vm_area_struct *vma) {} - -#endif /* CONFIG_PER_VMA_LOCK */ - struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct = *mm) return NULL; =20 vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - kmem_cache_free(vm_area_cachep, vma); - return NULL; - } =20 return vma; } @@ -496,10 +463,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) * will be reinitialized. */ data_race(memcpy(new, orig, sizeof(*new))); - if (!vma_lock_alloc(new)) { - kmem_cache_free(vm_area_cachep, new); - return NULL; - } + vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); dup_anon_vma_name(orig, new); @@ -511,7 +475,6 @@ void __vm_area_free(struct vm_area_struct *vma) { vma_numab_state_free(vma); free_anon_vma_name(vma); - vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } =20 @@ -522,7 +485,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); =20 /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); __vm_area_free(vma); } #endif @@ -3188,11 +3151,9 @@ void __init proc_caches_init(void) sizeof(struct fs_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - - vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); -#ifdef CONFIG_PER_VMA_LOCK - vma_lock_cachep =3D KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); -#endif + vm_area_cachep =3D KMEM_CACHE(vm_area_struct, + SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC| + SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); } diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index 2404347fa2c7..96aeb28c81f9 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -274,10 +274,10 @@ struct vm_area_struct { /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) - * - vm_lock->lock (in write mode) + * - vm_lock.lock (in write mode) * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) - * - vm_lock->lock (in read or write mode) + * - vm_lock.lock (in read or write mode) * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * @@ -286,7 +286,7 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - struct vma_lock *vm_lock; + struct vma_lock vm_lock; #endif =20 /* @@ -463,17 +463,10 @@ static inline struct vm_area_struct *vma_next(struct = vma_iterator *vmi) return mas_find(&vmi->mas, ULONG_MAX); } =20 -static inline bool vma_lock_alloc(struct vm_area_struct *vma) +static inline void vma_lock_init(struct vm_area_struct *vma) { - vma->vm_lock =3D calloc(1, sizeof(struct vma_lock)); - - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); + init_rwsem(&vma->vm_lock.lock); vma->vm_lock_seq =3D UINT_MAX; - - return true; } =20 static inline void vma_assert_write_locked(struct vm_area_struct *); @@ -496,6 +489,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); + vma_lock_init(vma); } =20 static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) @@ -506,10 +500,6 @@ static inline struct vm_area_struct *vm_area_alloc(str= uct mm_struct *mm) return NULL; =20 vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - free(vma); - return NULL; - } =20 return vma; } @@ -522,10 +512,7 @@ static inline struct vm_area_struct *vm_area_dup(struc= t vm_area_struct *orig) return NULL; =20 memcpy(new, orig, sizeof(*new)); - if (!vma_lock_alloc(new)) { - free(new); - return NULL; - } + vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); =20 return new; @@ -695,14 +682,8 @@ static inline void mpol_put(struct mempolicy *) { } =20 -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - free(vma->vm_lock); -} - static inline void __vm_area_free(struct vm_area_struct *vma) { - vma_lock_free(vma); free(vma); } =20 --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E1F31632DF for ; Sat, 11 Jan 2025 04:26:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569576; cv=none; b=dO3TB4mu6NKZySd+H5TubayiuwF+Z6RCh0D9ySwWxpI1+GVQAIgU8SDwolw1mh5we9OGmvZ+nptquzqSYOUpGmlsJGYdWu8voWEU9qg3mYsaC2ywDiCuzi6gvxVsxPtMLAldJzRZfvXy7g2DiC9Sqi0jT3i8b0LBXtrIOBOwrAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569576; c=relaxed/simple; bh=MfxR5bi1rTcQE7SYC2OiCzZrhGL3Ss5lrDM6zUl33jg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=spSFY5hSehLuPdTCstiq4osk7T5PJJBq9GdWDq9pgmgVgNCrMOUl3OPCeP8Bj69E2unx4AAYaxERkuzytpMnsBjN45eeRVyEcRIV4qRXajxa0qwzPm2pVNTdZ0HTG8rHpVrGMcpZynpK5zogeGUggegT8OMHvokxy7lkqWHK2oI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yg1109qK; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yg1109qK" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-218cf85639eso70600085ad.3 for ; Fri, 10 Jan 2025 20:26:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569574; x=1737174374; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SElJeL0WT8TUy5A74YFI2zLpX797Z27dhV0LPxWhKd0=; b=yg1109qKRQBTyrYKmRSvIclASPDtDtbrRSX5uhPySi3TwQy8roYXvNBxw4JW9D9dYv 62Ny0UBKAp6X2GdwHhW7IC5pLm9jIMQ2D4R/YWeDjwA6npfJeqtxrLJD9+GTzvI5Ugtd 07M5jk/weBvQjqttKtUwFxwese6OeghFOXwMRjcHLreo2jhVcrOpvwNq5RllEL/1PcvG Wi+4odeXAesxAcg+iimCGzQ3OsQuLUcQ4Sf7h8zt3CDzerxTk9G0yrVMAeQ0RW1oB80D 2dvRX6GN5ETygQ6d3nZV67aq7SN6ug8EqDWhlBKDWRXz5R9RrtY0f8afCjD6jBy+q8w6 QH3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569574; x=1737174374; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SElJeL0WT8TUy5A74YFI2zLpX797Z27dhV0LPxWhKd0=; b=O9vyYosYBejC5BQ1GxHs7FslPh649rRGiNu5kTit2WTv8wCI+Kr5vGQAZUKXcvNYq2 kRIf9FL9/QXH2j2VPFzGK8/HSgoMJ2ugemUc2FRSAgPaNCkrV8UO7DP0DaAZQZh/NFBj Pzybe1ZkVvUO3QCVYs7OVC0A5+i7r4gib/Zfg3dqWIkMthAI9OoBXAOb8f5q2/4e50Xz Gl1k25XnREXXjnk4oHo2q9bNM8As8HT2rZhDK5WAXCKLuN9gIcxVUrHEPmYH+NtuaqgC W4STaYzPkB56Z5uohrXQycaArYtdCmb4TU6wFK5Xxd2COuaC8xYngIcH+Z4Qg3Q9BUo3 /CDw== X-Forwarded-Encrypted: i=1; AJvYcCWzvQB2V1TudwxlKjxjL0g17nX+A2cr839e99h8SoJgriMXW0lUJhj2sK9VuiOOXNE1pdtZ4DNJ3NykjE8=@vger.kernel.org X-Gm-Message-State: AOJu0Yy+px4/5cvok1I39kUptwvGc0ysRPMI5tDsUOeUtJOqLW+5Gbrv 0Ur99nul60r5RZmNZTYa20eHXdxRnfa3lHeAVuUfKIJQGLG+pYzy5s56bafAzgpfK46qKbaNg+a isw== X-Google-Smtp-Source: AGHT+IEQSBazEnMEVuc7MWzbILvMQe6z9E5ialwdpHDO4Dnoni0JWQpSkI+iskP8PjwM5g+e4yexHOB3pPo= X-Received: from pgbcv10.prod.google.com ([2002:a05:6a02:420a:b0:7fe:5385:5c99]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:dac8:b0:215:773a:c168 with SMTP id d9443c01a7336-21a83f48cf9mr212973855ad.1.1736569574261; Fri, 10 Jan 2025 20:26:14 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:50 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-4-surenb@google.com> Subject: [PATCH v9 03/17] mm: mark vma as detached until it's added into vma tree From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, "Liam R. Howlett" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Current implementation does not set detached flag when a VMA is first allocated. This does not represent the real state of the VMA, which is detached until it is added into mm's VMA tree. Fix this by marking new VMAs as detached and resetting detached flag only after VMA is added into a tree. Introduce vma_mark_attached() to make the API more readable and to simplify possible future cleanup when vma->vm_mm might be used to indicate detached vma and vma_mark_attached() will need an additional mm parameter. Signed-off-by: Suren Baghdasaryan Reviewed-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes Reviewed-by: Vlastimil Babka Reviewed-by: Liam R. Howlett Tested-by: Shivank Garg --- include/linux/mm.h | 27 ++++++++++++++++++++------- kernel/fork.c | 4 ++++ mm/memory.c | 2 +- mm/vma.c | 6 +++--- mm/vma.h | 2 ++ tools/testing/vma/vma_internal.h | 17 ++++++++++++----- 6 files changed, 42 insertions(+), 16 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ed739406b0a7..2b322871da87 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -821,12 +821,21 @@ static inline void vma_assert_locked(struct vm_area_s= truct *vma) vma_assert_write_locked(vma); } =20 -static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta= ched) +static inline void vma_mark_attached(struct vm_area_struct *vma) +{ + vma->detached =3D false; +} + +static inline void vma_mark_detached(struct vm_area_struct *vma) { /* When detaching vma should be write-locked */ - if (detached) - vma_assert_write_locked(vma); - vma->detached =3D detached; + vma_assert_write_locked(vma); + vma->detached =3D true; +} + +static inline bool is_vma_detached(struct vm_area_struct *vma) +{ + return vma->detached; } =20 static inline void release_fault_lock(struct vm_fault *vmf) @@ -857,8 +866,8 @@ static inline void vma_end_read(struct vm_area_struct *= vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} static inline void vma_assert_write_locked(struct vm_area_struct *vma) { mmap_assert_write_locked(vma->vm_mm); } -static inline void vma_mark_detached(struct vm_area_struct *vma, - bool detached) {} +static inline void vma_mark_attached(struct vm_area_struct *vma) {} +static inline void vma_mark_detached(struct vm_area_struct *vma) {} =20 static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *= mm, unsigned long address) @@ -891,7 +900,10 @@ static inline void vma_init(struct vm_area_struct *vma= , struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_mark_detached(vma, false); +#ifdef CONFIG_PER_VMA_LOCK + /* vma is not locked, can't use vma_mark_detached() */ + vma->detached =3D true; +#endif vma_numab_state_init(vma); vma_lock_init(vma); } @@ -1086,6 +1098,7 @@ static inline int vma_iter_bulk_store(struct vma_iter= ator *vmi, if (unlikely(mas_is_err(&vmi->mas))) return -ENOMEM; =20 + vma_mark_attached(vma); return 0; } =20 diff --git a/kernel/fork.c b/kernel/fork.c index 40a8e615499f..f2f9e7b427ad 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -465,6 +465,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) data_race(memcpy(new, orig, sizeof(*new))); vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); +#ifdef CONFIG_PER_VMA_LOCK + /* vma is not locked, can't use vma_mark_detached() */ + new->detached =3D true; +#endif vma_numab_state_init(new); dup_anon_vma_name(orig, new); =20 diff --git a/mm/memory.c b/mm/memory.c index 2a20e3810534..d0dee2282325 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6349,7 +6349,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, goto inval; =20 /* Check if the VMA got isolated after we found it */ - if (vma->detached) { + if (is_vma_detached(vma)) { vma_end_read(vma); count_vm_vma_lock_event(VMA_LOCK_MISS); /* The area was replaced with another one */ diff --git a/mm/vma.c b/mm/vma.c index af1d549b179c..d603494e69d7 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -327,7 +327,7 @@ static void vma_complete(struct vma_prepare *vp, struct= vma_iterator *vmi, =20 if (vp->remove) { again: - vma_mark_detached(vp->remove, true); + vma_mark_detached(vp->remove); if (vp->file) { uprobe_munmap(vp->remove, vp->remove->vm_start, vp->remove->vm_end); @@ -1221,7 +1221,7 @@ static void reattach_vmas(struct ma_state *mas_detach) =20 mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - vma_mark_detached(vma, false); + vma_mark_attached(vma); =20 __mt_destroy(mas_detach->tree); } @@ -1296,7 +1296,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_s= truct *vms, if (error) goto munmap_gather_failed; =20 - vma_mark_detached(next, true); + vma_mark_detached(next); nrpages =3D vma_pages(next); =20 vms->nr_pages +=3D nrpages; diff --git a/mm/vma.h b/mm/vma.h index a2e8710b8c47..2a2668de8d2c 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -157,6 +157,7 @@ static inline int vma_iter_store_gfp(struct vma_iterato= r *vmi, if (unlikely(mas_is_err(&vmi->mas))) return -ENOMEM; =20 + vma_mark_attached(vma); return 0; } =20 @@ -389,6 +390,7 @@ static inline void vma_iter_store(struct vma_iterator *= vmi, =20 __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); mas_store_prealloc(&vmi->mas, vma); + vma_mark_attached(vma); } =20 static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index 96aeb28c81f9..47c8b03ffbbd 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -469,13 +469,17 @@ static inline void vma_lock_init(struct vm_area_struc= t *vma) vma->vm_lock_seq =3D UINT_MAX; } =20 +static inline void vma_mark_attached(struct vm_area_struct *vma) +{ + vma->detached =3D false; +} + static inline void vma_assert_write_locked(struct vm_area_struct *); -static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta= ched) +static inline void vma_mark_detached(struct vm_area_struct *vma) { /* When detaching vma should be write-locked */ - if (detached) - vma_assert_write_locked(vma); - vma->detached =3D detached; + vma_assert_write_locked(vma); + vma->detached =3D true; } =20 extern const struct vm_operations_struct vma_dummy_vm_ops; @@ -488,7 +492,8 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_mark_detached(vma, false); + /* vma is not locked, can't use vma_mark_detached() */ + vma->detached =3D true; vma_lock_init(vma); } =20 @@ -514,6 +519,8 @@ static inline struct vm_area_struct *vm_area_dup(struct= vm_area_struct *orig) memcpy(new, orig, sizeof(*new)); vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); + /* vma is not locked, can't use vma_mark_detached() */ + new->detached =3D true; =20 return new; } --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35A2D154456 for ; Sat, 11 Jan 2025 04:26:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569579; cv=none; b=R+SZjGcfx3IoeWfk/SW8MiNTTmpAVrdqFyFc4+lyDpNpusd306EgY7g4/iEBI25+Ffokv56jRKhN0sMU8WffmwoivOWTbhKT53OHsuh6pcWPYO8cqlQf3aueow+Mu8vRvTpDSaSawBqqFfnKFOHkqnB0evdEjDDYXe3toXoLD38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569579; c=relaxed/simple; bh=3wsxpxXJ0DyGL/HXs8ZOZgd3qcoYzOfkpO1qULBMBm8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kQ9B4k082ggQ7j2BzNq5r5cyUUoyhp5m+egt5JSedn598ixDSllI729exVW497EFybmCxqKrYtFkskMZMm1V/7ymcUGrr6HJ6KHva5RBoBttogCSAsfvfFMhhdZeMAX171FurKyGBHD/mrfMNas3wHYcq/Uz8nLJrOuSfOVlgDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Y12chbOi; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Y12chbOi" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef7fbd99a6so4773801a91.1 for ; Fri, 10 Jan 2025 20:26:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569576; x=1737174376; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ADhTMbqXYt1v8ykOXAPhOs0iBeUY/Z6oTrjn3gyIBaM=; b=Y12chbOiA1VIxpMGnUy7tCn4v2qDoW02HJLy/htanbbEKm0a6bmsMo7wc7s4tVRX8T tnA3mjfu/pAL3VnivK2yljeXVTXYB3b/ot3Tm6W1vQCm5obIoDY8YvQwTXraaLIrYpBp Gvo7KfsTgphUkb91cK0ejUGm5riH41pBCcJLrLOMBEpjaJI0OPGh/T596j96yHVwznZr nYmF8TZvRO4UpSeYSuFfN4SPyNR7WBShHGvqlpByXEBhoHXxxm+sS42mSUkURhkLpD8j Q6NKuStaiPwqx5IAYVXT50FVVG0vSehKdoL6NIs0781uAU77f2TByyhP7VdublYKT7Gg nA7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569576; x=1737174376; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ADhTMbqXYt1v8ykOXAPhOs0iBeUY/Z6oTrjn3gyIBaM=; b=rfFrybRlCMViaNHlkMHEFlTnLXesmI/YgEGx6SOV0srz+DY7hqc3YcTASvWeq0eU0H h41qzUMjaWFaSFlstk9uvV8NuEMwMeSASsxj1OINbBLRH954pOcP7F+Fpie7EspBICNd AeObnqcemgm7XI2ibdusWzMZw/g00GQn5hVyfdSk+cjE6eqnD/vRnBWMp1QrUSxqLcx+ 2xrQQBLjkBKRQHN9t9uxNcwBn36injLIr3XT7/Hpj2r+lAaqzwKyHvIniodP1VjQ3h4A cNLHS90M35mtgZOvswmvBU7Mb/V0kWbScPRMXI/+lpxCWy6UsJI8TFFKbrarP6MI4iuc iXLA== X-Forwarded-Encrypted: i=1; AJvYcCUfAW5lroUwnR9HbAO3qxYPMmtky4+hZTGSBTtbTsJl8JWFSFFgU7337JXCyHW1HIXxIv3Oxou+9J7bcP8=@vger.kernel.org X-Gm-Message-State: AOJu0YxUKbPDpsHMLLho0zS5btaW1p8KmaFKgZgI8Vg3NNVelLk+2CQu KUkVc0xRLabgGdukrqq5S/2SodoncxtD9YGkG/x4mPIu1U/WRBsIhzyJpPjKjV1uNYR/7UWEukA 93g== X-Google-Smtp-Source: AGHT+IFORDPp3F2kDYS1t6EeTCyG81wOk/69Q07YfVbBJQQfUoW8CkvQ1ejcetFlgpXNkE764lNprKA+Q/U= X-Received: from pfbfd28.prod.google.com ([2002:a05:6a00:2e9c:b0:727:3b66:ace]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3287:b0:725:456e:76e with SMTP id d2e1a72fcca58-72d21fb1d3fmr18712375b3a.6.1736569576461; Fri, 10 Jan 2025 20:26:16 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:51 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-5-surenb@google.com> Subject: [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_iter_store() functions can be used both when adding a new vma and when updating an existing one. However for existing ones we do not need to mark them attached as they are already marked that way. Introduce vma_iter_store_attached() to be used with already attached vmas. Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Tested-by: Shivank Garg --- include/linux/mm.h | 12 ++++++++++++ mm/vma.c | 8 ++++---- mm/vma.h | 11 +++++++++-- 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2b322871da87..2f805f1a0176 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -821,6 +821,16 @@ static inline void vma_assert_locked(struct vm_area_st= ruct *vma) vma_assert_write_locked(vma); } =20 +static inline void vma_assert_attached(struct vm_area_struct *vma) +{ + VM_BUG_ON_VMA(vma->detached, vma); +} + +static inline void vma_assert_detached(struct vm_area_struct *vma) +{ + VM_BUG_ON_VMA(!vma->detached, vma); +} + static inline void vma_mark_attached(struct vm_area_struct *vma) { vma->detached =3D false; @@ -866,6 +876,8 @@ static inline void vma_end_read(struct vm_area_struct *= vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} static inline void vma_assert_write_locked(struct vm_area_struct *vma) { mmap_assert_write_locked(vma->vm_mm); } +static inline void vma_assert_attached(struct vm_area_struct *vma) {} +static inline void vma_assert_detached(struct vm_area_struct *vma) {} static inline void vma_mark_attached(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma) {} =20 diff --git a/mm/vma.c b/mm/vma.c index d603494e69d7..b9cf552e120c 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -660,14 +660,14 @@ static int commit_merge(struct vma_merge_struct *vmg, vma_set_range(vmg->vma, vmg->start, vmg->end, vmg->pgoff); =20 if (expanded) - vma_iter_store(vmg->vmi, vmg->vma); + vma_iter_store_attached(vmg->vmi, vmg->vma); =20 if (adj_start) { adjust->vm_start +=3D adj_start; adjust->vm_pgoff +=3D PHYS_PFN(adj_start); if (adj_start < 0) { WARN_ON(expanded); - vma_iter_store(vmg->vmi, adjust); + vma_iter_store_attached(vmg->vmi, adjust); } } =20 @@ -2845,7 +2845,7 @@ int expand_upwards(struct vm_area_struct *vma, unsign= ed long address) anon_vma_interval_tree_pre_update_vma(vma); vma->vm_end =3D address; /* Overwrite old entry in mtree. */ - vma_iter_store(&vmi, vma); + vma_iter_store_attached(&vmi, vma); anon_vma_interval_tree_post_update_vma(vma); =20 perf_event_mmap(vma); @@ -2925,7 +2925,7 @@ int expand_downwards(struct vm_area_struct *vma, unsi= gned long address) vma->vm_start =3D address; vma->vm_pgoff -=3D grow; /* Overwrite old entry in mtree. */ - vma_iter_store(&vmi, vma); + vma_iter_store_attached(&vmi, vma); anon_vma_interval_tree_post_update_vma(vma); =20 perf_event_mmap(vma); diff --git a/mm/vma.h b/mm/vma.h index 2a2668de8d2c..63dd38d5230c 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -365,9 +365,10 @@ static inline struct vm_area_struct *vma_iter_load(str= uct vma_iterator *vmi) } =20 /* Store a VMA with preallocated memory */ -static inline void vma_iter_store(struct vma_iterator *vmi, - struct vm_area_struct *vma) +static inline void vma_iter_store_attached(struct vma_iterator *vmi, + struct vm_area_struct *vma) { + vma_assert_attached(vma); =20 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE) if (MAS_WARN_ON(&vmi->mas, vmi->mas.status !=3D ma_start && @@ -390,7 +391,13 @@ static inline void vma_iter_store(struct vma_iterator = *vmi, =20 __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); mas_store_prealloc(&vmi->mas, vma); +} + +static inline void vma_iter_store(struct vma_iterator *vmi, + struct vm_area_struct *vma) +{ vma_mark_attached(vma); + vma_iter_store_attached(vmi, vma); } =20 static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53A2E1862BB for ; Sat, 11 Jan 2025 04:26:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569582; cv=none; b=InZOcQ4ymLldLY5SIDmbTEjMrLkzSShyj210qJhotB2GsrSLj3+lbb+eeDI8LEU07EIWMR6o4AFg/PqZ9W76WIQBlhDgtT/e8uzWGdetD6pDaYdQ2CvEDY9bvVkOc6eEuVCVCHFPBTBZ1W6Kk4NyuVLcughQfbwm8wG5ngbZIF8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569582; c=relaxed/simple; bh=t9LvCLfNsf7gW2HQp/USAapAN+m412qBOrN7Y8/AQ98=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VhMiG9bmeSLl5ZKyHUzHG8aEqP8M5XysffyqaeoB972QtMOXlUNmrx5mtGMSC5bPXAgYDMQAHrVWB95DyQ4yO3m4NoUXoFCCKoqjDMKAstYGZlWFyYvZ6Gq99TaonyEpodoakcGskS/nyArPV09xwRf/TdxNFMTuMIhfU92XNtQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zwTnqW7+; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zwTnqW7+" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-21648c8601cso44064675ad.2 for ; Fri, 10 Jan 2025 20:26:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569579; x=1737174379; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QrBwNMjEBhpbcxWqFk88nlV6Pfhf+HDin7YMebARszw=; b=zwTnqW7+HA3u2LM/BObMsciWle6eglDXagil3RZQxbhhRKtBfp/WLxJzFuNQEnjzss iraCsASvU2O7kFJ7BvXkPOK6JLPnHQ7LynWM6QiVn64Yln5wYqnVD0IuVM8rW9+Qcp3N TjDmDCllumgGMArgwjC1AdiaG2hAO+nlDoXpe1jomJTlii15NLkynLg7AcadDu+4hL6u kT3umWKjYr+rnWBjTN+ABdAvZkAgGIMZC4+EXp8fBCm2pMrK1tX5JHeFAG4BhWE5puNI eYqd/rtxzWyKqKllsFJdN5SJk/VJkA+S0t/Z49YQu/rSnIBuQ9uIQIczq0Cw831BDoet a7xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569579; x=1737174379; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QrBwNMjEBhpbcxWqFk88nlV6Pfhf+HDin7YMebARszw=; b=I4v8l2oKVTZ272v+xXMEh1OWs1SX9YQiVpUzRsyHQdgqyA1S4RCLhtTOYw6UBAWAtL Oo8fevYi0GjOquHVX9l6Az2cGpxNK34ouGlWmEELpX9c3r3+kHsfCiCEETr5b0FkdGcM HDKKiRvDjbtrdN24n4k1MJMrYfkGhBwMIBCcDOoeuOZiiib53u1JC3wCmS6iTNsjh28d ufnARdFIaPD3jGWc/P5WQxZe3GK7A+22MxEwophsS+G+BdPFirjjO4lnucGjg5KQK299 A9E5dZL3P0otEyiEqdfKkuiyfIKJa2zhh+1B/8JKTD9q5undPIkAu/W6hO4idlzkcAkR z2pw== X-Forwarded-Encrypted: i=1; AJvYcCWhNmplAn5Nbmd9S3xtmVzr4PkStBix9EbKNla04ymFv9BF/zDARDWepsGAH728AHyYSDx8BPV8NaheDSM=@vger.kernel.org X-Gm-Message-State: AOJu0YxzvaD4NJBfZ9dWlLwncKJUxh3k0QZEX5cmW79ZqXJfKO4oVstN w7Aje+cvqQ++i98Tp9mHjh/CL2fcK10Bp/zXsgB2nwqtAOCkh3vQFRgl1pFiS8eTrS2/llXEb0Z Iiw== X-Google-Smtp-Source: AGHT+IHnRj/SaQAEPl0ebQpZxgzIjgJT++Eb5qi3sriuXSku4L0jGeJRI8Hfnaea64zScjmdjPJijWAPl6E= X-Received: from plsd17.prod.google.com ([2002:a17:902:b711:b0:211:f320:a598]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f64f:b0:215:5240:bb3d with SMTP id d9443c01a7336-21a83fe4915mr197586285ad.42.1736569578687; Fri, 10 Jan 2025 20:26:18 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:52 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-6-surenb@google.com> Subject: [PATCH v9 05/17] mm: mark vmas detached upon exit From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When exit_mmap() removes vmas belonging to an exiting task, it does not mark them as detached since they can't be reached by other tasks and they will be freed shortly. Once we introduce vma reuse, all vmas will have to be in detached state before they are freed to ensure vma when reused is in a consistent state. Add missing vma_mark_detached() before freeing the vma. Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Reviewed-by: Lorenzo Stoakes Tested-by: Shivank Garg --- mm/vma.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/vma.c b/mm/vma.c index b9cf552e120c..93ff42ac2002 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -413,10 +413,12 @@ void remove_vma(struct vm_area_struct *vma, bool unre= achable) if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - if (unreachable) + if (unreachable) { + vma_mark_detached(vma); __vm_area_free(vma); - else + } else { vm_area_free(vma); + } } =20 /* --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 902C01552FD for ; Sat, 11 Jan 2025 04:26:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569583; cv=none; b=Tq8mI/Yw84X3JJzVOWktGF/zOjeqet3R+xZLZd8oJv315B6zRGQPmEgKXcwDUXiMOvhIlkmnWNHJVGCxIYs6uYF+Syf1BDA/rB8AmFnvqQF2ezUyzeDGZD5ycZS+98UZVb+JcUpDtg3doh7LfRH0WhGE6oI4bhQAVBIgqglxUUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569583; c=relaxed/simple; bh=G5eaEd7sNymn18hH96NTtB8QopbMIDFHamvLccfcHqs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WsiSIyBaeHdMBdbyyfSp26e17AAJUJqAkMeO05aP35P4ch6ZbBHpUFg3915Ef7dboGsN4W/8rV0OrX2zmF7MLhl14JByMVuwlTkDFkCj348ASo5aOtTKJgQ1yODVSBZxMXWjYGDxHSqP24E6OAMCBcbmumIQmEFWnSsggMwJkI4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=A4L5Gcxa; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="A4L5Gcxa" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2162259a5dcso75225925ad.3 for ; Fri, 10 Jan 2025 20:26:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569581; x=1737174381; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cA6SN2b+VepEpqi6xeamEUXAYwuTlZjCipcGUs+0HQY=; b=A4L5Gcxa4TxBMuyXMVM/gJdxU/Dz7Tmvd3dCwbkN6KNqwizU9xQXppOKnE0CoBMhTu U1XUq1lctoQEpzqZ/Tqwq/8nZE/skG4hIi/tuWJWoeQ+78NzaKTDE991MN3W1zDKX3iJ qU6yh/FGQvwRAr28DSFHZwgWKm+M5jdm/j93ry+PLStTyJJnZZ8UL4MO+QrKTNw4uu9B Rqd5suB2ToEb9sguaGmfXpkQM5z1qha5SD1Xsx+EG/1AygW9tNQFd4uue6KprhSQk1Gb F0HP9fztQK7uP0rK7nAikS/vIE0pa+3ipsmZb3H8cQQF6z04uAGBpMJqz3usWNBPW43I NFBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569581; x=1737174381; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cA6SN2b+VepEpqi6xeamEUXAYwuTlZjCipcGUs+0HQY=; b=Y6mn78QHWGpvNz33ZTQIJ0qjpTvtiNq4j9lVx0SUXl65sUcFSjQv/i3lZ4iAXAd276 +SIPlfz8SgWPwfsDH6GDIvUkp1aHDYJYDtYD+Z7K/a8MggacXTmiQ6mQev1getQPUsYm GsVRPpO71m9oj2DGbFLfNx2YmpnI65SFKPXC+EbN68ASdwx5n8kBocjFAmOZSOMxAsmy 3q5SF7y5RsAi6hpX9/Re3c/9Hc6QutO840eK372nbq2TB+4dq99Hvo0BQkjxI033OlEH IwgTakiZ17Z8xJv0kULQMQQ4ZbXD8Z/rWxDGcj5priDWHfJWTt4QKvCE6OPtowqipPVA BFOA== X-Forwarded-Encrypted: i=1; AJvYcCUjHvZMlq2KQxxIe8e4zlbF851sMV3d1/eNmfUiG2e1BAlWnx3qoxapJrCd9GIAFzHAcm/HyGB6/DbplOc=@vger.kernel.org X-Gm-Message-State: AOJu0Yx4sgCFNkhTlWin8/O4LCIpCAOAyxdBK+Vs6QWnhYPglqjfglSv WXnyIEMgSBtz4j5+m3gCYvUJQonmNqIxskieJuVzctj7YFQgUJ4ZdiFxaYF1/5Y2xd9qiId+Eab z0g== X-Google-Smtp-Source: AGHT+IE6ZciFE4COCcZoptuRqP9Xdk4NScDBFayFFXNBfZ/aDS6ipS+1cA/rJiTwDEgWxrdwV/9h4u1KDQw= X-Received: from pfbko22.prod.google.com ([2002:a05:6a00:4616:b0:72a:a111:742a]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:d043:b0:1e0:d4f4:5b39 with SMTP id adf61e73a8af0-1e88d1d5dfcmr24841946637.24.1736569580885; Fri, 10 Jan 2025 20:26:20 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:53 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-7-surenb@google.com> Subject: [PATCH v9 06/17] types: move struct rcuwait into types.h From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, "Liam R. Howlett" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move rcuwait struct definition into types.h so that rcuwait can be used without including rcuwait.h which includes other headers. Without this change mm_types.h can't use rcuwait due to a the following circular dependency: mm_types.h -> rcuwait.h -> signal.h -> mm_types.h Suggested-by: Matthew Wilcox Signed-off-by: Suren Baghdasaryan Acked-by: Davidlohr Bueso Acked-by: Liam R. Howlett Acked-by: Lorenzo Stoakes Tested-by: Shivank Garg --- include/linux/rcuwait.h | 13 +------------ include/linux/types.h | 12 ++++++++++++ 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h index 27343424225c..9ad134a04b41 100644 --- a/include/linux/rcuwait.h +++ b/include/linux/rcuwait.h @@ -4,18 +4,7 @@ =20 #include #include - -/* - * rcuwait provides a way of blocking and waking up a single - * task in an rcu-safe manner. - * - * The only time @task is non-nil is when a user is blocked (or - * checking if it needs to) on a condition, and reset as soon as we - * know that the condition has succeeded and are awoken. - */ -struct rcuwait { - struct task_struct __rcu *task; -}; +#include =20 #define __RCUWAIT_INITIALIZER(name) \ { .task =3D NULL, } diff --git a/include/linux/types.h b/include/linux/types.h index 2d7b9ae8714c..f1356a9a5730 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -248,5 +248,17 @@ typedef void (*swap_func_t)(void *a, void *b, int size= ); typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv= ); typedef int (*cmp_func_t)(const void *a, const void *b); =20 +/* + * rcuwait provides a way of blocking and waking up a single + * task in an rcu-safe manner. + * + * The only time @task is non-nil is when a user is blocked (or + * checking if it needs to) on a condition, and reset as soon as we + * know that the condition has succeeded and are awoken. + */ +struct rcuwait { + struct task_struct __rcu *task; +}; + #endif /* __ASSEMBLY__ */ #endif /* _LINUX_TYPES_H */ --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AE54188012 for ; Sat, 11 Jan 2025 04:26:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569584; cv=none; b=CBadKRUIQdIUxipum9CB+QjbLM2Yx8w5F/GIn5aD2LOnspJlroGXYGvJhwGV1CRPvKxNyNC2TxbJ6HhIcnY+N8V2ogZ8VfvBtQVXYw1cZpqTbB2GD2vOCh+GRVKZe/5Ujo6fiNydzYcYH1kT8HcgpTfnumpxyDTLQ0Xrnkt3+7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569584; c=relaxed/simple; bh=RBaZ74So51j4aqXC7AuC9vLEqhQDweY9kqrtQ5fJJDc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jUcA8ZN919HYWVHR6EwzXBnzaWgsWPjsapVLyXARTj9wuqZCKSrfLDc8qZk5EYAtAdIgut7h/oVJEi2peQx8s+tFu+toj0imFrG1sZ9rJ6n+kFtnWWzCvMm69ZsXzqaCukUlR3Pq58fEHyTnkLV+Q+E6nUncaI4q6JnT5t62E2w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wAypSoED; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wAypSoED" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef91d5c863so4898280a91.2 for ; Fri, 10 Jan 2025 20:26:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569583; x=1737174383; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7fOHsmOPwrKtQ1EzbWANyGDHIimQCM8l7vZM7q2M3lw=; b=wAypSoEDpEaoDCUnIRT8RsQ65BeO5tVZBgYysFtNh62/V6ObPFscpK26rsnX7XLiGh lG8AnDPzMe4GIUGY7YAIsa59L9fBzWXyhw0g3kjuI4+al8WLpd22u9XZrX8TSeAiZQFy gCubBlSUUcdg0DZMj0J5S18kz81ViUfpJeGRYmLg550MKIjglsG9zX9DyREXc/7+ejpW BdZTRx/zlLt3RuhM1yUcrFew+bM5SkgnrX9UzwJ+/OaLN9th53Khb5e5wwqNclfUKG2/ WHgB/HAACd/uyChATLRpy5NPtw4CucHA4LoUJ61OplGeCSVWFREBgM4nxnZzJqCnJ3V+ 8qVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569583; x=1737174383; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7fOHsmOPwrKtQ1EzbWANyGDHIimQCM8l7vZM7q2M3lw=; b=vTihw1OPtOG9n91BmJvHTM8MObgev2xrUfH1LsB2paLuObYNPva5dgQGjKg68uMwRs /1ezLC+V/2aDNJ6InAOACMDWORfYBItTvRmUGTUBr3t7CKB92uV5ae7L4f4AWZ0s9ET6 zTQdwPa5GgkDCV5gWhl9S7OtbbWe0H+iHOg2t23aSpOk3R7Wt1TQBn5Hp5lpDWZ16E/y IaEioAalru/v4VmQn0qT63I65alqI/Y1VzVQ30Jz/X+Snv392JRkUD9fThAC710zKOlt q4dGJAguc889Bmp1bRu9URavATlA/s86b4HyUJ6VcItinzIhdkadPdESO7spCWw/rKjk YukQ== X-Forwarded-Encrypted: i=1; AJvYcCWb/XYeTDBxlMP6Qoya6GNCQ0W3q/MVZmqFSkzRhyE6FaETzYFSnDljpEc/yjbEdSfbl11s4NM6CxpJRMs=@vger.kernel.org X-Gm-Message-State: AOJu0YyKRKj273azfQSIRqmcdOR8iMi5uN2jBYikuJ9+7x/UIhJfq+wZ n4dL7NydLQ1uoPHAr1d2qaaBNnVbiTH2jnjOCl6cFSi9GXBcDNMZtzn/ITzQvPRe2bhXYGPz4xu q4w== X-Google-Smtp-Source: AGHT+IEri6oWvbwaR9NKB8OaH7pKRVw+NxMfmWZU4bRXuSK8tL89q2iubRxHeJuXJzsmqdnV3Y+AwYimngo= X-Received: from pjbsl14.prod.google.com ([2002:a17:90b:2e0e:b0:2ef:8f54:4254]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:da88:b0:2ee:5111:a54b with SMTP id 98e67ed59e1d1-2f548f424c7mr16967174a91.31.1736569582930; Fri, 10 Jan 2025 20:26:22 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:54 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-8-surenb@google.com> Subject: [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With upcoming replacement of vm_lock with vm_refcnt, we need to handle a possibility of vma_start_read_locked/vma_start_read_locked_nested failing due to refcount overflow. Prepare for such possibility by changing these APIs and adjusting their users. Signed-off-by: Suren Baghdasaryan Acked-by: Vlastimil Babka Cc: Lokesh Gidra Tested-by: Shivank Garg --- include/linux/mm.h | 6 ++++-- mm/userfaultfd.c | 18 +++++++++++++----- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2f805f1a0176..cbb4e3dbbaed 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -747,10 +747,11 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) * not be used in such cases because it might fail due to mm_lock_seq over= flow. * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. */ -static inline void vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) +static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) { mmap_assert_locked(vma->vm_mm); down_read_nested(&vma->vm_lock.lock, subclass); + return true; } =20 /* @@ -759,10 +760,11 @@ static inline void vma_start_read_locked_nested(struc= t vm_area_struct *vma, int * not be used in such cases because it might fail due to mm_lock_seq over= flow. * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. */ -static inline void vma_start_read_locked(struct vm_area_struct *vma) +static inline bool vma_start_read_locked(struct vm_area_struct *vma) { mmap_assert_locked(vma->vm_mm); down_read(&vma->vm_lock.lock); + return true; } =20 static inline void vma_end_read(struct vm_area_struct *vma) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 4527c385935b..411a663932c4 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -85,7 +85,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_str= uct *mm, mmap_read_lock(mm); vma =3D find_vma_and_prepare_anon(mm, address); if (!IS_ERR(vma)) - vma_start_read_locked(vma); + if (!vma_start_read_locked(vma)) + vma =3D ERR_PTR(-EAGAIN); =20 mmap_read_unlock(mm); return vma; @@ -1483,10 +1484,17 @@ static int uffd_move_lock(struct mm_struct *mm, mmap_read_lock(mm); err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap); if (!err) { - vma_start_read_locked(*dst_vmap); - if (*dst_vmap !=3D *src_vmap) - vma_start_read_locked_nested(*src_vmap, - SINGLE_DEPTH_NESTING); + if (vma_start_read_locked(*dst_vmap)) { + if (*dst_vmap !=3D *src_vmap) { + if (!vma_start_read_locked_nested(*src_vmap, + SINGLE_DEPTH_NESTING)) { + vma_end_read(*dst_vmap); + err =3D -EAGAIN; + } + } + } else { + err =3D -EAGAIN; + } } mmap_read_unlock(mm); return err; --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D53E218E054 for ; Sat, 11 Jan 2025 04:26:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569588; cv=none; b=FT99eA3RENS+5mkwiAe3rWgC/TIEjeGplIFAGpffBXxDb6GJ8RydlKxM797bwvYV1y6gZGb4wQKe/Edo7BNbJN36Gr+K63G2lUMg2gBSl4uIo+H9w9hUFVzPSJkDS7gSuP8ygb3So5iOnywmnl2iwdN/dos6nC3RGImlnHCPN7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569588; c=relaxed/simple; bh=ZCtusRX0CFLoEEKgeZ6+wWBSV3rTZgAWPuc+3VLvu1s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ISbV/fYcHYxF5TNP2hjUL/2gp9O8DWFLde9ecqNoYZcRF2SwQW9REE0d2cWjTg+ePPJqNRYBTBMk7TsLumZTzCnao2WePaWgJB6++bWsK6IKEdZbyXe39rHXSmxzQ9N12c1HQfNPODJYWdSYCSQxxapnnz40YvAk3KYXyU8uwGQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MWApLQIr; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MWApLQIr" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2efc3292021so6907744a91.1 for ; Fri, 10 Jan 2025 20:26:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569585; x=1737174385; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mjOxH5E6F+PRJReyBLOWVSfVgJNT1AsQeggvw/y9xao=; b=MWApLQIrbG0VjuKOJBvvSxX44T9C+gxwNiuMvktmFRMt84447XmyAKorz0dCvEyxyw R9vk/0CweVDuS9+f3x14RuidKLGMMNKFU7o+4WX0IiskqxLgdt9HxYhjQO0lhcBHI2YJ VtLraXDfHuEV4FkLAEkQroMIMaTqhnxje/a4wAMtPnr6uSQwhCJ1j1zJa8fKecGDaviQ mse0X43t+kWbIQ7IL82pNljNE3iflQciEYZh9QHu7upGa+PR8lpzmHZpclAPGGxFMc8T sfixWu28pAwjRYQfwZIeKPWwpT6azFost0ujLO/yJN69D13P7vgintMf4LtmJ45S94Et gJNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569585; x=1737174385; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mjOxH5E6F+PRJReyBLOWVSfVgJNT1AsQeggvw/y9xao=; b=PgeAsFDGhTozNT6eaRkDLDv2pdrzBopA+bK9FHNkwNxehXvkDwWYyOGJ7Qhs3Ps3MV v8xbZcNscx2kF5kMvW07fHc6wABZ5/DqCTrUUQyBzkiwwWG8EVnE+3bx0ROy//IFFeKR ZR6VEJI4ieNzNCC6El0zMqTWgjJLwlkKS0YH0QvGZ7dLdtlv8RNT111nVgSuBBHqUPJQ OIK1iKNiq0JUAbz/dAGRddfbfFBnx4utFjKdf4Rc7L2eaC86EvprxG7JCHWt6awOD2sH XhDw7lwx75b6P7AhWqBSmNDXvMUAYHe5WvA5w5ze4uR0ql35RjTfUlp0djLitHAH5e/r DI1g== X-Forwarded-Encrypted: i=1; AJvYcCXkGxYTN9v9efACzPpCPV1AWIcHQ8koUPIy4gd4RolSdjoPKAk7EdqFDVCly2n2BiJ5ivusaMiYuQy5Sis=@vger.kernel.org X-Gm-Message-State: AOJu0YwDochaK61iZoqqYtAGP6ru8plAVKQ0hHG4YUmkk/CQQJgJpzra A7zdKZb74zoDNLqTMigAUI9hUoc+40504BeyvEErJpolTvZHvZXXmix4LSUiypmbLBFLskzV85k eCA== X-Google-Smtp-Source: AGHT+IE0/9yJJ25TvfX89iO9TEsxGGT3C6giF06ZQXBi5U4FgLqynwZmNkiGXRge1CwIKkM+8jgi9u5PJaA= X-Received: from pfus6.prod.google.com ([2002:a05:6a00:8c6:b0:728:e1a0:2e73]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:7e02:b0:1e8:a374:cee6 with SMTP id adf61e73a8af0-1e8a374d574mr11673994637.6.1736569585095; Fri, 10 Jan 2025 20:26:25 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:55 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-9-surenb@google.com> Subject: [PATCH v9 08/17] mm: move mmap_init_lock() out of the header file From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mmap_init_lock() is used only from mm_init() in fork.c, therefore it does not have to reside in the header file. This move lets us avoid including additional headers in mmap_lock.h later, when mmap_init_lock() needs to initialize rcuwait object. Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Reviewed-by: Lorenzo Stoakes Tested-by: Shivank Garg --- include/linux/mmap_lock.h | 6 ------ kernel/fork.c | 6 ++++++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 45a21faa3ff6..4706c6769902 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -122,12 +122,6 @@ static inline bool mmap_lock_speculate_retry(struct mm= _struct *mm, unsigned int =20 #endif /* CONFIG_PER_VMA_LOCK */ =20 -static inline void mmap_init_lock(struct mm_struct *mm) -{ - init_rwsem(&mm->mmap_lock); - mm_lock_seqcount_init(mm); -} - static inline void mmap_write_lock(struct mm_struct *mm) { __mmap_lock_trace_start_locking(mm, true); diff --git a/kernel/fork.c b/kernel/fork.c index f2f9e7b427ad..d4c75428ccaf 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1219,6 +1219,12 @@ static void mm_init_uprobes_state(struct mm_struct *= mm) #endif } =20 +static inline void mmap_init_lock(struct mm_struct *mm) +{ + init_rwsem(&mm->mmap_lock); + mm_lock_seqcount_init(mm); +} + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, struct user_namespace *user_ns) { --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A9CB156960 for ; Sat, 11 Jan 2025 04:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569589; cv=none; b=qpNdSaWBfvFHjioVUiPM426B0Y9Dxu3+RNSzk7Ep/1IJ0lJsdAYBUKbCJu49msPMWlD5wYjHvxg80uQo+pGN7OQAbxgG9lm56HRy4uuqgBjZctozlZGvHekSJyGC/I7uqH+bJcnc6AjgxQsLwNlJczKvyECAQainQNcX5PrFMFQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569589; c=relaxed/simple; bh=eQsx0wmiGJe2BPbUDGY56ZA9e41HJdyZq2H2i9aUEHc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IvTdmPj+edWxC9lMSP7TiD1dcl4/pHLHWYjdhJnrcHTzTPoZurxZISEl5KTNf6hBWQomreVSMgRMuLp7lOa5x62Le5brpfZigfmInICKog3SaWgDnYnOl/ClVTFjop+W+fQZxi9le30AqjpVLM/zb3O46PJQam3KkjVtQdSIiEM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0fkdha4o; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0fkdha4o" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-216405eea1fso50027545ad.0 for ; Fri, 10 Jan 2025 20:26:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569587; x=1737174387; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=STfGBHDxpr+nljSTXZBRe7awpLRa2H0ex1QXNY89OAU=; b=0fkdha4ooLEZdL2cx+3mi+R+ImG0GzjT4b6hoC6qR/Y8loJFg8Ajqji7QGUSoinbGz DJPobaNCjUSd644wsH4bROv204ucTpO8nI9vfUPefjCGV8Yejj589JQE9L7iHqT/0kVy 6aIdBuvtmPsY35baYHAGWiIG2nmZzCvp29kQhA4Azz/ywR9gRNiXKlVs+zIrdJy/AXSt H3GONMEMlnGstwPnJGdIdJZ7u57a0515Ax6VqhABWZ84QQVKQiMYjmDsM9drGnPRbfwF a2tX4tarp4cbznUio2QW63FEbUlqt6qQqYi56QfJMJLXaT+fqPXM6JE2z+zVRqz1rLUr RuMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569587; x=1737174387; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=STfGBHDxpr+nljSTXZBRe7awpLRa2H0ex1QXNY89OAU=; b=I1jVXmNClVYyIzKSvqgBYVypN2qh/eVYnspSLR03Ez+W1VBTwt+9cnFlYQ3wVhqk6i MctQ+aBnnIvhhhjhXEmY5I2+9KUWtzqzPqKZZr0R9OLTMCe9HhwCph6Yb2qnlKymd/LC iKtAOpfFB6NOPTxjLzsQTNr9rLttPMm0I/LJAF2izP5nh0zonVabHpZ21VEoqk22D5ci lH/SdC76nIb4dshmBKvngrhyZzgNwfAGAAr7M2oVYT72bL92km9poOTzPbCxpgq/ZXpQ I5lYr5xveQgO7ICie9ABnrkG1TAN3IbYiwlcalA4bntRGwKavGkMK2MQpY0x11RFKjQ3 B23g== X-Forwarded-Encrypted: i=1; AJvYcCV5xV6cJoWPYbq2MzbA18ZMMwVOWQPaw9TQkz1iLLyzkC3jtXctaGTLsJm3mMfH4rQK/N19cLZdNShSB+I=@vger.kernel.org X-Gm-Message-State: AOJu0YyPGl7YSpsaEli+ftgMQ6poVm6DWEkHXlDVWY/NUzvqM2djq0Kj 5/9OpzHlQl0DtNHGlfqPXHj5sZfKlT0yO1hZPvAR4kdDwAcPC4COT4j7Y6tb7vtx5WQzNmA/hSj Wcg== X-Google-Smtp-Source: AGHT+IHvoBVJY80+e+qdsj3aPRMB2UfI+za8DGuxRrXd9nOwFJGkON2pj+F8zNabSAnJMgvKZtfO1mpbjyw= X-Received: from pfbkp5.prod.google.com ([2002:a05:6a00:4645:b0:725:e6a0:55ea]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1489:b0:728:eb32:356c with SMTP id d2e1a72fcca58-72d21f459dbmr17649276b3a.11.1736569587302; Fri, 10 Jan 2025 20:26:27 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:56 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-10-surenb@google.com> Subject: [PATCH v9 09/17] mm: uninline the main body of vma_start_write() From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_start_write() is used in many places and will grow in size very soon. It is not used in performance critical paths and uninlining it should limit the future code size growth. No functional changes. Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Reviewed-by: Lorenzo Stoakes Tested-by: Shivank Garg --- include/linux/mm.h | 12 +++--------- mm/memory.c | 14 ++++++++++++++ 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cbb4e3dbbaed..3432756d95e6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -787,6 +787,8 @@ static bool __is_vma_write_locked(struct vm_area_struct= *vma, unsigned int *mm_l return (vma->vm_lock_seq =3D=3D *mm_lock_seq); } =20 +void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q); + /* * Begin writing to a VMA. * Exclude concurrent readers under the per-VMA lock until the currently @@ -799,15 +801,7 @@ static inline void vma_start_write(struct vm_area_stru= ct *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; =20 - down_write(&vma->vm_lock.lock); - /* - * We should use WRITE_ONCE() here because we can have concurrent reads - * from the early lockless pessimistic check in vma_start_read(). - * We don't really care about the correctness of that early check, but - * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. - */ - WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock.lock); + __vma_start_write(vma, mm_lock_seq); } =20 static inline void vma_assert_write_locked(struct vm_area_struct *vma) diff --git a/mm/memory.c b/mm/memory.c index d0dee2282325..236fdecd44d6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6328,6 +6328,20 @@ struct vm_area_struct *lock_mm_and_find_vma(struct m= m_struct *mm, #endif =20 #ifdef CONFIG_PER_VMA_LOCK +void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q) +{ + down_write(&vma->vm_lock.lock); + /* + * We should use WRITE_ONCE() here because we can have concurrent reads + * from the early lockless pessimistic check in vma_start_read(). + * We don't really care about the correctness of that early check, but + * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. + */ + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); + up_write(&vma->vm_lock.lock); +} +EXPORT_SYMBOL_GPL(__vma_start_write); + /* * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed = to be * stable and not isolated. If the VMA is not found or is being modified t= he --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3325318FDBE for ; Sat, 11 Jan 2025 04:26:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569591; cv=none; b=O2CmrjOSQhgYsi9NCH+MwxilTr/q52Drt5Lf4w7chCMA4shUsEgbqoTYLd+DYrAkvSbYEa+JR8h7ysTMmUe0QS7kE7Uy2hIcZC1cBYxt3YVNfPdV0VvWdlJtAdp1KXstfWCZAxX4X5W6jaD2y/4Tu8I7rFOSTUHGaunP4nNYL9U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569591; c=relaxed/simple; bh=ClFnDYClt/1mzZUhQ7YHEE3ht4KWxc0BIqAtptsG1rc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lXjnri18h2LsGJ7I4aOHk5Q2TOEc35ikX3JQz62Mp91fCEbM4UmfBNdUC754pV16uRuY3gb70mLxr8ZafjBAhM+y1BaVBD8SbrWmtChA/YezQ5pnNWD97okCRBKRN/g9/zWBQImGGGaWSNbTL/waWJugqzKQ8I/v+3+eqMPKb8c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AdvXjKeM; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AdvXjKeM" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef8c7ef51dso4834380a91.1 for ; Fri, 10 Jan 2025 20:26:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569589; x=1737174389; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VwcbS0dFv6CgCNycrk/1CQg6vwwJAzf/LstwUATtKHM=; b=AdvXjKeMe7qWZornGiSIIUqZbtl63WNHwgGzV0QV2XAEKAdfMLJrb1BxW5Q6O4do5V 1+ZqlHWMy7WbpxQZ70vyS1sPQlY9d2RIiOwVeg3xA9sVOX8kyRxTTi0xFwg4uZbaMMVu rKiSNRR3KPSZ51DGozyEY+MZ+9KqKjpvtfxk4VTY9w+76NVGFWmz78DPCbBSbDwfKebb kqQDYNtwrCIPuUL3vTx4saUwNrQokgYA+yEHYh24gPmWBcBkKMLwhHqpF+ZfP6zvEdTZ Uyng78A+UEBzk71ZOQ3FurUQQ/hln8WvY/K44G09uMmpcmo1A4H1CjjcQDe0HEt1uvnW vqJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569589; x=1737174389; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VwcbS0dFv6CgCNycrk/1CQg6vwwJAzf/LstwUATtKHM=; b=bSXyD3QWS3zUoARbjRVEeyoQsbs04d8LzQE7eNcmY7D2WbE0kH8V27s9YCDFHzvDGI pJ2kKytJXTJKi/TrAHDuqPQ6M0RNFDxIeoeTTuwlhvcGH5LFbXd8r8YdCT7/tvAmm3mq ypSHr5t5PDl/9kSt0Zo+tb/eOoynUlJiQb7huHds5M+2OnjYi4O6REld5FYlt0gfMHwK owgdZdG3hkvXe9cnn20+DkGNiQH+91dQsIz8c6gA/GA8VWJwb18wG0Kj/VvLSSCJOKWN jrHDzfrL7vfMtLB60TeRvtxYZDl5V0xo0Ubb+BnU0+eLurkgU6lAbUqFcvKiN/pmq/ET GMiw== X-Forwarded-Encrypted: i=1; AJvYcCXzMSk2H/K0eWrbjbV6gwWajkrkSB3dCl2olbAuLQQV1evBnMKsEOVR9gpDkjCl+PBWvDApS8q0365kLTc=@vger.kernel.org X-Gm-Message-State: AOJu0YyAuYQgIr3+Kq5/LZD/J7Xcnp1RkVRBIiX0Pz9IpQHjfd7/ICFC EEHCy0CBprYe00p5TEOX5MhWrexc3CptvBtEoIwOGIIgR31UF7yY9ktrcMDc26ml5Cv0dy+BeAq Xag== X-Google-Smtp-Source: AGHT+IHRdmdmZQsqrHxmHBpzkjciwsLRGCl3+kAbA70quP18hv457pqxyHwgZacevWuPeHpfl252t4i39B0= X-Received: from pjuj6.prod.google.com ([2002:a17:90a:d006:b0:2ea:4139:e72d]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3b4f:b0:2f2:a974:fc11 with SMTP id 98e67ed59e1d1-2f554603e39mr14016432a91.17.1736569589540; Fri, 10 Jan 2025 20:26:29 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:57 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-11-surenb@google.com> Subject: [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce functions to increase refcount but with a top limit above which they will fail to increase (the limit is inclusive). Setting the limit to INT_MAX indicates no limit. Signed-off-by: Suren Baghdasaryan Tested-by: Shivank Garg --- include/linux/refcount.h | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/include/linux/refcount.h b/include/linux/refcount.h index 35f039ecb272..5072ba99f05e 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -137,13 +137,23 @@ static inline unsigned int refcount_read(const refcou= nt_t *r) } =20 static inline __must_check __signed_wrap -bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp) +bool __refcount_add_not_zero_limited(int i, refcount_t *r, int *oldp, + int limit) { int old =3D refcount_read(r); =20 do { if (!old) break; + + if (statically_true(limit =3D=3D INT_MAX)) + continue; + + if (i > limit - old) { + if (oldp) + *oldp =3D old; + return false; + } } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i)); =20 if (oldp) @@ -155,6 +165,12 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int= *oldp) return old; } =20 +static inline __must_check __signed_wrap +bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp) +{ + return __refcount_add_not_zero_limited(i, r, oldp, INT_MAX); +} + /** * refcount_add_not_zero - add a value to a refcount unless it is 0 * @i: the value to add to the refcount @@ -213,6 +229,12 @@ static inline void refcount_add(int i, refcount_t *r) __refcount_add(i, r, NULL); } =20 +static inline __must_check bool __refcount_inc_not_zero_limited(refcount_t= *r, + int *oldp, int limit) +{ + return __refcount_add_not_zero_limited(1, r, oldp, limit); +} + static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int= *oldp) { return __refcount_add_not_zero(1, r, oldp); --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F265019048A for ; Sat, 11 Jan 2025 04:26:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569594; cv=none; b=fnqZuVKXi4+3h70Po/5ujJhS9nPL5D4RbdnSQX36XlR4o0NnvYM4+cm0CzihcuxxqOwguMiBripbTdgBfZYI4D2ImO2VQT9NDYUmqapU3Wbbo5zXcoUrTgMPVAR8r3TjM+QWinSpQTUeRr94Cxy2L2RGjRGquMFLa4RqjQrRXDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569594; c=relaxed/simple; bh=rMIVzHZMifiUqAFuP/0tgu537E1k1pRYMClxsx1zBzw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UcQqbo3SvAZH07tYvD1lD9C4KD9S2CV7nSNDW6zrY9hrluAytxzJov5xKolimFYt052a6j7m2UaIqDrtZ3xl9f4QsSPcs5nvLcJC+9lQeKxda8c+SLW3HBHjaToh9KPQeh9IKkBt/qIAdXd39hKSZ+ZMbLcgr+4zssfYyzzkeyc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UwmWkPhL; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UwmWkPhL" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2163dc0f5dbso51533915ad.2 for ; Fri, 10 Jan 2025 20:26:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569591; x=1737174391; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KoiBLTdTPvVbAK30clnt8JF+GUWPfdD11DMH8Q5DV6o=; b=UwmWkPhLVMdxqGaQOlwo/lg34qjCzkuI5ohRmSD3ohHFiO3h1q0lZ7I3uwVDWJITMk 9+MGlqeRJ8hVCcLHHArxiOfl45upjir0Nzd63d5CGymUqmJQS78torfAS34HkdSX+vzL NuOH/lDdWa5joGSx5yodj0YKje7VhqpF4jSuEVKselG169Q50WhArmC5c8gQ7sdoHydM in9sJq3qe2wRAS/Y5zP/xiEN+BYJtig6WnqOVPAmmU8Z6ns33ZVD4YdERQBkcak8+jcy LnTje2Xz6hPRmc0lTxeRKFOWbNV5zxzwq5slzjyjeZg0FSX0KQHWauW7B5pNDAxpD2Uf ofFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569591; x=1737174391; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KoiBLTdTPvVbAK30clnt8JF+GUWPfdD11DMH8Q5DV6o=; b=Oo6Z829ZoWRtCQxdg4W+DOcoV+3/antGGec6uhpdlHiFYZYnu5qXlBTa+b0SKIwcLL TapnanB1YxSHzJo4IZo+w2CZdKhPjne5ao0ZHYBMLv3lnFQeAZQweuEf9BlxcKy5PBWc NsIvAscK+UYgjzr3QEvpFk80yFBooz0GNoXQbRYUH4sSfl0BaZktY4kKZEAjE2+Qi/aG /86vYdQNJ9n9o1oTG35xkk9iPXMZcwntPy1aDlvbZgG3etI+tFGu/PuY/7VTaVh40Pfz mKwLtUtgLvHa/cNLfbWmQSPZrivwWxLm6Mg+NtrgbkxFQERGFEVMQGWkJTsqRn1kP1ZR o6rw== X-Forwarded-Encrypted: i=1; AJvYcCXoRKUAUfOaGzn7Nxh3AxPw3uoB4R/BDLRrQtwdyUjScObrJqrGuBMb4aq0G++IMtv9nOSTlxSBUIYbiQg=@vger.kernel.org X-Gm-Message-State: AOJu0YzLvCXjovvhfgVMajSeIr//I4/EmgOIjoupB2ff1QroUXcJNMyO yyGtYo4usN1PMAcFSlmpyiW/WU71dAU9oqOaaFSjBWvmAR927ObedhoCldkCVNG5GdJI/u+Hhow 6FQ== X-Google-Smtp-Source: AGHT+IEX9OBw5ROoqGyPPqejgdegOwDYM53o3H0c4/BravMRmFKAEJOFN39u0ZLsc5tczaqxqbZoTLVJdhg= X-Received: from pgqb13.prod.google.com ([2002:a65:41cd:0:b0:7fd:558c:c660]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:3a8a:b0:1e6:5323:58cb with SMTP id adf61e73a8af0-1e88cfd2420mr20293451637.23.1736569591481; Fri, 10 Jan 2025 20:26:31 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:58 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-12-surenb@google.com> Subject: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rw_semaphore is a sizable structure of 40 bytes and consumes considerable space for each vm_area_struct. However vma_lock has two important specifics which can be used to replace rw_semaphore with a simpler structure: 1. Readers never wait. They try to take the vma_lock and fall back to mmap_lock if that fails. 2. Only one writer at a time will ever try to write-lock a vma_lock because writers first take mmap_lock in write mode. Because of these requirements, full rw_semaphore functionality is not needed and we can replace rw_semaphore and the vma->detached flag with a refcount (vm_refcnt). When vma is in detached state, vm_refcnt is 0 and only a call to vma_mark_attached() can take it out of this state. Note that unlike before, now we enforce both vma_mark_attached() and vma_mark_detached() to be done only after vma has been write-locked. vma_mark_attached() changes vm_refcnt to 1 to indicate that it has been attached to the vma tree. When a reader takes read lock, it increments vm_refcnt, unless the top usable bit of vm_refcnt (0x40000000) is set, indicating presence of a writer. When writer takes write lock, it sets the top usable bit to indicate its presence. If there are readers, writer will wait using newly introduced mm->vma_writer_wait. Since all writers take mmap_lock in write mode first, there can be only one writer at a time. The last reader to release the lock will signal the writer to wake up. refcount might overflow if there are many competing readers, in which case read-locking will fail. Readers are expected to handle such failures. In summary: 1. all readers increment the vm_refcnt; 2. writer sets top usable (writer) bit of vm_refcnt; 3. readers cannot increment the vm_refcnt if the writer bit is set; 4. in the presence of readers, writer must wait for the vm_refcnt to drop to 1 (ignoring the writer bit), indicating an attached vma with no readers; 5. vm_refcnt overflow is handled by the readers. While this vm_lock replacement does not yet result in a smaller vm_area_struct (it stays at 256 bytes due to cacheline alignment), it allows for further size optimization by structure member regrouping to bring the size of vm_area_struct below 192 bytes. Suggested-by: Peter Zijlstra Suggested-by: Matthew Wilcox Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Tested-by: Shivank Garg --- include/linux/mm.h | 102 +++++++++++++++++++++---------- include/linux/mm_types.h | 22 +++---- kernel/fork.c | 13 ++-- mm/init-mm.c | 1 + mm/memory.c | 80 +++++++++++++++++++++--- tools/testing/vma/linux/atomic.h | 5 ++ tools/testing/vma/vma_internal.h | 66 +++++++++++--------- 7 files changed, 198 insertions(+), 91 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3432756d95e6..a99b11ee1f66 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -32,6 +32,7 @@ #include #include #include +#include =20 struct mempolicy; struct anon_vma; @@ -697,12 +698,43 @@ static inline void vma_numab_state_free(struct vm_are= a_struct *vma) {} #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_PER_VMA_LOCK -static inline void vma_lock_init(struct vm_area_struct *vma) +static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_re= fcnt) { - init_rwsem(&vma->vm_lock.lock); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + static struct lock_class_key lockdep_key; + + lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0); +#endif + if (reset_refcnt) + refcount_set(&vma->vm_refcnt, 0); vma->vm_lock_seq =3D UINT_MAX; } =20 +static inline bool is_vma_writer_only(int refcnt) +{ + /* + * With a writer and no readers, refcnt is VMA_LOCK_OFFSET if the vma + * is detached and (VMA_LOCK_OFFSET + 1) if it is attached. Waiting on + * a detached vma happens only in vma_mark_detached() and is a rare + * case, therefore most of the time there will be no unnecessary wakeup. + */ + return refcnt & VMA_LOCK_OFFSET && refcnt <=3D VMA_LOCK_OFFSET + 1; +} + +static inline void vma_refcount_put(struct vm_area_struct *vma) +{ + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ + struct mm_struct *mm =3D vma->vm_mm; + int oldcnt; + + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); + if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) { + + if (is_vma_writer_only(oldcnt - 1)) + rcuwait_wake_up(&mm->vma_writer_wait); + } +} + /* * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to @@ -710,6 +742,8 @@ static inline void vma_lock_init(struct vm_area_struct = *vma) */ static inline bool vma_start_read(struct vm_area_struct *vma) { + int oldcnt; + /* * Check before locking. A race might cause false locked result. * We can use READ_ONCE() for the mm_lock_seq here, and don't need @@ -720,13 +754,19 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.= sequence)) return false; =20 - if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0)) + /* + * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited() will fail + * because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET. + */ + if (unlikely(!__refcount_inc_not_zero_limited(&vma->vm_refcnt, &oldcnt, + VMA_REF_LIMIT))) return false; =20 + rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); /* - * Overflow might produce false locked result. + * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. * False unlocked result is impossible because we modify and check - * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq + * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq * modification invalidates all existing locks. * * We must use ACQUIRE semantics for the mm_lock_seq so that if we are @@ -735,9 +775,10 @@ static inline bool vma_start_read(struct vm_area_struc= t *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { - up_read(&vma->vm_lock.lock); + vma_refcount_put(vma); return false; } + return true; } =20 @@ -749,8 +790,14 @@ static inline bool vma_start_read(struct vm_area_struc= t *vma) */ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) { + int oldcnt; + mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock.lock, subclass); + if (unlikely(!__refcount_inc_not_zero_limited(&vma->vm_refcnt, &oldcnt, + VMA_REF_LIMIT))) + return false; + + rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); return true; } =20 @@ -762,16 +809,12 @@ static inline bool vma_start_read_locked_nested(struc= t vm_area_struct *vma, int */ static inline bool vma_start_read_locked(struct vm_area_struct *vma) { - mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock.lock); - return true; + return vma_start_read_locked_nested(vma, 0); } =20 static inline void vma_end_read(struct vm_area_struct *vma) { - rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock.lock); - rcu_read_unlock(); + vma_refcount_put(vma); } =20 /* WARNING! Can only be used if mmap_lock is expected to be write-locked */ @@ -813,36 +856,33 @@ static inline void vma_assert_write_locked(struct vm_= area_struct *vma) =20 static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock.lock)) + if (refcount_read(&vma->vm_refcnt) <=3D 1) vma_assert_write_locked(vma); } =20 +/* + * WARNING: to avoid racing with vma_mark_attached()/vma_mark_detached(), = these + * assertions should be made either under mmap_write_lock or when the obje= ct + * has been isolated under mmap_write_lock, ensuring no competing writers. + */ static inline void vma_assert_attached(struct vm_area_struct *vma) { - VM_BUG_ON_VMA(vma->detached, vma); + VM_BUG_ON_VMA(!refcount_read(&vma->vm_refcnt), vma); } =20 static inline void vma_assert_detached(struct vm_area_struct *vma) { - VM_BUG_ON_VMA(!vma->detached, vma); + VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt), vma); } =20 static inline void vma_mark_attached(struct vm_area_struct *vma) { - vma->detached =3D false; -} - -static inline void vma_mark_detached(struct vm_area_struct *vma) -{ - /* When detaching vma should be write-locked */ vma_assert_write_locked(vma); - vma->detached =3D true; + vma_assert_detached(vma); + refcount_set(&vma->vm_refcnt, 1); } =20 -static inline bool is_vma_detached(struct vm_area_struct *vma) -{ - return vma->detached; -} +void vma_mark_detached(struct vm_area_struct *vma); =20 static inline void release_fault_lock(struct vm_fault *vmf) { @@ -865,7 +905,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str= uct *mm, =20 #else /* CONFIG_PER_VMA_LOCK */ =20 -static inline void vma_lock_init(struct vm_area_struct *vma) {} +static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_re= fcnt) {} static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} @@ -908,12 +948,8 @@ static inline void vma_init(struct vm_area_struct *vma= , struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); -#ifdef CONFIG_PER_VMA_LOCK - /* vma is not locked, can't use vma_mark_detached() */ - vma->detached =3D true; -#endif vma_numab_state_init(vma); - vma_lock_init(vma); + vma_lock_init(vma, false); } =20 /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6573d95f1d1e..9228d19662c6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,6 +19,7 @@ #include #include #include +#include =20 #include =20 @@ -629,9 +630,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc= (const char *name) } #endif =20 -struct vma_lock { - struct rw_semaphore lock; -}; +#define VMA_LOCK_OFFSET 0x40000000 +#define VMA_REF_LIMIT (VMA_LOCK_OFFSET - 1) =20 struct vma_numab_state { /* @@ -709,19 +709,13 @@ struct vm_area_struct { }; =20 #ifdef CONFIG_PER_VMA_LOCK - /* - * Flag to indicate areas detached from the mm->mm_mt tree. - * Unstable RCU readers are allowed to read this. - */ - bool detached; - /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) - * - vm_lock->lock (in write mode) + * - vm_refcnt bit at VMA_LOCK_OFFSET is set * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) - * - vm_lock->lock (in read or write mode) + * - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * @@ -784,7 +778,10 @@ struct vm_area_struct { struct vm_userfaultfd_ctx vm_userfaultfd_ctx; #ifdef CONFIG_PER_VMA_LOCK /* Unstable RCU readers are allowed to read this. */ - struct vma_lock vm_lock ____cacheline_aligned_in_smp; + refcount_t vm_refcnt ____cacheline_aligned_in_smp; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map vmlock_dep_map; +#endif #endif } __randomize_layout; =20 @@ -919,6 +916,7 @@ struct mm_struct { * by mmlist_lock */ #ifdef CONFIG_PER_VMA_LOCK + struct rcuwait vma_writer_wait; /* * This field has lock-like semantics, meaning it is sometimes * accessed with ACQUIRE/RELEASE semantics. diff --git a/kernel/fork.c b/kernel/fork.c index d4c75428ccaf..9d9275783cf8 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -463,12 +463,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) * will be reinitialized. */ data_race(memcpy(new, orig, sizeof(*new))); - vma_lock_init(new); + vma_lock_init(new, true); INIT_LIST_HEAD(&new->anon_vma_chain); -#ifdef CONFIG_PER_VMA_LOCK - /* vma is not locked, can't use vma_mark_detached() */ - new->detached =3D true; -#endif vma_numab_state_init(new); dup_anon_vma_name(orig, new); =20 @@ -477,6 +473,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc= t *orig) =20 void __vm_area_free(struct vm_area_struct *vma) { + /* The vma should be detached while being destroyed. */ + vma_assert_detached(vma); vma_numab_state_free(vma); free_anon_vma_name(vma); kmem_cache_free(vm_area_cachep, vma); @@ -488,8 +486,6 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, vm_rcu); =20 - /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); __vm_area_free(vma); } #endif @@ -1223,6 +1219,9 @@ static inline void mmap_init_lock(struct mm_struct *m= m) { init_rwsem(&mm->mmap_lock); mm_lock_seqcount_init(mm); +#ifdef CONFIG_PER_VMA_LOCK + rcuwait_init(&mm->vma_writer_wait); +#endif } =20 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, diff --git a/mm/init-mm.c b/mm/init-mm.c index 6af3ad675930..4600e7605cab 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -40,6 +40,7 @@ struct mm_struct init_mm =3D { .arg_lock =3D __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist =3D LIST_HEAD_INIT(init_mm.mmlist), #ifdef CONFIG_PER_VMA_LOCK + .vma_writer_wait =3D __RCUWAIT_INITIALIZER(init_mm.vma_writer_wait), .mm_lock_seq =3D SEQCNT_ZERO(init_mm.mm_lock_seq), #endif .user_ns =3D &init_user_ns, diff --git a/mm/memory.c b/mm/memory.c index 236fdecd44d6..dc16b67beefa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6328,9 +6328,47 @@ struct vm_area_struct *lock_mm_and_find_vma(struct m= m_struct *mm, #endif =20 #ifdef CONFIG_PER_VMA_LOCK +static inline bool __vma_enter_locked(struct vm_area_struct *vma, bool det= aching) +{ + unsigned int tgt_refcnt =3D VMA_LOCK_OFFSET; + + /* Additional refcnt if the vma is attached. */ + if (!detaching) + tgt_refcnt++; + + /* + * If vma is detached then only vma_mark_attached() can raise the + * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached(). + */ + if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) + return false; + + rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_); + rcuwait_wait_event(&vma->vm_mm->vma_writer_wait, + refcount_read(&vma->vm_refcnt) =3D=3D tgt_refcnt, + TASK_UNINTERRUPTIBLE); + lock_acquired(&vma->vmlock_dep_map, _RET_IP_); + + return true; +} + +static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *det= ached) +{ + *detached =3D refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt); + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); +} + void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q) { - down_write(&vma->vm_lock.lock); + bool locked; + + /* + * __vma_enter_locked() returns false immediately if the vma is not + * attached, otherwise it waits until refcnt is indicating that vma + * is attached with no readers. + */ + locked =3D __vma_enter_locked(vma, false); + /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -6338,10 +6376,40 @@ void __vma_start_write(struct vm_area_struct *vma, = unsigned int mm_lock_seq) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock.lock); + + if (locked) { + bool detached; + + __vma_exit_locked(vma, &detached); + VM_BUG_ON_VMA(detached, vma); /* vma should remain attached */ + } } EXPORT_SYMBOL_GPL(__vma_start_write); =20 +void vma_mark_detached(struct vm_area_struct *vma) +{ + vma_assert_write_locked(vma); + vma_assert_attached(vma); + + /* + * We are the only writer, so no need to use vma_refcount_put(). + * The condition below is unlikely because the vma has been already + * write-locked and readers can increment vm_refcnt only temporarily + * before they check vm_lock_seq, realize the vma is locked and drop + * back the vm_refcnt. That is a narrow window for observing a raised + * vm_refcnt. + */ + if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) { + /* Wait until vma is detached with no readers. */ + if (__vma_enter_locked(vma, true)) { + bool detached; + + __vma_exit_locked(vma, &detached); + VM_BUG_ON_VMA(!detached, vma); + } + } +} + /* * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed = to be * stable and not isolated. If the VMA is not found or is being modified t= he @@ -6354,7 +6422,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, struct vm_area_struct *vma; =20 rcu_read_lock(); -retry: vma =3D mas_walk(&mas); if (!vma) goto inval; @@ -6362,13 +6429,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_= struct *mm, if (!vma_start_read(vma)) goto inval; =20 - /* Check if the VMA got isolated after we found it */ - if (is_vma_detached(vma)) { - vma_end_read(vma); - count_vm_vma_lock_event(VMA_LOCK_MISS); - /* The area was replaced with another one */ - goto retry; - } /* * At this point, we have a stable reference to a VMA: The VMA is * locked and we know it hasn't already been isolated. diff --git a/tools/testing/vma/linux/atomic.h b/tools/testing/vma/linux/ato= mic.h index 3e1b6adc027b..788c597c4fde 100644 --- a/tools/testing/vma/linux/atomic.h +++ b/tools/testing/vma/linux/atomic.h @@ -9,4 +9,9 @@ #define atomic_set(x, y) uatomic_set(x, y) #define U8_MAX UCHAR_MAX =20 +#ifndef atomic_cmpxchg_relaxed +#define atomic_cmpxchg_relaxed uatomic_cmpxchg +#define atomic_cmpxchg_release uatomic_cmpxchg +#endif /* atomic_cmpxchg_relaxed */ + #endif /* _LINUX_ATOMIC_H */ diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index 47c8b03ffbbd..2ce032943861 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -25,7 +25,7 @@ #include #include #include -#include +#include =20 extern unsigned long stack_guard_gap; #ifdef CONFIG_MMU @@ -134,10 +134,6 @@ typedef __bitwise unsigned int vm_fault_t; */ #define pr_warn_once pr_err =20 -typedef struct refcount_struct { - atomic_t refs; -} refcount_t; - struct kref { refcount_t refcount; }; @@ -232,15 +228,12 @@ struct mm_struct { unsigned long flags; /* Must use atomic bitops to access */ }; =20 -struct vma_lock { - struct rw_semaphore lock; -}; - - struct file { struct address_space *f_mapping; }; =20 +#define VMA_LOCK_OFFSET 0x40000000 + struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ =20 @@ -268,16 +261,13 @@ struct vm_area_struct { }; =20 #ifdef CONFIG_PER_VMA_LOCK - /* Flag to indicate areas detached from the mm->mm_mt tree */ - bool detached; - /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) - * - vm_lock.lock (in write mode) + * - vm_refcnt bit at VMA_LOCK_OFFSET is set * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) - * - vm_lock.lock (in read or write mode) + * - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * @@ -286,7 +276,6 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - struct vma_lock vm_lock; #endif =20 /* @@ -339,6 +328,10 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + /* Unstable RCU readers are allowed to read this. */ + refcount_t vm_refcnt; +#endif } __randomize_layout; =20 struct vm_fault {}; @@ -463,23 +456,41 @@ static inline struct vm_area_struct *vma_next(struct = vma_iterator *vmi) return mas_find(&vmi->mas, ULONG_MAX); } =20 -static inline void vma_lock_init(struct vm_area_struct *vma) +/* + * WARNING: to avoid racing with vma_mark_attached()/vma_mark_detached(), = these + * assertions should be made either under mmap_write_lock or when the obje= ct + * has been isolated under mmap_write_lock, ensuring no competing writers. + */ +static inline void vma_assert_attached(struct vm_area_struct *vma) { - init_rwsem(&vma->vm_lock.lock); - vma->vm_lock_seq =3D UINT_MAX; + VM_BUG_ON_VMA(!refcount_read(&vma->vm_refcnt), vma); } =20 -static inline void vma_mark_attached(struct vm_area_struct *vma) +static inline void vma_assert_detached(struct vm_area_struct *vma) { - vma->detached =3D false; + VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt), vma); } =20 static inline void vma_assert_write_locked(struct vm_area_struct *); +static inline void vma_mark_attached(struct vm_area_struct *vma) +{ + vma_assert_write_locked(vma); + vma_assert_detached(vma); + refcount_set(&vma->vm_refcnt, 1); +} + static inline void vma_mark_detached(struct vm_area_struct *vma) { - /* When detaching vma should be write-locked */ vma_assert_write_locked(vma); - vma->detached =3D true; + vma_assert_attached(vma); + + /* We are the only writer, so no need to use vma_refcount_put(). */ + if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) { + /* + * Reader must have temporarily raised vm_refcnt but it will + * drop it without using the vma since vma is write-locked. + */ + } } =20 extern const struct vm_operations_struct vma_dummy_vm_ops; @@ -492,9 +503,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - /* vma is not locked, can't use vma_mark_detached() */ - vma->detached =3D true; - vma_lock_init(vma); + vma->vm_lock_seq =3D UINT_MAX; } =20 static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) @@ -517,10 +526,9 @@ static inline struct vm_area_struct *vm_area_dup(struc= t vm_area_struct *orig) return NULL; =20 memcpy(new, orig, sizeof(*new)); - vma_lock_init(new); + refcount_set(&new->vm_refcnt, 0); + new->vm_lock_seq =3D UINT_MAX; INIT_LIST_HEAD(&new->anon_vma_chain); - /* vma is not locked, can't use vma_mark_detached() */ - new->detached =3D true; =20 return new; } --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6178F191F6A for ; Sat, 11 Jan 2025 04:26:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569595; cv=none; b=mT+GTTIKd3ycQMsPJCs2GGwT7+E+rVVwNSkOumaQnOALlaQwH/MkyMfEjjT47hGA0tMRombS/6FCaHIJ1wviCFSf42Vpj6X5Pe7HXxpeJyJkj8wMsUKzWP708SD3BgmOo9JOSKdWMIORpZ9CxoxQ0UcE23JVwtPbB2bY6PKS9q4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569595; c=relaxed/simple; bh=Y9D0p+k57aMethzqPjdkAQ2WOKHP/DFBI9vGZ9OYQL0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=u2YfGbcJGzvAhwhh63GbA+2djJ54NvsPjvl6d/gY5qMwyQZaZwRUg9mmCJEU+bHUjwTcpRMEcbe9EtIZquZ7dtLvuoPYKNa62ww0JzCFhgQDpZ5ZUbjYA9l2bnmcpYP6n2XJ57FEkExEjpDcxL1WkuKRDxdj3GefeB7D2Mdbwoo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=awrWwo2R; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="awrWwo2R" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ee86953aeaso4761791a91.2 for ; Fri, 10 Jan 2025 20:26:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569593; x=1737174393; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2TrBali5XeUoD22YNdBhZEhX/KaryyxFeSYnw61ebAw=; b=awrWwo2ROfesPHK7P3iWIhaAq4rLCURYduyjQVhb91G6OjJfbghga20oScfIbO4224 /cNdNI7gC9wsKO5VbwTi9a+ee7q9gfUGVlLSNBe9/UH8EZ5Bb4oamULQt6QjFO4i+zWq RNIEAvNCzkL4Z6T/54kGF3YUozeBQgPYs1NEiuZwdEYJ4Af/dunYp2RPHdvMGT2PKf4S th5STfhJ310w9TQsC9xlu7jk4hTky8vJ92gkdih90f0efF0aoBVn0UWWUToVdN4voSdv A5uzRgrNnV8Qsjrp4Us7WqyubzdYvm1krl88qF6fN1N7k7aJxySISHwT8XdxsnC05iZ6 inwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569593; x=1737174393; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2TrBali5XeUoD22YNdBhZEhX/KaryyxFeSYnw61ebAw=; b=rL5rmDuCSxqZG6w7yitLZchAE9rWNvvj5xuG+S3+fYVTnWSyOCncXmHasj3jj4cq0G bFTk+XJKQcG9roDcRpPCjXt5ppPzTjraFZbCJu6ykxCPYSOpbyf/3S1DHtNu6u6ZwiYc K1e0TGZXeNm1NcF/w6Z1GZf4dJoXCp+vFVbfbk/wEQjCihkxvagwZULJy0fXggOAzh0E wg591kG2RVO5cCKzvoHQja80GqCY6LSgYp7620IRSoLkeRWzI2qnHT5LdNNV0Q0uiCvJ pihYheV7YVwyiRhGVmaPFcXTWnpfQ7LrP1Co4P5V+l5kHfJtfslFih+/9oZtthqUMm6p wysA== X-Forwarded-Encrypted: i=1; AJvYcCUtOEQ+ymLvfnEKGDg+wMMqqGbZ6SGlGXi71M5iteMwmy7UmJx3uQ/VhVsJs1/2kAUruxd9a5+wS52yyqo=@vger.kernel.org X-Gm-Message-State: AOJu0YwuJD3aYRL384cAMVN6pTa4kp7ju8QBP6LDW21rmCiPt7MdhJ6j t+wrB7txaPuJebL8eQLF1D1GqGJGUminguoGg+3VmHHV3tWRnqM7c6tWjbudws6eYcaZB1Vm7gD 9Jg== X-Google-Smtp-Source: AGHT+IGO/d7KP6nJwj/5X1Bl/23roKwNny3F3/SdCo6a/vda+g/68kT+5jbEnMfYJNiY3Hot1V2n9uarLr4= X-Received: from pjbdj16.prod.google.com ([2002:a17:90a:d2d0:b0:2ee:4826:cae3]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:518b:b0:2ee:b2fe:eeee with SMTP id 98e67ed59e1d1-2f548eba7d0mr20657931a91.15.1736569593696; Fri, 10 Jan 2025 20:26:33 -0800 (PST) Date: Fri, 10 Jan 2025 20:25:59 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-13-surenb@google.com> Subject: [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move several vma_area_struct members which are rarely or never used during page fault handling into the last cacheline to better pack vm_area_struct. As a result vm_area_struct will fit into 3 as opposed to 4 cachelines. New typical vm_area_struct layout: struct vm_area_struct { union { struct { long unsigned int vm_start; /* 0 8 */ long unsigned int vm_end; /* 8 8 */ }; /* 0 16 */ freeptr_t vm_freeptr; /* 0 8 */ }; /* 0 16 */ struct mm_struct * vm_mm; /* 16 8 */ pgprot_t vm_page_prot; /* 24 8 */ union { const vm_flags_t vm_flags; /* 32 8 */ vm_flags_t __vm_flags; /* 32 8 */ }; /* 32 8 */ unsigned int vm_lock_seq; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ struct list_head anon_vma_chain; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct anon_vma * anon_vma; /* 64 8 */ const struct vm_operations_struct * vm_ops; /* 72 8 */ long unsigned int vm_pgoff; /* 80 8 */ struct file * vm_file; /* 88 8 */ void * vm_private_data; /* 96 8 */ atomic_long_t swap_readahead_info; /* 104 8 */ struct mempolicy * vm_policy; /* 112 8 */ struct vma_numab_state * numab_state; /* 120 8 */ /* --- cacheline 2 boundary (128 bytes) --- */ refcount_t vm_refcnt (__aligned__(64)); /* 128 4 */ /* XXX 4 bytes hole, try to pack */ struct { struct rb_node rb (__aligned__(8)); /* 136 24 */ long unsigned int rb_subtree_last; /* 160 8 */ } __attribute__((__aligned__(8))) shared; /* 136 32 */ struct anon_vma_name * anon_name; /* 168 8 */ struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 176 8 */ /* size: 192, cachelines: 3, members: 18 */ /* sum members: 176, holes: 2, sum holes: 8 */ /* padding: 8 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */ } __attribute__((__aligned__(64))); Memory consumption per 1000 VMAs becomes 48 pages: slabinfo after vm_area_struct changes: ... : ... vm_area_struct ... 192 42 2 : ... Signed-off-by: Suren Baghdasaryan Reviewed-by: Lorenzo Stoakes Tested-by: Shivank Garg --- include/linux/mm_types.h | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 9228d19662c6..d902e6730654 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -725,17 +725,6 @@ struct vm_area_struct { */ unsigned int vm_lock_seq; #endif - - /* - * For areas with an address space and backing store, - * linkage into the address_space->i_mmap interval tree. - * - */ - struct { - struct rb_node rb; - unsigned long rb_subtree_last; - } shared; - /* * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma * list, after a COW of one of the file pages. A MAP_SHARED vma @@ -755,14 +744,6 @@ struct vm_area_struct { struct file * vm_file; /* File we map to (can be NULL). */ void * vm_private_data; /* was vm_pte (shared mem) */ =20 -#ifdef CONFIG_ANON_VMA_NAME - /* - * For private and shared anonymous mappings, a pointer to a null - * terminated string containing the name given to the vma, or NULL if - * unnamed. Serialized by mmap_lock. Use anon_vma_name to access. - */ - struct anon_vma_name *anon_name; -#endif #ifdef CONFIG_SWAP atomic_long_t swap_readahead_info; #endif @@ -775,7 +756,6 @@ struct vm_area_struct { #ifdef CONFIG_NUMA_BALANCING struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif - struct vm_userfaultfd_ctx vm_userfaultfd_ctx; #ifdef CONFIG_PER_VMA_LOCK /* Unstable RCU readers are allowed to read this. */ refcount_t vm_refcnt ____cacheline_aligned_in_smp; @@ -783,6 +763,24 @@ struct vm_area_struct { struct lockdep_map vmlock_dep_map; #endif #endif + /* + * For areas with an address space and backing store, + * linkage into the address_space->i_mmap interval tree. + * + */ + struct { + struct rb_node rb; + unsigned long rb_subtree_last; + } shared; +#ifdef CONFIG_ANON_VMA_NAME + /* + * For private and shared anonymous mappings, a pointer to a null + * terminated string containing the name given to the vma, or NULL if + * unnamed. Serialized by mmap_lock. Use anon_vma_name to access. + */ + struct anon_vma_name *anon_name; +#endif + struct vm_userfaultfd_ctx vm_userfaultfd_ctx; } __randomize_layout; =20 #ifdef CONFIG_NUMA --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 342B61922D8 for ; Sat, 11 Jan 2025 04:26:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569597; cv=none; b=HBpRF4euterLaZL77SP8NXFqUDmYRQPRYZwMFFZf2QHPkqM0tlLTMqJsDal3k+72poV8/kNiLBe5PuRDyZ0jxnQUvwcCf1UU71caDEanO5VODQ9SpCKDoPSl2FwLkR7bX3N0clmtbM3q4//yzf/2sVlWIAdzls3GitX5jMnQXg4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569597; c=relaxed/simple; bh=xrCJxJ8xCeea7DfOdcyGDHC3o5JGvEYZQC59p2slgWc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KkayI5jIhQUGj2q5VxTOKReGr8zDKWMUu1XytrW0HYq29TL8qEcGE0QAnFYaJrbcsXPNAinMyVXaP6JJ/SkRdGuyiMyog0NpEOQGv3sR1TMh5fPJ0R3KXHvAXUDR41lkhQOec/yCTnA5fwxOo+hu/keMXgog8ed9mJcBBPcZ1S4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gwlb02md; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gwlb02md" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef79d9c692so7015761a91.0 for ; Fri, 10 Jan 2025 20:26:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569595; x=1737174395; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hRZ4wMzIwoFaVH+QKckb3zqa5H5+gDzhM3t6myutkJ4=; b=gwlb02mdcWIs9LKyTkDgMcqq762wkPB14hosKiXqFgD4Di15uM1bykBm/llqeVhKsc Mr+n1hI4MZo74l41knoykRT8SnMZ3D/qb4t0X+Aep6ZNmDok7hwXyFFNqNjQaRTQZtt+ 6Ze3KkFuxYVtTmtzgptdE4AL+ZxnJVnzn8Pc6bvknSQ9LvNU4sO6zFlp+1yu2egABFLD lKE6ulZRC2Gb+eTxaKtpJxsA+H8Pfv7P7c9ei7EmFIbu+WQE3RQlvvPkogg62HF0kHSE 0HuURi9unvo79D0HDJgd89cb9gobjIu7xcjuPQi/5XDVkqf7l4N89dcYn7u79pglTFhc O/qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569595; x=1737174395; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hRZ4wMzIwoFaVH+QKckb3zqa5H5+gDzhM3t6myutkJ4=; b=i3HgQmFscRAneIzF47TmpQND4H71qZpvjzoJ93vgNM6ddQCup8L/xX8HLe5w7n0Vgh NSbG2hpf9i3cRsfW15bdcvTcsp3yEK0CnlTUBGw1KIRRsBwVUa1k2URmkEHSfSlOKaXd FKxM5q6s+2RqCzEZ+WzSURBCmU1wtvHLyxbLEa32jiET2aa49dYHjCrd5jzwlIz8/7ww 88WqqkDCLNlByATcm4DV03qGD6TPcBYi8uVxfzLsYE5Pp1j+bMkBfjrvnXqROGVtdRh+ iixAs3GfnSlYJTzrH0dslagT61HZEpMJWrLsm2HSDApNdqJU5gzEXgN3FX8/oZXElEem wTaQ== X-Forwarded-Encrypted: i=1; AJvYcCWwq0sVdqDoPVkx62Yt1UdhFp/ZPjBym4vPEMInPJ5QwT6QcncjG8CKDc4VEr7ZPwfNA7p7OY5L3n1dBOI=@vger.kernel.org X-Gm-Message-State: AOJu0YwHbO4umWkcMLrRqXz165k/gSYTrww8+mqTtuRM3YT/ysAIO/Rk F3ABCa6ueqVVXMUNw8dBO9M9MevtdvLsU7vin7mJUkAL1i1gTMc5yR0241KBYa+j44fv68okLbe Fdg== X-Google-Smtp-Source: AGHT+IFTyAfsIAx+W8ijjylMHsN3oJ/EjAkXBdrJzAHIWy/euYIgMSi4UM4+efBYB633hSURtx6BkjgnB9k= X-Received: from pjbtb14.prod.google.com ([2002:a17:90b:53ce:b0:2ef:7352:9e97]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:548f:b0:2ee:7870:8835 with SMTP id 98e67ed59e1d1-2f548f80206mr20438054a91.33.1736569595602; Fri, 10 Jan 2025 20:26:35 -0800 (PST) Date: Fri, 10 Jan 2025 20:26:00 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-14-surenb@google.com> Subject: [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vm_refcnt encodes a number of useful states: - whether vma is attached or detached - the number of current vma readers - presence of a vma writer Let's include it in the vma dump. Signed-off-by: Suren Baghdasaryan Acked-by: Vlastimil Babka Tested-by: Shivank Garg --- mm/debug.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/debug.c b/mm/debug.c index 8d2acf432385..325d7bf22038 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -178,6 +178,17 @@ EXPORT_SYMBOL(dump_page); =20 void dump_vma(const struct vm_area_struct *vma) { +#ifdef CONFIG_PER_VMA_LOCK + pr_emerg("vma %px start %px end %px mm %px\n" + "prot %lx anon_vma %px vm_ops %px\n" + "pgoff %lx file %px private_data %px\n" + "flags: %#lx(%pGv) refcnt %x\n", + vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_mm, + (unsigned long)pgprot_val(vma->vm_page_prot), + vma->anon_vma, vma->vm_ops, vma->vm_pgoff, + vma->vm_file, vma->vm_private_data, + vma->vm_flags, &vma->vm_flags, refcount_read(&vma->vm_refcnt)); +#else pr_emerg("vma %px start %px end %px mm %px\n" "prot %lx anon_vma %px vm_ops %px\n" "pgoff %lx file %px private_data %px\n" @@ -187,6 +198,7 @@ void dump_vma(const struct vm_area_struct *vma) vma->anon_vma, vma->vm_ops, vma->vm_pgoff, vma->vm_file, vma->vm_private_data, vma->vm_flags, &vma->vm_flags); +#endif } EXPORT_SYMBOL(dump_vma); =20 --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 176471925AF for ; Sat, 11 Jan 2025 04:26:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569599; cv=none; b=rlwixtM5AD762hx7ahqE9ci8ER4lLyOoClBNY2L8MDzT6h4Z2E3iWW3CHHX/1tbvBQPay0seC7lxES55te+gbhY9zVcshPESKitviFVR1oGZr3KKr8fdksx5JLclUUgssJ+hjVCmyVQU0cNgXhZ/0fJL6i91Vz6CllPABPqS5wk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569599; c=relaxed/simple; bh=jp7pOqpxTLEwDi4KhwTXe5XULMuBvlLG6ZH7BUTpMUM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=g+UagiupVHwlo00lvG9SVqIaN/wV7aevF1lbGiqLu197BTy51TilwPBzdLsyOFdF3CQWdR1YCf5x0Qdz4IH8uLX3/H6JGI53HgQaiin+CnbocNeLV6Dfic9pq0Ybwgcz3HcfTpaKWpivS818E4UJhDBveGWSB7z5x3Cw+1pWXmU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PFg5avya; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PFg5avya" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef9e4c5343so6934647a91.0 for ; Fri, 10 Jan 2025 20:26:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569597; x=1737174397; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rrfyWeUIr/I0BTq+qLIV8uWpeFqy3I6BsfgqMyH93Tk=; b=PFg5avyanzOtoFqkch7x+afqpznGP0h2/Vv1F2EegFTkOc7hTz7tFXMY2HbKkfpenS H9hF71X8zZXf0ZsLgSuWthI7tJukxRwGbEqJ0qa1DfFRqrYQnrx5fRLVsPly1sdp/tlF LceKTrLXMgRX+LwTx1y15U31CPD1GA6yicw3zRgxvWfihxzLq46zJ1nK3v06WUQq6LjU d9IRtfUboIreyEDBhoWS2z6lMLZQXUyrcMZN1eXWJfwNrbQrm3PgMarvXXuf6IZGfOZw Eqisyqex5X6nnPZOQjB6+b7YrDW8ndlhLfZAgycb2cp1scYMw/kCfIGDV95bm3TAYnsc cqJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569597; x=1737174397; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rrfyWeUIr/I0BTq+qLIV8uWpeFqy3I6BsfgqMyH93Tk=; b=gnuEoShTcwRhrGIn322YTNozHAbVcGx8jXZu9OgFf5s9zvom0Op5/QErQeGIftEqCa 9bjMkUX0VNgmatLlsw5sIGfLygsb6dqXzCd/hRcuMuYF8vIQoR+gA4eR6DExBL7FtiDn LUmMUUCAXODuPsEFX6Z4MdnpS/7mnNeqFNBbSJtop52hFdvbaodUJ6p7YyVqQgi+gafx MfscWUct8tgZIUMXtvx2a9Ah9p/mxB9KbaQd8+wQsBBQg5Vy3yuGcVpgwx6P5uQadb1f FqSp5pg6u+afbheDx4xErAmvJ10hHMZc3DdQSL5aKZs+fVrFa8PS9QWzxDsAMl9qdXVn 4nUA== X-Forwarded-Encrypted: i=1; AJvYcCWuq4Sr1++AJkxRxLqdfyREIHsu10kFpV6pGj976KOj8FHjcbdVcDb7dK06zI4jokBN2QkVl2Amz1r6S4A=@vger.kernel.org X-Gm-Message-State: AOJu0YxgIp4v0KrEH9qydYP1/6gm7TtiXSPPe39QeewFxLwB9wl1udi9 CsMGZ+v1aDqSsFk9ee4QpZFc5azeont/jR1u1NVvallkPuZ8AFt5ZnxGWzvMwI5R7OlZiGsjl9V KzQ== X-Google-Smtp-Source: AGHT+IF0cO5xMxR+CmFlng0UtVrY7zpuViy4m8fCuGIIjG6w6sySt0yQzGg7F/vXZQnVto/CKon38/Om3JE= X-Received: from pjz13.prod.google.com ([2002:a17:90b:56cd:b0:2ef:7af4:5e8e]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3a10:b0:2ee:f1e3:fd21 with SMTP id 98e67ed59e1d1-2f5490c20f3mr19186046a91.25.1736569597473; Fri, 10 Jan 2025 20:26:37 -0800 (PST) Date: Fri, 10 Jan 2025 20:26:01 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-15-surenb@google.com> Subject: [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_init() already memset's the whole vm_area_struct to 0, so there is no need to an additional vma_numab_state_init(). Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Reviewed-by: Lorenzo Stoakes Tested-by: Shivank Garg --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a99b11ee1f66..c8da64b114d1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -948,7 +948,6 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_numab_state_init(vma); vma_lock_init(vma, false); } =20 --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35A2019342F for ; Sat, 11 Jan 2025 04:26:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569601; cv=none; b=WE7eE7hgh3FgZyHsU1wTx+F0iV3YQ7lBaKr67KvSrghPMIvdC7gaz8dOJd3Hu94d3zAo6hwK2NyPW86lANnsjxmk0tL629h/xv6+H048zu+pPLc0wHUI8Ruz2AlAu7rYGwMBO4Fcm/HdLA/en8zylQ1W24S/ZRya8BqTY56UJxI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569601; c=relaxed/simple; bh=bMFnoPr0vvMm61gosasQGe/t4yH8uJ1rFqZuK9xLtds=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IAU58HfBC66Sq934khLKsnPhr16CIiLxzKeuIuZ+6or3XFE9388wWQT2vKWW3E05eMu0ATa6AgHA7dvC8MVsgOFafIl2suIavI7c12wBfmiLw/whc4PGlxpjq+8bwNXk6Wjoqm30s1V17PuE57f1sXpWrX978yXKMTp4NmZOhpY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=loBJwYJ5; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="loBJwYJ5" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-21638389f63so33524495ad.1 for ; Fri, 10 Jan 2025 20:26:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569599; x=1737174399; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=sqweZn90u3nkindBXw7O0ID50FUbfbOsDRIP3ctel0c=; b=loBJwYJ5r1BVvdRFPgm2kNlRt+qIHwFKEb/+03kmH+cDFiWkVUBGNQYgsROcRek3+r AfFW0gGRfudq7VhqSkWG01zliJNOANJM8COtlhqEge2hYuo0COA3KneQvZUaNUWjWG/f 1ob2VtdkAbGCmFzCzka0UVZxHPEoSDQ3Kx4dUnFYOOuoPeypHi6jWJNYk2M3C4OKWe1S crRYhYgnQieLGd5GtC7Gn3uFKwPTV90uBx8LcxGGUJ60/+gFh6wK5NpXCcufVk0R2ZvF L4GVIdF+QeP17YhBQ+IooGAKYFq5naAdGsWRF7E4MrmLSkJgL9u8KYWPRAhO44SPHcNp 1oKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569599; x=1737174399; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sqweZn90u3nkindBXw7O0ID50FUbfbOsDRIP3ctel0c=; b=gewv6yqKZd75xBIHxWLtRLNxX8wRhPnn1EuRs3C8kkYQTBPVCFJcUanrei6RVN3BZj B6NKx4ToVaMCUY8fN4/vzyygGFDRLzIvYuXDQ+0f1bVXf8BOVHTYzVjlHS4sYKaLBgzU XXqhy0sDhOuF133HumXtT4V+SApyde1bWSbFU6lZBBpm8u/QsNUV5y/A7clLsxh4RvWk fC46U+kVD8581/31VN586Ufv2MnFFMF4drVtD5WnFeZnwYuFhfD/6mGi62KQNYcD0b3R /kVhJWHgtC0IzSyI+bllT0S1zbjvofwnGZRtK/G5zeMiCzXpsDp8Kl6UZID3lyvi2caG ErYg== X-Forwarded-Encrypted: i=1; AJvYcCXyc4aeHPsATzIuABQ2qbViknrgWvm009BFAcxMS6qBAprpSttOu2uOZmcFNLlRfs+S4NRCRH+Z6hVdr+Y=@vger.kernel.org X-Gm-Message-State: AOJu0YzsLzA+oZpZE8kTzBGyD2pC3yG1i4jAWDmuZBjiD2KzmZrXHX8U GwAtygJeaRBgg6oExH38IpTLe5VW8cQbHBTvXblctt0Q39yVbUlAB1VK5UgiP1yrkEi+rYuCHXs 7yA== X-Google-Smtp-Source: AGHT+IEDi9vLxcvUpqCWBPMXXF0Y8lrMfIatGu39C3DhS1ZkGtjS5dUciQm8YtZJQiW79lEU0cSQ1XI24iE= X-Received: from pgg14.prod.google.com ([2002:a05:6a02:4d8e:b0:7ff:d6:4f07]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:c996:b0:1e3:e680:8c91 with SMTP id adf61e73a8af0-1e88d2d5ea9mr24704039637.31.1736569599645; Fri, 10 Jan 2025 20:26:39 -0800 (PST) Date: Fri, 10 Jan 2025 20:26:02 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-16-surenb@google.com> Subject: [PATCH v9 15/17] mm: prepare lock_vma_under_rcu() for vma reuse possibility From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Once we make vma cache SLAB_TYPESAFE_BY_RCU, it will be possible for a vma to be reused and attached to another mm after lock_vma_under_rcu() locks the vma. lock_vma_under_rcu() should ensure that vma_start_read() is using the original mm and after locking the vma it should ensure that vma->vm_mm has not changed from under us. Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Tested-by: Shivank Garg --- include/linux/mm.h | 10 ++++++---- mm/memory.c | 7 ++++--- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c8da64b114d1..cb29eb7360c5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -739,8 +739,10 @@ static inline void vma_refcount_put(struct vm_area_str= uct *vma) * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to * using mmap_lock. The function should never yield false unlocked result. + * False locked result is possible if mm_lock_seq overflows or if vma gets + * reused and attached to a different mm before we lock it. */ -static inline bool vma_start_read(struct vm_area_struct *vma) +static inline bool vma_start_read(struct mm_struct *mm, struct vm_area_str= uct *vma) { int oldcnt; =20 @@ -751,7 +753,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) * we don't rely on for anything - the mm_lock_seq read against which we * need ordering is below. */ - if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.= sequence)) + if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(mm->mm_lock_seq.sequence= )) return false; =20 /* @@ -774,7 +776,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) * after it has been unlocked. * This pairs with RELEASE semantics in vma_end_write_all(). */ - if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { + if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&mm->mm_lock_seq))= ) { vma_refcount_put(vma); return false; } @@ -906,7 +908,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str= uct *mm, #else /* CONFIG_PER_VMA_LOCK */ =20 static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_re= fcnt) {} -static inline bool vma_start_read(struct vm_area_struct *vma) +static inline bool vma_start_read(struct mm_struct *mm, struct vm_area_str= uct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} diff --git a/mm/memory.c b/mm/memory.c index dc16b67beefa..67cfcebb0f94 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6426,7 +6426,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, if (!vma) goto inval; =20 - if (!vma_start_read(vma)) + if (!vma_start_read(mm, vma)) goto inval; =20 /* @@ -6436,8 +6436,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, * fields are accessible for RCU readers. */ =20 - /* Check since vm_start/vm_end might change before we lock the VMA */ - if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)) + /* Check if the vma we locked is the right one. */ + if (unlikely(vma->vm_mm !=3D mm || + address < vma->vm_start || address >=3D vma->vm_end)) goto inval_end_read; =20 rcu_read_unlock(); --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87CE916F851 for ; Sat, 11 Jan 2025 04:26:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569604; cv=none; b=QsZOiueeTuXV/AjAHGiKyGw0bSuM27YBQCJ+y02GPj0+pcKmhCSJlqx/F+S6Rokc5x0QKEvoJC2STCUFBuKHplgXV6KrcrauCiCNuUtTlnxz8U0/XhCQa6UsdB4VQMj971KrjmTeeWMJsPWUhOncweQOsozLuF9peSk2T5G+uyI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569604; c=relaxed/simple; bh=GIEod8WQHSUoahku2pKiXwLt7WSQIBQZXZ0OvtktjO4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YJi55dR35C/zp7AOcJLAzQ6J/lCVEk9+oKN5pIJcTBLJD+QhIoQ1N8eV6apkRVaumFy0I3/XqtHSddcWvxvm1xFhULkQpz1A/5V6EQuhT52g+kn4vZoLZMT9HK7CQPmyQ7p9Ae4VDWGz8O1UaAfOq0+47N3L+L8pXNJHWgkDrL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CRiPWRMc; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CRiPWRMc" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-21650d4612eso69052805ad.2 for ; Fri, 10 Jan 2025 20:26:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569602; x=1737174402; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VRTmBS/RO6y0myDTT8TgSTc028MD502e4G83fytWfN8=; b=CRiPWRMcRQYKY5nWONXK9eFwE58cvWlNtW3WuG3TGGH1KOZdywTWH0l4m7cFT2yb5f fPBBUQp+YTBbPO2wp8rIlMdWjUxjc8qRGNfQG6BPlUEoTyHZeuBAsqEz9AUap6ti7LCO TilyHCkWrDYs9cGqVyfvCVyTojTuv6GtzdJWfAr9Zb2MlETmcmPJr/rTKuz5jIUe66kz VvDHYAMyzkrTUwntzZBD/LEGPF/Mhcuclg9F7D7UOxEydQz2ry2i6J4wfYqLR1QiwL42 tLReBJEdCE4iXipyh9/pjhqjBpRMvZvE+jFv8N9n/0B9QaemjKTT3gF7Drxa5jc3ByR+ 1o+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569602; x=1737174402; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VRTmBS/RO6y0myDTT8TgSTc028MD502e4G83fytWfN8=; b=VV66rpz0fU81J+2HT93GZNFN6i80/t0MNDRJdpqqQwelZcXUOKiGSCXTZ/hEM34CSK 2WvWO0tu17yMEMw0w0newjDSQ0ebk19E75GpFj+keWrryGKDrvY/b+t+owy1FksTO4F0 /BLB30iRRWxKXmLTfAgPVVWlx5AY8+Jjkk39yJYLNNfOVfxKm1QHxa44dQDCUk2Ds2mA akX+z7tEjOMT9Uew0CpZtVC3psH18xDSqlYUHmy0xde/bz4Cg/gl/64iBhT2O1ayxjCs jFx5vd9lOC3uDDx980Ix3OyPR/xe1s8k4qpbusIHB+nt+YLgfBCk4HFiMZkEOiSlnOzH 5HDQ== X-Forwarded-Encrypted: i=1; AJvYcCWAX5ATfQR3IGVzYVMfctN/KyEoFsiG/rwoCZjA8ozIpsRD01UY0lkJL7Ovh5dTonMeGSgJ7/n+sA0eSAc=@vger.kernel.org X-Gm-Message-State: AOJu0YwppMPu/uO2uJDrgKg3ibXCsLKTql1DlDXl8QumtWY+qEzlGLMm B0mG178Wfj6fsCLODw7+D54lTJthLwrCk0On+b2bzQk+u/Amh7LTB3lYBvWNj5cmOnGQPG5CGZQ s3g== X-Google-Smtp-Source: AGHT+IGvE8YVAHPC8Xc9V5y9NTovVckrMEjQEriOsrbb9o0xtzvcfxg/meKcTLOv1CFtxoMOk8jWgbTqBhU= X-Received: from pguy5.prod.google.com ([2002:a65:6c05:0:b0:7fd:4075:406d]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:841c:b0:1e1:a9dd:5a58 with SMTP id adf61e73a8af0-1e88d0b6d05mr23322693637.30.1736569601794; Fri, 10 Jan 2025 20:26:41 -0800 (PST) Date: Fri, 10 Jan 2025 20:26:03 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-17-surenb@google.com> Subject: [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that object reuse before RCU grace period is over will be detected by lock_vma_under_rcu(). Current checks are sufficient as long as vma is detached before it is freed. The only place this is not currently happening is in exit_mmap(). Add the missing vma_mark_detached() in exit_mmap(). Another issue which might trick lock_vma_under_rcu() during vma reuse is vm_area_dup(), which copies the entire content of the vma into a new one, overriding new vma's vm_refcnt and temporarily making it appear as attached. This might trick a racing lock_vma_under_rcu() to operate on a reused vma if it found the vma before it got reused. To prevent this situation, we should ensure that vm_refcnt stays at detached state (0) when it is copied and advances to attached state only after it is added into the vma tree. Introduce vm_area_init_from() which preserves new vma's vm_refcnt and use it in vm_area_dup(). Since all vmas are in detached state with no current readers when they are freed, lock_vma_under_rcu() will not be able to take vm_refcnt after vma got detached even if vma is reused. Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate vm_area_struct reuse and will minimize the number of call_rcu() calls. Signed-off-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Tested-by: Shivank Garg --- include/linux/mm.h | 2 - include/linux/mm_types.h | 13 ++++-- include/linux/slab.h | 6 --- kernel/fork.c | 73 ++++++++++++++++++++------------ mm/mmap.c | 3 +- mm/vma.c | 11 ++--- mm/vma.h | 2 +- tools/testing/vma/vma_internal.h | 7 +-- 8 files changed, 63 insertions(+), 54 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cb29eb7360c5..ac78425e9838 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -258,8 +258,6 @@ void setup_initial_init_mm(void *start_code, void *end_= code, struct vm_area_struct *vm_area_alloc(struct mm_struct *); struct vm_area_struct *vm_area_dup(struct vm_area_struct *); void vm_area_free(struct vm_area_struct *); -/* Use only if VMA has no other users */ -void __vm_area_free(struct vm_area_struct *vma); =20 #ifndef CONFIG_MMU extern struct rb_root nommu_region_tree; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d902e6730654..d366ec6302e6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -574,6 +574,12 @@ static inline void *folio_get_private(struct folio *fo= lio) =20 typedef unsigned long vm_flags_t; =20 +/* + * freeptr_t represents a SLUB freelist pointer, which might be encoded + * and not dereferenceable if CONFIG_SLAB_FREELIST_HARDENED is enabled. + */ +typedef struct { unsigned long v; } freeptr_t; + /* * A region containing a mapping of a non-memory backed file under NOMMU * conditions. These are held in a global tree and are pinned by the VMAs= that @@ -677,6 +683,9 @@ struct vma_numab_state { * * Only explicitly marked struct members may be accessed by RCU readers be= fore * getting a stable reference. + * + * WARNING: when adding new members, please update vm_area_init_from() to = copy + * them during vm_area_struct content duplication. */ struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ @@ -687,9 +696,7 @@ struct vm_area_struct { unsigned long vm_start; unsigned long vm_end; }; -#ifdef CONFIG_PER_VMA_LOCK - struct rcu_head vm_rcu; /* Used for deferred freeing. */ -#endif + freeptr_t vm_freeptr; /* Pointer used by SLAB_TYPESAFE_BY_RCU */ }; =20 /* diff --git a/include/linux/slab.h b/include/linux/slab.h index 10a971c2bde3..681b685b6c4e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -234,12 +234,6 @@ enum _slab_flag_bits { #define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED #endif =20 -/* - * freeptr_t represents a SLUB freelist pointer, which might be encoded - * and not dereferenceable if CONFIG_SLAB_FREELIST_HARDENED is enabled. - */ -typedef struct { unsigned long v; } freeptr_t; - /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * diff --git a/kernel/fork.c b/kernel/fork.c index 9d9275783cf8..151b40627c14 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -449,6 +449,42 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct = *mm) return vma; } =20 +static void vm_area_init_from(const struct vm_area_struct *src, + struct vm_area_struct *dest) +{ + dest->vm_mm =3D src->vm_mm; + dest->vm_ops =3D src->vm_ops; + dest->vm_start =3D src->vm_start; + dest->vm_end =3D src->vm_end; + dest->anon_vma =3D src->anon_vma; + dest->vm_pgoff =3D src->vm_pgoff; + dest->vm_file =3D src->vm_file; + dest->vm_private_data =3D src->vm_private_data; + vm_flags_init(dest, src->vm_flags); + memcpy(&dest->vm_page_prot, &src->vm_page_prot, + sizeof(dest->vm_page_prot)); + /* + * src->shared.rb may be modified concurrently when called from + * dup_mmap(), but the clone will reinitialize it. + */ + data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared))); + memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx, + sizeof(dest->vm_userfaultfd_ctx)); +#ifdef CONFIG_ANON_VMA_NAME + dest->anon_name =3D src->anon_name; +#endif +#ifdef CONFIG_SWAP + memcpy(&dest->swap_readahead_info, &src->swap_readahead_info, + sizeof(dest->swap_readahead_info)); +#endif +#ifndef CONFIG_MMU + dest->vm_region =3D src->vm_region; +#endif +#ifdef CONFIG_NUMA + dest->vm_policy =3D src->vm_policy; +#endif +} + struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) { struct vm_area_struct *new =3D kmem_cache_alloc(vm_area_cachep, GFP_KERNE= L); @@ -458,11 +494,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) =20 ASSERT_EXCLUSIVE_WRITER(orig->vm_flags); ASSERT_EXCLUSIVE_WRITER(orig->vm_file); - /* - * orig->shared.rb may be modified concurrently, but the clone - * will be reinitialized. - */ - data_race(memcpy(new, orig, sizeof(*new))); + vm_area_init_from(orig, new); vma_lock_init(new, true); INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); @@ -471,7 +503,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struc= t *orig) return new; } =20 -void __vm_area_free(struct vm_area_struct *vma) +void vm_area_free(struct vm_area_struct *vma) { /* The vma should be detached while being destroyed. */ vma_assert_detached(vma); @@ -480,25 +512,6 @@ void __vm_area_free(struct vm_area_struct *vma) kmem_cache_free(vm_area_cachep, vma); } =20 -#ifdef CONFIG_PER_VMA_LOCK -static void vm_area_free_rcu_cb(struct rcu_head *head) -{ - struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, - vm_rcu); - - __vm_area_free(vma); -} -#endif - -void vm_area_free(struct vm_area_struct *vma) -{ -#ifdef CONFIG_PER_VMA_LOCK - call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb); -#else - __vm_area_free(vma); -#endif -} - static void account_kernel_stack(struct task_struct *tsk, int account) { if (IS_ENABLED(CONFIG_VMAP_STACK)) { @@ -3144,6 +3157,11 @@ void __init mm_cache_init(void) =20 void __init proc_caches_init(void) { + struct kmem_cache_args args =3D { + .use_freeptr_offset =3D true, + .freeptr_offset =3D offsetof(struct vm_area_struct, vm_freeptr), + }; + sighand_cachep =3D kmem_cache_create("sighand_cache", sizeof(struct sighand_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU| @@ -3160,8 +3178,9 @@ void __init proc_caches_init(void) sizeof(struct fs_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - vm_area_cachep =3D KMEM_CACHE(vm_area_struct, - SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC| + vm_area_cachep =3D kmem_cache_create("vm_area_struct", + sizeof(struct vm_area_struct), &args, + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU| SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); diff --git a/mm/mmap.c b/mm/mmap.c index cda01071c7b1..7aa36216ecc0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1305,7 +1305,8 @@ void exit_mmap(struct mm_struct *mm) do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted +=3D vma_pages(vma); - remove_vma(vma, /* unreachable =3D */ true); + vma_mark_detached(vma); + remove_vma(vma); count++; cond_resched(); vma =3D vma_next(&vmi); diff --git a/mm/vma.c b/mm/vma.c index 93ff42ac2002..0a5158d611e3 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -406,19 +406,14 @@ static bool can_vma_merge_right(struct vma_merge_stru= ct *vmg, /* * Close a vm structure and free it. */ -void remove_vma(struct vm_area_struct *vma, bool unreachable) +void remove_vma(struct vm_area_struct *vma) { might_sleep(); vma_close(vma); if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - if (unreachable) { - vma_mark_detached(vma); - __vm_area_free(vma); - } else { - vm_area_free(vma); - } + vm_area_free(vma); } =20 /* @@ -1201,7 +1196,7 @@ static void vms_complete_munmap_vmas(struct vma_munma= p_struct *vms, /* Remove and clean up vmas */ mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - remove_vma(vma, /* unreachable =3D */ false); + remove_vma(vma); =20 vm_unacct_memory(vms->nr_accounted); validate_mm(mm); diff --git a/mm/vma.h b/mm/vma.h index 63dd38d5230c..f51005b95b39 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -170,7 +170,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_s= truct *mm, unsigned long start, size_t len, struct list_head *uf, bool unlock); =20 -void remove_vma(struct vm_area_struct *vma, bool unreachable); +void remove_vma(struct vm_area_struct *vma); =20 void unmap_region(struct ma_state *mas, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next); diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index 2ce032943861..49a85ce0d45a 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -697,14 +697,9 @@ static inline void mpol_put(struct mempolicy *) { } =20 -static inline void __vm_area_free(struct vm_area_struct *vma) -{ - free(vma); -} - static inline void vm_area_free(struct vm_area_struct *vma) { - __vm_area_free(vma); + free(vma); } =20 static inline void lru_add_drain(void) --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Feb 7 12:11:39 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DA4A19ABAC for ; Sat, 11 Jan 2025 04:26:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569605; cv=none; b=ci1eQcswsYKyUA/quu9msnYTpbF3UhSOjKjgIxp9AtsMHEgRjQXqmkPX3g7SJDMDXoR4FtJL9JXx9vhTGOPYJmaTmbsyI+KYgSkwFwY2+/TqJyUMxTqLIJLExh1gYOsoeUuyB07x6C0NEH50es8j/o8OZ9n4sFSayf8x+eDGf24= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736569605; c=relaxed/simple; bh=zKfjcGKw4fgC5eVw/PLaT9hCUpUPIDpZJuPQmd/+rms=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=S+UIpzTKOiWdtGoLzoIeW0mPgFqDB7NLI57txA0EBmMRIkGtAvHfvOCZL0B2KNjdHH2MMYucDDlCwPtyKJBVeGxoUjPPsx0RPMC552farwuBpSSUCjQJDcUhlrchh4+TLV3F9j+CUkQgsmMFFPHlLtW0Zs3UNBO4MX7AaezuUyI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SeJGj0SC; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SeJGj0SC" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2f5538a2356so4690178a91.2 for ; Fri, 10 Jan 2025 20:26:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736569604; x=1737174404; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jMJU/3gn80Bp5VROzRR3pKDE/HRHDjLKawYxTpkzfvE=; b=SeJGj0SCuMAHaqLpRfknzAF5tYw2dkwNjvfTR7w2J1kP23Zr7jc7dPTM8qMSAXyY7g 34aKVr1j6YxtUtBryCgF0UHE5NtWWOGkssfL0jHtMEbcuUYPYXqXRCNR615mPwthm2z3 LDNV7zS+Y8PuBHEnNSihtP/K/ufJPfarYa7H+BExelbdrNlVuzpSR/hmj25hG5IMXa8C 76ozz/ptoQmi9OsDu8iIAxMWOk+KADF/0dPwxpFDN2c8P342kB0slpD1HRfngvQYUrVB 5ciwZrXd1+IJ/SHR5Q3yLmizbg+INZk/ta7kMxBi9n4H78ZHL56jvNiBKqE+AnOHFXqx 8nDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736569604; x=1737174404; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jMJU/3gn80Bp5VROzRR3pKDE/HRHDjLKawYxTpkzfvE=; b=sgax7LIroezXhhzNtL7wXl4Q/nvVXTuYKqG3pIXcNjmScBzzXNgnKfwdNJRfncJjlU KWyGcoZo5PihSuktbVTxsMTRgOMn77HTw56GSNPiSanZWtOTQLLksOiaTt/h8D2tdpk6 jeV14YB1zoaEGC3uqJUHiWsu135FBfn4lJrR0doEWTeUDKi+bFJdXnR7D2og2LWCDxY7 WJ//GB5vHzrtVc5O3swV09LA76/gq+l5TVdHcPsFSQn/qKmTMyAvN3xTqztXHaSTh7C6 JtI7SGdnymFyL61UlmI0v76bsCyIK75mRk588kyuRCGqoc2Tw/u6AjaY7AUEO919YPhZ pOsg== X-Forwarded-Encrypted: i=1; AJvYcCWiAMwP6fKop97Nv0poFPCszRLP+IgSTmsEhf7s6xyDTPzdt6Ki4G3+Zgk8oKBUstmlMxXxhAHPQAmwcKo=@vger.kernel.org X-Gm-Message-State: AOJu0YxQ2qoQ9ov8H5uyqmh+T+u8hFw+P3DIwQG8SwTdzIJY+hyvqt6O uOCoOYUy4uL37aK6b9URbbDbUhhnnlMulhb1hD+eHu2J8SXfbg1ey5tWkp++sL6VIGu+4XXFVwF w7w== X-Google-Smtp-Source: AGHT+IHMJpbQc3YVt8D3m2qQgogBlgT8FIMa0JXsY8T6EvdL/cAJ/CFqkCirBdGHXn7gzQWGFp3GMnON+/M= X-Received: from pjbqd11.prod.google.com ([2002:a17:90b:3ccb:b0:2e5:8726:a956]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:520e:b0:2ef:67c2:4030 with SMTP id 98e67ed59e1d1-2f548f4ea90mr18504335a91.27.1736569603731; Fri, 10 Jan 2025 20:26:43 -0800 (PST) Date: Fri, 10 Jan 2025 20:26:04 -0800 In-Reply-To: <20250111042604.3230628-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250111042604.3230628-18-surenb@google.com> Subject: [PATCH v9 17/17] docs/mm: document latest changes to vm_lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, "Liam R. Howlett" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change the documentation to reflect that vm_lock is integrated into vma and replaced with vm_refcnt. Document newly introduced vma_start_read_locked{_nested} functions. Signed-off-by: Suren Baghdasaryan Reviewed-by: Liam R. Howlett Reviewed-by: Lorenzo Stoakes Tested-by: Shivank Garg --- Documentation/mm/process_addrs.rst | 44 ++++++++++++++++++------------ 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_= addrs.rst index 81417fa2ed20..f573de936b5d 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -716,9 +716,14 @@ calls :c:func:`!rcu_read_lock` to ensure that the VMA = is looked up in an RCU critical section, then attempts to VMA lock it via :c:func:`!vma_start_rea= d`, before releasing the RCU lock via :c:func:`!rcu_read_unlock`. =20 -VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semapho= re for -their duration and the caller of :c:func:`!lock_vma_under_rcu` must releas= e it -via :c:func:`!vma_end_read`. +In cases when the user already holds mmap read lock, :c:func:`!vma_start_r= ead_locked` +and :c:func:`!vma_start_read_locked_nested` can be used. These functions d= o not +fail due to lock contention but the caller should still check their return= values +in case they fail for other reasons. + +VMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for = their +duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via +:c:func:`!vma_end_read`. =20 VMA **write** locks are acquired via :c:func:`!vma_start_write` in instanc= es where a VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is = always @@ -726,9 +731,9 @@ acquired. An mmap write lock **must** be held for the d= uration of the VMA write lock, releasing or downgrading the mmap write lock also releases the VMA w= rite lock so there is no :c:func:`!vma_end_write` function. =20 -Note that a semaphore write lock is not held across a VMA lock. Rather, a -sequence number is used for serialisation, and the write semaphore is only -acquired at the point of write lock to update this. +Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is= temporarily +modified so that readers can detect the presense of a writer. The referenc= e counter is +restored once the vma sequence number used for serialisation is updated. =20 This ensures the semantics we require - VMA write locks provide exclusive = write access to the VMA. @@ -738,7 +743,7 @@ Implementation details =20 The VMA lock mechanism is designed to be a lightweight means of avoiding t= he use of the heavily contended mmap lock. It is implemented using a combination = of a -read/write semaphore and sequence numbers belonging to the containing +reference counter and sequence numbers belonging to the containing :c:struct:`!struct mm_struct` and the VMA. =20 Read locks are acquired via :c:func:`!vma_start_read`, which is an optimis= tic @@ -779,28 +784,31 @@ release of any VMA locks on its release makes sense, = as you would never want to keep VMAs locked across entirely separate write operations. It also mainta= ins correct lock ordering. =20 -Each time a VMA read lock is acquired, we acquire a read lock on the -:c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking= that -the sequence count of the VMA does not match that of the mm. +Each time a VMA read lock is acquired, we increment :c:member:`!vma.vm_ref= cnt` +reference counter and check that the sequence count of the VMA does not ma= tch +that of the mm. =20 -If it does, the read lock fails. If it does not, we hold the lock, excludi= ng -writers, but permitting other readers, who will also obtain this lock unde= r RCU. +If it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped. +If it does not, we keep the reference counter raised, excluding writers, b= ut +permitting other readers, who can also obtain this lock under RCU. =20 Importantly, maple tree operations performed in :c:func:`!lock_vma_under_r= cu` are also RCU safe, so the whole read lock operation is guaranteed to funct= ion correctly. =20 -On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock` -read/write semaphore, before setting the VMA's sequence number under this = lock, -also simultaneously holding the mmap write lock. +On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't = be +modified by readers and wait for all readers to drop their reference count. +Once there are no readers, VMA's sequence number is set to match that of t= he +mm. During this entire operation mmap write lock is held. =20 This way, if any read locks are in effect, :c:func:`!vma_start_write` will= sleep until these are finished and mutual exclusion is achieved. =20 -After setting the VMA's sequence number, the lock is released, avoiding -complexity with a long-term held write lock. +After setting the VMA's sequence number, the bit in :c:member:`!vma.vm_ref= cnt` +indicating a writer is cleared. From this point on, VMA's sequence number = will +indicate VMA's write-locked state until mmap write lock is dropped or down= graded. =20 -This clever combination of a read/write semaphore and sequence count allow= s for +This clever combination of a reference counter and sequence count allows f= or fast RCU-based per-VMA lock acquisition (especially on page fault, though utilised elsewhere) with minimal complexity around lock ordering. =20 --=20 2.47.1.613.gc27f4b7a9f-goog