From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CDE8206F35 for ; Mon, 16 Dec 2024 19:24:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377067; cv=none; b=My1YLWT8p2Euf35atBExKQhnyN2i1rtJVM8hvgnPc7XzPrkddJnT3/kBwMNtWK/6U5RoyMeA7LIaNkmhcSmFwVofGKlxKGmgQiEC3Us7dKEvtB/VfqHu8uq/D9QLyJFFe5W/7MIMdvu5T1d8YSUtT4L96U49hYuyN4oLQfv/7ac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377067; c=relaxed/simple; bh=ZuhVNkkcxX9XPqzBdhxdevat7/UtnVBp6WiY/j9FpKE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sDL/bFtm6GXVojriwcxoO+7UspHIIc+LMcysiHIzqHvSXB0UhBqXr4jhPhXBfQJRaU90IYpWzzmjh1NYyeh/V8YT25NfiPLYz21/ajlT76OdF2XrTdodd9t8FKdFRr3w6Cdz+BcuY7kiM4eCt9/lk1Fs03R8L+ro5jw0DkZAG/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CPEHp+Gl; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CPEHp+Gl" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-725eb1e9139so6564702b3a.1 for ; Mon, 16 Dec 2024 11:24:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377065; x=1734981865; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NdPtaL44NpTjP9eS1K5KVvptksJqAFYIrPxgR/8blQY=; b=CPEHp+Glaf1mdGuw1Hmj22UTVdsJELco1PEoU2WWGYqk2G/Ypzdzva1D1/EBpwwZhP mR8tKWw3z5rlp4852r7WFYfa1qK648yfInvuAAJ8vkMg2blFiKSJSMsE5ClL1jV+2sv1 4muu+HI/eUCdxuRcG/+V38OdcrMF7FEcfP8g+ZZuD+4iX3a1NVXSoW0oJIzax+NtdVJf 6Bn5VbwzB4PD2HvYXubSYOGQ+bGlSRbS/Ctwjzn8dFQnkHgYdOiP38yImqueWjKoDV1a ucmZAMkkcV/LJxupJinzBLaGgpj7d1OIonstHtYov5a4j83tH+zsdj5DtTLD78DEZX4d GM4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377065; x=1734981865; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NdPtaL44NpTjP9eS1K5KVvptksJqAFYIrPxgR/8blQY=; b=Ty3z9ApleyYq/d2GO9F9w798Y0clrn3HnyoB8Dyx/c7hiUvIOFYUPQAXQ+BSOTyS8G dYjr2MqX2JGiVCaTgxX2L0OK2L/eqQntY/1V6JPTiUAzO2LHJQqhNgQpXj0CKJXIKM7h NlyWzQoXDCa+b6VzPI65Vd2jkfweBi69QmcQM3bJrtfCDG+nBol2D12koSLuyUj35WRS oa7WhiBE242Iz7kWpbcDYIsQqsfDOnkn046A4FTfKbrzHMjcaUc3Rd3qZ2BbGgWF150w atUsEhzgTaB5GMuuYSgqNFGMUKzXWX+v1hxtq/4v52IkbSvylkIl58H7asZh1xTxGPuX /U6Q== X-Forwarded-Encrypted: i=1; AJvYcCU4ecRht8/xeBsjsqE4oN5GwT/Z6P8NJegrkn9X1OTZGwrHhx0zwaznwqQzwmyh/xP1dosIY1gjN9MILK4=@vger.kernel.org X-Gm-Message-State: AOJu0YyvpWr0wG6LxRnCbzoy3nYnVvIxE16T0Oe9Z7m2Rt8nXT+EHXdW m5ti3ccQ8tKmE2SmRGknZXlYrM1k4a/OwckCKT8K8AQEav9bjRn8n6db+cTrpPzX5kK77g5WRyo TgQ== X-Google-Smtp-Source: AGHT+IGebkhkuPAK7qZirmEWQkyXcqUHZOWu2H7W9h1+nHtKbr+4oTQtJb1gp01e4aHBI2bniuiuItfkaYg= X-Received: from pfbbs2.prod.google.com ([2002:a05:6a00:4442:b0:725:e2fd:dcf9]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4305:b0:725:973f:9d53 with SMTP id d2e1a72fcca58-7290c19ec16mr16656092b3a.15.1734377065528; Mon, 16 Dec 2024 11:24:25 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:04 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-2-surenb@google.com> Subject: [PATCH v6 01/16] mm: introduce vma_start_read_locked{_nested} helpers From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce helper functions which can be used to read-lock a VMA when holding mmap_lock for read. Replace direct accesses to vma->vm_lock with these new helpers. Signed-off-by: Suren Baghdasaryan Reviewed-by: Lorenzo Stoakes Reviewed-by: Davidlohr Bueso Reviewed-by: Shakeel Butt Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 24 ++++++++++++++++++++++++ mm/userfaultfd.c | 22 +++++----------------- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1352147a2648..3815a43ba504 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -735,6 +735,30 @@ static inline bool vma_start_read(struct vm_area_struc= t *vma) return true; } =20 +/* + * Use only while holding mmap read lock which guarantees that locking wil= l not + * fail (nobody can concurrently write-lock the vma). vma_start_read() sho= uld + * not be used in such cases because it might fail due to mm_lock_seq over= flow. + * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. + */ +static inline void vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) +{ + mmap_assert_locked(vma->vm_mm); + down_read_nested(&vma->vm_lock->lock, subclass); +} + +/* + * Use only while holding mmap read lock which guarantees that locking wil= l not + * fail (nobody can concurrently write-lock the vma). vma_start_read() sho= uld + * not be used in such cases because it might fail due to mm_lock_seq over= flow. + * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. + */ +static inline void vma_start_read_locked(struct vm_area_struct *vma) +{ + mmap_assert_locked(vma->vm_mm); + down_read(&vma->vm_lock->lock); +} + static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8e16dc290ddf..bc9a66ec6a6e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -84,16 +84,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_st= ruct *mm, =20 mmap_read_lock(mm); vma =3D find_vma_and_prepare_anon(mm, address); - if (!IS_ERR(vma)) { - /* - * We cannot use vma_start_read() as it may fail due to - * false locked (see comment in vma_start_read()). We - * can avoid that by directly locking vm_lock under - * mmap_lock, which guarantees that nobody can lock the - * vma for write (vma_start_write()) under us. - */ - down_read(&vma->vm_lock->lock); - } + if (!IS_ERR(vma)) + vma_start_read_locked(vma); =20 mmap_read_unlock(mm); return vma; @@ -1491,14 +1483,10 @@ static int uffd_move_lock(struct mm_struct *mm, mmap_read_lock(mm); err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap); if (!err) { - /* - * See comment in uffd_lock_vma() as to why not using - * vma_start_read() here. - */ - down_read(&(*dst_vmap)->vm_lock->lock); + vma_start_read_locked(*dst_vmap); if (*dst_vmap !=3D *src_vmap) - down_read_nested(&(*src_vmap)->vm_lock->lock, - SINGLE_DEPTH_NESTING); + vma_start_read_locked_nested(*src_vmap, + SINGLE_DEPTH_NESTING); } mmap_read_unlock(mm); return err; --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E36B2080F0 for ; Mon, 16 Dec 2024 19:24:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377070; cv=none; b=PIJYGSHkY933tV2XjaFEtIaJNHHOtTWqKmVO+fMp8B72e4Qr5X0ny9OjbbszXt56Fmg4KsbbkM3Ss71FRDTzkkQccYWsNottLH/iTw76oVKqOon0NIvXewFtrZqUjct/XUEzmQiYsqRfAGcut5oR/uC8xnWHOPlURmf6Iy6hQWE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377070; c=relaxed/simple; bh=kmoyPqZq10++Hy23vjHJHRFTdBDcIK5HS+uu7lfS46w=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=D42S7FqW1wY8u5jlUh3FaW4AlD8HWr+cCS5tAy2ZCczP9WgunEyHoRMqD1pu+DgXGTEDYLInawjDb/WyZb2sxK8xtBUuDrbjc7uzjJ2sCvEWTcK9ATEwJRDptSVxJG5h1s9ToUDp9Qj+pAi/6x9U92XAhOGjLnenBIi2MAIei/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lQYDeRyF; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lQYDeRyF" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-728f1c4b95aso3338171b3a.0 for ; Mon, 16 Dec 2024 11:24:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377067; x=1734981867; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=igJVFGm55VljdfCWWP1FWPjwRGLgL1kTgn08p8BmWdc=; b=lQYDeRyFj24gFusq99k/Nkem8JKJ780AqI0Pprsj9eXJG85Y09NuptynU9Y+JV8FxB ssvWy3wReKuxzWYn6apnY/RXfJlGUDuMhjz01gxf7Q39CceryUJXOiOQnePiguiaZyCf 5ugTIt9puCiPi4yPsurId73zA7La6dnAZVacA87Ut949LRcOMfPksWOvaEN96fPPWaIa ZYqPW7g8QmOOTsh3O2/MqSKq/XfLrrAiCEeWlZU7E56n35hXYvEKRc0QqkhYw349euU3 UHCu7fuglhYIMkGK6vKd9qSPSxqTPJri3ZojKfcHKMLeJerc4plbAAIGFHQ7Q4xZpbsn smLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377067; x=1734981867; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=igJVFGm55VljdfCWWP1FWPjwRGLgL1kTgn08p8BmWdc=; b=VZErglneQJk+DornCgoIAAWG98cN7QxXQ5bUrpDU42ruXbR7+/5yvVBzIOtvOAKfDE qb6k+RnZ/VG0fXNoKk0JaJ30OHQbpGfz5x55vXQCCk4ZeabZfQc3fU1JsIzoE1CoKgz2 TftJgt31lUua4TT7qfHyy6/I6JMMcPxYCLow4v3XZMKX15LfMSQxVq4SMAxnUXYOomat CWhzYnJpr4ghCvnDiV6phf7WNfTCRxjdCLuIRBxYDC3+cywvPVc6SrnJIeoimpGhDK0O zh0nKWaBCnmJrD3eHQL/4ePe+jfPjw/YLTOnOaOrpu2A27DoeGnu620a6kWHLAZAqIUr XBrg== X-Forwarded-Encrypted: i=1; AJvYcCWlNA+yOwfpaFTkOKP+OdYezXP4U0Zjn1/HH7lFjXmU2gqZyHlKUQo0/07nHFwQmb0LkFeE2mBF2EHbnQk=@vger.kernel.org X-Gm-Message-State: AOJu0YxwdV20FSkuIrys5S4GOfWBUhVJ5R90plJI9ey1OhDVCBFvPpRZ PXND3DwiYdtAn+bFM+zJXs+QkeRMQjyrvGU9ZOxaIyN8uqJxzyiTM1mb0NE+wzZ93qnluHgyzTA E8w== X-Google-Smtp-Source: AGHT+IH2UojHUVJ+gBvUhOeXPQ2zGfzEXizGrigN1NQ9dKN5xlrh19HGwEfI3CKJSRYp1fQb42E5oINrzxo= X-Received: from pfbbk15.prod.google.com ([2002:aa7:830f:0:b0:728:e508:8a3e]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:cd5:b0:724:f86e:e3d9 with SMTP id d2e1a72fcca58-7290c181255mr20564216b3a.14.1734377067561; Mon, 16 Dec 2024 11:24:27 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:05 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-3-surenb@google.com> Subject: [PATCH v6 02/16] mm: move per-vma lock into vm_area_struct From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Back when per-vma locks were introduces, vm_lock was moved out of vm_area_struct in [1] because of the performance regression caused by false cacheline sharing. Recent investigation [2] revealed that the regressions is limited to a rather old Broadwell microarchitecture and even there it can be mitigated by disabling adjacent cacheline prefetching, see [3]. Splitting single logical structure into multiple ones leads to more complicated management, extra pointer dereferences and overall less maintainable code. When that split-away part is a lock, it complicates things even further. With no performance benefits, there are no reasons for this split. Merging the vm_lock back into vm_area_struct also allows vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset. Move vm_lock back into vm_area_struct, aligning it at the cacheline boundary and changing the cache to be cacheline-aligned as well. With kernel compiled using defconfig, this causes VMA memory consumption to grow from 160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes: slabinfo before: ... : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 160 51 2 : ... slabinfo after moving vm_lock: ... : ... vm_area_struct ... 256 32 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages, which is 5.5MB per 100000 VMAs. Note that the size of this structure is dependent on the kernel configuration and typically the original size is higher than 160 bytes. Therefore these calculations are close to the worst case scenario. A more realistic vm_area_struct usage before this change is: ... : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 176 46 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages, which is 3.9MB per 100000 VMAs. This memory consumption growth can be addressed later by optimizing the vm_lock. [1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/ [2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/ [3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbf= P_pR+-2g@mail.gmail.com/ Signed-off-by: Suren Baghdasaryan Reviewed-by: Lorenzo Stoakes Reviewed-by: Shakeel Butt Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 28 ++++++++++-------- include/linux/mm_types.h | 6 ++-- kernel/fork.c | 49 ++++---------------------------- tools/testing/vma/vma_internal.h | 33 +++++---------------- 4 files changed, 32 insertions(+), 84 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3815a43ba504..e1768a9395c9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -697,6 +697,12 @@ static inline void vma_numab_state_free(struct vm_area= _struct *vma) {} #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_PER_VMA_LOCK +static inline void vma_lock_init(struct vm_area_struct *vma) +{ + init_rwsem(&vma->vm_lock.lock); + vma->vm_lock_seq =3D UINT_MAX; +} + /* * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to @@ -714,7 +720,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.= sequence)) return false; =20 - if (unlikely(down_read_trylock(&vma->vm_lock->lock) =3D=3D 0)) + if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0)) return false; =20 /* @@ -729,7 +735,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); return false; } return true; @@ -744,7 +750,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) static inline void vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) { mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock->lock, subclass); + down_read_nested(&vma->vm_lock.lock, subclass); } =20 /* @@ -756,13 +762,13 @@ static inline void vma_start_read_locked_nested(struc= t vm_area_struct *vma, int static inline void vma_start_read_locked(struct vm_area_struct *vma) { mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock->lock); + down_read(&vma->vm_lock.lock); } =20 static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock.lock); rcu_read_unlock(); } =20 @@ -791,7 +797,7 @@ static inline void vma_start_write(struct vm_area_struc= t *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; =20 - down_write(&vma->vm_lock->lock); + down_write(&vma->vm_lock.lock); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -799,7 +805,7 @@ static inline void vma_start_write(struct vm_area_struc= t *vma) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock->lock); + up_write(&vma->vm_lock.lock); } =20 static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -811,7 +817,7 @@ static inline void vma_assert_write_locked(struct vm_ar= ea_struct *vma) =20 static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock->lock)) + if (!rwsem_is_locked(&vma->vm_lock.lock)) vma_assert_write_locked(vma); } =20 @@ -844,6 +850,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str= uct *mm, =20 #else /* CONFIG_PER_VMA_LOCK */ =20 +static inline void vma_lock_init(struct vm_area_struct *vma) {} static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} @@ -878,10 +885,6 @@ static inline void assert_fault_locked(struct vm_fault= *vmf) =20 extern const struct vm_operations_struct vma_dummy_vm_ops; =20 -/* - * WARNING: vma_init does not initialize vma->vm_lock. - * Use vm_area_alloc()/vm_area_free() if vma needs locking. - */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *= mm) { memset(vma, 0, sizeof(*vma)); @@ -890,6 +893,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); + vma_lock_init(vma); } =20 /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 266f53b2bb49..825f6328f9e5 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -700,8 +700,6 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - /* Unstable RCU readers are allowed to read this. */ - struct vma_lock *vm_lock; #endif =20 /* @@ -754,6 +752,10 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + /* Unstable RCU readers are allowed to read this. */ + struct vma_lock vm_lock ____cacheline_aligned_in_smp; +#endif } __randomize_layout; =20 #ifdef CONFIG_NUMA diff --git a/kernel/fork.c b/kernel/fork.c index 8dc670fe90d4..eb3e35d65e95 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; =20 -#ifdef CONFIG_PER_VMA_LOCK - -/* SLAB cache for vm_area_struct.lock */ -static struct kmem_cache *vma_lock_cachep; - -static bool vma_lock_alloc(struct vm_area_struct *vma) -{ - vma->vm_lock =3D kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq =3D UINT_MAX; - - return true; -} - -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - kmem_cache_free(vma_lock_cachep, vma->vm_lock); -} - -#else /* CONFIG_PER_VMA_LOCK */ - -static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return tru= e; } -static inline void vma_lock_free(struct vm_area_struct *vma) {} - -#endif /* CONFIG_PER_VMA_LOCK */ - struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct = *mm) return NULL; =20 vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - kmem_cache_free(vm_area_cachep, vma); - return NULL; - } =20 return vma; } @@ -496,10 +463,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) * will be reinitialized. */ data_race(memcpy(new, orig, sizeof(*new))); - if (!vma_lock_alloc(new)) { - kmem_cache_free(vm_area_cachep, new); - return NULL; - } + vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); dup_anon_vma_name(orig, new); @@ -511,7 +475,6 @@ void __vm_area_free(struct vm_area_struct *vma) { vma_numab_state_free(vma); free_anon_vma_name(vma); - vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } =20 @@ -522,7 +485,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); =20 /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); __vm_area_free(vma); } #endif @@ -3189,11 +3152,9 @@ void __init proc_caches_init(void) sizeof(struct fs_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - - vm_area_cachep =3D KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); -#ifdef CONFIG_PER_VMA_LOCK - vma_lock_cachep =3D KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); -#endif + vm_area_cachep =3D KMEM_CACHE(vm_area_struct, + SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC| + SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); } diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index b973b3e41c83..568c18d24d53 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -270,10 +270,10 @@ struct vm_area_struct { /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) - * - vm_lock->lock (in write mode) + * - vm_lock.lock (in write mode) * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) - * - vm_lock->lock (in read or write mode) + * - vm_lock.lock (in read or write mode) * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * @@ -282,7 +282,7 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - struct vma_lock *vm_lock; + struct vma_lock vm_lock; #endif =20 /* @@ -459,17 +459,10 @@ static inline struct vm_area_struct *vma_next(struct = vma_iterator *vmi) return mas_find(&vmi->mas, ULONG_MAX); } =20 -static inline bool vma_lock_alloc(struct vm_area_struct *vma) +static inline void vma_lock_init(struct vm_area_struct *vma) { - vma->vm_lock =3D calloc(1, sizeof(struct vma_lock)); - - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); + init_rwsem(&vma->vm_lock.lock); vma->vm_lock_seq =3D UINT_MAX; - - return true; } =20 static inline void vma_assert_write_locked(struct vm_area_struct *); @@ -492,6 +485,7 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); + vma_lock_init(vma); } =20 static inline struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) @@ -502,10 +496,6 @@ static inline struct vm_area_struct *vm_area_alloc(str= uct mm_struct *mm) return NULL; =20 vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - free(vma); - return NULL; - } =20 return vma; } @@ -518,10 +508,7 @@ static inline struct vm_area_struct *vm_area_dup(struc= t vm_area_struct *orig) return NULL; =20 memcpy(new, orig, sizeof(*new)); - if (!vma_lock_alloc(new)) { - free(new); - return NULL; - } + vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); =20 return new; @@ -691,14 +678,8 @@ static inline void mpol_put(struct mempolicy *) { } =20 -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - free(vma->vm_lock); -} - static inline void __vm_area_free(struct vm_area_struct *vma) { - vma_lock_free(vma); free(vma); } =20 --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 504B420967C for ; Mon, 16 Dec 2024 19:24:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377071; cv=none; b=JzT4XBrb1n369og+E49P7Ho+VwxAhoXNXHRGuk7wvnf/JekGIOsL3y/hd0WD1ua+GYej+q+be7cSYH8t4Oy4qKrkwrLu86On1g1Q5Hk18cFcV8nImT2emYYuACuwJqHJyGJLQLr+KiLzYoqKZZ4Xh520/Oz7uzbBgpSM0osUIqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377071; c=relaxed/simple; bh=OwfOVDl+AeBnqohswLkwlqnaJ4+jypTPvJQiyj/gY1g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=j7fssInEXb2mORiHFQfY61UJviX+pLIJDrlirioVfTHg6t5Q8ISn++lBydWJzGcXgXhO4vvn0sbtuX5onPJTOiNtp/hQTOe5TsPQhS2ic3tAxX901eck92IRFa9DCJNpWHYIGU1hX6GGKTKphXbJM2Jz0ILNqIeZJwhhYX5PC4o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Iyy1YSro; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Iyy1YSro" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7fd48914d9bso2878592a12.2 for ; Mon, 16 Dec 2024 11:24:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377069; x=1734981869; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tPcYammI4P+bTAfQ+4gtRvMOMTfi6gzN2lD/68NUW0U=; b=Iyy1YSroCJB9Jyk4Bl8pg8m5HCr0qUKzKZPDK5kgztkqcnZAj0bOEFlEPinHqdCLLx jlk8dv3gFHyAKNHzFkFW/DAkKF9q4a2JGiu1xt1OyG84ZGgD41rGb7Wr2SySG1PnICPP vXnaCIsVaXYZyIKFl/U2MpK7PMs/hjnbAUoOhpzpXTpcoZBoV6OUgR79svwCbMEfUkzJ 0LnXFCtCXKgx0IFLIYhJ7vIwwLJHyzDHzoMnuMkDYmcUmNHKopYgtN3loZmcbt5x5YJl ss/YUmfN2YKpQKxRsC91r3n61XQEgTG1l49SwesinX5aRwb9cj73PATUrJhF2mbbi4ws 7KBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377069; x=1734981869; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tPcYammI4P+bTAfQ+4gtRvMOMTfi6gzN2lD/68NUW0U=; b=GcYH3t/oz/qpMLU/m3C/V7G3vJJXD6JrO0tmWNnxduxpm3AVwDHnOE9wbit7hItYMf QRAV5sgHzhzVu4bJVBFuEAJlBMGfNGg0cjL8qXqjmUrMexLuu81EsbEUIGnE0IAMzcP+ XCPMVvsfvvzpwVYdvJgXZA9HcbuNJFoT/1VuBnUIh1gZuUGW+CQvTOBSYpAeH+Wn2umb J/qc0eF95QqTf3l5vIs65B5lkzzFHBQc/w4QGZIo9r7sOgKQIlbIQEe4LW8r0KFl6OpP Uo6iXN71tkTslnlP4ZGUHoF8VC3yqOFucQd8a5SOuGv9k3QGPOzWPGWnVHJ1R4FAhKjw imbQ== X-Forwarded-Encrypted: i=1; AJvYcCVSLsKPOd1AODOmGOAT1+yLlJawsM27x237LCTU5BMRuC7TpaYJxWtqqerwh82RcJ3C16GkZl0A3ZOHqYg=@vger.kernel.org X-Gm-Message-State: AOJu0YzQBmo0J2g1rvagSSUOr+16pFFgejFzJ+iQHiSujZ8mi3ErybGw cRpo9i9A5YxrsmY6PcjhE0bdzLlxICNnjwpAs3JM/0H2Diriq71QMLFJ61wfb2kd+IjQEqERRXg rWA== X-Google-Smtp-Source: AGHT+IHuIz3KcGOEBSL7UPXAppbQsIUkVSUTnN31px7An8YO7VAcBSifJ3+A4XtsuM0O2Sd96qkE9oQILpU= X-Received: from pgbcz14.prod.google.com ([2002:a05:6a02:230e:b0:7fd:4e21:2f5a]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:a120:b0:1e0:d32f:24e2 with SMTP id adf61e73a8af0-1e462d97415mr1129422637.38.1734377069542; Mon, 16 Dec 2024 11:24:29 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:06 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-4-surenb@google.com> Subject: [PATCH v6 03/16] mm: mark vma as detached until it's added into vma tree From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Current implementation does not set detached flag when a VMA is first allocated. This does not represent the real state of the VMA, which is detached until it is added into mm's VMA tree. Fix this by marking new VMAs as detached and resetting detached flag only after VMA is added into a tree. Introduce vma_mark_attached() to make the API more readable and to simplify possible future cleanup when vma->vm_mm might be used to indicate detached vma and vma_mark_attached() will need an additional mm parameter. Signed-off-by: Suren Baghdasaryan Reviewed-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 27 ++++++++++++++++++++------- kernel/fork.c | 4 ++++ mm/memory.c | 2 +- mm/vma.c | 6 +++--- mm/vma.h | 2 ++ tools/testing/vma/vma_internal.h | 17 ++++++++++++----- 6 files changed, 42 insertions(+), 16 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e1768a9395c9..689f5a1e2181 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -821,12 +821,21 @@ static inline void vma_assert_locked(struct vm_area_s= truct *vma) vma_assert_write_locked(vma); } =20 -static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta= ched) +static inline void vma_mark_attached(struct vm_area_struct *vma) +{ + vma->detached =3D false; +} + +static inline void vma_mark_detached(struct vm_area_struct *vma) { /* When detaching vma should be write-locked */ - if (detached) - vma_assert_write_locked(vma); - vma->detached =3D detached; + vma_assert_write_locked(vma); + vma->detached =3D true; +} + +static inline bool is_vma_detached(struct vm_area_struct *vma) +{ + return vma->detached; } =20 static inline void release_fault_lock(struct vm_fault *vmf) @@ -857,8 +866,8 @@ static inline void vma_end_read(struct vm_area_struct *= vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} static inline void vma_assert_write_locked(struct vm_area_struct *vma) { mmap_assert_write_locked(vma->vm_mm); } -static inline void vma_mark_detached(struct vm_area_struct *vma, - bool detached) {} +static inline void vma_mark_attached(struct vm_area_struct *vma) {} +static inline void vma_mark_detached(struct vm_area_struct *vma) {} =20 static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *= mm, unsigned long address) @@ -891,7 +900,10 @@ static inline void vma_init(struct vm_area_struct *vma= , struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_mark_detached(vma, false); +#ifdef CONFIG_PER_VMA_LOCK + /* vma is not locked, can't use vma_mark_detached() */ + vma->detached =3D true; +#endif vma_numab_state_init(vma); vma_lock_init(vma); } @@ -1086,6 +1098,7 @@ static inline int vma_iter_bulk_store(struct vma_iter= ator *vmi, if (unlikely(mas_is_err(&vmi->mas))) return -ENOMEM; =20 + vma_mark_attached(vma); return 0; } =20 diff --git a/kernel/fork.c b/kernel/fork.c index eb3e35d65e95..57dc5b935f79 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -465,6 +465,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) data_race(memcpy(new, orig, sizeof(*new))); vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); +#ifdef CONFIG_PER_VMA_LOCK + /* vma is not locked, can't use vma_mark_detached() */ + new->detached =3D true; +#endif vma_numab_state_init(new); dup_anon_vma_name(orig, new); =20 diff --git a/mm/memory.c b/mm/memory.c index 2d97a17dd3ba..cc7159aef918 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6350,7 +6350,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, goto inval; =20 /* Check if the VMA got isolated after we found it */ - if (vma->detached) { + if (is_vma_detached(vma)) { vma_end_read(vma); count_vm_vma_lock_event(VMA_LOCK_MISS); /* The area was replaced with another one */ diff --git a/mm/vma.c b/mm/vma.c index 6fa240e5b0c5..fbd7254517d6 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -327,7 +327,7 @@ static void vma_complete(struct vma_prepare *vp, struct= vma_iterator *vmi, =20 if (vp->remove) { again: - vma_mark_detached(vp->remove, true); + vma_mark_detached(vp->remove); if (vp->file) { uprobe_munmap(vp->remove, vp->remove->vm_start, vp->remove->vm_end); @@ -1222,7 +1222,7 @@ static void reattach_vmas(struct ma_state *mas_detach) =20 mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - vma_mark_detached(vma, false); + vma_mark_attached(vma); =20 __mt_destroy(mas_detach->tree); } @@ -1297,7 +1297,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_s= truct *vms, if (error) goto munmap_gather_failed; =20 - vma_mark_detached(next, true); + vma_mark_detached(next); nrpages =3D vma_pages(next); =20 vms->nr_pages +=3D nrpages; diff --git a/mm/vma.h b/mm/vma.h index 61ed044b6145..24636a2b0acf 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -157,6 +157,7 @@ static inline int vma_iter_store_gfp(struct vma_iterato= r *vmi, if (unlikely(mas_is_err(&vmi->mas))) return -ENOMEM; =20 + vma_mark_attached(vma); return 0; } =20 @@ -389,6 +390,7 @@ static inline void vma_iter_store(struct vma_iterator *= vmi, =20 __mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1); mas_store_prealloc(&vmi->mas, vma); + vma_mark_attached(vma); } =20 static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index 568c18d24d53..0cdc5f8c3d60 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -465,13 +465,17 @@ static inline void vma_lock_init(struct vm_area_struc= t *vma) vma->vm_lock_seq =3D UINT_MAX; } =20 +static inline void vma_mark_attached(struct vm_area_struct *vma) +{ + vma->detached =3D false; +} + static inline void vma_assert_write_locked(struct vm_area_struct *); -static inline void vma_mark_detached(struct vm_area_struct *vma, bool deta= ched) +static inline void vma_mark_detached(struct vm_area_struct *vma) { /* When detaching vma should be write-locked */ - if (detached) - vma_assert_write_locked(vma); - vma->detached =3D detached; + vma_assert_write_locked(vma); + vma->detached =3D true; } =20 extern const struct vm_operations_struct vma_dummy_vm_ops; @@ -484,7 +488,8 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_mark_detached(vma, false); + /* vma is not locked, can't use vma_mark_detached() */ + vma->detached =3D true; vma_lock_init(vma); } =20 @@ -510,6 +515,8 @@ static inline struct vm_area_struct *vm_area_dup(struct= vm_area_struct *orig) memcpy(new, orig, sizeof(*new)); vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); + /* vma is not locked, can't use vma_mark_detached() */ + new->detached =3D true; =20 return new; } --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B410209F53 for ; Mon, 16 Dec 2024 19:24:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377073; cv=none; b=hyRV74q8aFaeg+loecwZDrJPiVY7pPTCu7G8IM39oo+nAI40KL7BKkI3Ap+72RTOa160tRUb5fLlYKQzCtfgPTQPlwmDHPH9yCUzeWPv8sclsmcM+eSFPai47nGgSYaTLA8y5IMDt7tpnbdvZhcqgK91mSH2r1ynplWxaA75DHE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377073; c=relaxed/simple; bh=49beW2sXDUl+AO8f/HtB0hh16LO58j92QfMcV5g84FE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YUXAdPwpj1Qv+bRY0deZ27Wmx4gj7aXSICYb/UwP6aOjHrF5XwXznuErBg6Cm834AiN6fiE3QOxUt41VhkgJemQW3jOWjRPEREeNwKqZAYjz4P58OQSbR1pLuidG4eZ5t5vYbcFp+iLHzPrd8xYOWQ2O8entMSBNIearppgumeI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Wec0Lhzt; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Wec0Lhzt" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-728e3ae8112so3419475b3a.0 for ; Mon, 16 Dec 2024 11:24:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377071; x=1734981871; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/Ia4skJ0dxk8cYc3IKUIPhlZF9heDgJL08xGmdJ5E3s=; b=Wec0Lhztpt5nC8HzvM5gTFs02RDCMsa2sE3vtUdLVnpFsTcyswh9eFRxlW8LVgTr7s JWNjxK375vqgbThc7E/CmmR93/0LvJUFeAWX2C6j/rBOsl9v/wQn2wLjai9unVbM8ovx 2N4/9g0Ay/u+8qjx0n6NBZ6pijc87PJAl4nhmWgVvLGY0BmVUek6fWtMXvaPZ7+wwrSA T2U3drZwahFF/MEk1TIWtP5l3uCIY6uelaz4iGijmmzRSmByOKnnzM4cNL2l6ybYCS4w jG+mIDCUl3oZv4yIWQPMGXqA1r5QDiJb96/mapTUsCCjO35b0756Typyld6QyGyHFrXE /8Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377071; x=1734981871; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/Ia4skJ0dxk8cYc3IKUIPhlZF9heDgJL08xGmdJ5E3s=; b=s5xfc0DzLoWV2CLxtLlnJ7OOqrAcC4RHvJLAmrgxAcPKEsKrPR4mRzqrsYRsllVOPV 6UjQXeqokUHic9Ac7Sa/D45kAiF17TFa9wcEaY0vS76+9Cs7+qvrJxIUl53hzKk9ztTG fnjxfM5V9eUU18QU7ofOGovWIzTclzdD7FXZHFrgk4GoDxULqXlPlW3AWdgp2i5B7pDo V9T2cbTyI6bpH8EQwuaefCdZYWN5W4YXtdiZYUXDhH2aODdhz33s3bw0X1bw2e7kwysJ CuyOZJUbSLPBE+nHxdp527+/PQcpX6nxhTfmNy9ZOzK8L0IkUhd7lq8hv/ULys9rOmV1 29hg== X-Forwarded-Encrypted: i=1; AJvYcCXdCrGU3SGAZbQhYVeAx3++VKzAKystP5dzSpddHhgmwgvM9X3WtPxJLFqsyI4IzpoanlI+ns9n/691qe4=@vger.kernel.org X-Gm-Message-State: AOJu0Ywc97c/5lEsZNdGxDo6dHntYzf8YQ9QJkdgoX18rr4/9PvknILN qwdtUtQPRQ+X31avotfmRmZehgyOJ4O0Tzh1Ptm2NLmxQOsAFfyYL66HxxFEqSTNMLrTWLYQrlO KyA== X-Google-Smtp-Source: AGHT+IGblgxI1ekRLHe39odjzDuWa2GhKIvnP9JXJ082wXSRcQoL744G5O8IU0+gMQyAryC3i1beb9Fb4As= X-Received: from pfbci13.prod.google.com ([2002:a05:6a00:28cd:b0:725:ec78:5008]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:6c8c:b0:728:b601:86ee with SMTP id d2e1a72fcca58-7290c24330fmr18435707b3a.16.1734377071613; Mon, 16 Dec 2024 11:24:31 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:07 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-5-surenb@google.com> Subject: [PATCH v6 04/16] mm/nommu: fix the last places where vma is not locked before being attached From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nommu configuration has two places where vma gets attached to the vma tree without write-locking it. Add the missing locks to ensure vma is always locked before it's attached. Signed-off-by: Suren Baghdasaryan --- mm/nommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/nommu.c b/mm/nommu.c index 9cb6e99215e2..248392ef4048 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1189,6 +1189,7 @@ unsigned long do_mmap(struct file *file, goto error_just_free; =20 setup_vma_to_mm(vma, current->mm); + vma_start_write(vma); current->mm->map_count++; /* add the VMA to the tree */ vma_iter_store(&vmi, vma); @@ -1356,6 +1357,7 @@ static int split_vma(struct vma_iterator *vmi, struct= vm_area_struct *vma, =20 setup_vma_to_mm(vma, mm); setup_vma_to_mm(new, mm); + vma_start_write(new); vma_iter_store(vmi, new); mm->map_count++; return 0; --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56BBB20A5DB for ; Mon, 16 Dec 2024 19:24:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377075; cv=none; b=Y8q0Bcwx0LeE8/ajg/N/q0XLk5rv6GyF1FpydWS+nbpLCpattXJ0ht9tFqlidRcsV9Z7CXoRGIMRLZU77issjdovgowTavsWQqT/SN0M5xo1gHeLbLWwbxl4xwsjdNNvmRaiKdoayXXanmxa1csELwKyMI9SQb9XzGePUjFST0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377075; c=relaxed/simple; bh=x8qaFP1AxURYpa5oiGFQli55lZvdLlf3Vb+7T69X9+M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VSJ7Mp5Do6NoIIR5vQfbbW66XvrfFzWrizKH/vmySMwaAdq9AoHwjK1vHdqGvzcx4YkIq+ZgoiZ9v5nmEPfk8xfusxvCnZviL9if6HoEc/gfONnGlhsqPpA+CKLx35XAUlOzoZhYhO4DuQp/p2mehSx+qW90MFUlIpnV++an4Cw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fAImfv8N; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fAImfv8N" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2166f9f52fbso64303045ad.2 for ; Mon, 16 Dec 2024 11:24:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377074; x=1734981874; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0ykFNL3Q77su3l1hmkCa41V3LHYM0KyunH48W8conCI=; b=fAImfv8NrNn7eSCKI2R9L4Tkc+vS/Xe7mSmJtvKPBuQC7GruIDFWuRDL8s4gsUMyHn W/puA+dERxHsoYr7hnXJRYT73sHuZk/S/wulOa3w64/FTxhhvCCQ42v3xA9wvwhB/eMK 42tKUaf7LbYYn2QiU2fKOTzDK9kvk/ZVWHyUPpxDIREIYPqXF130WmB2HOKR1DgPH3a3 oqVP8jLkk3tG4q8P/NiSKIQFuKeeRxCJ9ZqSoM1zsXebR46OGse+dJqtwqOxQHZXwhSK NYX8OABbGmYPPzdurt9wnZSgWdFWcnum4BzHPadd6iE5s8irOCvDk17I0lItx0yvhoXE OZYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377074; x=1734981874; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0ykFNL3Q77su3l1hmkCa41V3LHYM0KyunH48W8conCI=; b=wd/muroUqpOSBKpt3oZA1ES8zW69QtL1C+zuSxy+7ir9JvqvXqDc4yrf3Ac69d3jyM ysmhoJXeX5hK73rKj/DNp9xc0Jl8C6KF8WZiWXNL/zbncJOyGfYhoL2hsIjtNeaBxzSC iHMmBXEecR8+JUsrT6WMtuoYmewsuQ1mJZWacu/vhYo7Ay8T9FmJ4f0IsZuIBJkDePJX ZOP9VDRHJ+ry007AiqSzuRatfrNCKCs/JcuhlNGCaM57PTorIe/N7nzHNK47P1wG9VDW a6EG6VPoY2rdOm3EucywPw2/RDXCmFcGd99Vg7HVGWDJwE8dUhNYvlau0m5phWrMPbGX VtwQ== X-Forwarded-Encrypted: i=1; AJvYcCXnOZf4qoMOr+nFFdhhNaOK6IzjqTs3sXPsx5E4J5zi3aIyGFoMkWypy7Tj1/t8L1HGGQ4FaknYlfvBaCg=@vger.kernel.org X-Gm-Message-State: AOJu0YybpRBE/4GFLZOSC2ny0Yd34GdByw1BxbzN5zlyEVUtEc+mKTXQ qaFtOd5W2a70r+1PmRGG2S4lMPkXH43b7iY6z5L7O7girPKBRK8zx74b9+QwEuLKoE37WUvc03D gJg== X-Google-Smtp-Source: AGHT+IEVw0uJl7cLZP7L5C6hpMFvqcbjARbisNWJ3VdxfhR00byT5nuQon+Itb1c7QrUoZSJH5F454tTzFQ= X-Received: from pgbel4.prod.google.com ([2002:a05:6a02:4644:b0:7fd:40dd:86a5]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e80d:b0:215:b5d6:5fa8 with SMTP id d9443c01a7336-21892a441admr219433885ad.22.1734377073676; Mon, 16 Dec 2024 11:24:33 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:08 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-6-surenb@google.com> Subject: [PATCH v6 05/16] types: move struct rcuwait into types.h From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move rcuwait struct definition into types.h so that rcuwait can be used without including rcuwait.h which includes other headers. Without this change mm_types.h can't use rcuwait due to a the following circular dependency: mm_types.h -> rcuwait.h -> signal.h -> mm_types.h Suggested-by: Matthew Wilcox Signed-off-by: Suren Baghdasaryan --- include/linux/rcuwait.h | 13 +------------ include/linux/types.h | 12 ++++++++++++ 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h index 27343424225c..9ad134a04b41 100644 --- a/include/linux/rcuwait.h +++ b/include/linux/rcuwait.h @@ -4,18 +4,7 @@ =20 #include #include - -/* - * rcuwait provides a way of blocking and waking up a single - * task in an rcu-safe manner. - * - * The only time @task is non-nil is when a user is blocked (or - * checking if it needs to) on a condition, and reset as soon as we - * know that the condition has succeeded and are awoken. - */ -struct rcuwait { - struct task_struct __rcu *task; -}; +#include =20 #define __RCUWAIT_INITIALIZER(name) \ { .task =3D NULL, } diff --git a/include/linux/types.h b/include/linux/types.h index 2d7b9ae8714c..f1356a9a5730 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -248,5 +248,17 @@ typedef void (*swap_func_t)(void *a, void *b, int size= ); typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv= ); typedef int (*cmp_func_t)(const void *a, const void *b); =20 +/* + * rcuwait provides a way of blocking and waking up a single + * task in an rcu-safe manner. + * + * The only time @task is non-nil is when a user is blocked (or + * checking if it needs to) on a condition, and reset as soon as we + * know that the condition has succeeded and are awoken. + */ +struct rcuwait { + struct task_struct __rcu *task; +}; + #endif /* __ASSEMBLY__ */ #endif /* _LINUX_TYPES_H */ --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9274E20ADEC for ; Mon, 16 Dec 2024 19:24:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377078; cv=none; b=kHGbmGfglGjb03Sw+C7dPRpa5rL7WG7UAoO16gbRdtjy58t9Nl/lCkxVeoLZQgWdt/k7qs75Mi61LJdsgxPx4j43xK7/5iavNWH+Uc8IRIwF5T5XLlRKcvhQMkdzqozB3gOr1eukmyuTKMoTJ2JP0+7nhHab/yHux5++IZWjtkc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377078; c=relaxed/simple; bh=xnU2+nk7SpTe6JV+i63325HLgR21aNoQ+OOoHR3eG1s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=l4n0vB//+X2Mf85BG0mZ748vTIbRwac0G7c9AeO1l3bPTToO5DKe8lj5Rq341KgnZTW5owdjzDIijZMyOgGuct6mzlkYVAEhnA7pEuZsVdntVzwf2gdfZrtj9ns+PPJnbr7/TZcV3EAJ99G6RKF0TjmC9Csc39+a8Mrf0WbEdIA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=a4oaNIN0; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="a4oaNIN0" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7fbbb61a67cso2659267a12.3 for ; Mon, 16 Dec 2024 11:24:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377076; x=1734981876; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1gkuZWwQBkEGAXVOS6gJMPBIIwpS2AmmvzoNNuLoLbs=; b=a4oaNIN0Hm/6mh34emTGvwo7Fi8sgKg7GNsvt/OyBeG6gO5aQVSLQg9GhZR6DMbemw sCtn+hlQDuCBtYqnZ8NNViTE7FBXQG5RYYpvSpmp2ARbW+vIACqDUBWvpOV9OoAREPGS ba3BSx2/KkGBspNf3QtAsLBDor04WQyWQttpfKdv9osX1gAMuPfW9BR1Hyw10d/YfxFm 7mBYht/k49ZQgApfR9DR4so2zi4NBvrn1N2AXoduagINDh7PpT2UOVDgNC/1pNhkt7mH 58mNiJpWDuvfh/id5IpaZWyM1AEsrNHz5Yw677umBpc/cqbn3hwGzBuElFP9U0NkucQZ IONQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377076; x=1734981876; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1gkuZWwQBkEGAXVOS6gJMPBIIwpS2AmmvzoNNuLoLbs=; b=Hm3YEtWgTXWbr7YhHsvYD+FiA6N9DYkkKapUXXex0ecg4t+1WkRCCAXtLSZ3Smxfq+ 5Wm1+pvCgsNhUUHsepcKjgEI4M29/UweG0qml22RpMkd+4FC4rFJJy4VYTGGIdV+YJUk J4gllc7yoRDYwNcEWD1r4RhVEGnHOY2uORBcQ37tUGVQMUW6o1gUK7uD/wLlsIcmAPkQ yPfq1VT9/MRR7vvXIOJafGkWKbaQab2cyPuiT0M+k1NEntwYIEDP5vUNiH1Vqz+iXmYP Z1oBqeRhN5bQp0dtfYCKOusyRoGuEbI6ADw0zgJsw4LL/vfy4fPSFzeZ1JAgpQkNolAf BYWQ== X-Forwarded-Encrypted: i=1; AJvYcCW7u0yzyYlUA/ObzfSJnQVaJ+fXs6TZ4QtLcI8ARMJ+SGWTG689R3RiS5IB2cjAMuvO6fh+fFuD7GjW0RA=@vger.kernel.org X-Gm-Message-State: AOJu0Yw+HAc6veoFy1ajRYBFhPfScsVyPi3RKiHQFnoZ+B6Hbwwdcw/1 g2r1N+D5il3dlu54zlXlsEFXxcdinZJ7R6a5SgUEq+Udyf/y5WQotG6k3hNfM8CkaJZM5U3dXha XuA== X-Google-Smtp-Source: AGHT+IFRpgE3svFkQGxxP8t9jtfuJXCqP6DbanV+kjCkw5VL6vSIPipGx+mrpM9/g8/yEqYCZ44EecENvBs= X-Received: from pjbsp4.prod.google.com ([2002:a17:90b:52c4:b0:2ee:4679:4a6b]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2744:b0:2ef:33a4:ae6e with SMTP id 98e67ed59e1d1-2f2d7d9fd46mr1314965a91.12.1734377075865; Mon, 16 Dec 2024 11:24:35 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:09 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-7-surenb@google.com> Subject: [PATCH v6 06/16] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With upcoming replacement of vm_lock with vm_refcnt, we need to handle a possibility of vma_start_read_locked/vma_start_read_locked_nested failing due to refcount overflow. Prepare for such possibility by changing these APIs and adjusting their users. Signed-off-by: Suren Baghdasaryan Cc: Lokesh Gidra --- include/linux/mm.h | 6 ++++-- mm/userfaultfd.c | 17 ++++++++++++----- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 689f5a1e2181..0ecd321c50b7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -747,10 +747,11 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) * not be used in such cases because it might fail due to mm_lock_seq over= flow. * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. */ -static inline void vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) +static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) { mmap_assert_locked(vma->vm_mm); down_read_nested(&vma->vm_lock.lock, subclass); + return true; } =20 /* @@ -759,10 +760,11 @@ static inline void vma_start_read_locked_nested(struc= t vm_area_struct *vma, int * not be used in such cases because it might fail due to mm_lock_seq over= flow. * This functionality is used to obtain vma read lock and drop the mmap re= ad lock. */ -static inline void vma_start_read_locked(struct vm_area_struct *vma) +static inline bool vma_start_read_locked(struct vm_area_struct *vma) { mmap_assert_locked(vma->vm_mm); down_read(&vma->vm_lock.lock); + return true; } =20 static inline void vma_end_read(struct vm_area_struct *vma) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index bc9a66ec6a6e..79e8ae676f75 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -85,7 +85,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_str= uct *mm, mmap_read_lock(mm); vma =3D find_vma_and_prepare_anon(mm, address); if (!IS_ERR(vma)) - vma_start_read_locked(vma); + if (!vma_start_read_locked(vma)) + vma =3D ERR_PTR(-EAGAIN); =20 mmap_read_unlock(mm); return vma; @@ -1483,10 +1484,16 @@ static int uffd_move_lock(struct mm_struct *mm, mmap_read_lock(mm); err =3D find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap); if (!err) { - vma_start_read_locked(*dst_vmap); - if (*dst_vmap !=3D *src_vmap) - vma_start_read_locked_nested(*src_vmap, - SINGLE_DEPTH_NESTING); + if (!vma_start_read_locked(*dst_vmap)) { + if (*dst_vmap !=3D *src_vmap) { + if (!vma_start_read_locked_nested(*src_vmap, + SINGLE_DEPTH_NESTING)) { + vma_end_read(*dst_vmap); + err =3D -EAGAIN; + } + } + } else + err =3D -EAGAIN; } mmap_read_unlock(mm); return err; --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B517620B1F5 for ; Mon, 16 Dec 2024 19:24:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377080; cv=none; b=R7VqfT0IR5ATN8/gGfbRB3PuFLuMrgpR2AHIrYOINuhvKOxgIvLBZvEe5aT+fPbn72fzN+/o43hYGU8bDP0Wxtq7A/MYd2FZpZo9JO78cgZcwhsKIxQJolW0LHQ8hP0E0hGu+l66fE0vWURz3Mrwh2v9ouu85JCiu6hNA0LMAok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377080; c=relaxed/simple; bh=UPkBiQCnTHMVc7aRHLgwfaCDlFkcN0QzJjPCmja7pxs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ti0woS5ChlbQyLoqspS2FZ+v7Jzf+tR0zx0BwDkAEbDWMbtZAhujJPy9H89uwF1HOirgogmKMk3HRJV+t/eY9DHQ1wDupIiL68fDL8B5yyStcOjHh/xGs8sJxnENdRYs0/PQ4T1XO/PbgHcR4Gehy7EHOH80WKcJWOFmwDlo9/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r7iDGO8O; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r7iDGO8O" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-216717543b7so70323105ad.0 for ; Mon, 16 Dec 2024 11:24:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377078; x=1734981878; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=V23hsekxf1oBAjhtCt33Ew0gdChPa6B6f0O6pWk1TGI=; b=r7iDGO8OxOk1nCb7sQbu4njndA3P3nKUXetv+8DO0ODk/H8RX/ImnPJ4RqUwRNui9R 48H7YoX3bZuR6eSUFFVEA3XRhqz/qKDpG0MSsC2FDvlgX8YAz0pbVH8NbSlvNb9JtBHs yUj5AGWCukUevS3OoiRvZ1trk5DGUi5wpzY++0Ke02rvHGm9KQevRimIfkPO2ytYKsbG bh0orfMvK8nyj8Owf7C0Ol/u0kw+LlBcm1zcasz2Xyf7Ls9FkQTH5QVPvbIuHnF1VlWO H8sL/9LrLyXEWCcfJdS5Q61e6CMkayKhA0leE14A5diNA2YHTEC5l+FVVHgo20/gDGNu a17Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377078; x=1734981878; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=V23hsekxf1oBAjhtCt33Ew0gdChPa6B6f0O6pWk1TGI=; b=QKyY0bnrAtWubxkF5UtVLDkkVOcNTlq6RCPUj1VEuoHpA5uyO1mOUmjJ31un6EmKqo 3mg7/RL+vac5grceD465h93Nu1Vbl1fFCA9lmQI+GvdHc8GQt73DbBxumW3aJCcv96Pk 0chdTxjiS06i1YvMwlk33/W89ivTHj0Bl+olTxK7bEAyH5yMWp+pFRUJUsAzp3oTXlw5 4hJaNK/05BoVjBae1fVh/gHzjAD81SsfEEsgUZN8F5s5Kx100fv0yKz5GFdZ9nIe5ZU+ 5ySfdFCyXzQth4U8dD6OjOm6kEiotKmiw0F8GKyDhjdHAiQdyu0ZebPI5SDaOP9vbA6H /LfQ== X-Forwarded-Encrypted: i=1; AJvYcCXAvJlogkFwXOT080Lwe6tyb02cThu0HAtBUWOI/RePEY04DEqLYB5SNLMl0XpzNwjeqtTZIiQUk9dyRtU=@vger.kernel.org X-Gm-Message-State: AOJu0Yx3BRzEy5eNUs2WCiROYsUoLclF6VVRQo+yt3UwjlWrrSUVEEEm AZl24ejD35C0qwGwz2qVJxgXaTd85GEF0xxnvE0uJszKhpCuCaIrn2XK2wYiIrK6fCD+joeWseu Mvg== X-Google-Smtp-Source: AGHT+IGOYBhQK/OMn61IKHhYYRKFLZqMwRmmFqwVG/Z+7upwVvTEn0G50CkD9wYvokG+WIRwbcBuoK+avFE= X-Received: from pjboi5.prod.google.com ([2002:a17:90b:3a05:b0:2ee:53fe:d0fc]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:cec9:b0:215:4450:54fb with SMTP id d9443c01a7336-21892a5c02fmr173665245ad.55.1734377077933; Mon, 16 Dec 2024 11:24:37 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:10 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-8-surenb@google.com> Subject: [PATCH v6 07/16] mm: move mmap_init_lock() out of the header file From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mmap_init_lock() is used only from mm_init() in fork.c, therefore it does not have to reside in the header file. This move lets us avoid including additional headers in mmap_lock.h later, when mmap_init_lock() needs to initialize rcuwait object. Signed-off-by: Suren Baghdasaryan --- include/linux/mmap_lock.h | 6 ------ kernel/fork.c | 6 ++++++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 45a21faa3ff6..4706c6769902 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -122,12 +122,6 @@ static inline bool mmap_lock_speculate_retry(struct mm= _struct *mm, unsigned int =20 #endif /* CONFIG_PER_VMA_LOCK */ =20 -static inline void mmap_init_lock(struct mm_struct *mm) -{ - init_rwsem(&mm->mmap_lock); - mm_lock_seqcount_init(mm); -} - static inline void mmap_write_lock(struct mm_struct *mm) { __mmap_lock_trace_start_locking(mm, true); diff --git a/kernel/fork.c b/kernel/fork.c index 57dc5b935f79..8cb19c23e892 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1224,6 +1224,12 @@ static void mm_init_uprobes_state(struct mm_struct *= mm) #endif } =20 +static inline void mmap_init_lock(struct mm_struct *mm) +{ + init_rwsem(&mm->mmap_lock); + mm_lock_seqcount_init(mm); +} + static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, struct user_namespace *user_ns) { --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA30820B7ED for ; Mon, 16 Dec 2024 19:24:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377082; cv=none; b=Dm1Z/bfc/SenPm0rzuE07fayl2r0yG+VHgQKthUJA0aAb+e9sVgmToJ4aqu2XfoD0CgCTaqkp3VcdtlfPgO44zUN+fsB12qp0YNVrRHw5aUnk0QWoeuTiUggtbJJNBP6I4AhYzT2tmPdptXQtXk2zFie7fBHpcAJM8et+W4gBzM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377082; c=relaxed/simple; bh=4HVgEs+Z5egjLoYt8HIyu4EWui7/xkUMkpUi/DbY8o0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MYq9msLd2FdReKxJafCN+BvNHkGs1lnzd9plLo2/MMKrGU150+Zg8bwKjV5kdTOG3usIb7+gTwoK2fi7hU89j/jEzeZJmxmLdw9yrKyD2/gC2fqHP46UrQrl628egwbURVpWL3fgRQqdENrkXF4UYmWBAJ06jinb3PZgoSjuYNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Zl0PUbgR; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Zl0PUbgR" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-725e6fe074eso6614075b3a.0 for ; Mon, 16 Dec 2024 11:24:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377080; x=1734981880; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IlSIABx/zxxF1ijBMV6mDoRgzimHB9QssQ8SIetRkVU=; b=Zl0PUbgRvff4heTsRClAa4GLDd5rS/Gc97JTiiC0xEoMBlPxtL6ETeJeJERUXobOY5 TIRa3E9v/F1s1RcOLTEwNFK9rFGPkf+Z5CdYiM3rxwJhReY+BN6O3zqoxD5ZzgbIFBW/ Xofn9BNGYEhQyqkeCsD0z1XlGrDrxgAihEmJFd3RbohRtPJ3RdjQzsOsVOkPqtch35Nx VUPfTVekH7Q5lvGVR/p8ivmtiV0Z9zCNSCfboglU//M1S4zV4Xe8/mMPDnxzo9LsYw0V /fKkrPV23ml0Q++kWdG1+W6NOi9GRGQT/i4rFVt2QB9Eg65fw4EErECQSCKXXu7gq4gA 8edA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377080; x=1734981880; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IlSIABx/zxxF1ijBMV6mDoRgzimHB9QssQ8SIetRkVU=; b=d8pBWT5cEQX4gq0OqOV6gOlJostL5j0nC8y6P7iXROwY5B92cWaFUUoVGfpkLphzFE bTuBOK1C1zsOb4DlRQpdALzvJNFq0F3Tlg8w9+ojXkDp/ZvHQyCQHdBW6SuJamf1dN0z PpGhvf2sPG5Q54eKhpxOpWwBUKaEkhmelju0Qli9PGtjcP3ywxEfiS9mz2cj6nCcUzss otX39zQ+3lk7Z3hEScRXHFg8//bGPOy67Ohpt4CDJBb9UjbVNN+md3VfCmIsZO0Ruigz VbVVVv7QTStG3/jz2Pg6ZTfhm/5Q2/e8UQURO6ixDGP+xrTlgZ1MNnsviyYx5EKDBLEo uvIA== X-Forwarded-Encrypted: i=1; AJvYcCWyE1ZQ2LTp771PS6756xEGPLR6xWnCtlemM7KZXnf8CxeTQBe2XWs4BFT6uh1wpuQJNh4a0rU5Jd/uxGI=@vger.kernel.org X-Gm-Message-State: AOJu0YwSnMhyDYsA5vVpDfKlk+wuMhoIfT+17AypCYuECIO6jq8Qxkwk f5Q25g0+hpc9sBTGAd2rV2OKGG6lsG9XL9OTkKi0ma3UTHD/4OpJK42FzlRA1HUOnurr8rP/Aal XGw== X-Google-Smtp-Source: AGHT+IFInU2/c8WUI1v0xjpwtORb6r7NqVkTkq8OC0u0JtieghImrnI2AnpvFTBaqoW2DCB2chLg/yy/KrA= X-Received: from pfblh8.prod.google.com ([2002:a05:6a00:7108:b0:728:f1bf:72ad]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:9c8d:b0:1dc:790e:3bd0 with SMTP id adf61e73a8af0-1e1dfd3dbb6mr22228664637.15.1734377079970; Mon, 16 Dec 2024 11:24:39 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:11 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-9-surenb@google.com> Subject: [PATCH v6 08/16] mm: uninline the main body of vma_start_write() From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_start_write() is used in many places and will grow in size very soon. It is not used in performance critical paths and uninlining it should limit the future code size growth. No functional changes. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 12 +++--------- mm/memory.c | 14 ++++++++++++++ 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0ecd321c50b7..ccb8f2afeca8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -787,6 +787,8 @@ static bool __is_vma_write_locked(struct vm_area_struct= *vma, unsigned int *mm_l return (vma->vm_lock_seq =3D=3D *mm_lock_seq); } =20 +void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q); + /* * Begin writing to a VMA. * Exclude concurrent readers under the per-VMA lock until the currently @@ -799,15 +801,7 @@ static inline void vma_start_write(struct vm_area_stru= ct *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; =20 - down_write(&vma->vm_lock.lock); - /* - * We should use WRITE_ONCE() here because we can have concurrent reads - * from the early lockless pessimistic check in vma_start_read(). - * We don't really care about the correctness of that early check, but - * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. - */ - WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock.lock); + __vma_start_write(vma, mm_lock_seq); } =20 static inline void vma_assert_write_locked(struct vm_area_struct *vma) diff --git a/mm/memory.c b/mm/memory.c index cc7159aef918..c6356ea703d8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6329,6 +6329,20 @@ struct vm_area_struct *lock_mm_and_find_vma(struct m= m_struct *mm, #endif =20 #ifdef CONFIG_PER_VMA_LOCK +void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q) +{ + down_write(&vma->vm_lock.lock); + /* + * We should use WRITE_ONCE() here because we can have concurrent reads + * from the early lockless pessimistic check in vma_start_read(). + * We don't really care about the correctness of that early check, but + * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. + */ + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); + up_write(&vma->vm_lock.lock); +} +EXPORT_SYMBOL_GPL(__vma_start_write); + /* * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed = to be * stable and not isolated. If the VMA is not found or is being modified t= he --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAE9D20C00B for ; Mon, 16 Dec 2024 19:24:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377084; cv=none; b=VnUVwfpC0J7QwCBYi5hbViyGQPtguCxEdjflPktf+cZmo4zwisztY82Bwqo5+CxzF2nFs/jJO/lOWjxRuehJc9nZTAckXvCtQuKrqfBIyxjibAHw09+YwdHBWPb1v5IKue8dDktoPMTGVmTsWPa0C/1MTDdnFFCW6urSeoZAJfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377084; c=relaxed/simple; bh=slEyDSyOmggj5mJJINjjr/YocKsmeusN6XkbngdxcOw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XVxfHLzUBeYbnozinR5vfJni7oLIY7I0xAGOwmtakXlZ5RlAb17x5AF2xJTNa0o8DL5pNk2wYVAg1pf1qSmGr4BwENzI0fHbGMK4LyHSs2ni12UQTL0yHbMI+dYpcxQclxcYyWV5Z7nt7Prkeoqz/xz8iWcsT4zKP52YoKpViFs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=G1zLWh1U; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="G1zLWh1U" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef9b9981f1so6675914a91.3 for ; Mon, 16 Dec 2024 11:24:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377082; x=1734981882; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PJyEpzu4DU8rJRmq3ow3hqlNy2LMWwwp3HWnjTLdNPo=; b=G1zLWh1UH/CfwHm+0Ij2DjpAsoptAMaKm7hv7WFaV6EcwkIrLbnQEWiMPvN3afu7VA 9rCE8H3PJIDq5Llbc5WxDZeHMwBMkKA6weIARvQVeoAdYmil13sM8IUGu2u6/2oqYQUN 1S5uvbOT/CoIgu7bwBFL6QWGB1pRnPbBNWvrN4KBQqHHa6L1lHxyNNtmGAYw3h19wkTk T+EsMGfMvpXuVlL+b+Rv+21hRhi0LD84U5jFFDMWTx5Pa4LHYCUdAuaf0Lrw+iIanW3x bk9mZMOHCMvmir4CDf3QV7rTp4XsEhtI/Qcu/55Q2+yx0aHDY0WcI4JuVuyvwND2stA4 Mxpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377082; x=1734981882; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PJyEpzu4DU8rJRmq3ow3hqlNy2LMWwwp3HWnjTLdNPo=; b=LKz9soJ7iVODwWWt8Lqq+mkxM3rlXCvBKiJquzIfsTnL5ldmEvikVH7iF6n6jhfaew SihddWHKV+wwamYrU/9I1880DTCogJ7CblZbg6W2Y0Fz9DZjD4n3TdNpvWq+xX/DlEkY slKMy/W5Kpo7qnaHM5FBeBm01a+A2s89o2MHsDOCedrE+Ro1UTzoyWXNYJ4RrUQV23Y1 sy5ptWyGTXD+QR9YGN7pulyVHWOQdqG1zpGMBY/ZAjBJCGKveqYd+tYt33+1BvE7o91p biyBuOIIeA3xEJDWpnSw9oss68+AWJNktc2+8F47FsR6ELMUV1lrvPrbOmT+YtMVBODE dDuA== X-Forwarded-Encrypted: i=1; AJvYcCWjgTQSnOD+Dj0XzbrffB+z9nH+q2ou/2zcfoHK0m69gH2l2kgdb97mP2YjVUiMBL4qelMKUCBcEAH1iZw=@vger.kernel.org X-Gm-Message-State: AOJu0YwG9RcRsfhhtfzhTG0fbO1iPAAL7bYLJL71WebiYPwK8cTFuc5k ZApy2Bb2k0ejWDhuyPP28s9p2gI6vwBT0DDs15/xnUPTm4+E8I8SUtyUCMjGU75YD0oAv6e95mg JCQ== X-Google-Smtp-Source: AGHT+IGSC/WDrL2DnP38qBt3Yi0DRZx0zLMWebStlpO2MkkwMeYlYALEb8oRG5sXg1v7sD7oiSjMQ3XiMF4= X-Received: from pjboi16.prod.google.com ([2002:a17:90b:3a10:b0:2e2:9021:cf53]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3c4a:b0:2ee:7233:4e8c with SMTP id 98e67ed59e1d1-2f28fb522d7mr20621018a91.8.1734377082197; Mon, 16 Dec 2024 11:24:42 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:12 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-10-surenb@google.com> Subject: [PATCH v6 09/16] refcount: introduce __refcount_{add|inc}_not_zero_limited From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce functions to increase refcount but with a top limit above which they will fail to increase. Setting the limit to 0 indicates no limit. Signed-off-by: Suren Baghdasaryan --- include/linux/refcount.h | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/include/linux/refcount.h b/include/linux/refcount.h index 35f039ecb272..e51a49179307 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -137,13 +137,19 @@ static inline unsigned int refcount_read(const refcou= nt_t *r) } =20 static inline __must_check __signed_wrap -bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp) +bool __refcount_add_not_zero_limited(int i, refcount_t *r, int *oldp, + int limit) { int old =3D refcount_read(r); =20 do { if (!old) break; + if (limit && old + i > limit) { + if (oldp) + *oldp =3D old; + return false; + } } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i)); =20 if (oldp) @@ -155,6 +161,12 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int= *oldp) return old; } =20 +static inline __must_check __signed_wrap +bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp) +{ + return __refcount_add_not_zero_limited(i, r, oldp, 0); +} + /** * refcount_add_not_zero - add a value to a refcount unless it is 0 * @i: the value to add to the refcount @@ -213,6 +225,12 @@ static inline void refcount_add(int i, refcount_t *r) __refcount_add(i, r, NULL); } =20 +static inline __must_check bool __refcount_inc_not_zero_limited(refcount_t= *r, + int *oldp, int limit) +{ + return __refcount_add_not_zero_limited(1, r, oldp, limit); +} + static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int= *oldp) { return __refcount_add_not_zero(1, r, oldp); --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E59F320C03E for ; Mon, 16 Dec 2024 19:24:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377086; cv=none; b=WKyVZSUXEmELt6M275rK2qq42fJVfGKeStXW6DRnPEtRjgBqNI53Vl6PDClylQlVrTt7xjttf0dEa/zqKXMHJFgcIF2QVmPws14P8VdnwOsd/c1+n8DbWNv9sx2VE7OFmZoHfZ9N+ZpiHli78S0edwHvibFP+c/qqJ67F6LWCCY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377086; c=relaxed/simple; bh=ZJJNm2gxTiBYukTk0g+FcnabqsE7DubybHOIkBFTfc0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MsddDx8eYbxkbccHhQ73WPP2bhRrSU+X/ZHvgK+oKJmSqKehr996hHK9Ae0NONeD24qJY7aZI9e6r3Nf2BSgA2ncb42U+Vz+A5nMR2dNfipDhJ4UWf7yi4+7WgHwQHUJu9EL+OLpim+kFSmx9WxRmp9Clirg6QemOXnXPQ5HoZk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wVdqGo9D; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wVdqGo9D" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-725e3c6ad0dso6091590b3a.0 for ; Mon, 16 Dec 2024 11:24:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377084; x=1734981884; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yWr5wuN/XFwhP0jdgiadjUA9/QQ6j9qWDWV+i5/abRA=; b=wVdqGo9DPRttr0DODFLQGI3+CfABxr6qlxi6mLRTuGvfWNBFIUUxvG7GfUgammWS74 rufGk9M466ydraAtfJ7DvfBHrJESgF9arOYpE4IRkSkhkSSvzzfXl6RruOw5yF50jc26 CQVapc8L2enxtql13kleJMU/yoZghY8YPdBOFcMLDI6TwlxGlh0S+X7b+k1HAtx0FFp9 6VtA7P7/tnlqTYWKANcSyi1gyeBdAVyR62XYtmuRhfUnddEuODtsBUP7XEzUTJmB+kzV WXS3Fv/upg0CN4tXGm1PHoMq6Z8jac1gxJ66826TAwJpv1pstwpAB35HHUObOChDrGM/ DQAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377084; x=1734981884; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yWr5wuN/XFwhP0jdgiadjUA9/QQ6j9qWDWV+i5/abRA=; b=tCHgD3EHHOxstUqgl8yTqa/nKaLv9iQQxQUEhdM3ylwnyW3Di1VogxcrZVAzvovaqj V1374OdrQ+fDx/OYT0kfhU4/Z6/HF2SlQEpOL0l2aDKCG+0Xo1ZJ5sER4SB58nm4q854 RpfnZ1i/qK5mnqZdczxBF9SkZvpgz1du2HZjhYLJw+zHmvFo1YZdm/8SmgHkCbv7CWcR 3aSvc6k56XyzTIo66U5ElVeJGcIkqGspWbtNERXfgCyG5n4at/y1qyF5/h6vpUmuKwN/ cY9uBLAF6Ub1ylJZp4ZYyBV5F5aDw5eLtSz4QPnRQYNO/cg4ZckirHk/5UC85h8ZpO/5 gh6A== X-Forwarded-Encrypted: i=1; AJvYcCWh5k5yWQtydl1giKvUSPL3l84F2m9vVNy8p5NbZAaajQyqOKLdx9ek70Rqb4nWceDqWQtoqS+Qdhyc0ic=@vger.kernel.org X-Gm-Message-State: AOJu0YzzySc6ArfsDCzZ/LwDOORFp1sKZvdjbqr7l8NAEjejK8oodPTS r2cBf8mAszLFfomGoltCTOBCAOLrJSJjaAu+4T4zfPK01+ptm8lhAfZf+P67eeJGM79ql6e1I36 6Dw== X-Google-Smtp-Source: AGHT+IG2bwhxLQwIlqUIeB8kJPQ7+AcPBBfyUQeVN2TC0lLVCS5WDt3dsxGA1e6kkw4Q6iS5TtnTtrsPmuM= X-Received: from pgbdq26.prod.google.com ([2002:a05:6a02:f9a:b0:7fd:577a:6d1e]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:328d:b0:1e1:9bc1:6d6d with SMTP id adf61e73a8af0-1e1dfde67b3mr25323185637.31.1734377083981; Mon, 16 Dec 2024 11:24:43 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:13 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-11-surenb@google.com> Subject: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rw_semaphore is a sizable structure of 40 bytes and consumes considerable space for each vm_area_struct. However vma_lock has two important specifics which can be used to replace rw_semaphore with a simpler structure: 1. Readers never wait. They try to take the vma_lock and fall back to mmap_lock if that fails. 2. Only one writer at a time will ever try to write-lock a vma_lock because writers first take mmap_lock in write mode. Because of these requirements, full rw_semaphore functionality is not needed and we can replace rw_semaphore and the vma->detached flag with a refcount (vm_refcnt). When vma is in detached state, vm_refcnt is 0 and only a call to vma_mark_attached() can take it out of this state. Note that unlike before, now we enforce both vma_mark_attached() and vma_mark_detached() to be done only after vma has been write-locked. vma_mark_attached() changes vm_refcnt to 1 to indicate that it has been attached to the vma tree. When a reader takes read lock, it increments vm_refcnt, unless the top usable bit of vm_refcnt (0x40000000) is set, indicating presence of a writer. When writer takes write lock, it both increments vm_refcnt and sets the top usable bit to indicate its presence. If there are readers, writer will wait using newly introduced mm->vma_writer_wait. Since all writers take mmap_lock in write mode first, there can be only one writer at a time. The last reader to release the lock will signal the writer to wake up. refcount might overflow if there are many competing readers, in which case read-locking will fail. Readers are expected to handle such failures. Suggested-by: Peter Zijlstra Suggested-by: Matthew Wilcox Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 95 ++++++++++++++++++++++++-------- include/linux/mm_types.h | 23 ++++---- kernel/fork.c | 9 +-- mm/init-mm.c | 1 + mm/memory.c | 33 +++++++---- tools/testing/vma/linux/atomic.h | 5 ++ tools/testing/vma/vma_internal.h | 57 ++++++++++--------- 7 files changed, 147 insertions(+), 76 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ccb8f2afeca8..d9edabc385b3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -32,6 +32,7 @@ #include #include #include +#include =20 struct mempolicy; struct anon_vma; @@ -699,10 +700,27 @@ static inline void vma_numab_state_free(struct vm_are= a_struct *vma) {} #ifdef CONFIG_PER_VMA_LOCK static inline void vma_lock_init(struct vm_area_struct *vma) { - init_rwsem(&vma->vm_lock.lock); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + static struct lock_class_key lockdep_key; + + lockdep_init_map(&vma->vmlock_dep_map, "vm_lock", &lockdep_key, 0); +#endif + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED); vma->vm_lock_seq =3D UINT_MAX; } =20 +static inline void vma_refcount_put(struct vm_area_struct *vma) +{ + int refcnt; + + if (!__refcount_dec_and_test(&vma->vm_refcnt, &refcnt)) { + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); + + if (refcnt & VMA_STATE_LOCKED) + rcuwait_wake_up(&vma->vm_mm->vma_writer_wait); + } +} + /* * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to @@ -710,6 +728,8 @@ static inline void vma_lock_init(struct vm_area_struct = *vma) */ static inline bool vma_start_read(struct vm_area_struct *vma) { + int oldcnt; + /* * Check before locking. A race might cause false locked result. * We can use READ_ONCE() for the mm_lock_seq here, and don't need @@ -720,13 +740,20 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.= sequence)) return false; =20 - if (unlikely(down_read_trylock(&vma->vm_lock.lock) =3D=3D 0)) + + rwsem_acquire_read(&vma->vmlock_dep_map, 0, 0, _RET_IP_); + /* Limit at VMA_STATE_LOCKED - 2 to leave one count for a writer */ + if (unlikely(!__refcount_inc_not_zero_limited(&vma->vm_refcnt, &oldcnt, + VMA_STATE_LOCKED - 2))) { + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); return false; + } + lock_acquired(&vma->vmlock_dep_map, _RET_IP_); =20 /* - * Overflow might produce false locked result. + * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. * False unlocked result is impossible because we modify and check - * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq + * vma->vm_lock_seq under vma->vm_refcnt protection and mm->mm_lock_seq * modification invalidates all existing locks. * * We must use ACQUIRE semantics for the mm_lock_seq so that if we are @@ -734,10 +761,12 @@ static inline bool vma_start_read(struct vm_area_stru= ct *vma) * after it has been unlocked. * This pairs with RELEASE semantics in vma_end_write_all(). */ - if (unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { - up_read(&vma->vm_lock.lock); + if (oldcnt & VMA_STATE_LOCKED || + unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { + vma_refcount_put(vma); return false; } + return true; } =20 @@ -749,8 +778,17 @@ static inline bool vma_start_read(struct vm_area_struc= t *vma) */ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma= , int subclass) { + int oldcnt; + mmap_assert_locked(vma->vm_mm); - down_read_nested(&vma->vm_lock.lock, subclass); + rwsem_acquire_read(&vma->vmlock_dep_map, subclass, 0, _RET_IP_); + /* Limit at VMA_STATE_LOCKED - 2 to leave one count for a writer */ + if (unlikely(!__refcount_inc_not_zero_limited(&vma->vm_refcnt, &oldcnt, + VMA_STATE_LOCKED - 2))) { + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); + return false; + } + lock_acquired(&vma->vmlock_dep_map, _RET_IP_); return true; } =20 @@ -762,15 +800,13 @@ static inline bool vma_start_read_locked_nested(struc= t vm_area_struct *vma, int */ static inline bool vma_start_read_locked(struct vm_area_struct *vma) { - mmap_assert_locked(vma->vm_mm); - down_read(&vma->vm_lock.lock); - return true; + return vma_start_read_locked_nested(vma, 0); } =20 static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock.lock); + vma_refcount_put(vma); rcu_read_unlock(); } =20 @@ -813,25 +849,42 @@ static inline void vma_assert_write_locked(struct vm_= area_struct *vma) =20 static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock.lock)) + if (refcount_read(&vma->vm_refcnt) <=3D VMA_STATE_ATTACHED) vma_assert_write_locked(vma); } =20 -static inline void vma_mark_attached(struct vm_area_struct *vma) +/* + * WARNING: to avoid racing with vma_mark_attached(), should be called eit= her + * under mmap_write_lock or when the object has been isolated under + * mmap_write_lock, ensuring no competing writers. + */ +static inline bool is_vma_detached(struct vm_area_struct *vma) { - vma->detached =3D false; + return refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_DETACHED; } =20 -static inline void vma_mark_detached(struct vm_area_struct *vma) +static inline void vma_mark_attached(struct vm_area_struct *vma) { - /* When detaching vma should be write-locked */ vma_assert_write_locked(vma); - vma->detached =3D true; + + if (is_vma_detached(vma)) + refcount_set(&vma->vm_refcnt, VMA_STATE_ATTACHED); } =20 -static inline bool is_vma_detached(struct vm_area_struct *vma) +static inline void vma_mark_detached(struct vm_area_struct *vma) { - return vma->detached; + vma_assert_write_locked(vma); + + if (is_vma_detached(vma)) + return; + + /* We are the only writer, so no need to use vma_refcount_put(). */ + if (!refcount_dec_and_test(&vma->vm_refcnt)) { + /* + * Reader must have temporarily raised vm_refcnt but it will + * drop it without using the vma since vma is write-locked. + */ + } } =20 static inline void release_fault_lock(struct vm_fault *vmf) @@ -896,10 +949,6 @@ static inline void vma_init(struct vm_area_struct *vma= , struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); -#ifdef CONFIG_PER_VMA_LOCK - /* vma is not locked, can't use vma_mark_detached() */ - vma->detached =3D true; -#endif vma_numab_state_init(vma); vma_lock_init(vma); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 825f6328f9e5..803f718c007c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,6 +19,7 @@ #include #include #include +#include =20 #include =20 @@ -599,9 +600,9 @@ static inline struct anon_vma_name *anon_vma_name_alloc= (const char *name) } #endif =20 -struct vma_lock { - struct rw_semaphore lock; -}; +#define VMA_STATE_DETACHED 0x0 +#define VMA_STATE_ATTACHED 0x1 +#define VMA_STATE_LOCKED 0x40000000 =20 struct vma_numab_state { /* @@ -679,19 +680,13 @@ struct vm_area_struct { }; =20 #ifdef CONFIG_PER_VMA_LOCK - /* - * Flag to indicate areas detached from the mm->mm_mt tree. - * Unstable RCU readers are allowed to read this. - */ - bool detached; - /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) - * - vm_lock->lock (in write mode) + * - vm_refcnt VMA_STATE_LOCKED is set * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) - * - vm_lock->lock (in read or write mode) + * - vm_refcnt VMA_STATE_LOCKED is set or vm_refcnt > VMA_STATE_ATTACHED * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * @@ -754,7 +749,10 @@ struct vm_area_struct { struct vm_userfaultfd_ctx vm_userfaultfd_ctx; #ifdef CONFIG_PER_VMA_LOCK /* Unstable RCU readers are allowed to read this. */ - struct vma_lock vm_lock ____cacheline_aligned_in_smp; + refcount_t vm_refcnt ____cacheline_aligned_in_smp; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map vmlock_dep_map; +#endif #endif } __randomize_layout; =20 @@ -889,6 +887,7 @@ struct mm_struct { * by mmlist_lock */ #ifdef CONFIG_PER_VMA_LOCK + struct rcuwait vma_writer_wait; /* * This field has lock-like semantics, meaning it is sometimes * accessed with ACQUIRE/RELEASE semantics. diff --git a/kernel/fork.c b/kernel/fork.c index 8cb19c23e892..283909d082cb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -465,10 +465,6 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) data_race(memcpy(new, orig, sizeof(*new))); vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); -#ifdef CONFIG_PER_VMA_LOCK - /* vma is not locked, can't use vma_mark_detached() */ - new->detached =3D true; -#endif vma_numab_state_init(new); dup_anon_vma_name(orig, new); =20 @@ -488,8 +484,6 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, vm_rcu); =20 - /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock.lock), vma); __vm_area_free(vma); } #endif @@ -1228,6 +1222,9 @@ static inline void mmap_init_lock(struct mm_struct *m= m) { init_rwsem(&mm->mmap_lock); mm_lock_seqcount_init(mm); +#ifdef CONFIG_PER_VMA_LOCK + rcuwait_init(&mm->vma_writer_wait); +#endif } =20 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct = *p, diff --git a/mm/init-mm.c b/mm/init-mm.c index 6af3ad675930..4600e7605cab 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -40,6 +40,7 @@ struct mm_struct init_mm =3D { .arg_lock =3D __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist =3D LIST_HEAD_INIT(init_mm.mmlist), #ifdef CONFIG_PER_VMA_LOCK + .vma_writer_wait =3D __RCUWAIT_INITIALIZER(init_mm.vma_writer_wait), .mm_lock_seq =3D SEQCNT_ZERO(init_mm.mm_lock_seq), #endif .user_ns =3D &init_user_ns, diff --git a/mm/memory.c b/mm/memory.c index c6356ea703d8..cff132003e24 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6331,7 +6331,25 @@ struct vm_area_struct *lock_mm_and_find_vma(struct m= m_struct *mm, #ifdef CONFIG_PER_VMA_LOCK void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q) { - down_write(&vma->vm_lock.lock); + bool detached; + + /* + * If vma is detached then only vma_mark_attached() can raise the + * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached(). + */ + if (!refcount_inc_not_zero(&vma->vm_refcnt)) { + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); + return; + } + + rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_); + /* vma is attached, set the writer present bit */ + refcount_add(VMA_STATE_LOCKED, &vma->vm_refcnt); + /* wait until state is VMA_STATE_ATTACHED + (VMA_STATE_LOCKED + 1) */ + rcuwait_wait_event(&vma->vm_mm->vma_writer_wait, + refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_ATTACHED + (VMA_STATE= _LOCKED + 1), + TASK_UNINTERRUPTIBLE); + lock_acquired(&vma->vmlock_dep_map, _RET_IP_); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -6339,7 +6357,10 @@ void __vma_start_write(struct vm_area_struct *vma, u= nsigned int mm_lock_seq) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock.lock); + detached =3D refcount_sub_and_test(VMA_STATE_LOCKED + 1, + &vma->vm_refcnt); + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); + VM_BUG_ON_VMA(detached, vma); /* vma should remain attached */ } EXPORT_SYMBOL_GPL(__vma_start_write); =20 @@ -6355,7 +6376,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, struct vm_area_struct *vma; =20 rcu_read_lock(); -retry: vma =3D mas_walk(&mas); if (!vma) goto inval; @@ -6363,13 +6383,6 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_= struct *mm, if (!vma_start_read(vma)) goto inval; =20 - /* Check if the VMA got isolated after we found it */ - if (is_vma_detached(vma)) { - vma_end_read(vma); - count_vm_vma_lock_event(VMA_LOCK_MISS); - /* The area was replaced with another one */ - goto retry; - } /* * At this point, we have a stable reference to a VMA: The VMA is * locked and we know it hasn't already been isolated. diff --git a/tools/testing/vma/linux/atomic.h b/tools/testing/vma/linux/ato= mic.h index e01f66f98982..2e2021553196 100644 --- a/tools/testing/vma/linux/atomic.h +++ b/tools/testing/vma/linux/atomic.h @@ -9,4 +9,9 @@ #define atomic_set(x, y) do {} while (0) #define U8_MAX UCHAR_MAX =20 +#ifndef atomic_cmpxchg_relaxed +#define atomic_cmpxchg_relaxed uatomic_cmpxchg +#define atomic_cmpxchg_release uatomic_cmpxchg +#endif /* atomic_cmpxchg_relaxed */ + #endif /* _LINUX_ATOMIC_H */ diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index 0cdc5f8c3d60..b55556b16060 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -25,7 +25,7 @@ #include #include #include -#include +#include =20 extern unsigned long stack_guard_gap; #ifdef CONFIG_MMU @@ -132,10 +132,6 @@ typedef __bitwise unsigned int vm_fault_t; */ #define pr_warn_once pr_err =20 -typedef struct refcount_struct { - atomic_t refs; -} refcount_t; - struct kref { refcount_t refcount; }; @@ -228,15 +224,14 @@ struct mm_struct { unsigned long def_flags; }; =20 -struct vma_lock { - struct rw_semaphore lock; -}; - - struct file { struct address_space *f_mapping; }; =20 +#define VMA_STATE_DETACHED 0x0 +#define VMA_STATE_ATTACHED 0x1 +#define VMA_STATE_LOCKED 0x40000000 + struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ =20 @@ -264,16 +259,13 @@ struct vm_area_struct { }; =20 #ifdef CONFIG_PER_VMA_LOCK - /* Flag to indicate areas detached from the mm->mm_mt tree */ - bool detached; - /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) - * - vm_lock.lock (in write mode) + * - vm_refcnt VMA_STATE_LOCKED is set * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) - * - vm_lock.lock (in read or write mode) + * - vm_refcnt VMA_STATE_LOCKED is set or vm_refcnt > VMA_STATE_ATTACHED * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * @@ -282,7 +274,6 @@ struct vm_area_struct { * slowpath. */ unsigned int vm_lock_seq; - struct vma_lock vm_lock; #endif =20 /* @@ -335,6 +326,10 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + /* Unstable RCU readers are allowed to read this. */ + refcount_t vm_refcnt; +#endif } __randomize_layout; =20 struct vm_fault {}; @@ -461,21 +456,37 @@ static inline struct vm_area_struct *vma_next(struct = vma_iterator *vmi) =20 static inline void vma_lock_init(struct vm_area_struct *vma) { - init_rwsem(&vma->vm_lock.lock); + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED); vma->vm_lock_seq =3D UINT_MAX; } =20 -static inline void vma_mark_attached(struct vm_area_struct *vma) +static inline bool is_vma_detached(struct vm_area_struct *vma) { - vma->detached =3D false; + return refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_DETACHED; } =20 static inline void vma_assert_write_locked(struct vm_area_struct *); +static inline void vma_mark_attached(struct vm_area_struct *vma) +{ + vma_assert_write_locked(vma); + + if (is_vma_detached(vma)) + refcount_set(&vma->vm_refcnt, VMA_STATE_ATTACHED); +} + static inline void vma_mark_detached(struct vm_area_struct *vma) { - /* When detaching vma should be write-locked */ vma_assert_write_locked(vma); - vma->detached =3D true; + + if (is_vma_detached(vma)) + return; + + if (!refcount_dec_and_test(&vma->vm_refcnt)) { + /* + * Reader must have temporarily raised vm_refcnt but it will + * drop it without using the vma since vma is write-locked. + */ + } } =20 extern const struct vm_operations_struct vma_dummy_vm_ops; @@ -488,8 +499,6 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - /* vma is not locked, can't use vma_mark_detached() */ - vma->detached =3D true; vma_lock_init(vma); } =20 @@ -515,8 +524,6 @@ static inline struct vm_area_struct *vm_area_dup(struct= vm_area_struct *orig) memcpy(new, orig, sizeof(*new)); vma_lock_init(new); INIT_LIST_HEAD(&new->anon_vma_chain); - /* vma is not locked, can't use vma_mark_detached() */ - new->detached =3D true; =20 return new; } --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D703920C47B for ; Mon, 16 Dec 2024 19:24:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377088; cv=none; b=NHjkjhBexyFYiQIQfO4okE7KdwfP534CuWA6qu7QpYrbcR/wQZjLzmD0qPMTXdjEX++vaq2zcaULMhYIniimiJMo2Kmte1Pu90iPeU7QPCJjupOsL11DQiizjRC4H/0C07SM4nciSopZqtyNiTq1rKT7bDwmBnkY2HGUvp4UCdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377088; c=relaxed/simple; bh=J9clzO1R9BBoWjQqCVEG6ERIeffoUB+RtDJCkEdVkoc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fbo2vC2xJl16aelOP3eYvX0YT9j5a2flUsANfnUjn/SW4BvOKD4JdoeMmGF4V3VTzKDFZqSoxVWst0fKEURBzZhgTGFVhr61rV/2Vrbycd99vlpDZyVsevINRKnlLjO2ep5SmrG0GJcRhWJwS87IZ702EO2fVIQsVbkMYBBNTUU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oZphh87y; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oZphh87y" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-725c882576aso3306929b3a.3 for ; Mon, 16 Dec 2024 11:24:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377086; x=1734981886; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nohm0qlZ+D54asZWR+C7SIZfQAywm7JDUn18lM86DfI=; b=oZphh87yVjS2yDxJxJ5dwdmfMXTOmuXQL7CldbpyJ0T46KDEi5DU+MaqRbjcf+sNdU Z/0/ai0s8vBjLIiaOVWt4BOLD0JpBhmdTgZu/Dv5yFcJYKjaJp2rxsyMyYlj0h1RrHWV 378mxJczgUisW0KXoxyBKxa47JjGKYRp6WUUN2+4Im7Qp1r/8sH9Gnlo4qbjbrVfDmTy a0ONP2Hsug16sKoUCkr4hHiRFE2u2KNCh35ekG1F+paeNKD7kML5vWj8sklds9EtUQGi SMnpTRhdCs5fTd985dQt018OcAy4bOBuVUEcLunDRsa8hddYeg4oeGLU6VXnwY7IQuFD 8jBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377086; x=1734981886; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nohm0qlZ+D54asZWR+C7SIZfQAywm7JDUn18lM86DfI=; b=qOR3cJzNkk/369BnTDL0mG+DKywvhLoD6K/XXMwJiC5fpA4LqR5MvA3Ot7aMgk0oVo 7F595NPJfEkI8nTGyrs3Iuixq3LwQHYkun96/fn07bzlbVQvQiH2dqBYe2nBY6LD/AAv 0fnxLWSwAsLueM61pFaKmJuuOYIj0Ivw953vVl3axDrmev8cFPd9suIBjETX4qKQcT8h D3GzcxtJa8OSnYHwzpfoxkmG7KZhFQPf3k9ZJzV+Tcm6X14VqFybnUIEv17ppNeWUiEv +Y/F6qq8yu8iXo+eknE311ey8oEQ7g++D8xf4kiRCOGeAkwfNyjIP3ZnqOLy+Fqgyalf KVoA== X-Forwarded-Encrypted: i=1; AJvYcCUM0kS0/Opp9v1+SL4TKNJAmu3kbtHXXX99ocG4m/IESORz8srWUkaRyfMXLTkBEFBkLptINg1jg3D/cuY=@vger.kernel.org X-Gm-Message-State: AOJu0YxpTUMSmcQAFi969zIdzYItnMfE6B1wQZRnMSohIRp23xAqcO3n 5jygQBlBiYDgWGn9AZ48XSXvp5Smsvms6dMvwP0dkPNbwpknrHAU+c2wJIr9z0xKK1ZC9zhxiGt ylA== X-Google-Smtp-Source: AGHT+IF9Nz4fP8YCDWuEvv8HenQUzivi17wIWFTDjOw79pIi9puAuyZi3edh0rN76x0eDFYowMyrfvXvJ/c= X-Received: from pgiz13.prod.google.com ([2002:a63:c04d:0:b0:7fd:585b:babf]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:c88e:b0:1e0:f495:1bd9 with SMTP id adf61e73a8af0-1e45ab17a26mr1379210637.8.1734377086086; Mon, 16 Dec 2024 11:24:46 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:14 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-12-surenb@google.com> Subject: [PATCH v6 11/16] mm: enforce vma to be in detached state before freeing From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" exit_mmap() frees vmas without detaching them. This will become a problem when we introduce vma reuse. Ensure that vmas are always detached before being freed. Signed-off-by: Suren Baghdasaryan --- kernel/fork.c | 4 ++++ mm/vma.c | 10 ++++++++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 283909d082cb..f1ddfc7b3b48 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -473,6 +473,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_stru= ct *orig) =20 void __vm_area_free(struct vm_area_struct *vma) { +#ifdef CONFIG_PER_VMA_LOCK + /* The vma should be detached while being destroyed. */ + VM_BUG_ON_VMA(!is_vma_detached(vma), vma); +#endif vma_numab_state_free(vma); free_anon_vma_name(vma); kmem_cache_free(vm_area_cachep, vma); diff --git a/mm/vma.c b/mm/vma.c index fbd7254517d6..0436a7d21e01 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -413,9 +413,15 @@ void remove_vma(struct vm_area_struct *vma, bool unrea= chable) if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - if (unreachable) + if (unreachable) { +#ifdef CONFIG_PER_VMA_LOCK + if (!is_vma_detached(vma)) { + vma_start_write(vma); + vma_mark_detached(vma); + } +#endif __vm_area_free(vma); - else + } else vm_area_free(vma); } =20 --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A30EE20C48A for ; Mon, 16 Dec 2024 19:24:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377090; cv=none; b=nR0on2jfAAjWdbq+63pWkOh6BIgWfOogZ3MUr1eT3sVpUx/mG4MqAM2aOMI/vAxxGRHXwMKpzKG9C9TcYdWX7QuWOuJ6Atcf7HGoi3/jg2AtyQnJjyJ34OlPbcZHmAWahsluYAmb8rCqmA6ocEcv7enDriO5gMJ9gzRs4QrF/Ig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377090; c=relaxed/simple; bh=meptwdKdQlQfWYK30tfEjRISvkkyMSxm37GbBTGDJNk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ln7jXfrFaLox0GcD93L1Zl1DqF+TbOHy1eWPOa7T6QGeN1NCv7rZtinP96slPvKyw2VXovDGOAAzO+QMAQ2Qi+y1juP3maTTnpoCOM2RvSMAXMFY+BplthHugH5KcN/p6h6F9hqREhDckrtazu50CMWDgWbZTdYdqzxxmzbHe3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aepi4bD+; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aepi4bD+" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-725c882576aso3306955b3a.3 for ; Mon, 16 Dec 2024 11:24:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377088; x=1734981888; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wSQ3mT4u4m7ofM+stpYMvUJpeXhYChT7fq9/vjKfHLA=; b=aepi4bD+deWNr6HmgUvSPHEvKqqWUVn+rm0wh60KGTZu/rH8ASHhhDsUYOsmkhPaEu nGvr99iWzLWuwdxFcWY9uQPGop79fbK4iqnwv0gPBQKDhEn5F8KiHB+SClo9Ck92Wq+8 mVIXpjuucYYiUeFBCBgiQVvzfrQPdbDMNrVS3Se+fsOzDm0QMNrqtKc1qd2CApXt09ds NDSHv97a7ud4qLCkIL3bBCF415KJi53eVVjgiYAXlhBpeds9hY22TLkjCdx0Bz2ho8FE vDmP2BQqGngaCjGcupdp3FwRmUUZ+ca/EosLeoBqOaxiE2RsDx/N3qH1FYPUOihOGyWT +99A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377088; x=1734981888; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wSQ3mT4u4m7ofM+stpYMvUJpeXhYChT7fq9/vjKfHLA=; b=lP881TYTs5ZgOyTmlv+Vd40CieMhQgL58qFzYyXtb/C0HteaEMcNG5LhKVNGInQI+F KLGr+04etRfdOr/23mIVZA5m9mVjkewwmcmMLjKDK9qfhq45Vy6eUfzKjIH5NAaZVQii SWUm9MSEmfcMVNOHaoVZVYhNN7UIXxXr8M0v+qVgfuFSVhiR7kJpeMcHhn0B/b1v9gFn E2ZKW8ZBkhDBuPyv7qofHGrkRSJSiq5IGGS7ChaSNBAd/0F9+nhPzGfIw+8jeKHYUN85 k2X/RZtFRYzfBIycxoSHnWO4z2UQWUE1vou0V0Szud3DSaHCHAtjLWJf5seh+nqLxLtq I6mA== X-Forwarded-Encrypted: i=1; AJvYcCXqgaV/n80B30IqUuzViZmf2VSFFiMS+f18yiIeNgPxIwZ2+8fwN39AuKjdGc6HHCKyC9Ujk1871j4K2+0=@vger.kernel.org X-Gm-Message-State: AOJu0Yxy9mqk6yyXob5rgiM78GAMy+3KiQlXAGyHGojICADEc8yKiIy1 D1MKJxA9qosCGiUywu18RvMAoKpHxNUenH1MnQMOARJwH402PN+ivg1TfuMkoMUJVlyxPok8p6d ZsQ== X-Google-Smtp-Source: AGHT+IE4pawsfAUH9TLKs+lZMgX1XervLrLNUWY+S4QjAFjLPVQLu5/+TUucSp06ogqQfpKVSrSRx/CLyDw= X-Received: from pfop18.prod.google.com ([2002:a05:6a00:b52:b0:727:3c81:f42a]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6d8a:b0:1e1:b12e:edb8 with SMTP id adf61e73a8af0-1e462d60708mr1104953637.30.1734377087952; Mon, 16 Dec 2024 11:24:47 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:15 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-13-surenb@google.com> Subject: [PATCH v6 12/16] mm: remove extra vma_numab_state_init() call From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_init() already memset's the whole vm_area_struct to 0, so there is no need to an additional vma_numab_state_init(). Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d9edabc385b3..b73cf64233a4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -949,7 +949,6 @@ static inline void vma_init(struct vm_area_struct *vma,= struct mm_struct *mm) vma->vm_mm =3D mm; vma->vm_ops =3D &vma_dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_numab_state_init(vma); vma_lock_init(vma); } =20 --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4D0F20CCDE for ; Mon, 16 Dec 2024 19:24:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377092; cv=none; b=fjshJ6pCmrHpjx8/QGJbkNxh/98oSLfgnylovrmbgmXpqV2aP6nKeOw3alOjACamvZ9mscne/wurQNDdO70kwBcAXYgTin54meUqTsdb2Uz+c3WA75J6fem5yavc7yu0TbKHVdapzNyWqnk/WIF5/P0zRVafEhMsJx5/sdjVZvw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377092; c=relaxed/simple; bh=94fhfIX6Q2WlSK+YdFVONdwPdyOaDHKJvIkaNQuymuE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ND4rxNCz6BWALwf1qcx55SOPrIh6qU/vq2n2+rO3aYA3Px8uEX2CjXLpDG5tDMj9PucQhebgdZyeXLRwxcmIaHrJN5PqWlT2K1Ge4VNFGAXCTbWOpMurmY57py80VSKjBiB0bIzVrIsLTua9SJQuj+ODF32m+4Er1ezUu/64RDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nbHz9gD2; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nbHz9gD2" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7fdde44f4f2so2850293a12.1 for ; Mon, 16 Dec 2024 11:24:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377090; x=1734981890; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eye0eVNbsQT4TepH1/hu+cBmomL+3TKzfNRnkVWyI/4=; b=nbHz9gD2diPARnvxYSwXEyz38Xz1txu+dB8pNgGfqxUnesPjYVbymTheviPU7TYGVE Kgr6yYha0FbNkTGNVxrmZAQcGbitCqL5cW7ekyA0VgGPAgiHHKcT1KpVg3KO17yWX8Nq NLjSJJtzEk5ailYS0QtKjXdBvTgDaYSQPivQQnj9utCjEzzJsYt5TZntI2wRa4bO1GNR miwwkLts+gK+oa1Clq2VT2vOfl8K6ujrzFquKrnr42yno+oE59sE9Cgu+vlVPpmeyQAt a1yyhAAV5AT13GKmt1A1wRqwWzUCmu8Wu0eOVMu1KqhoRP64JTbBrZ2aTgPL3aBVGNs8 6V0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377090; x=1734981890; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eye0eVNbsQT4TepH1/hu+cBmomL+3TKzfNRnkVWyI/4=; b=iUiiFxgl8xPMZPQKyF7D7Gp1wvuikwNhb70BwHODQzQtLYsMi1d2bdHFBtIWJ4azKQ LaIym4zfFL9Mva3bHrriZioF1bQSRPBDBllDUPKduRloCtn48MDloXxLzvmuGSSSTwU5 BOGwQrmLSUB9wEdYC7Zt5JdhuEzrljEB3Z7L5gPWLpU5dHYIdx9YyFP6PHJW2A7WbWwY YJfBM2MFGAayxLjbZSTNJW/dr/oTaxOwu/hKi4w4ft+rK/uZ5IkkeqaIyBtvjsoOHNLr 2FdcTp8hr1A3RtfunRaj4lGiQwWI7XAnqrRMD7fusJOddYVyqvAedKxJ7THsBgrn5fQs hsRA== X-Forwarded-Encrypted: i=1; AJvYcCUfPLS8WjXHtnnN85wTbdIWHpDlB1wQMA0e/bfuut5efQKQlaeyXZ5VL1qCw7t2bQpeK/6NhsXdeU8SQtw=@vger.kernel.org X-Gm-Message-State: AOJu0Yzk2zDCr+z9x0DpssrvlmEzG+TLTQCTrRTypwSJ75p/d66Zrphf vGDwMrjjbjzzf3uPR4ROuC/6BR4zUqVVst9V21zpkiDAvpl7eVt4qnrKJIY5yEbJwy5g3sJDz/H i4w== X-Google-Smtp-Source: AGHT+IFpyvj1Cl0PXtve1a3hZL9RrpgDnkSC5MatBr/H5K08KaVaYCnO8eim/kS1J5eWUMkZBi1DYQsuwp4= X-Received: from pfbdf1.prod.google.com ([2002:a05:6a00:4701:b0:728:ec44:ed90]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:9f04:b0:1e0:b5ae:8fc1 with SMTP id adf61e73a8af0-1e1dfd3da76mr21067006637.13.1734377089941; Mon, 16 Dec 2024 11:24:49 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:16 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-14-surenb@google.com> Subject: [PATCH v6 13/16] mm: introduce vma_ensure_detached() From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" vma_start_read() can temporarily raise vm_refcnt of a write-locked and detached vma: // vm_refcnt=3D=3D1 (attached) vma_start_write() vma->vm_lock_seq =3D mm->mm_lock_seq vma_start_read() vm_refcnt++; // vm_refcnt=3D=3D2 vma_mark_detached() vm_refcnt--; // vm_refcnt=3D=3D1 // vma is detached but vm_refcnt!=3D0 temporarily if (vma->vm_lock_seq =3D=3D mm->mm_lock_seq) vma_refcount_put() vm_refcnt--; // vm_refcnt=3D=3D0 This is currently not a problem when freeing the vma because RCU grace period should pass before kmem_cache_free(vma) gets called and by that time vma_start_read() should be done and vm_refcnt is 0. However once we introduce possibility of vma reuse before RCU grace period is over, this will become a problem (reused vma might be in non-detached state). Introduce vma_ensure_detached() for the writer to wait for readers until they exit vma_start_read(). Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 9 ++++++ mm/memory.c | 55 +++++++++++++++++++++++--------- tools/testing/vma/vma_internal.h | 8 +++++ 3 files changed, 57 insertions(+), 15 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b73cf64233a4..361f26dedab1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -863,6 +863,15 @@ static inline bool is_vma_detached(struct vm_area_stru= ct *vma) return refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_DETACHED; } =20 +/* + * WARNING: to avoid racing with vma_mark_attached(), should be called eit= her + * under mmap_write_lock or when the object has been isolated under + * mmap_write_lock, ensuring no competing writers. + * Should be called after marking vma as detached to wait for possible + * readers which temporarily raised vm_refcnt to drop it back and exit. + */ +void vma_ensure_detached(struct vm_area_struct *vma); + static inline void vma_mark_attached(struct vm_area_struct *vma) { vma_assert_write_locked(vma); diff --git a/mm/memory.c b/mm/memory.c index cff132003e24..534e279f98c1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6329,18 +6329,10 @@ struct vm_area_struct *lock_mm_and_find_vma(struct = mm_struct *mm, #endif =20 #ifdef CONFIG_PER_VMA_LOCK -void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q) +static inline bool __vma_enter_locked(struct vm_area_struct *vma) { - bool detached; - - /* - * If vma is detached then only vma_mark_attached() can raise the - * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached(). - */ - if (!refcount_inc_not_zero(&vma->vm_refcnt)) { - WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - return; - } + if (!refcount_inc_not_zero(&vma->vm_refcnt)) + return false; =20 rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_); /* vma is attached, set the writer present bit */ @@ -6350,6 +6342,22 @@ void __vma_start_write(struct vm_area_struct *vma, u= nsigned int mm_lock_seq) refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_ATTACHED + (VMA_STATE= _LOCKED + 1), TASK_UNINTERRUPTIBLE); lock_acquired(&vma->vmlock_dep_map, _RET_IP_); + + return true; +} + +static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *is_= detached) +{ + *is_detached =3D refcount_sub_and_test(VMA_STATE_LOCKED + 1, + &vma->vm_refcnt); + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); +} + +void __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_se= q) +{ + bool locked; + + locked =3D __vma_enter_locked(vma); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -6357,13 +6365,30 @@ void __vma_start_write(struct vm_area_struct *vma, = unsigned int mm_lock_seq) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - detached =3D refcount_sub_and_test(VMA_STATE_LOCKED + 1, - &vma->vm_refcnt); - rwsem_release(&vma->vmlock_dep_map, _RET_IP_); - VM_BUG_ON_VMA(detached, vma); /* vma should remain attached */ + if (locked) { + bool detached; + + __vma_exit_locked(vma, &detached); + /* vma was originally attached and should remain so */ + VM_BUG_ON_VMA(detached, vma); + } } EXPORT_SYMBOL_GPL(__vma_start_write); =20 +void vma_ensure_detached(struct vm_area_struct *vma) +{ + if (is_vma_detached(vma)) + return; + + if (__vma_enter_locked(vma)) { + bool detached; + + /* Wait for temporary readers to drop the vm_refcnt */ + __vma_exit_locked(vma, &detached); + VM_BUG_ON_VMA(!detached, vma); + } +} + /* * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed = to be * stable and not isolated. If the VMA is not found or is being modified t= he diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index b55556b16060..ac0a59906fea 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -465,6 +465,14 @@ static inline bool is_vma_detached(struct vm_area_stru= ct *vma) return refcount_read(&vma->vm_refcnt) =3D=3D VMA_STATE_DETACHED; } =20 +static inline void vma_ensure_detached(struct vm_area_struct *vma) +{ + if (is_vma_detached(vma)) + return; + + refcount_set(&vma->vm_refcnt, VMA_STATE_DETACHED); +} + static inline void vma_assert_write_locked(struct vm_area_struct *); static inline void vma_mark_attached(struct vm_area_struct *vma) { --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B20AD20D4F4 for ; Mon, 16 Dec 2024 19:24:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377094; cv=none; b=OAMnqK2+X8zjY/ubxFJXMdteAh4cw8tgecTQ3HaGvFFd2iXOfccABL540vFGi3E47ZbWwDooOsmvlsTtWmCxPCjCXK0ZYbyOlVf8lVtOHW4r27kaozESuJeaVc9QG3WGcA0B3vOkfGTz0a31j0IHYw4bT29YIW5QJ1NgBRiUhys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377094; c=relaxed/simple; bh=5aDrPziR1EBUYF3aYwfa2DjR/XNJQIfQktb62k8eAEg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DXyO6CDJrl9LxNsoYCssrhapskweLxyTcJpqFdoNZBm7na0tZEig6gtGB6mMlzM77WQd2F6PAWuuB7YKYD1/1BSOi+gOv7iPgZxiM2jjkhAuOzv2gz6e5LmOg+771dKeZbxQN0ZxQPvk82dB26KdKlgaG1my9eNnRqt3n211avc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wD0+YLwe; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wD0+YLwe" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7f8af3950ecso3104576a12.3 for ; Mon, 16 Dec 2024 11:24:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377092; x=1734981892; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=t5fnSRoB5oFCIXa/V8gP+VbtOpDmgl8LzmUTLWBvDvo=; b=wD0+YLwe3dA+iyPuTaoxkA0OACDyK+rEXriDHf8LH4SREMNESHoRxsHvpY4ju2WIjU qiSxEwtiVqquyT1BgyXJNafzkK/OcHJ53auCtuimnI6YiFuREJEtLSHcFt8uDg3yApg4 2bt63ybUZUlLqKGK2hP55xbO2QzhP3tZ9zO7wEW+Dt9sCnSONXXDrCOstReS0ITOwjqr e+prRXmmeDGHMFDR6SvqPkVtIl0zN4JzHAbXPy3CSO8UJXthk7lZr4QA6c1XLCEpXXCk UyjHIK+FAZPmu05Rgju1f8OBFnkf/ytOvuZoqcziAH9X2NhWEopQgA4TLnqwr639wC0K qrDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377092; x=1734981892; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=t5fnSRoB5oFCIXa/V8gP+VbtOpDmgl8LzmUTLWBvDvo=; b=UL0Z3iwfj+ZHjFj/Ipdtox+CeMTIhef4Vi2MNEWFxnphfbZC+hHOIMnc+pUswGN0Rn EKNzut0D00Y/Atko+Q+oDehWgSycI0VXRjVkTSos7yYj+1VOFFl9Ms6nB3VxxRsmM/jH dK+iQwAxCR7+wfDVLVrqlEuXgDRdh5ZlMgcGZK1Te28VKp3chn2ayrao+3Oo98n7aU5+ oshupgiFlSIDh6/y3woOCNeiU3DE8Ti/caYJqarKS1qEF2y5CZ7OK6xnYyP7HROPPGpO RujKRkMNvl+lPGqi8kUWIxmzv23qE6jKmMHpB+Ni3hczmVGSvet9BFFK17tT3v8mL5A0 2AAg== X-Forwarded-Encrypted: i=1; AJvYcCVj64oPz32SJCYdk71JlVJZp2NbWpPASAXxs3/6+ddXq5wbW+vI8G3k9c4QFRyQBiF8xt2IGNy/zbwFdGI=@vger.kernel.org X-Gm-Message-State: AOJu0YycEm2fMyzRCb1ZBBXlxWFi7Hm+kcIvKtFNk6oKK6Xx/MxLPh2m 2Vvj/wav8FzM8/vXeBPnjyTPgJMq168BHHVK3Oc1yDc3m4DtgLgm1+zV69f0R5QS7homteTebsh IPg== X-Google-Smtp-Source: AGHT+IGeGCIjsB3KXvsuHWb/nocj10tzreW4ZVMNCmw0sPpWQ78cfKvuNdThF2i3rD3x89K8u/6hh2RJEJY= X-Received: from pggk15.prod.google.com ([2002:a63:d10f:0:b0:801:d768:c174]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:8885:b0:1e2:5cf:c8d6 with SMTP id adf61e73a8af0-1e205cfcb77mr10415260637.36.1734377091912; Mon, 16 Dec 2024 11:24:51 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:17 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-15-surenb@google.com> Subject: [PATCH v6 14/16] mm: prepare lock_vma_under_rcu() for vma reuse possibility From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Once we make vma cache SLAB_TYPESAFE_BY_RCU, it will be possible for a vma to be reused and attached to another mm after lock_vma_under_rcu() locks the vma. lock_vma_under_rcu() should ensure that vma_start_read() is using the original mm and after locking the vma it should ensure that vma->vm_mm has not changed from under us. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 10 ++++++---- mm/memory.c | 7 ++++--- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 361f26dedab1..bfd01ae07660 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -725,8 +725,10 @@ static inline void vma_refcount_put(struct vm_area_str= uct *vma) * Try to read-lock a vma. The function is allowed to occasionally yield f= alse * locked result to avoid performance overhead, in which case we fall back= to * using mmap_lock. The function should never yield false unlocked result. + * False locked result is possible if mm_lock_seq overflows or if vma gets + * reused and attached to a different mm before we lock it. */ -static inline bool vma_start_read(struct vm_area_struct *vma) +static inline bool vma_start_read(struct mm_struct *mm, struct vm_area_str= uct *vma) { int oldcnt; =20 @@ -737,7 +739,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) * we don't rely on for anything - the mm_lock_seq read against which we * need ordering is below. */ - if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(vma->vm_mm->mm_lock_seq.= sequence)) + if (READ_ONCE(vma->vm_lock_seq) =3D=3D READ_ONCE(mm->mm_lock_seq.sequence= )) return false; =20 =20 @@ -762,7 +764,7 @@ static inline bool vma_start_read(struct vm_area_struct= *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (oldcnt & VMA_STATE_LOCKED || - unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&vma->vm_mm->mm_lo= ck_seq))) { + unlikely(vma->vm_lock_seq =3D=3D raw_read_seqcount(&mm->mm_lock_seq))= ) { vma_refcount_put(vma); return false; } @@ -918,7 +920,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_str= uct *mm, #else /* CONFIG_PER_VMA_LOCK */ =20 static inline void vma_lock_init(struct vm_area_struct *vma) {} -static inline bool vma_start_read(struct vm_area_struct *vma) +static inline bool vma_start_read(struct mm_struct *mm, struct vm_area_str= uct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} diff --git a/mm/memory.c b/mm/memory.c index 534e279f98c1..2131d9769bb4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6405,7 +6405,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, if (!vma) goto inval; =20 - if (!vma_start_read(vma)) + if (!vma_start_read(mm, vma)) goto inval; =20 /* @@ -6415,8 +6415,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_s= truct *mm, * fields are accessible for RCU readers. */ =20 - /* Check since vm_start/vm_end might change before we lock the VMA */ - if (unlikely(address < vma->vm_start || address >=3D vma->vm_end)) + /* Check if the vma we locked is the right one. */ + if (unlikely(vma->vm_mm !=3D mm || + address < vma->vm_start || address >=3D vma->vm_end)) goto inval_end_read; =20 rcu_read_unlock(); --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8658820D509 for ; Mon, 16 Dec 2024 19:24:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377097; cv=none; b=CpZdG5ZMP1TiI4SLgS8sUixNXLgXCcCRFfHrOgvKQq9ojS7+l5wNmwa2tgl6bo94m4VUaNodCNsG670d/9Zodk7jwDoxmoa09aX0JsA0iG9IIgXV0eqrS3lgPeW6B6ncSJGvvzkrpfDryhuupCMeUMEe+K6onlbj4QIFqBXXLi0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377097; c=relaxed/simple; bh=YIeTsEP14iiWPIza2sclDillMZs5APtjXO1N31t+pkg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UgW7iZOted2h+5HaEZgZfiPKuscRMRlfvclAeGFM1Ek7bQn9IkDZFBedPzJZ5xDNG8042vzvRr1aOJQn2RI6qGvtIysfs9vs/e8pauqA23bsUTtEo8e7bbrNnxvXNmEzWvObPyLvGJTE8Dbxvnwu/6E0tK6Y/Hi2/pvNGqpatI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lb8b9bsR; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lb8b9bsR" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-725e87a142dso5865308b3a.2 for ; Mon, 16 Dec 2024 11:24:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377094; x=1734981894; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kODX23DYVFRUazWZEBIanW7JFjEXStT07MxX5tftr1k=; b=lb8b9bsRMRTI5evgj/WcqIhZYXRJheqZxxpIwISNDf3hiDHzJ2x5XVEx7E8FIwLPal nLj/9B+nZx7IR6bEhArh9Hn1k1Un9eUH+7lANf+olQN2fDzp+cWYgzOGuJSz1oInwmR/ fcxA3NCXdfkW0ikth6tPMAMMQyWAXIu23aLYFkqW30dnQ8R3fuyVqJ5oy00EEv2P+0+Q D9yZu0yqxHKDi/WeNs/xCyhvmYKZwloxqdyAsbTfi7iGVh2qtAMCQ9xrm0HimGXNxasG ZCwsKBYH7T0MG13DZK6gXsLS6fbwgKCXutRYC4MtsvMJ8FFkP4aUFwGffx2zU3V45TQg I2Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377094; x=1734981894; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kODX23DYVFRUazWZEBIanW7JFjEXStT07MxX5tftr1k=; b=iauIwhXXM/jSJ26lTCGJ/rTNvo0Zk9aa4AHHbEQqa174s4pYSrLRs7p5xiDRIEtReN hTKaeOz0kAQLC6xyS9KPvCkDokumfM/S85AveLrKren+El4dHTvzhbA/J9DP+u3c9WzH K6AK1wjnQf8t6Fgww1Gzuuso6JawK9qERR6f38Saq0gS4WVyS48LZHHQHhbyEN77P6Tg +b04fqR4mPpnvm7D9R7MGUheqs8aQT6Ns8oe9eJHUuRACjHqeyQ259KkwX1h6F6aDBx4 flopORin8vPbQ7nDlzLbSlPL8mNmw2/MX34V55dn81WY+BRAFhew3fAR89fT0jgBZYnu Lzjg== X-Forwarded-Encrypted: i=1; AJvYcCULjXWCQGar6YMDmcleD3vP+MJntWjOlWX+yCg5owmwJ0YsE9xuJm1/aQ/3GNgA1+scP/KY2YpQ9oFcvts=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5OTyztE39zZurnH1dSdBWaQTXT64ov5J4kapUUq4yjtYlMExZ 5qh2fJ7UQP6GYgnA0f64iMqpkrsR8G24T4kjG06pAmRftoDhmWmAyoCyhY6SVCdRQrhJ+yab1BE o5A== X-Google-Smtp-Source: AGHT+IFs6nvWjzM1OoknSdPIN0ghPnyqI6YPL5avZzhKMYnwXQrh2jvp6FsUbYnz6o+KIo68RndWt2pA8Qg= X-Received: from pfjg1.prod.google.com ([2002:a05:6a00:b81:b0:725:8ee5:e458]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2995:b0:725:eacf:cfda with SMTP id d2e1a72fcca58-7290c25ae78mr18948341b3a.17.1734377093879; Mon, 16 Dec 2024 11:24:53 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:18 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-16-surenb@google.com> Subject: [PATCH v6 15/16] mm: make vma cache SLAB_TYPESAFE_BY_RCU From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that object reuse before RCU grace period is over will be detected by lock_vma_under_rcu(). Current checks are sufficient as long as vma is detached before it is freed. Implement this guarantee by calling vma_ensure_detached() before vma is freed and make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate vm_area_struct reuse and will minimize the number of call_rcu() calls. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 2 -- include/linux/mm_types.h | 10 +++++++--- include/linux/slab.h | 6 ------ kernel/fork.c | 34 ++++++++++---------------------- mm/mmap.c | 8 +++++++- mm/vma.c | 15 +++----------- mm/vma.h | 2 +- tools/testing/vma/vma_internal.h | 7 +------ 8 files changed, 29 insertions(+), 55 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bfd01ae07660..da773302af70 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -258,8 +258,6 @@ void setup_initial_init_mm(void *start_code, void *end_= code, struct vm_area_struct *vm_area_alloc(struct mm_struct *); struct vm_area_struct *vm_area_dup(struct vm_area_struct *); void vm_area_free(struct vm_area_struct *); -/* Use only if VMA has no other users */ -void __vm_area_free(struct vm_area_struct *vma); =20 #ifndef CONFIG_MMU extern struct rb_root nommu_region_tree; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 803f718c007c..a720f7383dd8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -544,6 +544,12 @@ static inline void *folio_get_private(struct folio *fo= lio) =20 typedef unsigned long vm_flags_t; =20 +/* + * freeptr_t represents a SLUB freelist pointer, which might be encoded + * and not dereferenceable if CONFIG_SLAB_FREELIST_HARDENED is enabled. + */ +typedef struct { unsigned long v; } freeptr_t; + /* * A region containing a mapping of a non-memory backed file under NOMMU * conditions. These are held in a global tree and are pinned by the VMAs= that @@ -658,9 +664,7 @@ struct vm_area_struct { unsigned long vm_start; unsigned long vm_end; }; -#ifdef CONFIG_PER_VMA_LOCK - struct rcu_head vm_rcu; /* Used for deferred freeing. */ -#endif + freeptr_t vm_freeptr; /* Pointer used by SLAB_TYPESAFE_BY_RCU */ }; =20 /* diff --git a/include/linux/slab.h b/include/linux/slab.h index 10a971c2bde3..681b685b6c4e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -234,12 +234,6 @@ enum _slab_flag_bits { #define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED #endif =20 -/* - * freeptr_t represents a SLUB freelist pointer, which might be encoded - * and not dereferenceable if CONFIG_SLAB_FREELIST_HARDENED is enabled. - */ -typedef struct { unsigned long v; } freeptr_t; - /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * diff --git a/kernel/fork.c b/kernel/fork.c index f1ddfc7b3b48..7affb9245f64 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -471,36 +471,16 @@ struct vm_area_struct *vm_area_dup(struct vm_area_str= uct *orig) return new; } =20 -void __vm_area_free(struct vm_area_struct *vma) +void vm_area_free(struct vm_area_struct *vma) { #ifdef CONFIG_PER_VMA_LOCK - /* The vma should be detached while being destroyed. */ - VM_BUG_ON_VMA(!is_vma_detached(vma), vma); + vma_ensure_detached(vma); #endif vma_numab_state_free(vma); free_anon_vma_name(vma); kmem_cache_free(vm_area_cachep, vma); } =20 -#ifdef CONFIG_PER_VMA_LOCK -static void vm_area_free_rcu_cb(struct rcu_head *head) -{ - struct vm_area_struct *vma =3D container_of(head, struct vm_area_struct, - vm_rcu); - - __vm_area_free(vma); -} -#endif - -void vm_area_free(struct vm_area_struct *vma) -{ -#ifdef CONFIG_PER_VMA_LOCK - call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb); -#else - __vm_area_free(vma); -#endif -} - static void account_kernel_stack(struct task_struct *tsk, int account) { if (IS_ENABLED(CONFIG_VMAP_STACK)) { @@ -3147,6 +3127,11 @@ void __init mm_cache_init(void) =20 void __init proc_caches_init(void) { + struct kmem_cache_args args =3D { + .use_freeptr_offset =3D true, + .freeptr_offset =3D offsetof(struct vm_area_struct, vm_freeptr), + }; + sighand_cachep =3D kmem_cache_create("sighand_cache", sizeof(struct sighand_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU| @@ -3163,8 +3148,9 @@ void __init proc_caches_init(void) sizeof(struct fs_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - vm_area_cachep =3D KMEM_CACHE(vm_area_struct, - SLAB_HWCACHE_ALIGN|SLAB_NO_MERGE|SLAB_PANIC| + vm_area_cachep =3D kmem_cache_create("vm_area_struct", + sizeof(struct vm_area_struct), &args, + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU| SLAB_ACCOUNT); mmap_init(); nsproxy_cache_init(); diff --git a/mm/mmap.c b/mm/mmap.c index df9154b15ef9..c848f6d645e9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1284,7 +1284,13 @@ void exit_mmap(struct mm_struct *mm) do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted +=3D vma_pages(vma); - remove_vma(vma, /* unreachable =3D */ true); +#ifdef CONFIG_PER_VMA_LOCK + if (!is_vma_detached(vma)) { + vma_start_write(vma); + vma_mark_detached(vma); + } +#endif + remove_vma(vma); count++; cond_resched(); vma =3D vma_next(&vmi); diff --git a/mm/vma.c b/mm/vma.c index 0436a7d21e01..1b46b92b2d4d 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -406,23 +406,14 @@ static bool can_vma_merge_right(struct vma_merge_stru= ct *vmg, /* * Close a vm structure and free it. */ -void remove_vma(struct vm_area_struct *vma, bool unreachable) +void remove_vma(struct vm_area_struct *vma) { might_sleep(); vma_close(vma); if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); - if (unreachable) { -#ifdef CONFIG_PER_VMA_LOCK - if (!is_vma_detached(vma)) { - vma_start_write(vma); - vma_mark_detached(vma); - } -#endif - __vm_area_free(vma); - } else - vm_area_free(vma); + vm_area_free(vma); } =20 /* @@ -1206,7 +1197,7 @@ static void vms_complete_munmap_vmas(struct vma_munma= p_struct *vms, /* Remove and clean up vmas */ mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - remove_vma(vma, /* unreachable =3D */ false); + remove_vma(vma); =20 vm_unacct_memory(vms->nr_accounted); validate_mm(mm); diff --git a/mm/vma.h b/mm/vma.h index 24636a2b0acf..3e6c14a748c2 100644 --- a/mm/vma.h +++ b/mm/vma.h @@ -170,7 +170,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_s= truct *mm, unsigned long start, size_t len, struct list_head *uf, bool unlock); =20 -void remove_vma(struct vm_area_struct *vma, bool unreachable); +void remove_vma(struct vm_area_struct *vma); =20 void unmap_region(struct ma_state *mas, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next); diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_inter= nal.h index ac0a59906fea..3342cad87ece 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -700,14 +700,9 @@ static inline void mpol_put(struct mempolicy *) { } =20 -static inline void __vm_area_free(struct vm_area_struct *vma) -{ - free(vma); -} - static inline void vm_area_free(struct vm_area_struct *vma) { - __vm_area_free(vma); + free(vma); } =20 static inline void lru_add_drain(void) --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Wed Dec 17 21:39:47 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C994820DD50 for ; Mon, 16 Dec 2024 19:24:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377098; cv=none; b=mmkYESkYE72AAc4OHg9pA2qG+yYqvgxjD+U0GuVd/EzWfwnVzOG7fwjVy5iXRPp7Zy9P2fOC+BFzUodGqa0GrD9athVFgtBNdZ6UjsqpZCB74T3Sjo2jCeMDSPdzzJMGxFH4AfxRba2QyNzpf+kTcgWJ5MgN6AyXnLx2ydXs7Cw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734377098; c=relaxed/simple; bh=GIO0fbE7SQZ1rtlKN8vZekgYlbMsWE+guT+3ZXk1HZ0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oBK01I1/6zh+3O/2kgUxVqKKwDmGthhUUbkWkE+ZP1bneHlS4OaNwQeOTS0b03EBzzVAu/ocWwcUgLaKddqPq7r60H3wgfmptBTICcyDNiKZyU5UlnZ6UWjj/fofpzng3bXCBeNp6kRJazF6/DuuaA7DTrlAUk6ztVrBEEvN0LI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=sm1OBka0; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sm1OBka0" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-725e2413114so3422811b3a.1 for ; Mon, 16 Dec 2024 11:24:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734377096; x=1734981896; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4HhxNlm5l3oc+a05Upn5SwGTABo4qTWEjrE4UxpsHII=; b=sm1OBka0RuFfJ0uvG3YR4A4+GZf3g+Qj9m0Nb6epgswsUaXXu67w/3SHSoUl1IWsjB qeBIWRCQ6QkTkyGsQSSysazuANWhPKC/32D5DT4YCascQYesg+1r36egRSEL2y1OM1S/ hcgygAHUJxMJv9QlNk97fxlXl7OeOgSuieS7R0eFytZ5qA4NVO0PmKOoAITG1fFAz3zQ n6sY2XNea9nP+IjLhJxtFTOSPHxAFvKeXH73+6FzIukFvzMbdb5OYrEXsArVcTVHWiZd lllw+uwZVDoOIRL1QlLDzEXsP+XT6NkfNAFajJbpQ6qTxlcU6s5/SCd6wR/gjJ4WfVTb YAlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734377096; x=1734981896; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4HhxNlm5l3oc+a05Upn5SwGTABo4qTWEjrE4UxpsHII=; b=VUk5Ja7a7xr77IVQsNpG+/0SWlq74Y0tvYAtamqmBcz6bVJiVrvpZEeqvdXnABZYON 7Fg6hAsb3PDPGKmsbYtLcnP6dIK44FJkOAuEjRee1XA6XhrWDd5HaOcHhabfUSHD+YuG yPB0hOrEuVZH6vfmDvD7hfK2fY6p3QsT9HkcrKnrC0RrsQ/al04pAICE3J6BE2T0lNJI YAjmacz0aL9K7beM47GUCR9aCCRV3DvRdZKYAPQNZ/KQBMwv/mht69J3zmrXwykmb5RL J80KyCrGRXRkMuuZr1E47Mv1ArqXbl/eK3Fgqh4euR5h7SxfecAx/ZwElMSc7lEcItTE 0Mqg== X-Forwarded-Encrypted: i=1; AJvYcCW1jWIfoS0i72ooz+3zbke+0/zktYhvSqVyvVUA+ob3YVOf5iy8LFKJejJLfoc+h2wb5DR0jWZN5613nuQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwGlbXCsaA9hSH842bj4PFkvt84Ua8tVBzyMWbekTRsbES5xf63 6bWp4cUUl/Li49RHgiU3/mW0ONbGSazKJw8DXCTDdgEhqSFxeZPfyJmsMuponrWoyCFedbqMOg0 h/w== X-Google-Smtp-Source: AGHT+IFCRcSRmnDxCEXNS618C+s8AbCbrJ3sqm2+B/Vq6ZMmO1in0EJV2noWuvjJbPv4qSgauLdf86FSJXo= X-Received: from pfbfb35.prod.google.com ([2002:a05:6a00:2da3:b0:728:b8e3:9934]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4b54:b0:729:49a:2db0 with SMTP id d2e1a72fcca58-7290c273bfemr20412405b3a.25.1734377095966; Mon, 16 Dec 2024 11:24:55 -0800 (PST) Date: Mon, 16 Dec 2024 11:24:19 -0800 In-Reply-To: <20241216192419.2970941-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241216192419.2970941-1-surenb@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241216192419.2970941-17-surenb@google.com> Subject: [PATCH v6 16/16] docs/mm: document latest changes to vm_lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: peterz@infradead.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change the documentation to reflect that vm_lock is integrated into vma and replaced with vm_refcnt. Document newly introduced vma_start_read_locked{_nested} functions. Signed-off-by: Suren Baghdasaryan --- Documentation/mm/process_addrs.rst | 44 ++++++++++++++++++------------ 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_= addrs.rst index 81417fa2ed20..f573de936b5d 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -716,9 +716,14 @@ calls :c:func:`!rcu_read_lock` to ensure that the VMA = is looked up in an RCU critical section, then attempts to VMA lock it via :c:func:`!vma_start_rea= d`, before releasing the RCU lock via :c:func:`!rcu_read_unlock`. =20 -VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semapho= re for -their duration and the caller of :c:func:`!lock_vma_under_rcu` must releas= e it -via :c:func:`!vma_end_read`. +In cases when the user already holds mmap read lock, :c:func:`!vma_start_r= ead_locked` +and :c:func:`!vma_start_read_locked_nested` can be used. These functions d= o not +fail due to lock contention but the caller should still check their return= values +in case they fail for other reasons. + +VMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for = their +duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via +:c:func:`!vma_end_read`. =20 VMA **write** locks are acquired via :c:func:`!vma_start_write` in instanc= es where a VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is = always @@ -726,9 +731,9 @@ acquired. An mmap write lock **must** be held for the d= uration of the VMA write lock, releasing or downgrading the mmap write lock also releases the VMA w= rite lock so there is no :c:func:`!vma_end_write` function. =20 -Note that a semaphore write lock is not held across a VMA lock. Rather, a -sequence number is used for serialisation, and the write semaphore is only -acquired at the point of write lock to update this. +Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is= temporarily +modified so that readers can detect the presense of a writer. The referenc= e counter is +restored once the vma sequence number used for serialisation is updated. =20 This ensures the semantics we require - VMA write locks provide exclusive = write access to the VMA. @@ -738,7 +743,7 @@ Implementation details =20 The VMA lock mechanism is designed to be a lightweight means of avoiding t= he use of the heavily contended mmap lock. It is implemented using a combination = of a -read/write semaphore and sequence numbers belonging to the containing +reference counter and sequence numbers belonging to the containing :c:struct:`!struct mm_struct` and the VMA. =20 Read locks are acquired via :c:func:`!vma_start_read`, which is an optimis= tic @@ -779,28 +784,31 @@ release of any VMA locks on its release makes sense, = as you would never want to keep VMAs locked across entirely separate write operations. It also mainta= ins correct lock ordering. =20 -Each time a VMA read lock is acquired, we acquire a read lock on the -:c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking= that -the sequence count of the VMA does not match that of the mm. +Each time a VMA read lock is acquired, we increment :c:member:`!vma.vm_ref= cnt` +reference counter and check that the sequence count of the VMA does not ma= tch +that of the mm. =20 -If it does, the read lock fails. If it does not, we hold the lock, excludi= ng -writers, but permitting other readers, who will also obtain this lock unde= r RCU. +If it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped. +If it does not, we keep the reference counter raised, excluding writers, b= ut +permitting other readers, who can also obtain this lock under RCU. =20 Importantly, maple tree operations performed in :c:func:`!lock_vma_under_r= cu` are also RCU safe, so the whole read lock operation is guaranteed to funct= ion correctly. =20 -On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock` -read/write semaphore, before setting the VMA's sequence number under this = lock, -also simultaneously holding the mmap write lock. +On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't = be +modified by readers and wait for all readers to drop their reference count. +Once there are no readers, VMA's sequence number is set to match that of t= he +mm. During this entire operation mmap write lock is held. =20 This way, if any read locks are in effect, :c:func:`!vma_start_write` will= sleep until these are finished and mutual exclusion is achieved. =20 -After setting the VMA's sequence number, the lock is released, avoiding -complexity with a long-term held write lock. +After setting the VMA's sequence number, the bit in :c:member:`!vma.vm_ref= cnt` +indicating a writer is cleared. From this point on, VMA's sequence number = will +indicate VMA's write-locked state until mmap write lock is dropped or down= graded. =20 -This clever combination of a read/write semaphore and sequence count allow= s for +This clever combination of a reference counter and sequence count allows f= or fast RCU-based per-VMA lock acquisition (especially on page fault, though utilised elsewhere) with minimal complexity around lock ordering. =20 --=20 2.47.1.613.gc27f4b7a9f-goog