From nobody Thu Apr 2 23:53:37 2026 Received: from mail-dy1-f201.google.com (mail-dy1-f201.google.com [74.125.82.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFF6838A70F for ; Thu, 26 Mar 2026 08:08:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774512529; cv=none; b=QoAWFRV7atUQMboY0IA+yl9JupuokesYQSnDAbSuy2go76lkne3G3PEpRB3aCqayHaWIAtR0oO399YBOyP2tRod2Sa5/vUngNtId9v4/8tVSC5i7zA0eATWvMUOvk6aIbRF5fXcUckVQvMvSqkNLZvuEMtY4yIobbvTitJhm+bI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774512529; c=relaxed/simple; bh=t74dnI+rxOISLYbAQ9wo5lkenKT6OQ51Z5+uKLX6Lxw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jwRSUVOFKFXybbtnVM9Z2UzCvUrLxW5kIvCAuVFsZp1+zlJo9iHfoy/4SlojWKocMK9DGa9pqx2vAiL6nn2UQ/Y4+h/s1QXfvSmeofddrWZ7ExrUx3amAS6d00VjzOhs1beOUOyPTk3cLGAVv5j8R5MthP/nzJ8dQH0uhXj9U8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tLQSdwKL; arc=none smtp.client-ip=74.125.82.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tLQSdwKL" Received: by mail-dy1-f201.google.com with SMTP id 5a478bee46e88-2c0ffce2570so4640850eec.1 for ; Thu, 26 Mar 2026 01:08:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1774512526; x=1775117326; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=C+yXmhx3uKy+6G9q4kJM+us9QDWjdFJDAOgDq56Y4dE=; b=tLQSdwKLC8kDAOHCsUi0vAe3124R4x7w9bZHzZR2Ql1ymRZVUdLhQ9/xDBLplca+c5 RmkYV2q2HljCwsCZXYnWYep/Bh4rywBiYX5rC3RXHyWv/CFmY8Kf39SEPKpmiEnpDCf4 PWuNfqQNfJoeLFAEqhtr2PyD6Fl2lMmrTXO+CwILorDYmGk9QzG47XCJ61ARvu/Fu0xw nAG7+hcUzV0r/nIfLBzzwDAMqzgbamw2mMJk2W3WGJGGnF20srT0XNG3DTq7k46XACQC /GJWaLfB2A8NPL9R9SmCnF1pZIV9Km7WDWkZ14GJn7lQVgyfVwAOmyLqjmW4TkczCN2z cQOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774512526; x=1775117326; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=C+yXmhx3uKy+6G9q4kJM+us9QDWjdFJDAOgDq56Y4dE=; b=CycIheEylQtFPgeH2zBnTpVlS3mfepb7q6j1AVwDJO6hgNydr5+kE2ymwFp82O2zGX 1neZPtJgycoq3K8/+Gi562fuH+EElC5xL1RnuQhFe25RUDbA5Z7J3E+VO2NGpCOyzs5Y g+JEVLXqRyr7mmznJKW68YIN4ZmjDxXD4/DM12rM5q7na65rc+fC+EIWqIJF+JGEsCOr Z4pbPjlVynMJqTnkn4xkeNCto6UEdQiQ4/Zv6dm+tEP9mYQQ9Lcwh2ssdE9S4HoR5U1g oA7MIioKX6De65E+aDLr/3niyNxVGP/QAAeOj+IJigEks+yOuBstq0KPjFPgSjgDFrAx LtgQ== X-Forwarded-Encrypted: i=1; AJvYcCVgPhO0Y6W0aR3G1gzt0cVrE9IEZCiG75/PNlHvjG6r0nE+5hz5zb1HOf3OGZV6uy7Oe8UcOzKXNKHSafw=@vger.kernel.org X-Gm-Message-State: AOJu0Yyy2LeQxTSJLDm+Cp5lyXoCWPQnnAOC5B7QteO69hKLDnh7vxS3 iAK3F3czCB9s6QQ9AhVC//Yklarj0FVZAvPXmf9zCkA+VO/UrK5zfvydqo/ekMUorKU7TCan4Wg rU2FmYQ== X-Received: from dlbvs2.prod.google.com ([2002:a05:7022:3f02:b0:128:d36e:eb6f]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7022:402:b0:11b:c1ab:bdd4 with SMTP id a92af1059eb24-12a96f04f69mr3338876c88.38.1774512525642; Thu, 26 Mar 2026 01:08:45 -0700 (PDT) Date: Thu, 26 Mar 2026 01:08:32 -0700 In-Reply-To: <20260326080836.695207-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260326080836.695207-1-surenb@google.com> X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog Message-ID: <20260326080836.695207-3-surenb@google.com> Subject: [PATCH v5 2/6] mm: use vma_start_write_killable() in mm syscalls From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, david@kernel.org, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, lorenzo.stoakes@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, vbabka@suse.cz, jannh@google.com, rppt@kernel.org, mhocko@suse.com, pfalcato@suse.de, kees@kernel.org, maddy@linux.ibm.com, npiggin@gmail.com, mpe@ellerman.id.au, chleroy@kernel.org, borntraeger@linux.ibm.com, frankja@linux.ibm.com, imbrenda@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, svens@linux.ibm.com, gerald.schaefer@linux.ibm.com, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replace vma_start_write() with vma_start_write_killable() in syscalls, improving reaction time to the kill signal. In a number of places we now lock VMA earlier than before to avoid doing work and undoing it later if a fatal signal is pending. This is safe because the moves are happening within sections where we already hold the mmap_write_lock, so the moves do not change the locking order relative to other kernel locks. Suggested-by: Matthew Wilcox Signed-off-by: Suren Baghdasaryan Reviewed-by: Lorenzo Stoakes (Oracle) --- mm/madvise.c | 4 +++- mm/memory.c | 2 ++ mm/mempolicy.c | 11 +++++++++-- mm/mlock.c | 28 ++++++++++++++++++++++------ mm/mprotect.c | 5 ++++- mm/mremap.c | 8 +++++--- mm/mseal.c | 5 ++++- 7 files changed, 49 insertions(+), 14 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 69708e953cf5..feaa16b0e1dc 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -175,7 +175,9 @@ static int madvise_update_vma(vm_flags_t new_flags, madv_behavior->vma =3D vma; =20 /* vm_flags is protected by the mmap_lock held in write mode. */ - vma_start_write(vma); + if (vma_start_write_killable(vma)) + return -EINTR; + vma->flags =3D new_vma_flags; if (set_new_anon_name) return replace_anon_vma_name(vma, anon_name); diff --git a/mm/memory.c b/mm/memory.c index e44469f9cf65..9f99ec634831 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -366,6 +366,8 @@ void free_pgd_range(struct mmu_gather *tlb, * page tables that should be removed. This can differ from the vma mappi= ngs on * some archs that may have mappings that need to be removed outside the v= mas. * Note that the prev->vm_end and next->vm_start are often used. + * We don't use vma_start_write_killable() because page tables should be f= reed + * even if the task is being killed. * * The vma_end differs from the pg_end when a dup_mmap() failed and the tr= ee has * unrelated data to the mm_struct being torn down. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index fd08771e2057..c38a90487531 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1784,7 +1784,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned lon= g, start, unsigned long, le return -EINVAL; if (end =3D=3D start) return 0; - mmap_write_lock(mm); + if (mmap_write_lock_killable(mm)) + return -EINTR; prev =3D vma_prev(&vmi); for_each_vma_range(vmi, vma, end) { /* @@ -1801,13 +1802,19 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned l= ong, start, unsigned long, le err =3D -EOPNOTSUPP; break; } + /* + * Lock the VMA early to avoid extra work if fatal signal + * is pending. + */ + err =3D vma_start_write_killable(vma); + if (err) + break; new =3D mpol_dup(old); if (IS_ERR(new)) { err =3D PTR_ERR(new); break; } =20 - vma_start_write(vma); new->home_node =3D home_node; err =3D mbind_range(&vmi, vma, &prev, start, end, new); mpol_put(new); diff --git a/mm/mlock.c b/mm/mlock.c index 8c227fefa2df..3d9147f3d404 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -419,8 +419,10 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long a= ddr, * * Called for mlock(), mlock2() and mlockall(), to set @vma VM_LOCKED; * called for munlock() and munlockall(), to clear VM_LOCKED from @vma. + * + * Return: 0 on success, -EINTR if fatal signal is pending. */ -static void mlock_vma_pages_range(struct vm_area_struct *vma, +static int mlock_vma_pages_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, vma_flags_t *new_vma_flags) { @@ -442,7 +444,9 @@ static void mlock_vma_pages_range(struct vm_area_struct= *vma, */ if (vma_flags_test(new_vma_flags, VMA_LOCKED_BIT)) vma_flags_set(new_vma_flags, VMA_IO_BIT); - vma_start_write(vma); + if (vma_start_write_killable(vma)) + return -EINTR; + vma_flags_reset_once(vma, new_vma_flags); =20 lru_add_drain(); @@ -453,6 +457,7 @@ static void mlock_vma_pages_range(struct vm_area_struct= *vma, vma_flags_clear(new_vma_flags, VMA_IO_BIT); vma_flags_reset_once(vma, new_vma_flags); } + return 0; } =20 /* @@ -506,11 +511,13 @@ static int mlock_fixup(struct vma_iterator *vmi, stru= ct vm_area_struct *vma, */ if (vma_flags_test(&new_vma_flags, VMA_LOCKED_BIT) && vma_flags_test(&old_vma_flags, VMA_LOCKED_BIT)) { + ret =3D vma_start_write_killable(vma); + if (ret) + goto out; /* mm->locked_vm is fine as nr_pages =3D=3D 0 */ /* No work to do, and mlocking twice would be wrong */ - vma_start_write(vma); vma->flags =3D new_vma_flags; } else { - mlock_vma_pages_range(vma, start, end, &new_vma_flags); + ret =3D mlock_vma_pages_range(vma, start, end, &new_vma_flags); } out: *prev =3D vma; @@ -739,9 +746,18 @@ static int apply_mlockall_flags(int flags) =20 error =3D mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end, newflags); - /* Ignore errors, but prev needs fixing up. */ - if (error) + if (error) { + /* + * If we failed due to a pending fatal signal, return + * now. If we locked the vma before signal arrived, it + * will be unlocked when we drop mmap_write_lock. + */ + if (fatal_signal_pending(current)) + return -EINTR; + + /* Ignore errors, but prev needs fixing up. */ prev =3D vma; + } cond_resched(); } out: diff --git a/mm/mprotect.c b/mm/mprotect.c index 110d47a36d4b..ae6ed882b600 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -768,7 +768,10 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_ga= ther *tlb, * vm_flags and vm_page_prot are protected by the mmap_lock * held in write mode. */ - vma_start_write(vma); + error =3D vma_start_write_killable(vma); + if (error) + goto fail; + vma_flags_reset_once(vma, &new_vma_flags); if (vma_wants_manual_pte_write_upgrade(vma)) mm_cp_flags |=3D MM_CP_TRY_CHANGE_WRITABLE; diff --git a/mm/mremap.c b/mm/mremap.c index e9c8b1d05832..0860102bddab 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1348,6 +1348,11 @@ static unsigned long move_vma(struct vma_remap_struc= t *vrm) if (err) return err; =20 + /* We don't want racing faults. */ + err =3D vma_start_write_killable(vrm->vma); + if (err) + return err; + /* * If accounted, determine the number of bytes the operation will * charge. @@ -1355,9 +1360,6 @@ static unsigned long move_vma(struct vma_remap_struct= *vrm) if (!vrm_calc_charge(vrm)) return -ENOMEM; =20 - /* We don't want racing faults. */ - vma_start_write(vrm->vma); - /* Perform copy step. */ err =3D copy_vma_and_data(vrm, &new_vma); /* diff --git a/mm/mseal.c b/mm/mseal.c index 603df53ad267..3b7737ba7524 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -70,6 +70,7 @@ static int mseal_apply(struct mm_struct *mm, =20 if (!vma_test(vma, VMA_SEALED_BIT)) { vma_flags_t vma_flags =3D vma->flags; + int err; =20 vma_flags_set(&vma_flags, VMA_SEALED_BIT); =20 @@ -77,7 +78,9 @@ static int mseal_apply(struct mm_struct *mm, curr_end, &vma_flags); if (IS_ERR(vma)) return PTR_ERR(vma); - vma_start_write(vma); + err =3D vma_start_write_killable(vma); + if (err) + return err; vma_set_flags(vma, VMA_SEALED_BIT); } =20 --=20 2.53.0.1018.g2bb0e51243-goog