From nobody Sat Feb 7 18:20:32 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38682236A7A for ; Wed, 26 Feb 2025 18:55:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740596117; cv=none; b=U58XMXnC02IsEKSdcPIJMUe06Eu8F9tia5rgzNX9IzAAwMRo6wBCdefFuYyg5sT6KfnJR5dCt9fIeydoKUUpg+Y0vP7kE3YF15mlf9MCTgqPrcoUFhdzswDFl4jmz6LGgqE7mXKmXT195qljVjGbZqOs9zq5BEW31UkLrkO963Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740596117; c=relaxed/simple; bh=Sd20295/OHQnaZMcApYJTd7Cd2Wh17OwbyGl6CLOl/4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ihIYdVs7tNLwUTXWNxgnG8gMPLaE4Z8GRltEmrNwVyGbSTYeJXir6FIUG1qkyscXKHIbmxQsmQzmBV87P/uBaU1/hlq2wCDnkX8xexHpwqhJVgwewZIUG/2m0UgOVpxDdao3z1vZ9STo/HuDC94Z5E34WFhkU5riTCqWxCJ2eK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=foF0lP3C; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="foF0lP3C" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fbff6426f5so321192a91.3 for ; Wed, 26 Feb 2025 10:55:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740596115; x=1741200915; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ds9BQrxGPLyNAd9raX4LmiVN9lc2UNQzgASQxqxWRTg=; b=foF0lP3Cr74kyUtIn44/nKFMNauhp4jTasME0Le7z+CsUlPla7NtcWrwBeGGkRjNvX Hapc4qPPRQ9ofync6fcNN1bssIDHMcUOB7I4/a01FBTP9H5bKYhTei/fGX3QxoUVkClu C5AbaZ6SbGHbvNlfR3YwFyVMmQjBA2VgRg79kSOir1YXmALhx/XFm/d3CI0lRlKEeB/B x3Db7sEB2EjOeXtRCFG7+95QUJwq9sQUGhrI307DtbM59iLG7G5gLTiBuMP7r+V61F2Y +mwjJn2NxUrpT9ohe8OSzb++syHHAd3KWE2tFyPWX+sQANnVzrpOBGp05sLT2aTjEdeY JXfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740596115; x=1741200915; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ds9BQrxGPLyNAd9raX4LmiVN9lc2UNQzgASQxqxWRTg=; b=AXdCEzwN3Z+l+XrO5ClN/02Nhy+2Gt+SsT5PD/KB5135BfkpgnH3Ir3fuVP73Ba21F if+KZjk+Lpg+x74nhee7pLl56rLWR/0gArI25fSBoJYVQ8n6aiaqUM+Poj/5U8ofIqfH C49xLxW8icsBtdqvFzXo+hqVT5fw7Ufuom268bqp2MJMggTjTz1UYQxwwYF3V4FALcmJ /4jgYZRg/rJaFzdiZvvwhjgcl+cYqqpCVAY4BMRoheVleyLP0KwSAsSL31odTyHd4ZIy U5MS/S4/sJd1x9oPF8t2q5Sz9J0nvxb+Rb3NahrkUewD5/nvPs7f/d1v5ZW8/dHVQhgv DAbQ== X-Forwarded-Encrypted: i=1; AJvYcCX6HgAWCqB9hhRcc8U8tejsMpocFNLwi2sMPlmpyJ6R3f+vi1Zm4a59J9S1ViItBZk7m7Jx8+UdyCSnXWc=@vger.kernel.org X-Gm-Message-State: AOJu0YyR7aE0ZT0H/XqOv5xUmutZhaFtG5R3kIWBzjLUYN/keLqSJG1E AdZbICdLT3Wgc5lc7NOI1Bfqve06iZLGiyX5GsR+ehgLhZZDG4PIdNaJ2hBAONBLjldlsjsdnG2 jqQ== X-Google-Smtp-Source: AGHT+IFcJCXuyF0wiZx7bOR1HEPmTXt8+rT2+rQ4qFqtP/5aygIpSvhmaCmY0HxoJ0P5eJwqc4YHPU/rkPM= X-Received: from pjboi16.prod.google.com ([2002:a17:90b:3a10:b0:2fc:1eb0:5743]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:54c7:b0:2ee:5111:a54b with SMTP id 98e67ed59e1d1-2fe68d05e2bmr12186917a91.31.1740596115492; Wed, 26 Feb 2025 10:55:15 -0800 (PST) Date: Wed, 26 Feb 2025 10:55:08 -0800 In-Reply-To: <20250226185510.2732648-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250226185510.2732648-1-surenb@google.com> X-Mailer: git-send-email 2.48.1.658.g4767266eb4-goog Message-ID: <20250226185510.2732648-2-surenb@google.com> Subject: [PATCH 1/2] userfaultfd: do not block on locking a large folio with raised refcount From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: lokeshgidra@google.com, aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com, david@redhat.com, peterx@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com, kaleshsingh@google.com, surenb@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock state when it goes into split_folio() with raised folio refcount. split_folio() expects the reference count to be exactly mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN otherwise. If multiple processes are trying to move the same large folio, they raise the refcount (all tasks succeed in that) then one of them succeeds in locking the folio, while others will block in folio_lock() while keeping the refcount raised. The winner of this race will proceed with calling split_folio() and will fail returning EAGAIN to the caller and unlocking the folio. The next competing process will get the folio locked and will go through the same flow. In the meantime the original winner will be retried and will block in folio_lock(), getting into the queue of waiting processes only to repeat the same path. All this results in a livelock. An easy fix would be to avoid waiting for the folio lock while holding folio refcount, similar to madvise_free_huge_pmd() where folio lock is acquired before raising the folio refcount. Since we lock and take a refcount of the folio while holding the PTE lock, changing the order of these operations should not break anything. Modify move_pages_pte() to try locking the folio first and if that fails and the folio is large then return EAGAIN without touching the folio refcount. If the folio is single-page then split_folio() is not called, so we don't have this issue. Lokesh has a reproducer [1] and I verified that this change fixes the issue. [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Reported-by: Lokesh Gidra Signed-off-by: Suren Baghdasaryan Reviewed-by: Peter Xu Cc: stable@vger.kernel.org Acked-by: Liam R. Howlett --- Note this patch is v2 of [2] but I did not bump up the version because now it's part of the patchset which is at its v1. Hopefully that's not too confusing. Changes since v1 [2]: - Rebased over mm-hotfixes-unstable to avoid merge conflicts with [3] - Added Reviewed-by, per Peter Xu - Added a note about PTL lock in the changelog, per Liam R. Howlett - CC'ed stable [2] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/ [3] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/ mm/userfaultfd.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 8eae4ea3cafd..e0f1e38ac5d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1250,6 +1250,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, */ if (!src_folio) { struct folio *folio; + bool locked; =20 /* * Pin the page while holding the lock to be sure the @@ -1269,12 +1270,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd, goto out; } =20 + locked =3D folio_trylock(folio); + /* + * We avoid waiting for folio lock with a raised refcount + * for large folios because extra refcounts will result in + * split_folio() failing later and retrying. If multiple + * tasks are trying to move a large folio we can end + * livelocking. + */ + if (!locked && folio_test_large(folio)) { + spin_unlock(src_ptl); + err =3D -EAGAIN; + goto out; + } + folio_get(folio); src_folio =3D folio; src_folio_pte =3D orig_src_pte; spin_unlock(src_ptl); =20 - if (!folio_trylock(src_folio)) { + if (!locked) { pte_unmap(&orig_src_pte); pte_unmap(&orig_dst_pte); src_pte =3D dst_pte =3D NULL; --=20 2.48.1.658.g4767266eb4-goog From nobody Sat Feb 7 18:20:32 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CAF4238171 for ; Wed, 26 Feb 2025 18:55:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740596119; cv=none; b=QWzWzWoi2ePL9FPSIeQjnGGucSVUBMza396Y6EUtvF0S9y5SzrmjGKpfq02YjlEaApnBG/YXuWOBJ5kePOMaz+nx4mbWdTJ10iymCgBW12fwF3qQfRLPos6gjqdgFYGArMnvppo2PVDsAhwUoeyWFdYbQ4tYLsbQ6ZkLcu6sibc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740596119; c=relaxed/simple; bh=DFxYwmAu0gMhoxnx+btdSl7Kr+kO5+f1+DLTpvLY7K4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=h+jVNKAcaxW28FoiMMPJ+Tyi2vC3UqBDCJALexPIRmYaqXB8X/sGABWl0BXWZf3n7I57dfL9/4nzbrHOSjhW0HWnlcbhcoSIEqmOsWVpVWSp+nbIz5N0Es/oorD1NqvgetUM1SNhllilwrguoCkrGknbmL35mi67/Y37Rsu4MJs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TvBT2RBT; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TvBT2RBT" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fc1e7efdffso474376a91.0 for ; Wed, 26 Feb 2025 10:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740596117; x=1741200917; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Qq/w9efOrw1qEDi9Bk8gaY9sQUDKf8Mk4VZDdis8uDc=; b=TvBT2RBT1nNEo9iB+0Igy/rWstsF1zvdj5KmKyBekWtHQ45bD08fY3vALVfIS+BBwd jPQSxrwe9EiitGKwcnPoqrH7/u/V0QyisA3MyZJWQ4dmKQ9Q//app03giSFub+oRVE/f 76ACM4VxXkNptr1SPzHh2lEj4DrtIi8Tagngzs/73PBV1g9WDcyduHr+RCeud7/i90V1 J8JZF+aQ7+gfWL+zfKfwDX/pNX7aA7emtCuJPR7JYbuScqfT4utCI6Ci0ofzHEoURmLA xzwMNk+mo6HwSB64/eNHRfe7usMqiNebhcxMW/5SUvqXiwGV6qSHnyVlKvpXWEWPB8Th fpYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740596117; x=1741200917; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Qq/w9efOrw1qEDi9Bk8gaY9sQUDKf8Mk4VZDdis8uDc=; b=ufFou9zMK3lyCQCfuXG+4wOXG4FGuB8p0LsmY4ipdmzE1VjfzK+7PPeD4XZ8G+JcD4 2jqHjHe954Jb9lPw/9bLew99IjOCH8upsl3HWEamASclpRcs1kTH+fY4BTZ5cSqhpwTL JEUST6c3J1gFtzBQ8PRMo4Cjb4LXQKzqtVWwqWTJBS7ztAzzXxz/Uarmyqm3QH6cbHcS fUw0YI5CE00w+ehQi+4z4wwWyh5SN6OkWDFYxera/wv4M2YyWSfIDEd5iRYEm1bIDXAd wXU7YgBr1fQmeu2cUhjBJ/qDFmo9yaTJAyiJJF3NMw4XHl1TFF7BFy2ol9Abu3KLI7s7 Unrw== X-Forwarded-Encrypted: i=1; AJvYcCX9aPC5ihvUW/VpDOSILJAxpIVlUfD+CuMoB0uNiT8ojpzdoyh6MyafIUWwMt4MB3f/A8EkijPDv5nfrBw=@vger.kernel.org X-Gm-Message-State: AOJu0YwBhyDhfNu5Fk2FdEmyde80Aq2UWf4EDMKSuWMIE0Rpq1Nzyxb7 HwLsacOiIKTvjdF+vXOrxnaJFc8UcS3PGwp2acPIVJNJPej7FK12NDqg/ieeur6RSiAvPDjRc0G xhQ== X-Google-Smtp-Source: AGHT+IEvcJQuoFon8k2xY5GFl4e5CXCv9wO73DmNtMZqod2CTTLE14+eDEJfscr06yWzvuu3MNI7/Hu/V4g= X-Received: from pjbqb10.prod.google.com ([2002:a17:90b:280a:b0:2e0:915d:d594]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2590:b0:2fa:d95:4501 with SMTP id 98e67ed59e1d1-2fe68ae7111mr15283755a91.18.1740596117509; Wed, 26 Feb 2025 10:55:17 -0800 (PST) Date: Wed, 26 Feb 2025 10:55:09 -0800 In-Reply-To: <20250226185510.2732648-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250226185510.2732648-1-surenb@google.com> X-Mailer: git-send-email 2.48.1.658.g4767266eb4-goog Message-ID: <20250226185510.2732648-3-surenb@google.com> Subject: [PATCH 2/2] userfaultfd: fix PTE unmapping stack-allocated PTE copies From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: lokeshgidra@google.com, aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com, david@redhat.com, peterx@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com, kaleshsingh@google.com, surenb@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Current implementation of move_pages_pte() copies source and destination PTEs in order to detect concurrent changes to PTEs involved in the move. However these copies are also used to unmap the PTEs, which will fail if CONFIG_HIGHPTE is enabled because the copies are allocated on the stack. Fix this by using the actual PTEs which were kmap()ed. Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Reported-by: Peter Xu Signed-off-by: Suren Baghdasaryan Cc: stable@vger.kernel.org Reviewed-by: Peter Xu --- mm/userfaultfd.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e0f1e38ac5d8..dda1c9a3662a 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1290,8 +1290,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, spin_unlock(src_ptl); =20 if (!locked) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte =3D dst_pte =3D NULL; /* now we can block and wait */ folio_lock(src_folio); @@ -1307,8 +1307,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, /* at this point we have src_folio locked */ if (folio_test_large(src_folio)) { /* split_folio() can block */ - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte =3D dst_pte =3D NULL; err =3D split_folio(src_folio); if (err) @@ -1333,8 +1333,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, goto out; } if (!anon_vma_trylock_write(src_anon_vma)) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte =3D dst_pte =3D NULL; /* now we can block and wait */ anon_vma_lock_write(src_anon_vma); @@ -1352,8 +1352,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, entry =3D pte_to_swp_entry(orig_src_pte); if (non_swap_entry(entry)) { if (is_migration_entry(entry)) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte =3D dst_pte =3D NULL; migration_entry_wait(mm, src_pmd, src_addr); err =3D -EAGAIN; @@ -1396,8 +1396,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t= *dst_pmd, pmd_t *src_pmd, src_folio =3D folio; src_folio_pte =3D orig_src_pte; if (!folio_trylock(src_folio)) { - pte_unmap(&orig_src_pte); - pte_unmap(&orig_dst_pte); + pte_unmap(src_pte); + pte_unmap(dst_pte); src_pte =3D dst_pte =3D NULL; /* now we can block and wait */ folio_lock(src_folio); --=20 2.48.1.658.g4767266eb4-goog