From nobody Thu Dec 18 01:54:06 2025 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DE691EB9F9 for ; Tue, 11 Feb 2025 07:28:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739258894; cv=none; b=B18+JdimrmWi9Ka5nJ4oWhLeQljI2cxj8v3VQ9l2lijB7pmk15GVBh89hNpeeT+GvjFzoaMketA7x3BwYAIUmurA3eH0xE9XnE08gyw3zSjNkhmKeKVXY8CGGd91413jrKlN3cweXVW9rzzAFqFletKZ8UeLWihIHQjJnjD0bRQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739258894; c=relaxed/simple; bh=xDphkzp7YOwassCx3klJnQSoSwa0N8eY8DjCLu7euf4=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=tkAr9iUNQKx93uG2Wz3mhJiPl/SQTtVMUNJcAKy49xDMfDMwHmtnqBSjK/TB4C14LKXkxAiCdt9/jigTX/pYRyaMZ/bSld8uqKORpHMUC1Mr/ewI7hzxBkJuLNdL56PXghdwWEb5DQkxJTJoaB7b1dq+8xjZA9xOsNuAvWO+KoI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=KAVy5F4q; arc=none smtp.client-ip=209.85.216.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="KAVy5F4q" Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-2fa21145217so8220116a91.3 for ; Mon, 10 Feb 2025 23:28:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1739258892; x=1739863692; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=u8JvwZe222+nCD5D0NlDaqFgopE4s8QEJ665TCjR17Q=; b=KAVy5F4qSoiLjA+7OYEDq7MqwzJY7gIB72ynzU1CY1DcFkTAYzC/kX8KGPZ4GVjiFW 7z5mYuPFyWdKsTLBD4pVsBrC/+0GvFUMq043BmjnaBPwuMrDzm23jsqQQo2eBj3LvhEi WRL39VjNnuXSErih5638WF0lFM/CWPj4h2sMCI4cFMEzYVLhyNNAUpYm6uVDou36ZZmx uy4myemNHB1NzCjeH5oxHorWvbUE8Y6iiaNgUJfYdcvq22d/6MwpxDAZcrZ6+PQZJXCP UcSNLgat75JZ3zPkTmFa46XKnkSui333Tiw1vy/epquGXZPYreHjQOsTNWB9Lr+TjD9i 0uRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739258892; x=1739863692; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=u8JvwZe222+nCD5D0NlDaqFgopE4s8QEJ665TCjR17Q=; b=E+fOa6sPxF3/qsO29yReE7b3JYHuTzaqL1yEsDNKg/tgSgVaHXATFaDT278Sz0iQof yC6mxWHdDzxKT9HbOahJfdZ/R5HA2910NaUiutjhB7kvGDdMbPzzFm89w4ba9MxRf+u7 WPo96GjYrQOMtstx7Yr9I897wEH25W90iIP6P0Z2ad7EndXS9Hvm16p8ehV1DDF+WaNN O5z73LTn3cMYRjG4xwCwWHwkgt3YCBelwZ1ZhTACo7eZptvtBMouh8ZWf/Vwjjy8EMAr IiJmqtRNSpt8e8PToCahKcN7G3Zq6oZWOB74SsLvjol78DM37SpIzckZljWVOGc04CsR udZw== X-Gm-Message-State: AOJu0YyNf+h5kPo9CTPzXj2yj1bQlJkTmsBF4N+OPK8DON7lg8c3WHbq PRB6VSri5cJ8YDDYpco6RHwACmJ268VB2ReFW5bdOVzrCqqb26vdSbvCT31c6dyRYrh37/NT9HS q X-Gm-Gg: ASbGncvzSiLxCaNl6AbhTDGiAAPV4GrcnEjeHBKZesdLYT8fb6VXe/Dg6Xx9ep8UQi6 gj5SJSqYWSPu8yAAuKfEIc79hwak3+RvTNSqWhFieRfFyNvbaY6tfkAwyMhg/fUf/Z86od9/2vj xmMDFoz1u9qLTBL8Mco5MzpCGb637EVMOgYK3FEE4/TxJ04rRZWU/RhcDfcn+kkCrYRQp6wKMWP 0uS6kaDDw+VrT4XmoG6Z6AI0VxQzfVC4ICKStpEwkLssWCWsAMHPunn2D2BipQItUZfr1FTIatW YhY7sDpbiWk1sVKJ+i3PDAsQeRmRimVIjd/M/OfrYMv+dNxewEuuXP5g X-Google-Smtp-Source: AGHT+IEGFQNt5sPJaZzJIBOUWjXNir7ONpzC/zGumrQumqkRxYesmObQGhXHsFIIUFr5ys2Z+8JnCg== X-Received: by 2002:a05:6a21:350d:b0:1ed:9e58:5195 with SMTP id adf61e73a8af0-1ee03a45ccdmr32137005637.13.1739258891851; Mon, 10 Feb 2025 23:28:11 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.150]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-ad54a066811sm3946778a12.8.2025.02.10.23.28.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Feb 2025 23:28:11 -0800 (PST) From: Qi Zheng To: brauner@kernel.org, willy@infradead.org, ziy@nvidia.com, quwenruo.btrfs@gmx.com, david@redhat.com, jannh@google.com, akpm@linux-foundation.org, david@fromorbit.com, djwong@kernel.org, muchun.song@linux.dev Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Qi Zheng , stable@vger.kernel.org Subject: [PATCH] mm: pgtable: fix incorrect reclaim of non-empty PTE pages Date: Tue, 11 Feb 2025 15:26:25 +0800 Message-Id: <20250211072625.89188-1-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In zap_pte_range(), if the pte lock was released midway, the pte entries may be refilled with physical pages by another thread, which may cause a non-empty PTE page to be reclaimed and eventually cause the system to crash. To fix it, fall back to the slow path in this case to recheck if all pte entries are still none. Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_D= ONTNEED)") Reported-by: Christian Brauner Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c= 7@brauner/ Reported-by: Qu Wenruo Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gm= x.com/ Tested-by: Zi Yan Cc: stable@vger.kernel.org Signed-off-by: Qi Zheng --- mm/memory.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index a8196ae72e9ae..7c7193cb21248 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1721,7 +1721,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, pmd_t pmdval; unsigned long start =3D addr; bool can_reclaim_pt =3D reclaim_pt_is_enabled(start, end, details); - bool direct_reclaim =3D false; + bool direct_reclaim =3D true; int nr; =20 retry: @@ -1736,8 +1736,10 @@ static unsigned long zap_pte_range(struct mmu_gather= *tlb, do { bool any_skipped =3D false; =20 - if (need_resched()) + if (need_resched()) { + direct_reclaim =3D false; break; + } =20 nr =3D do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, &force_flush, &force_break, &any_skipped); @@ -1745,11 +1747,20 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, can_reclaim_pt =3D false; if (unlikely(force_break)) { addr +=3D nr * PAGE_SIZE; + direct_reclaim =3D false; break; } } while (pte +=3D nr, addr +=3D PAGE_SIZE * nr, addr !=3D end); =20 - if (can_reclaim_pt && addr =3D=3D end) + /* + * Fast path: try to hold the pmd lock and unmap the PTE page. + * + * If the pte lock was released midway (retry case), or if the attempt + * to hold the pmd lock failed, then we need to recheck all pte entries + * to ensure they are still none, thereby preventing the pte entries + * from being repopulated by another thread. + */ + if (can_reclaim_pt && direct_reclaim && addr =3D=3D end) direct_reclaim =3D try_get_and_clear_pmd(mm, pmd, &pmdval); =20 add_mm_rss_vec(mm, rss); --=20 2.20.1