From nobody Sun May 24 22:35:55 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAEDC3603FC for ; Wed, 20 May 2026 20:50:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310241; cv=none; b=kHHFk68rAZtRAGUk424Xx1M0Pmn/BxK9gBBP3QygPwXkre6iLRYpJLNix73KvAE7S6Wud1oQSmqkl7mt4E5NgPDftHSycT0mmJESLTzn1M7hF93T8IlGFn4z8U7PmUxWYIbGSMTHPosI9ScG3jfnwW/YBhjiysQvI7UgcjE6+7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310241; c=relaxed/simple; bh=VtCji9Vx3n0Qm0HWIP6UWTPSjZWdzR1UjYoNBHm1Y+c=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=O7kerQb9XxesfZNNMuRezHU0TBgAn0ueZgNm0piZugmyZUEBP2ECGwrDhB2eRqrzBqySOuh32vaSTggYzWBgmzc6zmfWrFEO2uZZvlwghPKVI80Z/yYJZoR2hYNCjVNVFznMsqS5wGbqi0d03K+9VnDGx8AP0aBenbuHsdZoQ34= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=e3ggfQ5i; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="e3ggfQ5i" Received: from pps.filterd (m0167071.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKOPGN1521460 for ; Wed, 20 May 2026 16:50:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=Vd43 LnA947EFq0/DCZQf4Ap5tt7EnbD+Y9KrusTIT/M=; b=e3ggfQ5iJZvEHRmVWX+j LpFK6H+LsUTgm8fyI6wL9f0vMt7W2oIbT9qn60r4FWKZqhpLkP3JPi064tiKO5DZ fkdq8VvYDIilfbURYABMrhkiDk6vnaAbF4FQsQ5CbvQQ693ktoG3RgD1IfVwDM02 GtQWDjriMuc858rYZhQ8SwA8zKivQg4b1oB+Qr69f6/kyIX0+1MuF1XwUUTlZGbR oBKUlf2O2jQAMQgaGGwYC/2k2+I1UR2RtYy86RcLqQb0I0x0/HoYiUO8PCbpOwzd LlLoeCgkDOCpeW+7VspTHmsDFE7B4hWWUbpO2aYngH6S4S0QVjQu8/n3rWcHI4dr 9g== Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4e9avn4t53-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:32 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-911cf6eb48cso1260157885a.1 for ; Wed, 20 May 2026 13:50:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310232; x=1779915032; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Vd43LnA947EFq0/DCZQf4Ap5tt7EnbD+Y9KrusTIT/M=; b=GV4f6e4zf8z6pJRDGxFe+zjL6DGia0U8V8TeEj6bzAvxGEWts3phMPXOd2hRO6q/Ap SOSxA0CmuR4gikRcqxdRuYuRQKAfaxo16N7JElCQ8FYc3wda9b+w+SnSnwqyT/AydDdm p5m9e31yO8lGgCwmYWzh9ej+FroU27XGhDRPLsDuewxwcZ4HHB5Ev+IFzfTgrqN4WMR2 gPxFuVjJWEGU6nSslpkWX5JVzV/ou1fIpIEGqYC67ZM7Ns1Hu/PuPsHdphSXY0N3iYa9 Dc93jTYveFmuNmQ19XDcOBLfgvmLKGHBmBf2j6J+w9WUy4SMjQrMBPoJlqVL3cfnexPK LDxQ== X-Forwarded-Encrypted: i=1; AFNElJ+3H2QiZ5HVhQtbbOCqmVJz4LyQDgn/484yfq2ZnhQeheDrED0NhVJOC0i6HF0F0jDOUpaBmfBHp7SDzKc=@vger.kernel.org X-Gm-Message-State: AOJu0Yzh6XwkTMVgxxQGoEQxLsmMNd+Y1VXkYlnoijV0SXaptiBXSzN1 1jSZ0Bi8BNDDVEMtO/9OWbECayIoxwU5gO1+xCOmCjqOK7mW+1iT/3eQgpgCfaTjaV0x5jj04uI QxKc5QPqrJ+dF/RG3uNuCTC9ABEwu/njR2yMzN9MU8+LG8jPsI5JXoUAymbpTOA== X-Gm-Gg: Acq92OGvHs+20Gu2hntEgiNpu1awWoto7rje0/2IGU4S9qWPObEpAMdrwMIfSAqs9GZ vQ0d621iiYSvTvFqvsXuo3BsSvGufv0JJAqQrcJ7RhKkNMUVZYnyJGgl69A7xz8YuudBcDMI7NT FDglVLLSrIlGN9rrlEFwUlQ0mQg4ANY4mbsYuepFjZHoOpN9KJxs3PUZvSVrikaBMDIfqcCwVcu UO0s3ydnqDVDxqeuGOTNlCCbcyAH8GHkC6a1XLZgzciR3LsckWts6t8v2iey5AWcqbc7w/q9CPI E3c5apwQ3gDuENwIbV4GsO6KofmjzmoZPpn4YKzQYH3O7E8iETZlcSVHRIj6+jKbweOtKvDq0OZ GTiMeiPHKzb98w4q+yaNoU4eREOO2G+MbgRDfCLNwDl4yN5ztAShvlOaRAV082Abt700= X-Received: by 2002:a05:620a:45a1:b0:90d:3af3:2a62 with SMTP id af79cd13be357-911d01b8480mr3765384685a.46.1779310231919; Wed, 20 May 2026 13:50:31 -0700 (PDT) X-Received: by 2002:a05:620a:45a1:b0:90d:3af3:2a62 with SMTP id af79cd13be357-911d01b8480mr3765379085a.46.1779310231448; Wed, 20 May 2026 13:50:31 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:30 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:52 -0400 Subject: [PATCH RFC 01/11] mm: add folio_wake_writeback() helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-1-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=1111; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=VtCji9Vx3n0Qm0HWIP6UWTPSjZWdzR1UjYoNBHm1Y+c=; b=azfgHe+wJUmJYcnwmRy9nfEahOjxOjPQSFoLVtJuTFiY0m9rg0Q8a5Ipg+OYQuIMW9Uilws3E gf20tFxHI6SAIQaIiaLwaqYFsnM1ib8jXHnImDriiGvJgm0NNJdZl85 X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Authority-Analysis: v=2.4 cv=S6TpBosP c=1 sm=1 tr=0 ts=6a0e1e98 cx=c_pps a=HLyN3IcIa5EE8TELMZ618Q==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=79PYxaXUQd1wl-QFWJnA:22 a=s_jWojCdVP8jdXr2cG4A:9 a=QEXdDO2ut3YA:10 a=bTQJ7kPSJx9SKPbeHEYW:22 X-Proofpoint-ORIG-GUID: COyqg_WoMrzYfNJ6Maxvxiu3PmAuqpvG X-Proofpoint-GUID: COyqg_WoMrzYfNJ6Maxvxiu3PmAuqpvG X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX6G+WpaF0CSnY d/4wNs8RdrYgxLAhxAI+S30fWyUcG925I34dkHv4RkqGPhz6hn8gRVw1V/UCKaS4H/DKnc1m3st gwh4hywqM6hcHu6NBPOSPIZDHHpfOQpn0j3KlLF+jvx66jZbQrVdau1uqQUlqbuC/MfQnS0/Jt6 FgUmWrKgtCVJD4L1hcwgvJK6RjNvW71F0h3zCvn2bKWI7b2kvjz0c4UGx3Th9mSLutWR/mHXdwh MkinNadooPYr4qUcTRj0cSIZ8/vq3Vtx6iyJRpK1JaU4m2PKo6X0NX6KdEkCRYguzkaQKP3YqEk SAcw3VmSYle+P7b6RmoEWu4Zyrw+V8nIbFuhCCIRc3EuvbEfjgFKYfns17Eap4Pe2Uxv42U70hF jrjTzV6pLqnjw7fck4hIQ8t8nuAXBoJs++ubEDnJdQqe+pIP7AHs+TKRJP21p/ud/QflCF7WrCB G7bfotMyU5k8ObzwWbw== X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=10 spamscore=0 suspectscore=0 phishscore=0 priorityscore=1501 clxscore=1015 adultscore=0 impostorscore=10 malwarescore=0 bulkscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 Add a folio_wake_writeback() wrapper for folio_wake_bit() for use in folio_end_writeback_no_dropbehind(), in preparation for moving the folio bit lock and wait queue code to a separate file. No functional change. Signed-off-by: Tal Zussman --- mm/filemap.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 5aaba0d3e81d..567742fbaff0 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1209,6 +1209,14 @@ static void folio_wake_bit(struct folio *folio, int = bit_nr) spin_unlock_irqrestore(&q->lock, flags); } =20 +/* + * Wake waiters on PG_writeback for @folio. + */ +static void folio_wake_writeback(struct folio *folio) +{ + folio_wake_bit(folio, PG_writeback); +} + /* * A choice of three behaviors for folio_wait_bit_common(): */ @@ -1664,7 +1672,7 @@ void folio_end_writeback_no_dropbehind(struct folio *= folio) } =20 if (__folio_end_writeback(folio)) - folio_wake_bit(folio, PG_writeback); + folio_wake_writeback(folio); =20 acct_reclaim_writeback(folio); } --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D113B37755A for ; Wed, 20 May 2026 20:50:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310249; cv=none; b=NI13VB6wWruLKqbsl8CZfzO36Loy1L4TskH62fV2NSiGX5s3mmvzNIaS1RWtAYDZEzoVs3dU1pKvuhWeyRDyeFTSZDntxIB5dXOCuxRJT23HWz3mWk9dWeN6862k9+w1WrkZGBraM9glVSsxTpBpug6XqfIPYoZttHNc1kLlB7s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310249; c=relaxed/simple; bh=5hWRNnMyazR8+zx31stM4RbPYDDKhrjlnmiCkgNsqt8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=EhSeWO2oDXsJPyK6MbkFngR5/3s9qzM01jgBnzAvr/SAMz0ykgnKpZkQ1O8Ta+VnMqGA5sqwZh2/54nh7e4vuzSFprjc1yh6IRPnfCiLpAqhmhn3iaDeRzWayRv+LWURlqQZzWEMLQznHo7FrGisS47jusx20RfbFnmqF8nIrwI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=kaHUHZAh; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="kaHUHZAh" Received: from pps.filterd (m0167069.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKOKO6709213 for ; Wed, 20 May 2026 16:50:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=lHY+ 0X/XJqFghNgGFrGEkGoxXASaR9D81loKQ4ufczw=; b=kaHUHZAh5ZMtAOg1RaAj fbvoq/zZamrz6Io2iKUo7qX8XFb2X2N4rSt8roOrwC/x3OdUaWhDCKDjUdOGViTv wUyo1b4PO6QxIUbl4HLUGdQKig6n1wo2iLN2YhAHMfDrFlUwGWu/9+jpMMiV9q4F CvidoNIBEpSziDNZr14flzKbcrTTlmS+iFpkI9bVeQNeQLPF7SlhURKthgnK1dE1 JCVuIc+UVoVmqL/TBqZ6D/kUh6IEYE94lQ9t4Xq/posBDxTg0GBppkSluc0p580X l5+gzT81I4cSwgz+QvpI0PpvTmydo724t/ENIYDVM9DIho8cC3xYgwVia92Jl2DE JQ== Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4e99ydd0qr-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:36 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-90ffa709073so1230013785a.1 for ; Wed, 20 May 2026 13:50:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310235; x=1779915035; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=lHY+0X/XJqFghNgGFrGEkGoxXASaR9D81loKQ4ufczw=; b=A6dp8nzNYnW1lECL6/sFgGL05snsq0FLIoTSrD2RfDpnTlPst02453MlqS3dxPKHx6 37dNkZ1fLXDT24RRqmfqbl4GL5be/AAKHqIArLj8BzgE1S3gziJCnsinZFZjkuhL4q7r ksi+FkHcG4gkVec2dMZ5kJ/Uh5aJwCFfXK9j8ukd9NQEvG3W8TXbskGvMjqsd5aP7VPO yW5hYzHUJzyG5lRi8Aek4iMtcu9zokmb2WAaPhBa+LjThZMkI3Pnf6Rki/0fzHnzpkka 7r1FVfPGwHcgXgYs8KPwEe2TUYrvtTSKU6Zp0SZtvVlZ0nSnz8W0wdT+u1VEGuDeIYKC vk7Q== X-Forwarded-Encrypted: i=1; AFNElJ8Ka4O7XILPSg7nB4KwITIPUpVFndcrNzv7nyic789dWL+kAg3qO3E2flu2M4RnsI5pyOwUINWlmogNgls=@vger.kernel.org X-Gm-Message-State: AOJu0Yy7HGiiZreSQ62iD1by+2Dnjs/zJZqL1uZbMZLND2LuIuUd2Dsk k46k4Z4PvbxEHuFFMwttf/mb6gLehXHjtVmXUTQTx98fjL/jkiNuZ/o9+m7l2h+NGGyJLO+B1pP +IuSbKJU2fRLeIc4XVxlL31wOBfdIcSpzlwJxvzp4y8/qdqe0BHXUsOYNTImNHg== X-Gm-Gg: Acq92OH4DxTcJStWaXdndlPr7MpaWSYiA31tVqa7+u3zlPO5ThE/uWl+wRtXZlKyTmp aKWhQdGyCBv4l0DQ10aK4x1Xr4bO3lej6KKykl8b9aO60fDoeayczxIH7zApFJYOHr4iTb+NQhu D8EenvUSSdcdNH6MAVRi9mUM6qFd1/dTfb18EQk6/40JkC7YFAhBUnX6leRROjgJqNwnOf9Ssf9 m84MOaKYkZuGaW8iaNGxCZ1boJUPEiXiUDjniyy1LrwNhP0nq/K8RN0Bi4qKjJrNNQIxCdkJvAW H1LmOO4vrfI4LofJcgw2tg8iFsjJgATlCI/tFxDn21Ftcs1lLsV0jPkrMX5t7vK19LHXzdrZjuR urVzbCFoSm/vZciqw7e+EVM8wRmlZN3TsLJb/uEbg82G7rPUlMmFvJ0Q3vu1Y0qF/3SU= X-Received: by 2002:a05:620a:2846:b0:90f:e17c:145e with SMTP id af79cd13be357-911cf4f4aecmr3823664385a.44.1779310234376; Wed, 20 May 2026 13:50:34 -0700 (PDT) X-Received: by 2002:a05:620a:2846:b0:90f:e17c:145e with SMTP id af79cd13be357-911cf4f4aecmr3823648685a.44.1779310232558; Wed, 20 May 2026 13:50:32 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:31 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:53 -0400 Subject: [PATCH RFC 02/11] folio_wait: move folio bit-lock and wait implementation to mm/folio_wait.c Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-2-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=42905; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=5hWRNnMyazR8+zx31stM4RbPYDDKhrjlnmiCkgNsqt8=; b=eRUUmgWS62zn8ouZx6cpoJmD4CATyRCEyl9KcvZyyVYzetqLLPFRD0R2kKNpcyQc/Q6p0uCUD VfjB659XL0qAPPVzPTGOzsuozp6E1QlkBNZAb9PnDkUWJ1K+B3thAGm X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: 8-t8aEt-pDDjEurCvNOq10K536wxVWf8 X-Authority-Analysis: v=2.4 cv=TbKmcxQh c=1 sm=1 tr=0 ts=6a0e1e9c cx=c_pps a=HLyN3IcIa5EE8TELMZ618Q==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=JR4YdQiviy7OQf72WyZ1:22 a=03yOF3r4bum_QfZ1A0oA:9 a=QEXdDO2ut3YA:10 a=O8hF6Hzn-FEA:10 a=bTQJ7kPSJx9SKPbeHEYW:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX90ZfKTAlcNVC bbp0OmwgCWyzFAEPPKwfCdKu4myT+lyFtvecVyCFuBtEWED1EXXpSP42SRUFgt9lcxUWuK/uQyM yBvEwEsfz0bilej4CA5urcArzDks7OPVpWcQnOEifZlkqP4UZZk9WCq+PQyTyk2BqVc7JVpS6aR edqtOohI/k7h16rSU0Aj7MwVxbXaqkCujjMmIcht8ilys5U3sJT58JwY3E8V6jaQdjhmZlxd7EM 1zrpD2wvdDaFaYl3yoRJHZUPZdBINgHZaurqEXKUYLweX1yjXodZ0TwNwefncVI9Oge4kqMdIuA b0gryV2qtGz57DY0oEjl6HD3tT2mUJ7QrdBcoCBmsNmC8C5C4C3+Ekqgq/vJPS9Kequ6kSMje58 4XCNmqonZDVCsb9VZ341nmZre47HsIa0lqg855y2lBgHt/u4SNeJ+Lqqp04/sK84k15mn2XgzFu MdIRK/POsodkrPVQ/RA== X-Proofpoint-ORIG-GUID: 8-t8aEt-pDDjEurCvNOq10K536wxVWf8 X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=10 clxscore=1015 priorityscore=1501 bulkscore=10 suspectscore=0 malwarescore=0 spamscore=0 impostorscore=10 phishscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 mm/filemap.c contains ~600 lines of folio bit-lock and wait queue infrastructure that is logically separate from the page cache. Move it into a new file, mm/folio_wait.c. folio_wake_writeback(), folio_put_wait_locked(), and __folio_lock_async() are made non-static and declared in mm/internal.h, as they are still used in filemap.c. pagecache_init() is refactored to call folio_wait_init(), which initializes the wait queue table and page_lock_unfairness sysctl. filemap_sysctl_table is renamed to folio_wait_sysctl_table. Signed-off-by: Tal Zussman --- mm/Makefile | 2 +- mm/filemap.c | 640 +---------------------------------------------------= -- mm/folio_wait.c | 662 ++++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ mm/internal.h | 4 + 4 files changed, 668 insertions(+), 640 deletions(-) diff --git a/mm/Makefile b/mm/Makefile index 8ad2ab08244e..65ce5afe7692 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -52,7 +52,7 @@ obj-y :=3D filemap.o mempool.o oom_kill.o fadvise.o \ maccess.o page-writeback.o folio-compat.o \ readahead.o swap.o truncate.o vmscan.o shrinker.o \ shmem.o util.o mmzone.o vmstat.o backing-dev.o \ - mm_init.o percpu.o slab_common.o \ + mm_init.o percpu.o slab_common.o folio_wait.o \ compaction.o show_mem.o \ interval_tree.o list_lru.o workingset.o \ debug.o gup.o mmap_lock.o vma_init.o $(mmu-y) diff --git a/mm/filemap.c b/mm/filemap.c index 567742fbaff0..079f9c3ac8a2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1053,561 +1053,12 @@ void filemap_invalidate_unlock_two(struct address_= space *mapping1, } EXPORT_SYMBOL(filemap_invalidate_unlock_two); =20 -/* - * In order to wait for pages to become available there must be - * waitqueues associated with pages. By using a hash table of - * waitqueues where the bucket discipline is to maintain all - * waiters on the same queue and wake all when any of the pages - * become available, and for the woken contexts to check to be - * sure the appropriate page became available, this saves space - * at a cost of "thundering herd" phenomena during rare hash - * collisions. - */ -#define PAGE_WAIT_TABLE_BITS 8 -#define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) -static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] __cachelin= e_aligned; - -static wait_queue_head_t *folio_waitqueue(struct folio *folio) -{ - return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)]; -} - -/* How many times do we accept lock stealing from under a waiter? */ -static int sysctl_page_lock_unfairness =3D 5; -static const struct ctl_table filemap_sysctl_table[] =3D { - { - .procname =3D "page_lock_unfairness", - .data =3D &sysctl_page_lock_unfairness, - .maxlen =3D sizeof(sysctl_page_lock_unfairness), - .mode =3D 0644, - .proc_handler =3D proc_dointvec_minmax, - .extra1 =3D SYSCTL_ZERO, - } -}; - void __init pagecache_init(void) { - int i; - - for (i =3D 0; i < PAGE_WAIT_TABLE_SIZE; i++) - init_waitqueue_head(&folio_wait_table[i]); - + folio_wait_init(); page_writeback_init(); - register_sysctl_init("vm", filemap_sysctl_table); -} - -/* - * The page wait code treats the "wait->flags" somewhat unusually, because - * we have multiple different kinds of waits, not just the usual "exclusiv= e" - * one. - * - * We have: - * - * (a) no special bits set: - * - * We're just waiting for the bit to be released, and when a waker - * calls the wakeup function, we set WQ_FLAG_WOKEN and wake it up, - * and remove it from the wait queue. - * - * Simple and straightforward. - * - * (b) WQ_FLAG_EXCLUSIVE: - * - * The waiter is waiting to get the lock, and only one waiter should - * be woken up to avoid any thundering herd behavior. We'll set the - * WQ_FLAG_WOKEN bit, wake it up, and remove it from the wait queue. - * - * This is the traditional exclusive wait. - * - * (c) WQ_FLAG_EXCLUSIVE | WQ_FLAG_CUSTOM: - * - * The waiter is waiting to get the bit, and additionally wants the - * lock to be transferred to it for fair lock behavior. If the lock - * cannot be taken, we stop walking the wait queue without waking - * the waiter. - * - * This is the "fair lock handoff" case, and in addition to setting - * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see - * that it now has the lock. - */ -static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int= sync, void *arg) -{ - unsigned int flags; - struct wait_page_key *key =3D arg; - struct wait_page_queue *wait_page - =3D container_of(wait, struct wait_page_queue, wait); - - if (!wake_page_match(wait_page, key)) - return 0; - - /* - * If it's a lock handoff wait, we get the bit for it, and - * stop walking (and do not wake it up) if we can't. - */ - flags =3D wait->flags; - if (flags & WQ_FLAG_EXCLUSIVE) { - if (test_bit(key->bit_nr, &key->folio->flags.f)) - return -1; - if (flags & WQ_FLAG_CUSTOM) { - if (test_and_set_bit(key->bit_nr, &key->folio->flags.f)) - return -1; - flags |=3D WQ_FLAG_DONE; - } - } - - /* - * We are holding the wait-queue lock, but the waiter that - * is waiting for this will be checking the flags without - * any locking. - * - * So update the flags atomically, and wake up the waiter - * afterwards to avoid any races. This store-release pairs - * with the load-acquire in folio_wait_bit_common(). - */ - smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN); - wake_up_state(wait->private, mode); - - /* - * Ok, we have successfully done what we're waiting for, - * and we can unconditionally remove the wait entry. - * - * Note that this pairs with the "finish_wait()" in the - * waiter, and has to be the absolute last thing we do. - * After this list_del_init(&wait->entry) the wait entry - * might be de-allocated and the process might even have - * exited. - */ - list_del_init_careful(&wait->entry); - return (flags & WQ_FLAG_EXCLUSIVE) !=3D 0; -} - -static void folio_wake_bit(struct folio *folio, int bit_nr) -{ - wait_queue_head_t *q =3D folio_waitqueue(folio); - struct wait_page_key key; - unsigned long flags; - - key.folio =3D folio; - key.bit_nr =3D bit_nr; - key.page_match =3D 0; - - spin_lock_irqsave(&q->lock, flags); - __wake_up_locked_key(q, TASK_NORMAL, &key); - - /* - * It's possible to miss clearing waiters here, when we woke our page - * waiters, but the hashed waitqueue has waiters for other pages on it. - * That's okay, it's a rare case. The next waker will clear it. - * - * Note that, depending on the page pool (buddy, hugetlb, ZONE_DEVICE, - * other), the flag may be cleared in the course of freeing the page; - * but that is not required for correctness. - */ - if (!waitqueue_active(q) || !key.page_match) - folio_clear_waiters(folio); - - spin_unlock_irqrestore(&q->lock, flags); -} - -/* - * Wake waiters on PG_writeback for @folio. - */ -static void folio_wake_writeback(struct folio *folio) -{ - folio_wake_bit(folio, PG_writeback); -} - -/* - * A choice of three behaviors for folio_wait_bit_common(): - */ -enum behavior { - EXCLUSIVE, /* Hold ref to page and take the bit when woken, like - * __folio_lock() waiting on then setting PG_locked. - */ - SHARED, /* Hold ref to page and check the bit when woken, like - * folio_wait_writeback() waiting on PG_writeback. - */ - DROP, /* Drop ref to page before wait, no check when woken, - * like folio_put_wait_locked() on PG_locked. - */ -}; - -/* - * Attempt to check (or get) the folio flag, and mark us done - * if successful. - */ -static inline bool folio_trylock_flag(struct folio *folio, int bit_nr, - struct wait_queue_entry *wait) -{ - if (wait->flags & WQ_FLAG_EXCLUSIVE) { - if (test_and_set_bit(bit_nr, &folio->flags.f)) - return false; - } else if (test_bit(bit_nr, &folio->flags.f)) - return false; - - wait->flags |=3D WQ_FLAG_WOKEN | WQ_FLAG_DONE; - return true; -} - -static inline int folio_wait_bit_common(struct folio *folio, int bit_nr, - int state, enum behavior behavior) -{ - wait_queue_head_t *q =3D folio_waitqueue(folio); - int unfairness =3D sysctl_page_lock_unfairness; - struct wait_page_queue wait_page; - wait_queue_entry_t *wait =3D &wait_page.wait; - bool thrashing =3D false; - unsigned long pflags; - bool in_thrashing; - - if (bit_nr =3D=3D PG_locked && - !folio_test_uptodate(folio) && folio_test_workingset(folio)) { - delayacct_thrashing_start(&in_thrashing); - psi_memstall_enter(&pflags); - thrashing =3D true; - } - - init_wait(wait); - wait->func =3D wake_page_function; - wait_page.folio =3D folio; - wait_page.bit_nr =3D bit_nr; - -repeat: - wait->flags =3D 0; - if (behavior =3D=3D EXCLUSIVE) { - wait->flags =3D WQ_FLAG_EXCLUSIVE; - if (--unfairness < 0) - wait->flags |=3D WQ_FLAG_CUSTOM; - } - - /* - * Do one last check whether we can get the - * page bit synchronously. - * - * Do the folio_set_waiters() marking before that - * to let any waker we _just_ missed know they - * need to wake us up (otherwise they'll never - * even go to the slow case that looks at the - * page queue), and add ourselves to the wait - * queue if we need to sleep. - * - * This part needs to be done under the queue - * lock to avoid races. - */ - spin_lock_irq(&q->lock); - folio_set_waiters(folio); - if (!folio_trylock_flag(folio, bit_nr, wait)) - __add_wait_queue_entry_tail(q, wait); - spin_unlock_irq(&q->lock); - - /* - * From now on, all the logic will be based on - * the WQ_FLAG_WOKEN and WQ_FLAG_DONE flag, to - * see whether the page bit testing has already - * been done by the wake function. - * - * We can drop our reference to the folio. - */ - if (behavior =3D=3D DROP) - folio_put(folio); - - /* - * Note that until the "finish_wait()", or until - * we see the WQ_FLAG_WOKEN flag, we need to - * be very careful with the 'wait->flags', because - * we may race with a waker that sets them. - */ - for (;;) { - unsigned int flags; - - set_current_state(state); - - /* Loop until we've been woken or interrupted */ - flags =3D smp_load_acquire(&wait->flags); - if (!(flags & WQ_FLAG_WOKEN)) { - if (signal_pending_state(state, current)) - break; - - io_schedule(); - continue; - } - - /* If we were non-exclusive, we're done */ - if (behavior !=3D EXCLUSIVE) - break; - - /* If the waker got the lock for us, we're done */ - if (flags & WQ_FLAG_DONE) - break; - - /* - * Otherwise, if we're getting the lock, we need to - * try to get it ourselves. - * - * And if that fails, we'll have to retry this all. - */ - if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0)))) - goto repeat; - - wait->flags |=3D WQ_FLAG_DONE; - break; - } - - /* - * If a signal happened, this 'finish_wait()' may remove the last - * waiter from the wait-queues, but the folio waiters bit will remain - * set. That's ok. The next wakeup will take care of it, and trying - * to do it here would be difficult and prone to races. - */ - finish_wait(q, wait); - - if (thrashing) { - delayacct_thrashing_end(&in_thrashing); - psi_memstall_leave(&pflags); - } - - /* - * NOTE! The wait->flags weren't stable until we've done the - * 'finish_wait()', and we could have exited the loop above due - * to a signal, and had a wakeup event happen after the signal - * test but before the 'finish_wait()'. - * - * So only after the finish_wait() can we reliably determine - * if we got woken up or not, so we can now figure out the final - * return value based on that state without races. - * - * Also note that WQ_FLAG_WOKEN is sufficient for a non-exclusive - * waiter, but an exclusive one requires WQ_FLAG_DONE. - */ - if (behavior =3D=3D EXCLUSIVE) - return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR; - - return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR; } =20 -#ifdef CONFIG_MIGRATION -/** - * softleaf_entry_wait_on_locked - Wait for a migration entry or - * device_private entry to be removed. - * @entry: migration or device_private swap entry. - * @ptl: already locked ptl. This function will drop the lock. - * - * Wait for a migration entry referencing the given page, or device_private - * entry referencing a dvice_private page to be unlocked. This is - * equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except - * this can be called without taking a reference on the page. Instead this - * should be called while holding the ptl for @entry referencing - * the page. - * - * Returns after unlocking the ptl. - * - * This follows the same logic as folio_wait_bit_common() so see the comme= nts - * there. - */ -void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) - __releases(ptl) -{ - struct wait_page_queue wait_page; - wait_queue_entry_t *wait =3D &wait_page.wait; - bool thrashing =3D false; - unsigned long pflags; - bool in_thrashing; - wait_queue_head_t *q; - struct folio *folio =3D softleaf_to_folio(entry); - - q =3D folio_waitqueue(folio); - if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) { - delayacct_thrashing_start(&in_thrashing); - psi_memstall_enter(&pflags); - thrashing =3D true; - } - - init_wait(wait); - wait->func =3D wake_page_function; - wait_page.folio =3D folio; - wait_page.bit_nr =3D PG_locked; - wait->flags =3D 0; - - spin_lock_irq(&q->lock); - folio_set_waiters(folio); - if (!folio_trylock_flag(folio, PG_locked, wait)) - __add_wait_queue_entry_tail(q, wait); - spin_unlock_irq(&q->lock); - - /* - * If a migration entry exists for the page the migration path must hold - * a valid reference to the page, and it must take the ptl to remove the - * migration entry. So the page is valid until the ptl is dropped. - * Similarly any path attempting to drop the last reference to a - * device-private page needs to grab the ptl to remove the device-private - * entry. - */ - spin_unlock(ptl); - - for (;;) { - unsigned int flags; - - set_current_state(TASK_UNINTERRUPTIBLE); - - /* Loop until we've been woken or interrupted */ - flags =3D smp_load_acquire(&wait->flags); - if (!(flags & WQ_FLAG_WOKEN)) { - if (signal_pending_state(TASK_UNINTERRUPTIBLE, current)) - break; - - io_schedule(); - continue; - } - break; - } - - finish_wait(q, wait); - - if (thrashing) { - delayacct_thrashing_end(&in_thrashing); - psi_memstall_leave(&pflags); - } -} -#endif - -void folio_wait_bit(struct folio *folio, int bit_nr) -{ - folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED); -} -EXPORT_SYMBOL(folio_wait_bit); - -int folio_wait_bit_killable(struct folio *folio, int bit_nr) -{ - return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED); -} -EXPORT_SYMBOL(folio_wait_bit_killable); - -/** - * folio_put_wait_locked - Drop a reference and wait for it to be unlocked - * @folio: The folio to wait for. - * @state: The sleep state (TASK_KILLABLE, TASK_UNINTERRUPTIBLE, etc). - * - * The caller should hold a reference on @folio. They expect the page to - * become unlocked relatively soon, but do not wish to hold up migration - * (for example) by holding the reference while waiting for the folio to - * come unlocked. After this function returns, the caller should not - * dereference @folio. - * - * Return: 0 if the folio was unlocked or -EINTR if interrupted by a signa= l. - */ -static int folio_put_wait_locked(struct folio *folio, int state) -{ - return folio_wait_bit_common(folio, PG_locked, state, DROP); -} - -/** - * folio_unlock - Unlock a locked folio. - * @folio: The folio. - * - * Unlocks the folio and wakes up any thread sleeping on the page lock. - * - * Context: May be called from interrupt or process context. May not be - * called from NMI context. - */ -void folio_unlock(struct folio *folio) -{ - /* Bit 7 allows x86 to check the byte's sign bit */ - BUILD_BUG_ON(PG_waiters !=3D 7); - BUILD_BUG_ON(PG_locked > 7); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - if (folio_xor_flags_has_waiters(folio, 1 << PG_locked)) - folio_wake_bit(folio, PG_locked); -} -EXPORT_SYMBOL(folio_unlock); - -/** - * folio_end_read - End read on a folio. - * @folio: The folio. - * @success: True if all reads completed successfully. - * - * When all reads against a folio have completed, filesystems should - * call this function to let the pagecache know that no more reads - * are outstanding. This will unlock the folio and wake up any thread - * sleeping on the lock. The folio will also be marked uptodate if all - * reads succeeded. - * - * Context: May be called from interrupt or process context. May not be - * called from NMI context. - */ -void folio_end_read(struct folio *folio, bool success) -{ - unsigned long mask =3D 1 << PG_locked; - - /* Must be in bottom byte for x86 to work */ - BUILD_BUG_ON(PG_uptodate > 7); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio), folio); - - if (likely(success)) - mask |=3D 1 << PG_uptodate; - if (folio_xor_flags_has_waiters(folio, mask)) - folio_wake_bit(folio, PG_locked); -} -EXPORT_SYMBOL(folio_end_read); - -/** - * folio_end_private_2 - Clear PG_private_2 and wake any waiters. - * @folio: The folio. - * - * Clear the PG_private_2 bit on a folio and wake up any sleepers waiting = for - * it. The folio reference held for PG_private_2 being set is released. - * - * This is, for example, used when a netfs folio is being written to a loc= al - * disk cache, thereby allowing writes to the cache for the same folio to = be - * serialised. - */ -void folio_end_private_2(struct folio *folio) -{ - VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio); - clear_bit_unlock(PG_private_2, folio_flags(folio, 0)); - folio_wake_bit(folio, PG_private_2); - folio_put(folio); -} -EXPORT_SYMBOL(folio_end_private_2); - -/** - * folio_wait_private_2 - Wait for PG_private_2 to be cleared on a folio. - * @folio: The folio to wait on. - * - * Wait for PG_private_2 to be cleared on a folio. - */ -void folio_wait_private_2(struct folio *folio) -{ - while (folio_test_private_2(folio)) - folio_wait_bit(folio, PG_private_2); -} -EXPORT_SYMBOL(folio_wait_private_2); - -/** - * folio_wait_private_2_killable - Wait for PG_private_2 to be cleared on = a folio. - * @folio: The folio to wait on. - * - * Wait for PG_private_2 to be cleared on a folio or until a fatal signal = is - * received by the calling task. - * - * Return: - * - 0 if successful. - * - -EINTR if a fatal signal was encountered. - */ -int folio_wait_private_2_killable(struct folio *folio) -{ - int ret =3D 0; - - while (folio_test_private_2(folio)) { - ret =3D folio_wait_bit_killable(folio, PG_private_2); - if (ret < 0) - break; - } - - return ret; -} -EXPORT_SYMBOL(folio_wait_private_2_killable); - static void filemap_end_dropbehind(struct folio *folio) { struct address_space *mapping =3D folio->mapping; @@ -1703,95 +1154,6 @@ void folio_end_writeback(struct folio *folio) } EXPORT_SYMBOL(folio_end_writeback); =20 -/** - * __folio_lock - Get a lock on the folio, assuming we need to sleep to ge= t it. - * @folio: The folio to lock - */ -void __folio_lock(struct folio *folio) -{ - folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE, - EXCLUSIVE); -} -EXPORT_SYMBOL(__folio_lock); - -int __folio_lock_killable(struct folio *folio) -{ - return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE, - EXCLUSIVE); -} -EXPORT_SYMBOL_GPL(__folio_lock_killable); - -static int __folio_lock_async(struct folio *folio, struct wait_page_queue = *wait) -{ - struct wait_queue_head *q =3D folio_waitqueue(folio); - int ret; - - wait->folio =3D folio; - wait->bit_nr =3D PG_locked; - - spin_lock_irq(&q->lock); - __add_wait_queue_entry_tail(q, &wait->wait); - folio_set_waiters(folio); - ret =3D !folio_trylock(folio); - /* - * If we were successful now, we know we're still on the - * waitqueue as we're still under the lock. This means it's - * safe to remove and return success, we know the callback - * isn't going to trigger. - */ - if (!ret) - __remove_wait_queue(q, &wait->wait); - else - ret =3D -EIOCBQUEUED; - spin_unlock_irq(&q->lock); - return ret; -} - -/* - * Return values: - * 0 - folio is locked. - * non-zero - folio is not locked. - * mmap_lock or per-VMA lock has been released (mmap_read_unlock() or - * vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY and - * FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still held. - * - * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 - * with the folio locked and the mmap_lock/per-VMA lock is left unperturbe= d. - */ -vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) -{ - unsigned int flags =3D vmf->flags; - - if (fault_flag_allow_retry_first(flags)) { - /* - * CAUTION! In this case, mmap_lock/per-VMA lock is not - * released even though returning VM_FAULT_RETRY. - */ - if (flags & FAULT_FLAG_RETRY_NOWAIT) - return VM_FAULT_RETRY; - - release_fault_lock(vmf); - if (flags & FAULT_FLAG_KILLABLE) - folio_wait_locked_killable(folio); - else - folio_wait_locked(folio); - return VM_FAULT_RETRY; - } - if (flags & FAULT_FLAG_KILLABLE) { - bool ret; - - ret =3D __folio_lock_killable(folio); - if (ret) { - release_fault_lock(vmf); - return VM_FAULT_RETRY; - } - } else { - __folio_lock(folio); - } - - return 0; -} - /** * page_cache_next_miss() - Find the next gap in the page cache. * @mapping: Mapping. diff --git a/mm/folio_wait.c b/mm/folio_wait.c new file mode 100644 index 000000000000..18b42488ce37 --- /dev/null +++ b/mm/folio_wait.c @@ -0,0 +1,662 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Folio bit-lock and wait-queue infrastructure. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "internal.h" + +/* + * In order to wait for pages to become available there must be + * waitqueues associated with pages. By using a hash table of + * waitqueues where the bucket discipline is to maintain all + * waiters on the same queue and wake all when any of the pages + * become available, and for the woken contexts to check to be + * sure the appropriate page became available, this saves space + * at a cost of "thundering herd" phenomena during rare hash + * collisions. + */ +#define PAGE_WAIT_TABLE_BITS 8 +#define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) +static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] __cachelin= e_aligned; + +static wait_queue_head_t *folio_waitqueue(struct folio *folio) +{ + return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)]; +} + +/* How many times do we accept lock stealing from under a waiter? */ +static int sysctl_page_lock_unfairness =3D 5; +static const struct ctl_table folio_wait_sysctl_table[] =3D { + { + .procname =3D "page_lock_unfairness", + .data =3D &sysctl_page_lock_unfairness, + .maxlen =3D sizeof(sysctl_page_lock_unfairness), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D SYSCTL_ZERO, + } +}; + +void __init folio_wait_init(void) +{ + int i; + + for (i =3D 0; i < PAGE_WAIT_TABLE_SIZE; i++) + init_waitqueue_head(&folio_wait_table[i]); + + register_sysctl_init("vm", folio_wait_sysctl_table); +} + +/* + * The page wait code treats the "wait->flags" somewhat unusually, because + * we have multiple different kinds of waits, not just the usual "exclusiv= e" + * one. + * + * We have: + * + * (a) no special bits set: + * + * We're just waiting for the bit to be released, and when a waker + * calls the wakeup function, we set WQ_FLAG_WOKEN and wake it up, + * and remove it from the wait queue. + * + * Simple and straightforward. + * + * (b) WQ_FLAG_EXCLUSIVE: + * + * The waiter is waiting to get the lock, and only one waiter should + * be woken up to avoid any thundering herd behavior. We'll set the + * WQ_FLAG_WOKEN bit, wake it up, and remove it from the wait queue. + * + * This is the traditional exclusive wait. + * + * (c) WQ_FLAG_EXCLUSIVE | WQ_FLAG_CUSTOM: + * + * The waiter is waiting to get the bit, and additionally wants the + * lock to be transferred to it for fair lock behavior. If the lock + * cannot be taken, we stop walking the wait queue without waking + * the waiter. + * + * This is the "fair lock handoff" case, and in addition to setting + * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see + * that it now has the lock. + */ +static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int= sync, void *arg) +{ + unsigned int flags; + struct wait_page_key *key =3D arg; + struct wait_page_queue *wait_page + =3D container_of(wait, struct wait_page_queue, wait); + + if (!wake_page_match(wait_page, key)) + return 0; + + /* + * If it's a lock handoff wait, we get the bit for it, and + * stop walking (and do not wake it up) if we can't. + */ + flags =3D wait->flags; + if (flags & WQ_FLAG_EXCLUSIVE) { + if (test_bit(key->bit_nr, &key->folio->flags.f)) + return -1; + if (flags & WQ_FLAG_CUSTOM) { + if (test_and_set_bit(key->bit_nr, &key->folio->flags.f)) + return -1; + flags |=3D WQ_FLAG_DONE; + } + } + + /* + * We are holding the wait-queue lock, but the waiter that + * is waiting for this will be checking the flags without + * any locking. + * + * So update the flags atomically, and wake up the waiter + * afterwards to avoid any races. This store-release pairs + * with the load-acquire in folio_wait_bit_common(). + */ + smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN); + wake_up_state(wait->private, mode); + + /* + * Ok, we have successfully done what we're waiting for, + * and we can unconditionally remove the wait entry. + * + * Note that this pairs with the "finish_wait()" in the + * waiter, and has to be the absolute last thing we do. + * After this list_del_init(&wait->entry) the wait entry + * might be de-allocated and the process might even have + * exited. + */ + list_del_init_careful(&wait->entry); + return (flags & WQ_FLAG_EXCLUSIVE) !=3D 0; +} + +static void folio_wake_bit(struct folio *folio, int bit_nr) +{ + wait_queue_head_t *q =3D folio_waitqueue(folio); + struct wait_page_key key; + unsigned long flags; + + key.folio =3D folio; + key.bit_nr =3D bit_nr; + key.page_match =3D 0; + + spin_lock_irqsave(&q->lock, flags); + __wake_up_locked_key(q, TASK_NORMAL, &key); + + /* + * It's possible to miss clearing waiters here, when we woke our page + * waiters, but the hashed waitqueue has waiters for other pages on it. + * That's okay, it's a rare case. The next waker will clear it. + * + * Note that, depending on the page pool (buddy, hugetlb, ZONE_DEVICE, + * other), the flag may be cleared in the course of freeing the page; + * but that is not required for correctness. + */ + if (!waitqueue_active(q) || !key.page_match) + folio_clear_waiters(folio); + + spin_unlock_irqrestore(&q->lock, flags); +} + +/* + * Wake waiters on PG_writeback for @folio. + */ +void folio_wake_writeback(struct folio *folio) +{ + folio_wake_bit(folio, PG_writeback); +} + +/* + * A choice of three behaviors for folio_wait_bit_common(): + */ +enum behavior { + EXCLUSIVE, /* Hold ref to page and take the bit when woken, like + * __folio_lock() waiting on then setting PG_locked. + */ + SHARED, /* Hold ref to page and check the bit when woken, like + * folio_wait_writeback() waiting on PG_writeback. + */ + DROP, /* Drop ref to page before wait, no check when woken, + * like folio_put_wait_locked() on PG_locked. + */ +}; + +/* + * Attempt to check (or get) the folio flag, and mark us done + * if successful. + */ +static inline bool folio_trylock_flag(struct folio *folio, int bit_nr, + struct wait_queue_entry *wait) +{ + if (wait->flags & WQ_FLAG_EXCLUSIVE) { + if (test_and_set_bit(bit_nr, &folio->flags.f)) + return false; + } else if (test_bit(bit_nr, &folio->flags.f)) + return false; + + wait->flags |=3D WQ_FLAG_WOKEN | WQ_FLAG_DONE; + return true; +} + +static inline int folio_wait_bit_common(struct folio *folio, int bit_nr, + int state, enum behavior behavior) +{ + wait_queue_head_t *q =3D folio_waitqueue(folio); + int unfairness =3D sysctl_page_lock_unfairness; + struct wait_page_queue wait_page; + wait_queue_entry_t *wait =3D &wait_page.wait; + bool thrashing =3D false; + unsigned long pflags; + bool in_thrashing; + + if (bit_nr =3D=3D PG_locked && + !folio_test_uptodate(folio) && folio_test_workingset(folio)) { + delayacct_thrashing_start(&in_thrashing); + psi_memstall_enter(&pflags); + thrashing =3D true; + } + + init_wait(wait); + wait->func =3D wake_page_function; + wait_page.folio =3D folio; + wait_page.bit_nr =3D bit_nr; + +repeat: + wait->flags =3D 0; + if (behavior =3D=3D EXCLUSIVE) { + wait->flags =3D WQ_FLAG_EXCLUSIVE; + if (--unfairness < 0) + wait->flags |=3D WQ_FLAG_CUSTOM; + } + + /* + * Do one last check whether we can get the + * page bit synchronously. + * + * Do the folio_set_waiters() marking before that + * to let any waker we _just_ missed know they + * need to wake us up (otherwise they'll never + * even go to the slow case that looks at the + * page queue), and add ourselves to the wait + * queue if we need to sleep. + * + * This part needs to be done under the queue + * lock to avoid races. + */ + spin_lock_irq(&q->lock); + folio_set_waiters(folio); + if (!folio_trylock_flag(folio, bit_nr, wait)) + __add_wait_queue_entry_tail(q, wait); + spin_unlock_irq(&q->lock); + + /* + * From now on, all the logic will be based on + * the WQ_FLAG_WOKEN and WQ_FLAG_DONE flag, to + * see whether the page bit testing has already + * been done by the wake function. + * + * We can drop our reference to the folio. + */ + if (behavior =3D=3D DROP) + folio_put(folio); + + /* + * Note that until the "finish_wait()", or until + * we see the WQ_FLAG_WOKEN flag, we need to + * be very careful with the 'wait->flags', because + * we may race with a waker that sets them. + */ + for (;;) { + unsigned int flags; + + set_current_state(state); + + /* Loop until we've been woken or interrupted */ + flags =3D smp_load_acquire(&wait->flags); + if (!(flags & WQ_FLAG_WOKEN)) { + if (signal_pending_state(state, current)) + break; + + io_schedule(); + continue; + } + + /* If we were non-exclusive, we're done */ + if (behavior !=3D EXCLUSIVE) + break; + + /* If the waker got the lock for us, we're done */ + if (flags & WQ_FLAG_DONE) + break; + + /* + * Otherwise, if we're getting the lock, we need to + * try to get it ourselves. + * + * And if that fails, we'll have to retry this all. + */ + if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0)))) + goto repeat; + + wait->flags |=3D WQ_FLAG_DONE; + break; + } + + /* + * If a signal happened, this 'finish_wait()' may remove the last + * waiter from the wait-queues, but the folio waiters bit will remain + * set. That's ok. The next wakeup will take care of it, and trying + * to do it here would be difficult and prone to races. + */ + finish_wait(q, wait); + + if (thrashing) { + delayacct_thrashing_end(&in_thrashing); + psi_memstall_leave(&pflags); + } + + /* + * NOTE! The wait->flags weren't stable until we've done the + * 'finish_wait()', and we could have exited the loop above due + * to a signal, and had a wakeup event happen after the signal + * test but before the 'finish_wait()'. + * + * So only after the finish_wait() can we reliably determine + * if we got woken up or not, so we can now figure out the final + * return value based on that state without races. + * + * Also note that WQ_FLAG_WOKEN is sufficient for a non-exclusive + * waiter, but an exclusive one requires WQ_FLAG_DONE. + */ + if (behavior =3D=3D EXCLUSIVE) + return wait->flags & WQ_FLAG_DONE ? 0 : -EINTR; + + return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR; +} + +#ifdef CONFIG_MIGRATION +/** + * softleaf_entry_wait_on_locked - Wait for a migration entry or + * device_private entry to be removed. + * @entry: migration or device_private swap entry. + * @ptl: already locked ptl. This function will drop the lock. + * + * Wait for a migration entry referencing the given page, or device_private + * entry referencing a dvice_private page to be unlocked. This is + * equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except + * this can be called without taking a reference on the page. Instead this + * should be called while holding the ptl for @entry referencing + * the page. + * + * Returns after unlocking the ptl. + * + * This follows the same logic as folio_wait_bit_common() so see the comme= nts + * there. + */ +void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) + __releases(ptl) +{ + struct wait_page_queue wait_page; + wait_queue_entry_t *wait =3D &wait_page.wait; + bool thrashing =3D false; + unsigned long pflags; + bool in_thrashing; + wait_queue_head_t *q; + struct folio *folio =3D softleaf_to_folio(entry); + + q =3D folio_waitqueue(folio); + if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) { + delayacct_thrashing_start(&in_thrashing); + psi_memstall_enter(&pflags); + thrashing =3D true; + } + + init_wait(wait); + wait->func =3D wake_page_function; + wait_page.folio =3D folio; + wait_page.bit_nr =3D PG_locked; + wait->flags =3D 0; + + spin_lock_irq(&q->lock); + folio_set_waiters(folio); + if (!folio_trylock_flag(folio, PG_locked, wait)) + __add_wait_queue_entry_tail(q, wait); + spin_unlock_irq(&q->lock); + + /* + * If a migration entry exists for the page the migration path must hold + * a valid reference to the page, and it must take the ptl to remove the + * migration entry. So the page is valid until the ptl is dropped. + * Similarly any path attempting to drop the last reference to a + * device-private page needs to grab the ptl to remove the device-private + * entry. + */ + spin_unlock(ptl); + + for (;;) { + unsigned int flags; + + set_current_state(TASK_UNINTERRUPTIBLE); + + /* Loop until we've been woken or interrupted */ + flags =3D smp_load_acquire(&wait->flags); + if (!(flags & WQ_FLAG_WOKEN)) { + if (signal_pending_state(TASK_UNINTERRUPTIBLE, current)) + break; + + io_schedule(); + continue; + } + break; + } + + finish_wait(q, wait); + + if (thrashing) { + delayacct_thrashing_end(&in_thrashing); + psi_memstall_leave(&pflags); + } +} +#endif + +void folio_wait_bit(struct folio *folio, int bit_nr) +{ + folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED); +} +EXPORT_SYMBOL(folio_wait_bit); + +int folio_wait_bit_killable(struct folio *folio, int bit_nr) +{ + return folio_wait_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED); +} +EXPORT_SYMBOL(folio_wait_bit_killable); + +/** + * folio_put_wait_locked - Drop a reference and wait for it to be unlocked + * @folio: The folio to wait for. + * @state: The sleep state (TASK_KILLABLE, TASK_UNINTERRUPTIBLE, etc). + * + * The caller should hold a reference on @folio. They expect the page to + * become unlocked relatively soon, but do not wish to hold up migration + * (for example) by holding the reference while waiting for the folio to + * come unlocked. After this function returns, the caller should not + * dereference @folio. + * + * Return: 0 if the folio was unlocked or -EINTR if interrupted by a signa= l. + */ +int folio_put_wait_locked(struct folio *folio, int state) +{ + return folio_wait_bit_common(folio, PG_locked, state, DROP); +} + +/** + * folio_unlock - Unlock a locked folio. + * @folio: The folio. + * + * Unlocks the folio and wakes up any thread sleeping on the page lock. + * + * Context: May be called from interrupt or process context. May not be + * called from NMI context. + */ +void folio_unlock(struct folio *folio) +{ + /* Bit 7 allows x86 to check the byte's sign bit */ + BUILD_BUG_ON(PG_waiters !=3D 7); + BUILD_BUG_ON(PG_locked > 7); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + if (folio_xor_flags_has_waiters(folio, 1 << PG_locked)) + folio_wake_bit(folio, PG_locked); +} +EXPORT_SYMBOL(folio_unlock); + +/** + * folio_end_read - End read on a folio. + * @folio: The folio. + * @success: True if all reads completed successfully. + * + * When all reads against a folio have completed, filesystems should + * call this function to let the pagecache know that no more reads + * are outstanding. This will unlock the folio and wake up any thread + * sleeping on the lock. The folio will also be marked uptodate if all + * reads succeeded. + * + * Context: May be called from interrupt or process context. May not be + * called from NMI context. + */ +void folio_end_read(struct folio *folio, bool success) +{ + unsigned long mask =3D 1 << PG_locked; + + /* Must be in bottom byte for x86 to work */ + BUILD_BUG_ON(PG_uptodate > 7); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio), folio); + + if (likely(success)) + mask |=3D 1 << PG_uptodate; + if (folio_xor_flags_has_waiters(folio, mask)) + folio_wake_bit(folio, PG_locked); +} +EXPORT_SYMBOL(folio_end_read); + +/** + * folio_end_private_2 - Clear PG_private_2 and wake any waiters. + * @folio: The folio. + * + * Clear the PG_private_2 bit on a folio and wake up any sleepers waiting = for + * it. The folio reference held for PG_private_2 being set is released. + * + * This is, for example, used when a netfs folio is being written to a loc= al + * disk cache, thereby allowing writes to the cache for the same folio to = be + * serialised. + */ +void folio_end_private_2(struct folio *folio) +{ + VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio); + clear_bit_unlock(PG_private_2, folio_flags(folio, 0)); + folio_wake_bit(folio, PG_private_2); + folio_put(folio); +} +EXPORT_SYMBOL(folio_end_private_2); + +/** + * folio_wait_private_2 - Wait for PG_private_2 to be cleared on a folio. + * @folio: The folio to wait on. + * + * Wait for PG_private_2 to be cleared on a folio. + */ +void folio_wait_private_2(struct folio *folio) +{ + while (folio_test_private_2(folio)) + folio_wait_bit(folio, PG_private_2); +} +EXPORT_SYMBOL(folio_wait_private_2); + +/** + * folio_wait_private_2_killable - Wait for PG_private_2 to be cleared on = a folio. + * @folio: The folio to wait on. + * + * Wait for PG_private_2 to be cleared on a folio or until a fatal signal = is + * received by the calling task. + * + * Return: + * - 0 if successful. + * - -EINTR if a fatal signal was encountered. + */ +int folio_wait_private_2_killable(struct folio *folio) +{ + int ret =3D 0; + + while (folio_test_private_2(folio)) { + ret =3D folio_wait_bit_killable(folio, PG_private_2); + if (ret < 0) + break; + } + + return ret; +} +EXPORT_SYMBOL(folio_wait_private_2_killable); + +/** + * __folio_lock - Get a lock on the folio, assuming we need to sleep to ge= t it. + * @folio: The folio to lock + */ +void __folio_lock(struct folio *folio) +{ + folio_wait_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE, + EXCLUSIVE); +} +EXPORT_SYMBOL(__folio_lock); + +int __folio_lock_killable(struct folio *folio) +{ + return folio_wait_bit_common(folio, PG_locked, TASK_KILLABLE, + EXCLUSIVE); +} +EXPORT_SYMBOL_GPL(__folio_lock_killable); + +int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) +{ + struct wait_queue_head *q =3D folio_waitqueue(folio); + int ret; + + wait->folio =3D folio; + wait->bit_nr =3D PG_locked; + + spin_lock_irq(&q->lock); + __add_wait_queue_entry_tail(q, &wait->wait); + folio_set_waiters(folio); + ret =3D !folio_trylock(folio); + /* + * If we were successful now, we know we're still on the + * waitqueue as we're still under the lock. This means it's + * safe to remove and return success, we know the callback + * isn't going to trigger. + */ + if (!ret) + __remove_wait_queue(q, &wait->wait); + else + ret =3D -EIOCBQUEUED; + spin_unlock_irq(&q->lock); + return ret; +} + +/* + * Return values: + * 0 - folio is locked. + * non-zero - folio is not locked. + * mmap_lock or per-VMA lock has been released (mmap_read_unlock() or + * vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY and + * FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still held. + * + * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 + * with the folio locked and the mmap_lock/per-VMA lock is left unperturbe= d. + */ +vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) +{ + unsigned int flags =3D vmf->flags; + + if (fault_flag_allow_retry_first(flags)) { + /* + * CAUTION! In this case, mmap_lock/per-VMA lock is not + * released even though returning VM_FAULT_RETRY. + */ + if (flags & FAULT_FLAG_RETRY_NOWAIT) + return VM_FAULT_RETRY; + + release_fault_lock(vmf); + if (flags & FAULT_FLAG_KILLABLE) + folio_wait_locked_killable(folio); + else + folio_wait_locked(folio); + return VM_FAULT_RETRY; + } + if (flags & FAULT_FLAG_KILLABLE) { + bool ret; + + ret =3D __folio_lock_killable(folio); + if (ret) { + release_fault_lock(vmf); + return VM_FAULT_RETRY; + } + } else { + __folio_lock(folio); + } + + return 0; +} diff --git a/mm/internal.h b/mm/internal.h index 09931b1e535f..a121ca07f75c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -102,6 +102,10 @@ struct pagetable_move_control { }) =20 void page_writeback_init(void); +void folio_wait_init(void); +void folio_wake_writeback(struct folio *folio); +int folio_put_wait_locked(struct folio *folio, int state); +int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait); =20 /* * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages, --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1907376A10 for ; Wed, 20 May 2026 20:50:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310246; cv=none; b=uPTPUzvTxc5yOzh9iJ0rbMH35CPaScgx2UjdEHxOnRDpzlpHqH4SZqoULIXjxWnbWZQ8U6GB1KGmsP5Nh9KbtSYclIFC+l4g87m+oUU7twaDzl9054j2R0KMtJE5Wr7N/vkazI79/SrNUxIM0xn0DOzOFSogYWoioikw6Oy6p7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310246; c=relaxed/simple; bh=JzPEBxOc5uLYO+ALFu5mYnofmzvhYRoxUfHP+iRDfKc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AlpWLlnxl4Cr5d13ULYDTVAcXQKpbkmshMNQz00ugyGqn1xigNMAzbjk+u3ictp9rnpdCFNRynuRcBdqn2Bxmxpy++usU9KiN33gK9AqLeKzEqZz8JoJYXRNXgrGjxnSB4bS8kqX5+8GFgeNZLTapL0DoVR8MefuhaRstqmED5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=W/Fc4D31; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="W/Fc4D31" Received: from pps.filterd (m0499198.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKNwrR2012683 for ; Wed, 20 May 2026 16:50:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=ffJ2 JJSKQVzdHRhijG2w6dxdoknMDsdke9qLMqnWpks=; b=W/Fc4D31vRqCN3cO4jiE O9tkJZzBanaFHDsvBZnN4bSf+oqjUQ5WkPh+/nlPxP1ife3OtDV1s201eBD4xCaX EOyM+PIgbyPl5oB4EEetTGEq3CKYLOEzVvyWFKK0XZMDPcskKiKyc1Om+RbJjKBx u8+Arrrckm2jbgidqVt8GZb/AIR3KsfeGHFHIkbr0gtRjqrzluYG5tAsE4Tanc9Z zOaIYsiG0Oh8zKOdM+sUv4I3Bk6Gbz322ysRCtVeAxCad7iOhpED79Xob+cU3nqW V1LvXqiADQcxchodCvqj7J9Q5FwUDwZfdLvzdNeNnECSRFNdecQccpvmA0qKiMKl 1g== Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e9fdn2mub-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:35 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-90d02857cdfso1176511285a.2 for ; Wed, 20 May 2026 13:50:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310234; x=1779915034; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ffJ2JJSKQVzdHRhijG2w6dxdoknMDsdke9qLMqnWpks=; b=Y0F6Re4EysCrASEVPIRkr/uAErFNEw2/PyvrXbjWikZ63fk6ULWpnuUuQuM16f+Twu g1+CcWTY3+Es823kg+Q6/fvlVaCBohkedqsG7RR4wJ4M+WVugrzZWckAJGV2lfmHTQmB KWnrYSJ2VqQSfoAo3+E416i0ZrAy0LYeMZBWvVYqpJkuoHpplPARi7OrcEh3WXj/O8Xu dZyv3yu8lUa3mUG9lOx3nBuNAP3RLj6dlVwMTYGE+djasa2JcFLzcKQKkheomZsQw/H+ Zb83FppwqylOSdNqW24UzL4pY8HoqbwpbIARPs/AJ8BXV0aIOuCpN9V3r9EUiKUwibrQ x8LA== X-Forwarded-Encrypted: i=1; AFNElJ/IZOxGZup4mAsmdQwSG4UVH8tLv1ooad1EFSXCnVrV+pzd6P9P3wEhs4i6XxC8H9r06O7QD+5ansEehwA=@vger.kernel.org X-Gm-Message-State: AOJu0YynrNMRlzTREWnNnNMyI+lSGc5Ygrvxc9sf88RDQXEeXH+nbzjc k3dNRWpjBdrv7vfPdH8Wjxzf5vKpqRrSW8/fvE3PkjVhT8xVQF1SFqXApVIW5ACvRPZ4QRun6Mp x1x4M5HHGlT13xAVn3lUfAx67V4BfDn+rRw5dWb/VwcJZOrCR2qJ2MbkX7w5oCg== X-Gm-Gg: Acq92OEju6kHqhMqgTM+Pn2DfLzov/KNVT2+lPeW344A9OlVGqaAo9JF5nkkjMluJa4 8OBjsPHfje8TWbKNYkfZuRjsJXUtNWS9rJbc01dO/ARFKdD6cuWA2mJ59M4frVzMxboK5/Abzba eTroYpPphuAYAEMZssnxoBVbZjAnhd6cR8VpEvQO+NsUUfPBbZS1I+f+hGLNqOlW6NeWTg6ZDTx I1OP6g0ERiV+bKw3S6pP5YsfF9rE61++KXDRLKARtqS+0l0vnUqwfkvWCp/Gt0lg0aHIJHlWui9 zgxy5mKaUb1wcLzgOM3kODQ2E5BJRw32Bn2iG1pKwcOMCZZFzu96+wWPdHngSrhXwamhHO3+0Ii T4fEcWi2cAGPkQwurGExjZAjKeYCU6Cd9jEySt0CzBoFIMucFE1gvkWoGKPJ14K7OqB4= X-Received: by 2002:a05:620a:950f:b0:912:1206:ddc3 with SMTP id af79cd13be357-9121206e06emr2819137885a.26.1779310234305; Wed, 20 May 2026 13:50:34 -0700 (PDT) X-Received: by 2002:a05:620a:950f:b0:912:1206:ddc3 with SMTP id af79cd13be357-9121206e06emr2819134585a.26.1779310233678; Wed, 20 May 2026 13:50:33 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:33 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:54 -0400 Subject: [PATCH RFC 03/11] folio_wait: move folio bit-lock and wait declarations to include/linux/folio_wait.h Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-3-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=12875; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=JzPEBxOc5uLYO+ALFu5mYnofmzvhYRoxUfHP+iRDfKc=; b=oKMCnu1WDCEIQovrQVPCgWvWiT5h4hdabCbpif7zEJlr8N56Mg5qWQhk3F5szAwn3Z0gBo3Wg 0+m1hXVGY60CEfAZMYtlZBI3W1WCOQkn9X0W5SEZhF3iJoBXoSnunof X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: Hnp0DO0LOvK1tg2ege5V53pUYyeuVtgO X-Authority-Analysis: v=2.4 cv=P/4KQCAu c=1 sm=1 tr=0 ts=6a0e1e9b cx=c_pps a=HLyN3IcIa5EE8TELMZ618Q==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=BpGzv1V74M3SfeTrGa8v:22 a=DRKH7QcZ0YB8km8p46MA:9 a=QEXdDO2ut3YA:10 a=bTQJ7kPSJx9SKPbeHEYW:22 X-Proofpoint-ORIG-GUID: Hnp0DO0LOvK1tg2ege5V53pUYyeuVtgO X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfXxXo6/SAnVxyx 9MWm4W6Efin4y4TafxX3uGC7Mi7JdjKSEHUFRiEnSIYTJ6M0tMsB26w4duG3XBdk4P4/AfHjCM+ Kgg9S/tLmVOnbIDywH+lDXK/vYg1RT5S32jjYOXeSMmoH4cB17VotxQLvJ8G2DYqN6sknTkTp2n IY1QF531xawkwWIEybgcCpAhvto9nDU5UXZwi8zteiMpX7jtAY/i8zJpkLkR8/2oxUjy6w371bO WzrJRedE6Y9rIFQ4ErpqJ4AZVGcaTMRB3C7EPd6lWnhlMAMpAjdQMP+n57U3W0+iL7dOOqQenat /e4xTWbqMXeSb/q9lv+E3q2164NWXpacdPfL6x/sKajub6Oq2XteF325Ytf2F4ouy65vmyGGMB9 j38VilD1svZKEK37PHgvrFENB1JpK+u4hXqjlZ3xzrMK2MRyQ0pihn58GZg/w1miwyVetFw9w89 fGC80VhyvUCPNcrr4TA== X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 suspectscore=0 impostorscore=10 malwarescore=0 phishscore=0 adultscore=0 lowpriorityscore=10 spamscore=0 priorityscore=1501 bulkscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 Move ~150 lines of folio bit-lock and wait queue infrastructure from pagemap.h to folio_wait.h. pagemap.h includes the new header so existing users don't break. Signed-off-by: Tal Zussman --- include/linux/folio_wait.h | 181 +++++++++++++++++++++++++++++++++++++++++= ++++ include/linux/pagemap.h | 172 +----------------------------------------- mm/folio_wait.c | 2 +- 3 files changed, 183 insertions(+), 172 deletions(-) diff --git a/include/linux/folio_wait.h b/include/linux/folio_wait.h new file mode 100644 index 000000000000..80ddf1ffcae4 --- /dev/null +++ b/include/linux/folio_wait.h @@ -0,0 +1,181 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_FOLIO_WAIT_H +#define _LINUX_FOLIO_WAIT_H + +#include +#include +#include + +struct wait_page_key { + struct folio *folio; + int bit_nr; + int page_match; +}; + +struct wait_page_queue { + struct folio *folio; + int bit_nr; + wait_queue_entry_t wait; +}; + +static inline bool wake_page_match(struct wait_page_queue *wait_page, + struct wait_page_key *key) +{ + if (wait_page->folio !=3D key->folio) + return false; + key->page_match =3D 1; + + if (wait_page->bit_nr !=3D key->bit_nr) + return false; + + return true; +} + +void __folio_lock(struct folio *folio); +int __folio_lock_killable(struct folio *folio); +vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf= ); +void unlock_page(struct page *page); +void folio_unlock(struct folio *folio); + +/** + * folio_trylock() - Attempt to lock a folio. + * @folio: The folio to attempt to lock. + * + * Sometimes it is undesirable to wait for a folio to be unlocked (eg + * when the locks are being taken in the wrong order, or if making + * progress through a batch of folios is more important than processing + * them in order). Usually folio_lock() is the correct function to call. + * + * Context: Any context. + * Return: Whether the lock was successfully acquired. + */ +static inline bool folio_trylock(struct folio *folio) +{ + return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0))); +} + +/* + * Return true if the page was successfully locked + */ +static inline bool trylock_page(struct page *page) +{ + return folio_trylock(page_folio(page)); +} + +/** + * folio_lock() - Lock this folio. + * @folio: The folio to lock. + * + * The folio lock protects against many things, probably more than it + * should. It is primarily held while a folio is being brought uptodate, + * either from its backing file or from swap. It is also held while a + * folio is being truncated from its address_space, so holding the lock + * is sufficient to keep folio->mapping stable. + * + * The folio lock is also held while write() is modifying the page to + * provide POSIX atomicity guarantees (as long as the write does not + * cross a page boundary). Other modifications to the data in the folio + * do not hold the folio lock and can race with writes, eg DMA and stores + * to mapped pages. + * + * Context: May sleep. If you need to acquire the locks of two or + * more folios, they must be in order of ascending index, if they are + * in the same address_space. If they are in different address_spaces, + * acquire the lock of the folio which belongs to the address_space which + * has the lowest address in memory first. + */ +static inline void folio_lock(struct folio *folio) +{ + might_sleep(); + if (!folio_trylock(folio)) + __folio_lock(folio); +} + +/** + * lock_page() - Lock the folio containing this page. + * @page: The page to lock. + * + * See folio_lock() for a description of what the lock protects. + * This is a legacy function and new code should probably use folio_lock() + * instead. + * + * Context: May sleep. Pages in the same folio share a lock, so do not + * attempt to lock two pages which share a folio. + */ +static inline void lock_page(struct page *page) +{ + struct folio *folio; + might_sleep(); + + folio =3D page_folio(page); + if (!folio_trylock(folio)) + __folio_lock(folio); +} + +/** + * folio_lock_killable() - Lock this folio, interruptible by a fatal signa= l. + * @folio: The folio to lock. + * + * Attempts to lock the folio, like folio_lock(), except that the sleep + * to acquire the lock is interruptible by a fatal signal. + * + * Context: May sleep; see folio_lock(). + * Return: 0 if the lock was acquired; -EINTR if a fatal signal was receiv= ed. + */ +static inline int folio_lock_killable(struct folio *folio) +{ + might_sleep(); + if (!folio_trylock(folio)) + return __folio_lock_killable(folio); + return 0; +} + +/* + * folio_lock_or_retry - Lock the folio, unless this would block and the + * caller indicated that it can handle a retry. + * + * Return value and mmap_lock implications depend on flags; see + * __folio_lock_or_retry(). + */ +static inline vm_fault_t folio_lock_or_retry(struct folio *folio, + struct vm_fault *vmf) +{ + might_sleep(); + if (!folio_trylock(folio)) + return __folio_lock_or_retry(folio, vmf); + return 0; +} + +/* + * This is exported only for folio_wait_locked/folio_wait_writeback, etc., + * and should not be used directly. + */ +void folio_wait_bit(struct folio *folio, int bit_nr); +int folio_wait_bit_killable(struct folio *folio, int bit_nr); + +/* + * Wait for a folio to be unlocked. + * + * This must be called with the caller "holding" the folio, + * ie with increased folio reference count so that the folio won't + * go away during the wait. + */ +static inline void folio_wait_locked(struct folio *folio) +{ + if (folio_test_locked(folio)) + folio_wait_bit(folio, PG_locked); +} + +static inline int folio_wait_locked_killable(struct folio *folio) +{ + if (!folio_test_locked(folio)) + return 0; + return folio_wait_bit_killable(folio, PG_locked); +} + +void folio_end_read(struct folio *folio, bool success); +void folio_end_private_2(struct folio *folio); +void folio_wait_private_2(struct folio *folio); +int folio_wait_private_2_killable(struct folio *folio); + +#endif /* _LINUX_FOLIO_WAIT_H */ diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 627771e82eb1..7f65c2b0097b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -15,6 +15,7 @@ #include #include /* for in_interrupt() */ #include +#include =20 struct folio_batch; =20 @@ -1072,174 +1073,6 @@ static inline pgoff_t linear_page_index(const struc= t vm_area_struct *vma, return pgoff; } =20 -struct wait_page_key { - struct folio *folio; - int bit_nr; - int page_match; -}; - -struct wait_page_queue { - struct folio *folio; - int bit_nr; - wait_queue_entry_t wait; -}; - -static inline bool wake_page_match(struct wait_page_queue *wait_page, - struct wait_page_key *key) -{ - if (wait_page->folio !=3D key->folio) - return false; - key->page_match =3D 1; - - if (wait_page->bit_nr !=3D key->bit_nr) - return false; - - return true; -} - -void __folio_lock(struct folio *folio); -int __folio_lock_killable(struct folio *folio); -vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf= ); -void unlock_page(struct page *page); -void folio_unlock(struct folio *folio); - -/** - * folio_trylock() - Attempt to lock a folio. - * @folio: The folio to attempt to lock. - * - * Sometimes it is undesirable to wait for a folio to be unlocked (eg - * when the locks are being taken in the wrong order, or if making - * progress through a batch of folios is more important than processing - * them in order). Usually folio_lock() is the correct function to call. - * - * Context: Any context. - * Return: Whether the lock was successfully acquired. - */ -static inline bool folio_trylock(struct folio *folio) -{ - return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0))); -} - -/* - * Return true if the page was successfully locked - */ -static inline bool trylock_page(struct page *page) -{ - return folio_trylock(page_folio(page)); -} - -/** - * folio_lock() - Lock this folio. - * @folio: The folio to lock. - * - * The folio lock protects against many things, probably more than it - * should. It is primarily held while a folio is being brought uptodate, - * either from its backing file or from swap. It is also held while a - * folio is being truncated from its address_space, so holding the lock - * is sufficient to keep folio->mapping stable. - * - * The folio lock is also held while write() is modifying the page to - * provide POSIX atomicity guarantees (as long as the write does not - * cross a page boundary). Other modifications to the data in the folio - * do not hold the folio lock and can race with writes, eg DMA and stores - * to mapped pages. - * - * Context: May sleep. If you need to acquire the locks of two or - * more folios, they must be in order of ascending index, if they are - * in the same address_space. If they are in different address_spaces, - * acquire the lock of the folio which belongs to the address_space which - * has the lowest address in memory first. - */ -static inline void folio_lock(struct folio *folio) -{ - might_sleep(); - if (!folio_trylock(folio)) - __folio_lock(folio); -} - -/** - * lock_page() - Lock the folio containing this page. - * @page: The page to lock. - * - * See folio_lock() for a description of what the lock protects. - * This is a legacy function and new code should probably use folio_lock() - * instead. - * - * Context: May sleep. Pages in the same folio share a lock, so do not - * attempt to lock two pages which share a folio. - */ -static inline void lock_page(struct page *page) -{ - struct folio *folio; - might_sleep(); - - folio =3D page_folio(page); - if (!folio_trylock(folio)) - __folio_lock(folio); -} - -/** - * folio_lock_killable() - Lock this folio, interruptible by a fatal signa= l. - * @folio: The folio to lock. - * - * Attempts to lock the folio, like folio_lock(), except that the sleep - * to acquire the lock is interruptible by a fatal signal. - * - * Context: May sleep; see folio_lock(). - * Return: 0 if the lock was acquired; -EINTR if a fatal signal was receiv= ed. - */ -static inline int folio_lock_killable(struct folio *folio) -{ - might_sleep(); - if (!folio_trylock(folio)) - return __folio_lock_killable(folio); - return 0; -} - -/* - * folio_lock_or_retry - Lock the folio, unless this would block and the - * caller indicated that it can handle a retry. - * - * Return value and mmap_lock implications depend on flags; see - * __folio_lock_or_retry(). - */ -static inline vm_fault_t folio_lock_or_retry(struct folio *folio, - struct vm_fault *vmf) -{ - might_sleep(); - if (!folio_trylock(folio)) - return __folio_lock_or_retry(folio, vmf); - return 0; -} - -/* - * This is exported only for folio_wait_locked/folio_wait_writeback, etc., - * and should not be used directly. - */ -void folio_wait_bit(struct folio *folio, int bit_nr); -int folio_wait_bit_killable(struct folio *folio, int bit_nr); - -/*=20 - * Wait for a folio to be unlocked. - * - * This must be called with the caller "holding" the folio, - * ie with increased folio reference count so that the folio won't - * go away during the wait. - */ -static inline void folio_wait_locked(struct folio *folio) -{ - if (folio_test_locked(folio)) - folio_wait_bit(folio, PG_locked); -} - -static inline int folio_wait_locked_killable(struct folio *folio) -{ - if (!folio_test_locked(folio)) - return 0; - return folio_wait_bit_killable(folio, PG_locked); -} - -void folio_end_read(struct folio *folio, bool success); void wait_on_page_writeback(struct page *page); void folio_wait_writeback(struct folio *folio); int folio_wait_writeback_killable(struct folio *folio); @@ -1268,9 +1101,6 @@ int filemap_migrate_folio(struct address_space *mappi= ng, struct folio *dst, #else #define filemap_migrate_folio NULL #endif -void folio_end_private_2(struct folio *folio); -void folio_wait_private_2(struct folio *folio); -int folio_wait_private_2_killable(struct folio *folio); =20 /* * Fault in userspace address range. diff --git a/mm/folio_wait.c b/mm/folio_wait.c index 18b42488ce37..06156e138c09 100644 --- a/mm/folio_wait.c +++ b/mm/folio_wait.c @@ -8,7 +8,7 @@ #include #include #include -#include +#include #include #include #include --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0592376BEA for ; Wed, 20 May 2026 20:50:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310245; cv=none; b=FOTrLGoBgHSciZzKVKX+KLdP5ULLQkGN/KS5H8KvhbbIouY1ebNzn0vCgeGDoahYPArHPx9A/dAWoybf5XiTzbrUP8SiZm+Qx1aAh/28B8rpzg2g0GPRKsrbHbpTyOacL0lE8OvV989/9z9nGMnz7CuFPIKILNyLYSohXrjYodk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310245; c=relaxed/simple; bh=MdnqSRc2FJHBhrARlYsy3cs/9UevS5fEoCvn0pE0sDw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=KGXkKoci9IIIXfw4mVQa+WP9VH42IYi+tqt7ztwd9z8nIzm+yewQymKuLZVTJ7qOUlgvtPXqqnAHIRUeHYD9RGSeD4WGunqUFDU20LPVYAQk50vPRDjIO3hZ2/xMN2iwN+b9Szmh51UXBOlD4gSKFf6CaP2gS2W/ISyV9Xwzmoc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=CxuJGpWL; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="CxuJGpWL" Received: from pps.filterd (m0499198.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKOEv32013624 for ; Wed, 20 May 2026 16:50:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=y+gj tILqDLOQsRHv/xOkAmWQ2cCtCFJ6veJFyW78pUo=; b=CxuJGpWL+kS84wunAt9h U/LMKliWuP4nB71UXrm5g2rej0P5N7Idq6BmiNfKDH80LLeVfW44ieNa0/BkXXsi +L9TIuteWD2e2V/oKCwp2pBm8/BevB3hAA68MBGegIkV4f6lcMhzEeWtaWIZ/onj AS5ayl3chCL64tG3bR5sORVXXnch8+A3TP6eyrsP+oZjdBtNP6ga3aSMnRwc0p6/ BYV5Iilum2otM/TkKY8H/+zERGVJ7saJ/nflnYcYRup6RXfZY6tt6SAiOZhn8cYW OdnmFxoq6yyFwNO6fg+uSBMB8Kt/ndgipMDZ0G1TcX+iqata3s5oeMrGTtuoq5nS pg== Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e9fdn2mud-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:36 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-90fb4c8390aso556567485a.3 for ; Wed, 20 May 2026 13:50:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310235; x=1779915035; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=y+gjtILqDLOQsRHv/xOkAmWQ2cCtCFJ6veJFyW78pUo=; b=nCsIFnM6rVkiLUjSFaWU+O1CYcVnV+9r8HjDDjkxISNTtfI2Wq4Ke93kXB5a1iEjck 9+uVLZcadzkgNnlsMYyEJZ4HGZIYdyO/8FgMOcMRUlBPA217351q2GVBv4aS1WIoyXbn Xkwt9kqwRTZHea3j99QwQszMqUhoeSRwlkZte07bCn0ZNt/vOpbDOcF2xjsThhgSMCT/ Kb0e6ZWHTvlrlH48BSyinlKwDHEWddDUznN+78fdacLt9LiIxGJDAWM3yqKZOBzi0Jar t12hU51dY6mGCcPC0yJKacfTR1fSbbmzBTTx4qZ0MEic+U80RcRlhpW1OUOu5FZtMiXT udfA== X-Forwarded-Encrypted: i=1; AFNElJ8RMhDYp3Ax2ne6P5qngn3p/RfRtjnansv6BBr1lNHlRoN65MqgANDJwPSHLTQEj/l3Jvxdx4oVi6iAGOc=@vger.kernel.org X-Gm-Message-State: AOJu0YyYjXbs3VcjctLg8HtQi69q0005R0786rzJI1LeXijLwo/IKjja OwajmDkAIBc3fx3ldOJ9zj6818egdTjv7E4ZfkeJE622gwvMQugwZ7Sq4B3y7yQsvyK88KRt3v2 x3Yen4+OAAKCSamy/cOuECvBOHTjBxDJvhX8QAWrxh0+G1ufDO3oSW/mV/D8xoQ== X-Gm-Gg: Acq92OGWbsTVkxTG7C1Sr21Z/eQuHq8WWFw9a5X5jecStsLvOawvngGQm78vjk0ZQOC WteHMgUY7G3ht0y5QWToQX324J2Xi5piqe4qDDTsAUkIUEmNDKaMe3qLPolIKZnp2/NDPsBdSQ6 jjIjYKjSJN6BFTAnSC9SkalhdA2GRm1gWUByItQp/mnAXeVvtppi8Oa/norYq0pUeaPBwcdHZ+6 lo4xTWE/GKj7CmH0ef76OXwkBA1uSU3ePi2fmV4E4KhSwSVdQkzGisZ+DYUolaIM0clErICLeFo WhgGJVoEkyezoRh2trBxSrNRwp4TeDijx1Xvr1Nrpf/zecarvYCi8CITzjNk2ci1G+bkrkdJ21M aOURAfzPk+zO5OnDbzFCL3szPN3H0iCwAJ1htz2gGNkNXl+1n6ORTL3V2I65VEolQG5s= X-Received: by 2002:a05:620a:4542:b0:8ee:dc47:3b70 with SMTP id af79cd13be357-911cef062demr3495260085a.39.1779310235408; Wed, 20 May 2026 13:50:35 -0700 (PDT) X-Received: by 2002:a05:620a:4542:b0:8ee:dc47:3b70 with SMTP id af79cd13be357-911cef062demr3495254485a.39.1779310234746; Wed, 20 May 2026 13:50:34 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:34 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:55 -0400 Subject: [PATCH RFC 04/11] folio_wait: move folio_wait_writeback() family to mm/folio_wait.c Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-4-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=7803; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=MdnqSRc2FJHBhrARlYsy3cs/9UevS5fEoCvn0pE0sDw=; b=rkMG4EwgIAXfDMJhpCN/bphiVIHAebicxGt8s33PBw5nTp2DJPdH7PqP6gawcVTAxyPkOdXvw WsorbnVQOqHC4h13v+JdnPOZqm2wLrEZ0r4jbfVOfOBi4GxGI+ybNHp X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: QLBRdYToT3csAHx6Jd1Ul8tSeZxV-zzY X-Authority-Analysis: v=2.4 cv=P/4KQCAu c=1 sm=1 tr=0 ts=6a0e1e9c cx=c_pps a=HLyN3IcIa5EE8TELMZ618Q==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=BpGzv1V74M3SfeTrGa8v:22 a=oq273MN9QaURjshXV2UA:9 a=QEXdDO2ut3YA:10 a=bTQJ7kPSJx9SKPbeHEYW:22 X-Proofpoint-ORIG-GUID: QLBRdYToT3csAHx6Jd1Ul8tSeZxV-zzY X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX2HDATFqcw/1z cgs2ffymzg/CgetAFCI50WZ2IEs7IepcymH/HJqSiL5JSs2ykfQYc13HXAyYLK+id/Hq5yCmBzH ZhnWAUEGT/ELre7dO76KPwfbk59UBNfAgmwGlVCarU4uerYQ2IRB6MK/wC0jvnbPU8VZGl0gX6t 94NNBs4t7Uu0lz2SJjVjUqkDT6afvdupvEc1ydaKQmQiIilOAL7+2Jdgjklmgv4LuDU+oWjLQ7D bF7wByOw04DU24yL5ydvX0d0M5TryVcd50uAQCSmgoYZJNhCZnwqC4xTTHLArP75R0z92SE4mtZ q3aepU+S/69OWxBU71zr7lz0BrPWDy06OMCNEhJ2tnQKkjczlTbOXOjdZ+sbcLdQo5F2kpYFeWV jGkQ6teiCcDFugmD/sDjFE0aHAonqVKmC6BwqHTpv3ocUvY7TrboHOT+xrmtu4WHRumtmSeg0gf WRfmE9kwO+Ulz8KYpYQ== X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 suspectscore=0 impostorscore=10 malwarescore=0 phishscore=0 adultscore=0 lowpriorityscore=10 spamscore=0 priorityscore=1501 bulkscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 folio_wait_writeback(), folio_wait_writeback_killable() and folio_wait_stable() are thin wrappers around folio_wait_bit() and folio_wait_bit_killable() on PG_writeback. Move them to mm/folio_wait.c, next to the rest of the folio bit-wait infrastructure. The legacy wait_on_page_writeback() wrapper stays in folio-compat.c, as its days are numbered, and it will be deleted once the remaining callers are converted. Signed-off-by: Tal Zussman --- include/linux/folio_wait.h | 4 +++ include/linux/pagemap.h | 3 --- mm/folio_wait.c | 67 ++++++++++++++++++++++++++++++++++++++++++= ++++ mm/page-writeback.c | 66 ------------------------------------------= --- 4 files changed, 71 insertions(+), 69 deletions(-) diff --git a/include/linux/folio_wait.h b/include/linux/folio_wait.h index 80ddf1ffcae4..4a5cb2fcf046 100644 --- a/include/linux/folio_wait.h +++ b/include/linux/folio_wait.h @@ -178,4 +178,8 @@ void folio_end_private_2(struct folio *folio); void folio_wait_private_2(struct folio *folio); int folio_wait_private_2_killable(struct folio *folio); =20 +void folio_wait_writeback(struct folio *folio); +int folio_wait_writeback_killable(struct folio *folio); +void folio_wait_stable(struct folio *folio); + #endif /* _LINUX_FOLIO_WAIT_H */ diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 7f65c2b0097b..84ccb682cca8 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1074,13 +1074,10 @@ static inline pgoff_t linear_page_index(const struc= t vm_area_struct *vma, } =20 void wait_on_page_writeback(struct page *page); -void folio_wait_writeback(struct folio *folio); -int folio_wait_writeback_killable(struct folio *folio); void end_page_writeback(struct page *page); void folio_end_writeback(struct folio *folio); void folio_end_writeback_no_dropbehind(struct folio *folio); void folio_end_dropbehind(struct folio *folio); -void folio_wait_stable(struct folio *folio); void __folio_mark_dirty(struct folio *folio, struct address_space *, int w= arn); void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb); void __folio_cancel_dirty(struct folio *folio); diff --git a/mm/folio_wait.c b/mm/folio_wait.c index 06156e138c09..9d3328717bb3 100644 --- a/mm/folio_wait.c +++ b/mm/folio_wait.c @@ -15,6 +15,7 @@ #include #include #include +#include =20 #include "internal.h" =20 @@ -572,6 +573,72 @@ int folio_wait_private_2_killable(struct folio *folio) } EXPORT_SYMBOL(folio_wait_private_2_killable); =20 +/** + * folio_wait_writeback - Wait for a folio to finish writeback. + * @folio: The folio to wait for. + * + * If the folio is currently being written back to storage, wait for the + * I/O to complete. + * + * Context: Sleeps. Must be called in process context and with + * no spinlocks held. Caller should hold a reference on the folio. + * If the folio is not locked, writeback may start again after writeback + * has finished. + */ +void folio_wait_writeback(struct folio *folio) +{ + while (folio_test_writeback(folio)) { + trace_folio_wait_writeback(folio, folio_mapping(folio)); + folio_wait_bit(folio, PG_writeback); + } +} +EXPORT_SYMBOL_GPL(folio_wait_writeback); + +/** + * folio_wait_writeback_killable - Wait for a folio to finish writeback. + * @folio: The folio to wait for. + * + * If the folio is currently being written back to storage, wait for the + * I/O to complete or a fatal signal to arrive. + * + * Context: Sleeps. Must be called in process context and with + * no spinlocks held. Caller should hold a reference on the folio. + * If the folio is not locked, writeback may start again after writeback + * has finished. + * Return: 0 on success, -EINTR if we get a fatal signal while waiting. + */ +int folio_wait_writeback_killable(struct folio *folio) +{ + while (folio_test_writeback(folio)) { + trace_folio_wait_writeback(folio, folio_mapping(folio)); + if (folio_wait_bit_killable(folio, PG_writeback)) + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); + +/** + * folio_wait_stable() - wait for writeback to finish, if necessary. + * @folio: The folio to wait on. + * + * This function determines if the given folio is related to a backing + * device that requires folio contents to be held stable during writeback. + * If so, then it will wait for any pending writeback to complete. + * + * Context: Sleeps. Must be called in process context and with + * no spinlocks held. Caller should hold a reference on the folio. + * If the folio is not locked, writeback may start again after writeback + * has finished. + */ +void folio_wait_stable(struct folio *folio) +{ + if (mapping_stable_writes(folio_mapping(folio))) + folio_wait_writeback(folio); +} +EXPORT_SYMBOL_GPL(folio_wait_stable); + /** * __folio_lock - Get a lock on the folio, assuming we need to sleep to ge= t it. * @folio: The folio to lock diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 833f743f309f..50f548bbb375 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -3042,69 +3042,3 @@ void __folio_start_writeback(struct folio *folio, bo= ol keep_write) VM_BUG_ON_FOLIO(access_ret !=3D 0, folio); } EXPORT_SYMBOL(__folio_start_writeback); - -/** - * folio_wait_writeback - Wait for a folio to finish writeback. - * @folio: The folio to wait for. - * - * If the folio is currently being written back to storage, wait for the - * I/O to complete. - * - * Context: Sleeps. Must be called in process context and with - * no spinlocks held. Caller should hold a reference on the folio. - * If the folio is not locked, writeback may start again after writeback - * has finished. - */ -void folio_wait_writeback(struct folio *folio) -{ - while (folio_test_writeback(folio)) { - trace_folio_wait_writeback(folio, folio_mapping(folio)); - folio_wait_bit(folio, PG_writeback); - } -} -EXPORT_SYMBOL_GPL(folio_wait_writeback); - -/** - * folio_wait_writeback_killable - Wait for a folio to finish writeback. - * @folio: The folio to wait for. - * - * If the folio is currently being written back to storage, wait for the - * I/O to complete or a fatal signal to arrive. - * - * Context: Sleeps. Must be called in process context and with - * no spinlocks held. Caller should hold a reference on the folio. - * If the folio is not locked, writeback may start again after writeback - * has finished. - * Return: 0 on success, -EINTR if we get a fatal signal while waiting. - */ -int folio_wait_writeback_killable(struct folio *folio) -{ - while (folio_test_writeback(folio)) { - trace_folio_wait_writeback(folio, folio_mapping(folio)); - if (folio_wait_bit_killable(folio, PG_writeback)) - return -EINTR; - } - - return 0; -} -EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); - -/** - * folio_wait_stable() - wait for writeback to finish, if necessary. - * @folio: The folio to wait on. - * - * This function determines if the given folio is related to a backing - * device that requires folio contents to be held stable during writeback. - * If so, then it will wait for any pending writeback to complete. - * - * Context: Sleeps. Must be called in process context and with - * no spinlocks held. Caller should hold a reference on the folio. - * If the folio is not locked, writeback may start again after writeback - * has finished. - */ -void folio_wait_stable(struct folio *folio) -{ - if (mapping_stable_writes(folio_mapping(folio))) - folio_wait_writeback(folio); -} -EXPORT_SYMBOL_GPL(folio_wait_stable); --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F410375ADC for ; Wed, 20 May 2026 20:50:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310242; cv=none; b=VW/oY0iavEH5ULRdATeoyV22jRzAXb2XAJCR+0wBOxPj83UPN5P3R/1mJwywZY6Pd1GO8CNTLz0OgiAdkBBZyqR0TFFfraZwnfbdzAmi6Zvs0JtVib/PYq6Oy7d+6GG8Xc4Pw/uaBDXbxumKT2uXZ3rA5G/B29+s/7VfS9QIUlQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310242; c=relaxed/simple; bh=69FMnFpCCkSEBZ7/2YJYSjRd1MbsazhKzcMtt9GaMPw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=t8rlDKy2BkwTyivlCeQZsDq6Us6n+01iB0ve4HecjFiTdvHtq6crU3PgGGLPnggujvPMXPx6OMpo6XKSFjuMUS7JRbNPyV/iiZGLpDoyIj3srOsrs9um5TPvmrgh15qNZlyFKw9/jkGf7mBc6xXWmenm+npGmvQtjzEQbTudG1A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=qsUbckLp; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="qsUbckLp" Received: from pps.filterd (m0167075.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKOHog3139226 for ; Wed, 20 May 2026 16:50:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=QM60 WXSBSIQYQWVtbfbyKvBF8T4H0CKsBjV6qw2v1YM=; b=qsUbckLpGyB+A7AmVhc9 P/EmpJQBFEfgFQ6Udn4ZjA6omKFZKlDMSjeq+f1ohfc78WtsZMLDzvRM9bgXvo2S MBCu5l/GGRXt2PE4FwGczaZWrztMv9TpwKXvnjuzF2MwdCpiVEWZN2c298ivtzBY QGeU+gxRvkX2o9DO0ASI46sJppe6M7llf5MTsfrdKQi0QBe1i7SrgiWLC9uBdx8A Otr7Czvz4SEuwUdiNhvICi3uaD1ReLSTI35+Pzo+fB/osgj0Tkc39UFscfFF/RbB gOXVynu+j3mGFznKdiOcXCL3AYt9RyTRfgkMl31LbUIKv7ZIFGqgCOcdTZBhY7UM vw== Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e9d3hv055-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:37 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-90d6fe98316so1206825885a.3 for ; Wed, 20 May 2026 13:50:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310237; x=1779915037; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=QM60WXSBSIQYQWVtbfbyKvBF8T4H0CKsBjV6qw2v1YM=; b=YwQHeZfW1snWW+q2Hx11uYOzDDmCZBtYztNDScgl0uatZrAXoq+vC1wP0U/V7KLBMd 7BwQoL4IlZsTJiOVXGAAW1iHH9coZBqAOMiWeXfZG9TGTzrM0MaIAe1ZWJHfaD8BuMnW yzSbioqED/9EJpaB1W27QkjAFZb5vbY35kDV36Btrr7qRy5YSOyYZt4V+Kt3hcf5B5SM hquSFj0knw3MLjGA7mvOH+nQqSCb+wdylEK+tu0BBqg+I88fVoWuxPMcM+VMD+ulUYVG +mAuD0ifg7vAk+UId2LmNCfu/1v9kkwIDJ34XSgqubI7z9R5Tl52zvuRrzynzP3myblI ulnA== X-Forwarded-Encrypted: i=1; AFNElJ8Xq5YUy1HUz4QrL/kz4drgqx7nK7HG6G48eBEP+liM+LR/QsD3Ykv572Z9JVCG+MVAYjsCZUe+LrYDLzE=@vger.kernel.org X-Gm-Message-State: AOJu0YxHotQOAhb5uwYlaipXbBLopF1bKW2y74GBN8XT+XlMsPFjRvXX gx3ReMPA4LXz52JYydamgTeThmJE14RPOJLLxIke4YIStIfzuY/cO+5EJAtrvkcckkcKRap0yCc 81fswXFVokwk25X8y+cBgsYSh8xJCx9E7S/9SwHSbhgrdkQXcYEeWaCMfM7C/SQ== X-Gm-Gg: Acq92OHnMsJ4JhnpaH18mguVxglisKB4uR8eJiHcyhA8z7omERwFYoGaGfkgWE630x5 TTrNnpSkB9Shqrm3bZZ2ygLnMxN/p5btsQi907JSXMUu6VbL4FSW4+LqBm0UN4WygcxmxZdDNXz 0jW+QWlPBPdZSYeRtUAE93Foqruu4XxWMkD8a+NCghWCtlSTDjKv9G9ReNRPgZlfyxaaim3yjhR vrc17MdLgfgkaxt/fdJYpYSxDHm/8/BNcmKHnTPVrRClc4eb9bEVgM4kB2gR7yz4FsyXm0RAkPK wC2toY8vnA3SDxEY5NUB5e5lTZugC0CSnJa89quzdYrCYYb6cUM4hMYWUyR02duPTyvjqGiggMQ T8P2YiKj7iNnzLiQz9Jlwz5YDU/c6BF9FYwUBIe6bicMLI24SUWmxUQbtzeGV1eIV4GU= X-Received: by 2002:a05:620a:cfb:b0:911:fc2c:c078 with SMTP id af79cd13be357-911fc2cd1c1mr3069129585a.1.1779310237095; Wed, 20 May 2026 13:50:37 -0700 (PDT) X-Received: by 2002:a05:620a:cfb:b0:911:fc2c:c078 with SMTP id af79cd13be357-911fc2cd1c1mr3069124685a.1.1779310236449; Wed, 20 May 2026 13:50:36 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:35 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:56 -0400 Subject: [PATCH RFC 05/11] folio_wait: reformat comments and fix alignment Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-5-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=21905; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=69FMnFpCCkSEBZ7/2YJYSjRd1MbsazhKzcMtt9GaMPw=; b=/LS05GFRr0I0dRdpIdwuCT27Qutf2Js6Cv+yKW0yfgbbJSYuTkFWu6MOkmEqTdIvlw3VRr5oV smS0ff/auk2DI+45+Yy/pAXtlKe+W4MsmUEGzwoOkLRM2bxQ/nX4MKZ X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Authority-Analysis: v=2.4 cv=fsvsol4f c=1 sm=1 tr=0 ts=6a0e1e9e cx=c_pps a=qKBjSQ1v91RyAK45QCPf5w==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=HpS3TJQ9O3Ob1ozEcmik:22 a=ItFntl4SG6S2Zc4VtyUA:9 a=QEXdDO2ut3YA:10 a=O8hF6Hzn-FEA:10 a=NFOGd7dJGGMPyQGDc5-O:22 X-Proofpoint-GUID: Km2HWV7jka5yWfK0i-VaRSAElyQcUrP8 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX/+HuXT/qG6TH 76aLyVY4ZeKMzJD34NJ6Y/wQihztTHtXXjLg6gQC8oR+Rr3e9mH65Hqazp6lcueE0v8iEYAbmp9 K5JNe+TH7Xaxfsg7/5QeqDkbA3psulQ2zUP6K6lwakV1fGCcPKhxt8eH1blYcLO4A0chTyTGQIs OvfOOIk7mIiSnxu8DPGQWBLNFBP4ynpFRLUVyQCYPYX2jYX43cfzJLXJvKfNqfDusYbljt/Fomr fsC2TkTooEWX32zXd5K4i9yXa8vPqcr2bWpEAteOvVxUJc6RgoYN4FyWOhxz9TzZkIrTFi5GAHX UQFDSYmpQS39FTE864Xe5Cz2OdMyg1af5T+9PXXiFKGwqWkykNJiJS68+nRxYhxVfF8dTw542uG dLoTp6+EPdwe/CPEvfdg7+rMlZoPS3oVI78PU3dDneQJc6gq5bAthi4ZtyA4k4IW2WGKB/wb49R y89MrPC7hhpNDijh75Q== X-Proofpoint-ORIG-GUID: Km2HWV7jka5yWfK0i-VaRSAElyQcUrP8 X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 lowpriorityscore=10 suspectscore=0 spamscore=0 priorityscore=1501 adultscore=0 clxscore=1015 impostorscore=10 bulkscore=10 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 Reflow comments to fill 80 columns and fix indentation issues carried over from the original locations in pagemap.h and filemap.c. Signed-off-by: Tal Zussman --- include/linux/folio_wait.h | 62 ++++++++------- mm/folio_wait.c | 185 ++++++++++++++++++++---------------------= ---- 2 files changed, 113 insertions(+), 134 deletions(-) diff --git a/include/linux/folio_wait.h b/include/linux/folio_wait.h index 4a5cb2fcf046..57ccf9ffd243 100644 --- a/include/linux/folio_wait.h +++ b/include/linux/folio_wait.h @@ -19,10 +19,10 @@ struct wait_page_queue { }; =20 static inline bool wake_page_match(struct wait_page_queue *wait_page, - struct wait_page_key *key) + struct wait_page_key *key) { if (wait_page->folio !=3D key->folio) - return false; + return false; key->page_match =3D 1; =20 if (wait_page->bit_nr !=3D key->bit_nr) @@ -41,10 +41,10 @@ void folio_unlock(struct folio *folio); * folio_trylock() - Attempt to lock a folio. * @folio: The folio to attempt to lock. * - * Sometimes it is undesirable to wait for a folio to be unlocked (eg - * when the locks are being taken in the wrong order, or if making - * progress through a batch of folios is more important than processing - * them in order). Usually folio_lock() is the correct function to call. + * Sometimes it is undesirable to wait for a folio to be unlocked (e.g. wh= en + * the locks are being taken in the wrong order, or if making progress thr= ough + * a batch of folios is more important than processing them in order). Usu= ally + * folio_lock() is the correct function to call. * * Context: Any context. * Return: Whether the lock was successfully acquired. @@ -66,23 +66,22 @@ static inline bool trylock_page(struct page *page) * folio_lock() - Lock this folio. * @folio: The folio to lock. * - * The folio lock protects against many things, probably more than it - * should. It is primarily held while a folio is being brought uptodate, - * either from its backing file or from swap. It is also held while a - * folio is being truncated from its address_space, so holding the lock - * is sufficient to keep folio->mapping stable. + * The folio lock protects against many things, probably more than it shou= ld. + * It is primarily held while a folio is being brought uptodate, either fr= om + * its backing file or from swap. It is also held while a folio is being + * truncated from its address_space, so holding the lock is sufficient to = keep + * folio->mapping stable. * - * The folio lock is also held while write() is modifying the page to - * provide POSIX atomicity guarantees (as long as the write does not - * cross a page boundary). Other modifications to the data in the folio - * do not hold the folio lock and can race with writes, eg DMA and stores - * to mapped pages. + * The folio lock is also held while write() is modifying the folio to pro= vide + * POSIX atomicity guarantees (as long as the write does not cross a page + * boundary). Other modifications to the data in the folio do not hold the + * folio lock and can race with writes, e.g. DMA and stores to mapped page= s. * - * Context: May sleep. If you need to acquire the locks of two or - * more folios, they must be in order of ascending index, if they are - * in the same address_space. If they are in different address_spaces, - * acquire the lock of the folio which belongs to the address_space which - * has the lowest address in memory first. + * Context: May sleep. If you need to acquire the locks of two or more fol= ios, + * they must be in order of ascending index, if they are in the same + * address_space. If they are in different address_spaces, acquire the loc= k of + * the folio which belongs to the address_space which has the lowest addre= ss in + * memory first. */ static inline void folio_lock(struct folio *folio) { @@ -99,8 +98,8 @@ static inline void folio_lock(struct folio *folio) * This is a legacy function and new code should probably use folio_lock() * instead. * - * Context: May sleep. Pages in the same folio share a lock, so do not - * attempt to lock two pages which share a folio. + * Context: May sleep. Pages in the same folio share a lock, so do not att= empt + * to lock two pages which share a folio. */ static inline void lock_page(struct page *page) { @@ -116,8 +115,8 @@ static inline void lock_page(struct page *page) * folio_lock_killable() - Lock this folio, interruptible by a fatal signa= l. * @folio: The folio to lock. * - * Attempts to lock the folio, like folio_lock(), except that the sleep - * to acquire the lock is interruptible by a fatal signal. + * Attempts to lock the folio, like folio_lock(), except that the sleep to + * acquire the lock is interruptible by a fatal signal. * * Context: May sleep; see folio_lock(). * Return: 0 if the lock was acquired; -EINTR if a fatal signal was receiv= ed. @@ -131,8 +130,8 @@ static inline int folio_lock_killable(struct folio *fol= io) } =20 /* - * folio_lock_or_retry - Lock the folio, unless this would block and the - * caller indicated that it can handle a retry. + * folio_lock_or_retry - Lock the folio, unless this would block and the c= aller + * indicated that it can handle a retry. * * Return value and mmap_lock implications depend on flags; see * __folio_lock_or_retry(). @@ -147,8 +146,8 @@ static inline vm_fault_t folio_lock_or_retry(struct fol= io *folio, } =20 /* - * This is exported only for folio_wait_locked/folio_wait_writeback, etc., - * and should not be used directly. + * This is exported only for folio_wait_locked/folio_wait_writeback, etc.,= and + * should not be used directly. */ void folio_wait_bit(struct folio *folio, int bit_nr); int folio_wait_bit_killable(struct folio *folio, int bit_nr); @@ -156,9 +155,8 @@ int folio_wait_bit_killable(struct folio *folio, int bi= t_nr); /* * Wait for a folio to be unlocked. * - * This must be called with the caller "holding" the folio, - * ie with increased folio reference count so that the folio won't - * go away during the wait. + * This must be called with the caller "holding" the folio, i.e. with incr= eased + * folio reference count so that the folio won't go away during the wait. */ static inline void folio_wait_locked(struct folio *folio) { diff --git a/mm/folio_wait.c b/mm/folio_wait.c index 9d3328717bb3..8d8237cdd73b 100644 --- a/mm/folio_wait.c +++ b/mm/folio_wait.c @@ -20,14 +20,12 @@ #include "internal.h" =20 /* - * In order to wait for pages to become available there must be - * waitqueues associated with pages. By using a hash table of - * waitqueues where the bucket discipline is to maintain all - * waiters on the same queue and wake all when any of the pages - * become available, and for the woken contexts to check to be - * sure the appropriate page became available, this saves space - * at a cost of "thundering herd" phenomena during rare hash - * collisions. + * In order to wait for pages to become available there must be waitqueues + * associated with pages. By using a hash table of waitqueues where the bu= cket + * discipline is to maintain all waiters on the same queue and wake all wh= en any + * of the pages become available, and for the woken contexts to check to be + * sure the appropriate page became available, this saves space at a cost = of + * "thundering herd" phenomena during rare hash collisions. */ #define PAGE_WAIT_TABLE_BITS 8 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) @@ -70,44 +68,42 @@ void __init folio_wait_init(void) * * (a) no special bits set: * - * We're just waiting for the bit to be released, and when a waker - * calls the wakeup function, we set WQ_FLAG_WOKEN and wake it up, - * and remove it from the wait queue. + * We're just waiting for the bit to be released, and when a waker calls + * the wakeup function, we set WQ_FLAG_WOKEN and wake it up, and remove + * it from the wait queue. * * Simple and straightforward. * * (b) WQ_FLAG_EXCLUSIVE: * - * The waiter is waiting to get the lock, and only one waiter should - * be woken up to avoid any thundering herd behavior. We'll set the + * The waiter is waiting to get the lock, and only one waiter should be + * woken up to avoid any thundering herd behavior. We'll set the * WQ_FLAG_WOKEN bit, wake it up, and remove it from the wait queue. * * This is the traditional exclusive wait. * * (c) WQ_FLAG_EXCLUSIVE | WQ_FLAG_CUSTOM: * - * The waiter is waiting to get the bit, and additionally wants the - * lock to be transferred to it for fair lock behavior. If the lock - * cannot be taken, we stop walking the wait queue without waking - * the waiter. + * The waiter is waiting to get the bit, and additionally wants the lock + * to be transferred to it for fair lock behavior. If the lock cannot be + * taken, we stop walking the wait queue without waking the waiter. * * This is the "fair lock handoff" case, and in addition to setting - * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see - * that it now has the lock. + * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see that + * it now has the lock. */ -static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int= sync, void *arg) +static int wake_page_function(wait_queue_entry_t *wait, unsigned int mode,= int sync, void *arg) { unsigned int flags; struct wait_page_key *key =3D arg; - struct wait_page_queue *wait_page - =3D container_of(wait, struct wait_page_queue, wait); + struct wait_page_queue *wait_page =3D container_of(wait, struct wait_page= _queue, wait); =20 if (!wake_page_match(wait_page, key)) return 0; =20 /* - * If it's a lock handoff wait, we get the bit for it, and - * stop walking (and do not wake it up) if we can't. + * If it's a lock handoff wait, we get the bit for it, and stop walking + * (and do not wake it up) if we can't. */ flags =3D wait->flags; if (flags & WQ_FLAG_EXCLUSIVE) { @@ -121,26 +117,24 @@ static int wake_page_function(wait_queue_entry_t *wai= t, unsigned mode, int sync, } =20 /* - * We are holding the wait-queue lock, but the waiter that - * is waiting for this will be checking the flags without - * any locking. + * We are holding the wait-queue lock, but the waiter that is waiting + * for this will be checking the flags without any locking. * - * So update the flags atomically, and wake up the waiter - * afterwards to avoid any races. This store-release pairs - * with the load-acquire in folio_wait_bit_common(). + * So update the flags atomically, and wake up the waiter afterwards to + * avoid any races. This store-release pairs with the load-acquire in + * folio_wait_bit_common(). */ smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN); wake_up_state(wait->private, mode); =20 /* - * Ok, we have successfully done what we're waiting for, - * and we can unconditionally remove the wait entry. + * Ok, we have successfully done what we're waiting for, and we can + * unconditionally remove the wait entry. * - * Note that this pairs with the "finish_wait()" in the - * waiter, and has to be the absolute last thing we do. - * After this list_del_init(&wait->entry) the wait entry - * might be de-allocated and the process might even have - * exited. + * Note that this pairs with the "finish_wait()" in the waiter, and has + * to be the absolute last thing we do. After this + * list_del_init(&wait->entry) the wait entry might be de-allocated and + * the process might even have exited. */ list_del_init_careful(&wait->entry); return (flags & WQ_FLAG_EXCLUSIVE) !=3D 0; @@ -198,11 +192,10 @@ enum behavior { }; =20 /* - * Attempt to check (or get) the folio flag, and mark us done - * if successful. + * Attempt to check (or get) the folio flag, and mark as done if successfu= l. */ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr, - struct wait_queue_entry *wait) + struct wait_queue_entry *wait) { if (wait->flags & WQ_FLAG_EXCLUSIVE) { if (test_and_set_bit(bit_nr, &folio->flags.f)) @@ -246,18 +239,14 @@ static inline int folio_wait_bit_common(struct folio = *folio, int bit_nr, } =20 /* - * Do one last check whether we can get the - * page bit synchronously. + * Do one last check whether we can get the page bit synchronously. * - * Do the folio_set_waiters() marking before that - * to let any waker we _just_ missed know they - * need to wake us up (otherwise they'll never - * even go to the slow case that looks at the - * page queue), and add ourselves to the wait - * queue if we need to sleep. + * Do the folio_set_waiters() marking before that to let any waker we + * _just_ missed know they need to wake us up (otherwise they'll never + * even go to the slow case that looks at the wait queue), and add + * ourselves to the wait queue if we need to sleep. * - * This part needs to be done under the queue - * lock to avoid races. + * This part needs to be done under the queue lock to avoid races. */ spin_lock_irq(&q->lock); folio_set_waiters(folio); @@ -266,9 +255,8 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, spin_unlock_irq(&q->lock); =20 /* - * From now on, all the logic will be based on - * the WQ_FLAG_WOKEN and WQ_FLAG_DONE flag, to - * see whether the page bit testing has already + * From now on, all the logic will be based on the WQ_FLAG_WOKEN and + * WQ_FLAG_DONE flag, to see whether the page bit testing has already * been done by the wake function. * * We can drop our reference to the folio. @@ -277,10 +265,9 @@ static inline int folio_wait_bit_common(struct folio *= folio, int bit_nr, folio_put(folio); =20 /* - * Note that until the "finish_wait()", or until - * we see the WQ_FLAG_WOKEN flag, we need to - * be very careful with the 'wait->flags', because - * we may race with a waker that sets them. + * Note that until the "finish_wait()", or until we see the + * WQ_FLAG_WOKEN flag, we need to be very careful with the + * 'wait->flags', because we may race with a waker that sets them. */ for (;;) { unsigned int flags; @@ -306,8 +293,8 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, break; =20 /* - * Otherwise, if we're getting the lock, we need to - * try to get it ourselves. + * Otherwise, if we're getting the lock, we need to try to get + * it ourselves. * * And if that fails, we'll have to retry this all. */ @@ -333,13 +320,13 @@ static inline int folio_wait_bit_common(struct folio = *folio, int bit_nr, =20 /* * NOTE! The wait->flags weren't stable until we've done the - * 'finish_wait()', and we could have exited the loop above due - * to a signal, and had a wakeup event happen after the signal - * test but before the 'finish_wait()'. + * 'finish_wait()', and we could have exited the loop above due to a + * signal, and had a wakeup event happen after the signal test but + * before the 'finish_wait()'. * - * So only after the finish_wait() can we reliably determine - * if we got woken up or not, so we can now figure out the final - * return value based on that state without races. + * So only after the finish_wait() can we reliably determine if we got + * woken up or not, so we can now figure out the final return value + * based on that state without races. * * Also note that WQ_FLAG_WOKEN is sufficient for a non-exclusive * waiter, but an exclusive one requires WQ_FLAG_DONE. @@ -452,11 +439,10 @@ EXPORT_SYMBOL(folio_wait_bit_killable); * @folio: The folio to wait for. * @state: The sleep state (TASK_KILLABLE, TASK_UNINTERRUPTIBLE, etc). * - * The caller should hold a reference on @folio. They expect the page to - * become unlocked relatively soon, but do not wish to hold up migration - * (for example) by holding the reference while waiting for the folio to - * come unlocked. After this function returns, the caller should not - * dereference @folio. + * The caller should hold a reference on @folio. They expect the page to b= ecome + * unlocked relatively soon, but do not wish to hold up migration (for exa= mple) + * by holding the reference while waiting for the folio to come unlocked. = After + * this function returns, the caller should not dereference @folio. * * Return: 0 if the folio was unlocked or -EINTR if interrupted by a signa= l. */ @@ -471,8 +457,8 @@ int folio_put_wait_locked(struct folio *folio, int stat= e) * * Unlocks the folio and wakes up any thread sleeping on the page lock. * - * Context: May be called from interrupt or process context. May not be - * called from NMI context. + * Context: May be called from interrupt or process context. May not be ca= lled + * from NMI context. */ void folio_unlock(struct folio *folio) { @@ -490,14 +476,13 @@ EXPORT_SYMBOL(folio_unlock); * @folio: The folio. * @success: True if all reads completed successfully. * - * When all reads against a folio have completed, filesystems should - * call this function to let the pagecache know that no more reads - * are outstanding. This will unlock the folio and wake up any thread - * sleeping on the lock. The folio will also be marked uptodate if all - * reads succeeded. + * When all reads against a folio have completed, filesystems should call = this + * function to let the pagecache know that no more reads are outstanding. = This + * will unlock the folio and wake up any thread sleeping on the lock. The = folio + * will also be marked uptodate if all reads succeeded. * - * Context: May be called from interrupt or process context. May not be - * called from NMI context. + * Context: May be called from interrupt or process context. May not be ca= lled + * from NMI context. */ void folio_end_read(struct folio *folio, bool success) { @@ -577,13 +562,12 @@ EXPORT_SYMBOL(folio_wait_private_2_killable); * folio_wait_writeback - Wait for a folio to finish writeback. * @folio: The folio to wait for. * - * If the folio is currently being written back to storage, wait for the - * I/O to complete. + * If the folio is currently being written back to storage, wait for the I= /O to + * complete. * - * Context: Sleeps. Must be called in process context and with - * no spinlocks held. Caller should hold a reference on the folio. - * If the folio is not locked, writeback may start again after writeback - * has finished. + * Context: Sleeps. Must be called in process context and with no spinlocks + * held. Caller should hold a reference on the folio. If the folio is not + * locked, writeback may start again after writeback has finished. */ void folio_wait_writeback(struct folio *folio) { @@ -598,13 +582,12 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback); * folio_wait_writeback_killable - Wait for a folio to finish writeback. * @folio: The folio to wait for. * - * If the folio is currently being written back to storage, wait for the - * I/O to complete or a fatal signal to arrive. + * If the folio is currently being written back to storage, wait for the I= /O to + * complete or a fatal signal to arrive. * - * Context: Sleeps. Must be called in process context and with - * no spinlocks held. Caller should hold a reference on the folio. - * If the folio is not locked, writeback may start again after writeback - * has finished. + * Context: Sleeps. Must be called in process context and with no spinlocks + * held. Caller should hold a reference on the folio. If the folio is not + * locked, writeback may start again after writeback has finished. * Return: 0 on success, -EINTR if we get a fatal signal while waiting. */ int folio_wait_writeback_killable(struct folio *folio) @@ -623,14 +606,13 @@ EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); * folio_wait_stable() - wait for writeback to finish, if necessary. * @folio: The folio to wait on. * - * This function determines if the given folio is related to a backing - * device that requires folio contents to be held stable during writeback. - * If so, then it will wait for any pending writeback to complete. + * This function determines if the given folio is related to a backing dev= ice + * that requires folio contents to be held stable during writeback. If so,= then + * it will wait for any pending writeback to complete. * - * Context: Sleeps. Must be called in process context and with - * no spinlocks held. Caller should hold a reference on the folio. - * If the folio is not locked, writeback may start again after writeback - * has finished. + * Context: Sleeps. Must be called in process context and with no spinlocks + * held. Caller should hold a reference on the folio. If the folio is not + * locked, writeback may start again after writeback has finished. */ void folio_wait_stable(struct folio *folio) { @@ -670,10 +652,9 @@ int __folio_lock_async(struct folio *folio, struct wai= t_page_queue *wait) folio_set_waiters(folio); ret =3D !folio_trylock(folio); /* - * If we were successful now, we know we're still on the - * waitqueue as we're still under the lock. This means it's - * safe to remove and return success, we know the callback - * isn't going to trigger. + * If we were successful now, we know we're still on the waitqueue as + * we're still under the lock. This means it's safe to remove and + * return success, we know the callback isn't going to trigger. */ if (!ret) __remove_wait_queue(q, &wait->wait); --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76353374E6F for ; Wed, 20 May 2026 20:50:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310244; cv=none; b=uO//d7W3Mf5U78PrUyW0rggeDARr8JF/F76Aj+93Am7DMpx5cZ7/rDmz4wd11DtMIfwsLxK/EiuuNnRA+zhSZhlkZE/mu8eSOaP8wuZPCZhYfhoAseq4y85oj01Rj1tVmooJI3PkRh8CDjCkqzEIIMkadmc3LNVAZf+MF2nEgnY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310244; c=relaxed/simple; bh=aDZzeo5Mi+iWDTXHP87oOV9B/u/ZkW223gf3svvT4DU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qNwoO7ZeqM7B1jb7+t2OCGDh09W5/u8dgM3qxngAR0Jbo/if1rC2EjN2p7U6tZJHRn6md/ibpgdDN8xUp7xWC0sRZ8vMS3exLPJBVtYd9+RmLkacWwpOQL4whtraCKAvBXvLHFCnUdhG9dXDjSJSoSSZHOVDdiKNYE9iXCFe/zU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=D/wYsKnw; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="D/wYsKnw" Received: from pps.filterd (m0167068.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKc24K1686574 for ; Wed, 20 May 2026 16:50:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=ODDt SnyLEU7DCkKG4fZ6y/+r7g+oLCksOr+Ja3KX/vg=; b=D/wYsKnwKqHAu/G4yEwE 8v/whTz+u7J7NgdYQLUlE5hyc9ncjQhJd+eQXOKZHnG1LzEj4ykTArcvKLJxRFsG z/H5zchIlX2LTAJpm/Xi+/AptxRnQWB+4SOi+Nt+k5TWUsF18zx3lYhTNcuC058u f7ZXc7Q7zWVSCkgOiTiHDTh2xRRlKSaYdADvw5BX2qYd+DTtGen+4pB4rgWPOKXE H0R38nvO7Mk/FsZhPlRHCIiXTM3RHW9PFpK3CXj/tAmSE4lazH9AnEkX8jwKnzEI KrdLVsx3zKFHrc0Ni0MqDTA/oTHN3EWh1r+lmDz7BnBVwGqQzIBf6ptulthHRJe8 Sg== Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4e9m5n02fd-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:39 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-913dbb7a318so1319443985a.1 for ; Wed, 20 May 2026 13:50:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310239; x=1779915039; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ODDtSnyLEU7DCkKG4fZ6y/+r7g+oLCksOr+Ja3KX/vg=; b=rVTHFZmz3RyJJikhn3EkXQ1k9D5VkfEZTd24FnSvUj/LJshg2IuI6c/78Wvi28pN8T 2+Fp4u0pUqTQFg0QTI046Ldz6CN6KT77snlmozpeeuHgYp+/098VpF5rRptiH2QGpuH+ GYGhcjc11JWzuChY24pPnPUH1DL7x2tRNbNDGChPlan542myJTulweDIMrXNLg6aDvLx pe0V7d010kbu7Rv1s1TUCoaOqGfjQ0doS/qtwaVNf63jZqpBgyZ/x3MD7M7ahRb1y4xB TeM52znLDPa0u4n5ZWcXi5l6OlB6tL48MWmBfxkOzA07FXCOqi50wGE9IHTdb1kzNLeB Hb9w== X-Forwarded-Encrypted: i=1; AFNElJ8lfnFTd6u47jwjH9WOxUBMzmRNTwP5TKw7NB8bT28YV1Q9phL8q+fD4IFwAdr9PCthLEsEhrCdhmKyXt8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw761ylyeFlS8xUaepYH4XOgSBSAg0pYz8qsdF51APlskMVa/8L 6I3haF9rqcOyZ36YIJxkiTwHYRRn3ofa3CRM6Yp5yNGQGhY7X4lGxv/Xm6OZT+nu48XeoE4Ag+y 8AHouB44ySLo8+niQ84U3+cVQ71B90Si1cyoCAh2NR5mY/rR2ESvHv/79MA2Dcg== X-Gm-Gg: Acq92OGFPUUrd1BZQtTFhaG+uXuIphnXeaJnXXAw74J/pJ2BWZe2/socTN66Nlt+btc jRKaqyrXUyjU81mq2qsb3a+5EEfORWduXj6wla9KMr7w6G8MjirugjF4HvLYLP90vz/A6sFhMZI d7HMinvCCBeIzHmoSvT2q8/IpKRP5n94v1K6QS8Z0CDxtjQL9wgdq6K2VC1iphomNE2RXtptST0 uMhVulNt9laoLHb/ppkJtjwcpunLl+VASJiBvdnJ4tarrF/+aMemf9wnG/b/50AGrtbzfcmWZfo lHsGsjspCx8QW3nObTRx5b0lhHTHJsozv4sy7+lhtKmMMFGwvauOq21NJBxQKBGDm9EiHAQ3aqy zo3ju5PkPfHIVemjChfULxQy/NxOStht77aWUENeDRIxF+lWb7wly4AiBAzYiuA7QaxQ= X-Received: by 2002:a05:620a:7007:b0:910:1c85:4adb with SMTP id af79cd13be357-914a23e8b89mr7185285a.37.1779310238352; Wed, 20 May 2026 13:50:38 -0700 (PDT) X-Received: by 2002:a05:620a:7007:b0:910:1c85:4adb with SMTP id af79cd13be357-914a23e8b89mr7178985a.37.1779310237661; Wed, 20 May 2026 13:50:37 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:37 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:57 -0400 Subject: [PATCH RFC 06/11] folio_wait: rename wait_page_* infrastructure to wait_folio_* Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-6-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=14769; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=aDZzeo5Mi+iWDTXHP87oOV9B/u/ZkW223gf3svvT4DU=; b=K/IvEz0ljqCKFsEXd6/c9iMRpBI68QFsqc4QJ8G5LG1XjBYRvOcrIrxvtfDjj0SmplV8GixLH XbNu0wXh//SDtN5tG2fTm2nmdnyfS+p/55kDFbKQu5TtGb6Uia74u6j X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-ORIG-GUID: xwoUWBvezkqstxs_CCHuNZeUVyNHTp9l X-Authority-Analysis: v=2.4 cv=Pq6jqQM3 c=1 sm=1 tr=0 ts=6a0e1e9f cx=c_pps a=qKBjSQ1v91RyAK45QCPf5w==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=usPcmh10W0ubT8QP8_c3:22 a=ygHQXE4GxI9NjKAxWSYA:9 a=QEXdDO2ut3YA:10 a=O8hF6Hzn-FEA:10 a=NFOGd7dJGGMPyQGDc5-O:22 X-Proofpoint-GUID: xwoUWBvezkqstxs_CCHuNZeUVyNHTp9l X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX5SsOMtYMjLEe 8rJlxxbp09lr2KfoMYAzgFi8jFCIcANga4qeC3rSgRIb6bneqmBXzAjMkqyUJ+dTEOuGrUhIYqA gzPsZtslDYRRmBuadlBKsngiPsBvklmL8wU3vOh1Wezb/zPUGzDVQdDRz0kgho9B2oWn9Mp75+f pY3tH18AO3+OuWBBcWaaud89EhLlY+FIepicoydTPwmuiViGCtHlxnX1IBhC2KvP5lcH0fc8NVT veUGZO0VxXD0hwiM346oH5W9U/0Ht8Eh5M9gqn6HFhK8pOztevHjiEa+o23NJH39yOtB+4npOkr 4a0gKV5EROitMM6HRKtSm4NIzUWAWJj86TS910kSBl0bTjhzCAG8ESMmJdVLeP1GikXIMk2Xe4U 0jEuKo7pbAvBCpvkuP2DL6fuKfJLe7QP9hRpAb4wuyGeoOWKEYTaqsijuq3OYt74mvHFSiQBadj jwX8ZV0Tit12Qiyjz+w== X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 malwarescore=0 spamscore=0 bulkscore=10 lowpriorityscore=10 phishscore=0 adultscore=0 priorityscore=1501 impostorscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 The folio bit-lock wait infrastructure still refers to "page" in the names of its core types and helpers, even though it operates on folios. Rename accordingly: struct wait_page_key -> struct wait_folio_key struct wait_page_queue -> struct wait_folio_queue wait_page_key.page_match -> wait_folio_key.folio_match wake_page_match() -> wake_folio_match() wake_page_function() -> wake_folio_function() PAGE_WAIT_TABLE_{BITS,SIZE} -> FOLIO_WAIT_TABLE_{BITS,SIZE} Also rename local variables and field names, such as io_uring's wpq -> wfq. Update relevant comments as well. While at it, update io_uring/rw.h to include folio_wait.h rather than pagemap.h. Signed-off-by: Tal Zussman --- include/linux/folio_wait.h | 16 +++++----- include/linux/fs.h | 2 +- io_uring/rw.c | 14 ++++----- io_uring/rw.h | 6 ++-- mm/folio_wait.c | 74 +++++++++++++++++++++++-------------------= ---- mm/internal.h | 2 +- 6 files changed, 57 insertions(+), 57 deletions(-) diff --git a/include/linux/folio_wait.h b/include/linux/folio_wait.h index 57ccf9ffd243..1732df23d952 100644 --- a/include/linux/folio_wait.h +++ b/include/linux/folio_wait.h @@ -6,26 +6,26 @@ #include #include =20 -struct wait_page_key { +struct wait_folio_key { struct folio *folio; int bit_nr; - int page_match; + int folio_match; }; =20 -struct wait_page_queue { +struct wait_folio_queue { struct folio *folio; int bit_nr; wait_queue_entry_t wait; }; =20 -static inline bool wake_page_match(struct wait_page_queue *wait_page, - struct wait_page_key *key) +static inline bool wake_folio_match(struct wait_folio_queue *wait_folio, + struct wait_folio_key *key) { - if (wait_page->folio !=3D key->folio) + if (wait_folio->folio !=3D key->folio) return false; - key->page_match =3D 1; + key->folio_match =3D 1; =20 - if (wait_page->bit_nr !=3D key->bit_nr) + if (wait_folio->bit_nr !=3D key->bit_nr) return false; =20 return true; diff --git a/include/linux/fs.h b/include/linux/fs.h index bb9cc4f7207c..cd5088dfe9a1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -390,7 +390,7 @@ struct kiocb { * waitqueue associated with completing the read. * Valid IFF IOCB_WAITQ is set. */ - struct wait_page_queue *ki_waitq; + struct wait_folio_queue *ki_waitq; }; =20 static inline bool is_sync_kiocb(struct kiocb *kiocb) diff --git a/io_uring/rw.c b/io_uring/rw.c index 0c4834645279..fc87baac1911 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -763,14 +763,14 @@ static ssize_t loop_rw_iter(int ddir, struct io_rw *r= w, struct iov_iter *iter) static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode, int sync, void *arg) { - struct wait_page_queue *wpq; + struct wait_folio_queue *wfq; struct io_kiocb *req =3D wait->private; struct io_rw *rw =3D io_kiocb_to_cmd(req, struct io_rw); - struct wait_page_key *key =3D arg; + struct wait_folio_key *key =3D arg; =20 - wpq =3D container_of(wait, struct wait_page_queue, wait); + wfq =3D container_of(wait, struct wait_folio_queue, wait); =20 - if (!wake_page_match(wpq, key)) + if (!wake_folio_match(wfq, key)) return 0; =20 rw->kiocb.ki_flags &=3D ~IOCB_WAITQ; @@ -783,7 +783,7 @@ static int io_async_buf_func(struct wait_queue_entry *w= ait, unsigned mode, * This controls whether a given IO request should be armed for async page * based retry. If we return false here, the request is handed to the async * worker threads for retry. If we're doing buffered reads on a regular fi= le, - * we prepare a private wait_page_queue entry and retry the operation. This + * we prepare a private wait_folio_queue entry and retry the operation. Th= is * will either succeed because the page is now uptodate and unlocked, or it * will register a callback when the page is unlocked at IO completion. Th= rough * that callback, io_uring uses task_work to setup a retry of the operatio= n. @@ -794,7 +794,7 @@ static int io_async_buf_func(struct wait_queue_entry *w= ait, unsigned mode, static bool io_rw_should_retry(struct io_kiocb *req) { struct io_async_rw *io =3D req->async_data; - struct wait_page_queue *wait =3D &io->wpq; + struct wait_folio_queue *wait =3D &io->wfq; struct io_rw *rw =3D io_kiocb_to_cmd(req, struct io_rw); struct kiocb *kiocb =3D &rw->kiocb; =20 @@ -897,7 +897,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_= t mode, int rw_type) return -EINVAL; =20 /* - * We have a union of meta fields with wpq used for buffered-io + * We have a union of meta fields with wfq used for buffered-io * in io_async_rw, so fail it here. */ if (!(file->f_flags & O_DIRECT)) diff --git a/io_uring/rw.h b/io_uring/rw.h index 9bd7fbf70ea9..22e9f77c51d6 100644 --- a/io_uring/rw.h +++ b/io_uring/rw.h @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 =20 +#include #include -#include =20 struct io_meta_state { u32 seed; @@ -19,11 +19,11 @@ struct io_async_rw { unsigned buf_group; =20 /* - * wpq is for buffered io, while meta fields are used with + * wfq is for buffered io, while meta fields are used with * direct io */ union { - struct wait_page_queue wpq; + struct wait_folio_queue wfq; struct { struct uio_meta meta; struct io_meta_state meta_state; diff --git a/mm/folio_wait.c b/mm/folio_wait.c index 8d8237cdd73b..70f808729f9c 100644 --- a/mm/folio_wait.c +++ b/mm/folio_wait.c @@ -20,20 +20,20 @@ #include "internal.h" =20 /* - * In order to wait for pages to become available there must be waitqueues - * associated with pages. By using a hash table of waitqueues where the bu= cket + * In order to wait for folios to become available there must be waitqueues + * associated with folios. By using a hash table of waitqueues where the b= ucket * discipline is to maintain all waiters on the same queue and wake all wh= en any - * of the pages become available, and for the woken contexts to check to be - * sure the appropriate page became available, this saves space at a cost = of + * of the folios become available, and for the woken contexts to check to = be + * sure the appropriate folio became available, this saves space at a cost= of * "thundering herd" phenomena during rare hash collisions. */ -#define PAGE_WAIT_TABLE_BITS 8 -#define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) -static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] __cachelin= e_aligned; +#define FOLIO_WAIT_TABLE_BITS 8 +#define FOLIO_WAIT_TABLE_SIZE (1 << FOLIO_WAIT_TABLE_BITS) +static wait_queue_head_t folio_wait_table[FOLIO_WAIT_TABLE_SIZE] __cacheli= ne_aligned; =20 static wait_queue_head_t *folio_waitqueue(struct folio *folio) { - return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)]; + return &folio_wait_table[hash_ptr(folio, FOLIO_WAIT_TABLE_BITS)]; } =20 /* How many times do we accept lock stealing from under a waiter? */ @@ -53,14 +53,14 @@ void __init folio_wait_init(void) { int i; =20 - for (i =3D 0; i < PAGE_WAIT_TABLE_SIZE; i++) + for (i =3D 0; i < FOLIO_WAIT_TABLE_SIZE; i++) init_waitqueue_head(&folio_wait_table[i]); =20 register_sysctl_init("vm", folio_wait_sysctl_table); } =20 /* - * The page wait code treats the "wait->flags" somewhat unusually, because + * The folio wait code treats the "wait->flags" somewhat unusually, because * we have multiple different kinds of waits, not just the usual "exclusiv= e" * one. * @@ -92,13 +92,13 @@ void __init folio_wait_init(void) * WQ_FLAG_WOKEN, we set WQ_FLAG_DONE to let the waiter easily see that * it now has the lock. */ -static int wake_page_function(wait_queue_entry_t *wait, unsigned int mode,= int sync, void *arg) +static int wake_folio_function(wait_queue_entry_t *wait, unsigned int mode= , int sync, void *arg) { unsigned int flags; - struct wait_page_key *key =3D arg; - struct wait_page_queue *wait_page =3D container_of(wait, struct wait_page= _queue, wait); + struct wait_folio_key *key =3D arg; + struct wait_folio_queue *wait_folio =3D container_of(wait, struct wait_fo= lio_queue, wait); =20 - if (!wake_page_match(wait_page, key)) + if (!wake_folio_match(wait_folio, key)) return 0; =20 /* @@ -143,26 +143,26 @@ static int wake_page_function(wait_queue_entry_t *wai= t, unsigned int mode, int s static void folio_wake_bit(struct folio *folio, int bit_nr) { wait_queue_head_t *q =3D folio_waitqueue(folio); - struct wait_page_key key; + struct wait_folio_key key; unsigned long flags; =20 key.folio =3D folio; key.bit_nr =3D bit_nr; - key.page_match =3D 0; + key.folio_match =3D 0; =20 spin_lock_irqsave(&q->lock, flags); __wake_up_locked_key(q, TASK_NORMAL, &key); =20 /* - * It's possible to miss clearing waiters here, when we woke our page - * waiters, but the hashed waitqueue has waiters for other pages on it. + * It's possible to miss clearing waiters here, when we woke our folio + * waiters, but the hashed waitqueue has waiters for other folios on it. * That's okay, it's a rare case. The next waker will clear it. * * Note that, depending on the page pool (buddy, hugetlb, ZONE_DEVICE, * other), the flag may be cleared in the course of freeing the page; * but that is not required for correctness. */ - if (!waitqueue_active(q) || !key.page_match) + if (!waitqueue_active(q) || !key.folio_match) folio_clear_waiters(folio); =20 spin_unlock_irqrestore(&q->lock, flags); @@ -180,13 +180,13 @@ void folio_wake_writeback(struct folio *folio) * A choice of three behaviors for folio_wait_bit_common(): */ enum behavior { - EXCLUSIVE, /* Hold ref to page and take the bit when woken, like + EXCLUSIVE, /* Hold ref to folio and take the bit when woken, like * __folio_lock() waiting on then setting PG_locked. */ - SHARED, /* Hold ref to page and check the bit when woken, like + SHARED, /* Hold ref to folio and check the bit when woken, like * folio_wait_writeback() waiting on PG_writeback. */ - DROP, /* Drop ref to page before wait, no check when woken, + DROP, /* Drop ref to folio before wait, no check when woken, * like folio_put_wait_locked() on PG_locked. */ }; @@ -212,8 +212,8 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, { wait_queue_head_t *q =3D folio_waitqueue(folio); int unfairness =3D sysctl_page_lock_unfairness; - struct wait_page_queue wait_page; - wait_queue_entry_t *wait =3D &wait_page.wait; + struct wait_folio_queue wait_folio; + wait_queue_entry_t *wait =3D &wait_folio.wait; bool thrashing =3D false; unsigned long pflags; bool in_thrashing; @@ -226,9 +226,9 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, } =20 init_wait(wait); - wait->func =3D wake_page_function; - wait_page.folio =3D folio; - wait_page.bit_nr =3D bit_nr; + wait->func =3D wake_folio_function; + wait_folio.folio =3D folio; + wait_folio.bit_nr =3D bit_nr; =20 repeat: wait->flags =3D 0; @@ -239,7 +239,7 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, } =20 /* - * Do one last check whether we can get the page bit synchronously. + * Do one last check whether we can get the folio bit synchronously. * * Do the folio_set_waiters() marking before that to let any waker we * _just_ missed know they need to wake us up (otherwise they'll never @@ -256,7 +256,7 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, =20 /* * From now on, all the logic will be based on the WQ_FLAG_WOKEN and - * WQ_FLAG_DONE flag, to see whether the page bit testing has already + * WQ_FLAG_DONE flag, to see whether the folio bit testing has already * been done by the wake function. * * We can drop our reference to the folio. @@ -359,8 +359,8 @@ static inline int folio_wait_bit_common(struct folio *f= olio, int bit_nr, void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) __releases(ptl) { - struct wait_page_queue wait_page; - wait_queue_entry_t *wait =3D &wait_page.wait; + struct wait_folio_queue wait_folio; + wait_queue_entry_t *wait =3D &wait_folio.wait; bool thrashing =3D false; unsigned long pflags; bool in_thrashing; @@ -375,9 +375,9 @@ void softleaf_entry_wait_on_locked(softleaf_t entry, sp= inlock_t *ptl) } =20 init_wait(wait); - wait->func =3D wake_page_function; - wait_page.folio =3D folio; - wait_page.bit_nr =3D PG_locked; + wait->func =3D wake_folio_function; + wait_folio.folio =3D folio; + wait_folio.bit_nr =3D PG_locked; wait->flags =3D 0; =20 spin_lock_irq(&q->lock); @@ -439,7 +439,7 @@ EXPORT_SYMBOL(folio_wait_bit_killable); * @folio: The folio to wait for. * @state: The sleep state (TASK_KILLABLE, TASK_UNINTERRUPTIBLE, etc). * - * The caller should hold a reference on @folio. They expect the page to b= ecome + * The caller should hold a reference on @folio. They expect the folio to = become * unlocked relatively soon, but do not wish to hold up migration (for exa= mple) * by holding the reference while waiting for the folio to come unlocked. = After * this function returns, the caller should not dereference @folio. @@ -455,7 +455,7 @@ int folio_put_wait_locked(struct folio *folio, int stat= e) * folio_unlock - Unlock a locked folio. * @folio: The folio. * - * Unlocks the folio and wakes up any thread sleeping on the page lock. + * Unlocks the folio and wakes up any thread sleeping on the folio lock. * * Context: May be called from interrupt or process context. May not be ca= lled * from NMI context. @@ -639,7 +639,7 @@ int __folio_lock_killable(struct folio *folio) } EXPORT_SYMBOL_GPL(__folio_lock_killable); =20 -int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) +int __folio_lock_async(struct folio *folio, struct wait_folio_queue *wait) { struct wait_queue_head *q =3D folio_waitqueue(folio); int ret; diff --git a/mm/internal.h b/mm/internal.h index a121ca07f75c..21b0f4ec2478 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -105,7 +105,7 @@ void page_writeback_init(void); void folio_wait_init(void); void folio_wake_writeback(struct folio *folio); int folio_put_wait_locked(struct folio *folio, int state); -int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait); +int __folio_lock_async(struct folio *folio, struct wait_folio_queue *wait); =20 /* * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages, --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE67C376462 for ; Wed, 20 May 2026 20:50:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310243; cv=none; b=S4TNVOp+vdFq0yh82kiQ5DU/oJ6aEIi4D8q2uhHufi6EpIgu0wyDh5v6y8S3qpbytfnyNYVr8NJw3z7Kzi35gAoXGAk4TfJTRmdxREqp+20hxDjDUIraYOSfocpTCEJrWXqMXIJm2ZVk7QKASP9t+5DOKui7wXRINcGl2RBInEc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310243; c=relaxed/simple; bh=jJu/raWzDXHDrGY3OmHYJ5IRBMbRwIJjRLBSVuVduxw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sHANK79p5/Hprk9UytQJiMu0g+S+ac7ykjgPG3jJVqIrsPbQGX51E5Kuxpw3IZBw3yoPN+2mNe+lLYoJVs5UdFMy2ZNSzjfU8jiX+ABWZjr/GmpO30I6Yw2Zq2Uvt/Vngivgx5VXA8sG5tIHPYRKH5pv167bDX7ykY9YmZqrYYI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=B0T0AnEW; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="B0T0AnEW" Received: from pps.filterd (m0167068.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKbg5a1685666 for ; Wed, 20 May 2026 16:50:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=sToO R/Pf9Se6n6q9d83MywFIBvA3eA1y47gZuXeAAaE=; b=B0T0AnEW8S1XAAxYRdDC Jd/njvke3xOxdIRYA722oTLR9w1rS83qi2Xp/XDadg2RZ+8QRAuRaruu3txhGiA3 WcAY2ChK5Axkc6i1dUYeDzaiPGtTmG6btqv4svi1AzfXKUxyfoU5EodM0i5uGh23 x2xzAkPIz87g6dPT0sjt9VI9w6jwiBO+VSPSwfPFtK9zKdtSPZ+yZ9vACywTq4JE YI48gAATuweguQyGWHYhDCRN2/puaW69TUVcCLHFylJ3q4EmA41rhDmy9uhUvA/M KwbYBh3TbT9t5d9IGqAdrbF3hRkt5T1kOdVsC2L5+Q0xHZz1vyoaL5P3LSXSqmpu 4g== Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4e9m5n02fg-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:40 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-90d02857cdfso1176532485a.2 for ; Wed, 20 May 2026 13:50:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310239; x=1779915039; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=sToOR/Pf9Se6n6q9d83MywFIBvA3eA1y47gZuXeAAaE=; b=SpAxW61aKHAQ/q0QpVPeY3bVX+vESqag0c5pIzNlO8XdspazkIuxnaQsbO09bENsuD +i9yASGwI9ql+Y+t4kRgW4/RK1xKxjGtiGNAc/E2tLw+XwuYm5MS2o7cn6erbQpPr/+D ytlz9jXorPFQ85YDYMbz7RhNpVr5b31wUjSPpJjqrpuw8iBMZCCY/TIjSr1pxLM0f5WV U/mnyKaycaAC9aUaI8YCM4IqWZQwh2TBrxwcWBqk3UBKrG7NJjDU2PGGUdRm5/SKO2hj 3tYMbsALtYA/iwBqhWvD0JlE9mdQIIbQ+zv7svJytOjC42gAJO2l2FyOovyV+uPf6ozC XNKA== X-Forwarded-Encrypted: i=1; AFNElJ+dr3FP2lai2npsJXOS0uxOUodrIIKahhNWzT6xyG2YmBwB7VTEBOQwtrO9BFVn+jjV2ImTZeIdYwXRCRI=@vger.kernel.org X-Gm-Message-State: AOJu0Yyckgkh6YKZ+717YuQwz+OhXYbgSLkClQ1CVLjaoLQ+rug/kmKy ANJ9Xoqf0kFBmTm+IQmUdrtmR140Fxhfd3nlGoWB97aeCKCvtKsBRmSC9cjjz7HVJTRmV0K0xEP VjnOcqc5ECkbYemWf4L3w5bcm19Qy5kHMd1+ts9gnYHoSawefTl6pAjswuSRk+w== X-Gm-Gg: Acq92OGLVWqj/pplSCEzh5/kPWZIgGBgG4ViEa+/ZSUPKO66xRzhjEZ57+NZthjXvGR kk5RuF5wBx8RFG+TxvzgLu0NcIKIfwz9RZ1ybFJclu9GTNQAyFNeZhwx0FfM+Lc2wC9l5Z3SYn8 DUJdzgQ7HDHif2wMDF/0ZjT5Kp6VdCHyEpYTWqOc61DOPTZ/c8zmUx9rFVi0y8A3gdzRHiLiEpI /WYXaLfXBZ1HjgzN8kL4/jRb46NkKMPsE03ZQ6YPRsWAcIEo8jeO54guRPA2NBymYh8ypvqfW18 O2voWOZVJuhh9QOhmNbZZHSv8lH6pqNfIn5S61LNs+4/8S45iWL/bE18YdR776B5cINcY9bGPkp e0Ouu+p8hcvy3CjcOEGtnxtpzO2atHgRvVHEl2a1XL2L6WTao7/9pL8Ycz+y4AjcGHU8= X-Received: by 2002:a05:620a:f15:b0:913:e19b:2f56 with SMTP id af79cd13be357-913e19b63d0mr2524459385a.10.1779310239195; Wed, 20 May 2026 13:50:39 -0700 (PDT) X-Received: by 2002:a05:620a:f15:b0:913:e19b:2f56 with SMTP id af79cd13be357-913e19b63d0mr2524452485a.10.1779310238647; Wed, 20 May 2026 13:50:38 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:38 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:58 -0400 Subject: [PATCH RFC 07/11] folio_wait: convert VM_BUG_ON_FOLIO() to VM_WARN_ON_ONCE_FOLIO() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-7-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=1934; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=jJu/raWzDXHDrGY3OmHYJ5IRBMbRwIJjRLBSVuVduxw=; b=aWk4flyPY3yiLsLPfjIZ128cQrOIzopXW9sJDfGsMadODQh9UbsXyooxIwmWJlV2xXqudRhGw BU3URuIRQ6lDLnXcm0mJlBPdMp+Mm+NXsFS8CdLq15K50p/AHbAIbje X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-ORIG-GUID: PRC6qei3RIvR6imPuplsKQDCrNHXOF9P X-Authority-Analysis: v=2.4 cv=Pq6jqQM3 c=1 sm=1 tr=0 ts=6a0e1ea0 cx=c_pps a=HLyN3IcIa5EE8TELMZ618Q==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=usPcmh10W0ubT8QP8_c3:22 a=VwQbUJbxAAAA:8 a=M6LSvAv_FjuMSCFMCIwA:9 a=QEXdDO2ut3YA:10 a=bTQJ7kPSJx9SKPbeHEYW:22 X-Proofpoint-GUID: PRC6qei3RIvR6imPuplsKQDCrNHXOF9P X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX7L6LHg5pLeCI T6kAryXBr5aZxI531AOzgyzKhKDb/cSJczGjqj9xjBlljMwDmCNmkBSf51FOI+B4lJhX7TaU8TQ T0kNiEPf4vI3Jyt5n7jkoak6yCX+WaHNIypFyuJW88xbxBVjFQxiffDXK05dKEKRwHNfvmUwlCp fkJ9i3bw/I+7dkTeyyNmqMQJLYVVIAwtpnVURJyNcyYT1ZEtPQKE4Tz2XDDGsshZ2CQx1uye45+ z8PH/rmp6aOs5jAbHlZUjaSAYZmrBZU8gokfXATzH5tVbFEMN7QQSIzIYzybLe7e9KxB/Rn+LWX V8F7awC6OmZtoPOfA7apuZussB6qHUwUo4LXeMLl3Fe254lRpSOih3kSTrwib7B5GPWzTISsKNI 01ZtHtLg98ATPgvGbscy8WSbY2NTtxLSJQfLBl46uEGcCuD05Wj49kSkG4FIqHViPjph+Wky1SF ozeBZ48jDOGfV35PH0w== X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 malwarescore=0 spamscore=0 bulkscore=10 lowpriorityscore=10 phishscore=0 adultscore=0 priorityscore=1501 impostorscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 BUG_ON() is deprecated [1]. The VM_BUG_ON_FOLIO() assertions in folio_unlock(), folio_end_read(), and folio_end_private_2() verify folio state invariants and are already debug checks. There is no additional benefit gained by crashing the system. Convert them to VM_WARN_ON_ONCE_FOLIO(), as is now preferred for such checks. [1] https://www.kernel.org/doc/html/latest/process/coding-style.html#use-wa= rn-rather-than-bug Signed-off-by: Tal Zussman --- mm/folio_wait.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/folio_wait.c b/mm/folio_wait.c index 70f808729f9c..52d336bc7fe0 100644 --- a/mm/folio_wait.c +++ b/mm/folio_wait.c @@ -465,7 +465,7 @@ void folio_unlock(struct folio *folio) /* Bit 7 allows x86 to check the byte's sign bit */ BUILD_BUG_ON(PG_waiters !=3D 7); BUILD_BUG_ON(PG_locked > 7); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); if (folio_xor_flags_has_waiters(folio, 1 << PG_locked)) folio_wake_bit(folio, PG_locked); } @@ -490,8 +490,8 @@ void folio_end_read(struct folio *folio, bool success) =20 /* Must be in bottom byte for x86 to work */ BUILD_BUG_ON(PG_uptodate > 7); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio), folio); + VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); + VM_WARN_ON_ONCE_FOLIO(success && folio_test_uptodate(folio), folio); =20 if (likely(success)) mask |=3D 1 << PG_uptodate; @@ -513,7 +513,7 @@ EXPORT_SYMBOL(folio_end_read); */ void folio_end_private_2(struct folio *folio) { - VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio); + VM_WARN_ON_ONCE_FOLIO(!folio_test_private_2(folio), folio); clear_bit_unlock(PG_private_2, folio_flags(folio, 0)); folio_wake_bit(folio, PG_private_2); folio_put(folio); --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83BB0376BD0 for ; Wed, 20 May 2026 20:50:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310245; cv=none; b=kuGlpEKWL2L/+nJvbtcnYrb6t2bbeXe3mlar1IY8naK4VmbVQMox/tr9yrGGnYxWeTolZT13jnEJYCRB4rwKYBxOzL3UqdmiBaJJdCqTm2EuRJ0IOuKiZrb5MVUS4XlkVaBwThkgsgnHadLsx0W2tktLNBR8mmJlBMfSjjW70HA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310245; c=relaxed/simple; bh=dFyWvW6idGsgNVV3N8U6/Sf1EZqh/O1Nww0zGEVwRg8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=HoanVsa4Z6f689z3EvCE2CsOyP6RDWruoyIZdPVhzrcto28NyaE/m1Lc7jCEy0F4Gvr8Nbf4+4lIxGGPlLzXNgbz5YYeW3knhgQxSMTJ3v770ete3pOyFXIkg1jfCqn3Zi2T8KCCgZTRCcL8OjFpnBWboTPV/FHFH99KUg4+aYY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=R6qzpP2I; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="R6qzpP2I" Received: from pps.filterd (m0167076.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKNxvs506294 for ; Wed, 20 May 2026 16:50:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=avRQ NH2N900diQes/ogN4qxa19bmdOzpQ8cdXUSTkOY=; b=R6qzpP2I6TzJ9toROuZ4 25CAzSV0QypPvPhnHnRyFZnZ7ztKEuyAJKs24zra5vCMLYILNukkb/Tcsd217DS4 fggTihPdbZMn+yqz2B2JBLKJEc+41PZdlH++fP5e0SMaxYoHk97ZEfIr41vmfVO5 XXjqsssgJOCszcemHbfkWa25njREhNk0idIx4DAK/7F/BjCp3GKntlko+aGADs6u d6DefmlE/cw97ki7b5pBMrodg54Pz/HR+n/BZYql+ihGX5222U2LgO/KLY7N8m6F Up9pgnDgHsAJ2UD7LhXHFKJfSdOfrcpQEVqb5tVXZVQD1a5fjEHWUxupV7ijBmBb lA== Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e9a0bd1t9-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:41 -0400 (EDT) Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-8ba8a1f3dd7so80035436d6.3 for ; Wed, 20 May 2026 13:50:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310240; x=1779915040; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=avRQNH2N900diQes/ogN4qxa19bmdOzpQ8cdXUSTkOY=; b=oDDVUDBjikaNpE07W1p/+tsK6GtcfyCJv5ZUeddKs8YGd78VnsbGS5M6Oc56aLnSiS 08GN6bl0+61EtDTuogcHU4kxx4sEfQct9d40uX9A+xXax4HDq8Thh92rYpJjp+qE9kHy jzoMUZnVjMKBO2BUonMAvOG+IYmaGt9mhcQqEjseFnAsUoNq4D+JvJAjQ3ItY6u1hmV4 8wwHy0dcl+Ql0mRv/jPunl8oYDhX9awVBs61zLWNG3qpSl7I7vBr0beL801ImA7d29YR dBHbguWpchahvq5ksmqlVrNAe3dzw79zxkhIHf0SJacWui5llIIMd7q4Gg2nSbcD8j/E NxAA== X-Forwarded-Encrypted: i=1; AFNElJ/Axgeq7D+7qmv6DqT9JQBqpv6QFLBGYwNIZWKm+GxOdpl0F1+yNhFqQbkXD6Hp50QYb+eyY++p34StxsU=@vger.kernel.org X-Gm-Message-State: AOJu0YxFIqBwXYIQhnHj5VTINVM2Ha7Bc1bT6w8VcFSKFfnnrJ30sHMQ 6cJzNy+lrJnviQI8uNtN73J5byZaQOBJkSypV3y38gG7QOijoLRUi7K9vd+DusmZPyouw5SR6SD aIMl69mZL6NrAMOxwQ2hmHhZPVA24QxohjUrTyJunBSp3sowrKvksMF2lULvDvA== X-Gm-Gg: Acq92OE1bB/PK7LoHi6fcph6sUMLr7D3SATlQZe4li4Wow0e+VLj62aQnLWXIAcI3Rw C/8PFwQq24U2e4i1o8CM+1vHGk5NtoXbG/IlMBjCJS3P/fT3oc7mSVIoZ3vU6EwjTl0nEtMh6b0 o1Y5SMyNiqZzFSAU7m0M+Ed+rn9oSOOXVxqvJONm1n0NVmJDYwL42RRe7RUUFdp940GrrAnK9LB UK+aQknhespLuHKXdCzLlD8qTPjxIWKB7WdLfr33tx/bDtW1R8zhKGO+jI6hTGjYG+cS9x1aCGS awYsOB+HQt3mgf3zfILcaomhvFPP8f+XVjKrNg8ZU3lUsFsBQlcdS3wL6IipscVkX1151RD65Vg Z5FKQNmIlQ4crul68qdIV6THnHf0aWRg6FvhC29kMHejzydwmw0UV32wnE3Twd2nPSWU= X-Received: by 2002:a05:620a:28c2:b0:909:e4dc:fb32 with SMTP id af79cd13be357-911cdc429b7mr3837077385a.33.1779310240590; Wed, 20 May 2026 13:50:40 -0700 (PDT) X-Received: by 2002:a05:620a:28c2:b0:909:e4dc:fb32 with SMTP id af79cd13be357-911cdc429b7mr3837072885a.33.1779310240054; Wed, 20 May 2026 13:50:40 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:39 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:48:59 -0400 Subject: [PATCH RFC 08/11] MAINTAINERS: add folio_wait files to MEMORY MANAGEMENT - CORE Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-8-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=850; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=dFyWvW6idGsgNVV3N8U6/Sf1EZqh/O1Nww0zGEVwRg8=; b=HPyU6CJCZ8kqU8alpJJp2HCe9isKSiBdanKUUwMFGvy0Q00sMV+3afcEBole6GJFz/iDcpiMr llz2jdaNdK3DhYhQLMgNYC2AXXFscR5FYH0b1/6uHxLX4UxQ5UB3BTb X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: R2c9YbOCuotEfb3_KXrCWC1rS3FHZ5SG X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfXzCLOKPH9EG5Q AGt1rSBieVhzrk7vQG6fo6XCD/bSL9tQl7ro3HU6t34EU3hqsf8mr0As99ARiGEwAeOWEvlnoEU r5AxRKQMbrONOzVVd51W1aQuvGz6wVyqytuTL6tz85FnzSfE/r+CpI8QBtFDWh+Eqsiff6N2kp4 fmWHcctX8uJljY0zypGgUbW6YRrD1ETJssNBpb8HeI9lISc/StPAw89KZxP7Qsb+hpqPirm05zi wsvWzSjovV9X0W04ZsXREvnZwjeyb0Vc4lBh5KdquEcRBMwS+jvP4vfGlOsqLE1gpj7UUjg4ffI DQguDgETZVAIqnCOBFysx5Q6DgJ9bmsf5FLs1j9UZYUZL3LziOGBQ13qD01vdtFaVWv2kOao298 +55mwmi3oH2zul7M2kJOuXVttKLtZAEQB4Hw6FOoNtfFcAwfOgImFeLUEjyEKJt1VdlyVv4/BDO vPr3NkaphDG7cn1eAfA== X-Proofpoint-ORIG-GUID: R2c9YbOCuotEfb3_KXrCWC1rS3FHZ5SG X-Authority-Analysis: v=2.4 cv=KLJqylFo c=1 sm=1 tr=0 ts=6a0e1ea1 cx=c_pps a=oc9J++0uMp73DTRD5QyR2A==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=Qm0qsxP7aFY2tkT6R2MF:22 a=1-S1nHsFAAAA:8 a=VwQbUJbxAAAA:8 a=hdCJvu6vi-O6ykQZt0QA:9 a=QEXdDO2ut3YA:10 a=iYH6xdkBrDN1Jqds4HTS:22 a=gK44uIRsrOYWoX5St5dO:22 X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=10 priorityscore=1501 malwarescore=0 adultscore=0 clxscore=1015 bulkscore=10 phishscore=0 suspectscore=0 lowpriorityscore=10 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 Add mm/folio_wait.c and include/linux/folio_wait.h after they were split out from mm/filemap.c, mm/page-writeback.c, and include/linux/pagemap.h. Signed-off-by: Tal Zussman --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 8cf9ba51d981..bfe1488d9030 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16781,6 +16781,7 @@ S: Maintained W: http://www.linux-mm.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm F: include/linux/folio_batch.h +F: include/linux/folio_wait.h F: include/linux/gfp.h F: include/linux/gfp_types.h F: include/linux/highmem.h @@ -16802,6 +16803,7 @@ F: kernel/fork.c F: mm/Kconfig F: mm/debug.c F: mm/folio-compat.c +F: mm/folio_wait.c F: mm/highmem.c F: mm/init-mm.c F: mm/internal.h --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A41FF377EA5 for ; Wed, 20 May 2026 20:50:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310247; cv=none; b=ksYHZUa73eTEb4iDxP7T7UZV6QjlaNeFgyigb1KlrDmPOYCDRZtztxKzEHntZCYcRQM7CVtuKsvgQwfVCbhoUgeSCK7PS2Eit3nVdzd6+zwqaTeeCmdAWYJqAeeLiYrEYYAvuISf1f+LkyhQabpZcwltzivYx2NIAwIgb1ObXDo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310247; c=relaxed/simple; bh=rA3ee1bVN85AK8Xbqqgod0QTROlaUP31hnwYXO5EEik=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=o+XSYCIyBQdWK2MPEBBiF7NoV/nR9pCMCnyfvxYMpgBY0jONf4Yrz8lqScIJc8f744lcRoWCaoLUl8WPhdreDbJXKXDdDFS3gun0RnZzkofm24qSNUdAYoCK6W5/pooylOmLkBkPMqFPmQaXHIpdQucA9mIwI7aeOBVSPT7ChFU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=iHJauhNO; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="iHJauhNO" Received: from pps.filterd (m0167077.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKOQv31271403 for ; Wed, 20 May 2026 16:50:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=yDlB tkSZ/QVPh/7V2LuCLj7vDoerhrZrGeObhOfbEDE=; b=iHJauhNOQfQGuWQ9PBMd 9498doyET3MaIYPPH6gAxhnfTbaNfiN2HpEWnhfQQydxrwb9YLjAr0f4Hzz9oE/O fgZj57LX1AT6vGPBI4TBSadyl2n+Ha4Tz+b8UVbJXRMfLIBkIuY/Ve9fGyVT4co8 CWfypz/xg3x2wPetKTOGO0gbdH/nfd8bk8oUJYPlHyI8bnWD4jXB1S/B9q8XPwVg a/q7T59AHt1VqQQ1Jq/DCsxu26G8aGUFIr5qhSBAsd9ZVD/1vrojSd1jGW4Sf5ya FNIQdmkD5+yJCvLbmPd1L6c+MvkJlF6HEAnpWHT38ExeFCzYB0ApWQz8kBWW88CI Fg== Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e98j6wc8u-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:42 -0400 (EDT) Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-913fcc4c164so1255757685a.1 for ; Wed, 20 May 2026 13:50:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310241; x=1779915041; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=yDlBtkSZ/QVPh/7V2LuCLj7vDoerhrZrGeObhOfbEDE=; b=dTzWE6ShHIfT3+GnWXp0XzumkQVQCh7wpMqdL9oTnlMTTLzNCXtopXzLu19PRn2Tfi CqgAH8+i61F5/0OkQ9u8Dc4Ucby1+v25nuVi5sMJm+kDZFwQg9tlFv2NiHi+Vqa1v4bs 7HEevNqsUSPGgUFspqKkY0NzDz0/RW8HrtOASYOjce56sNcwO5XuM0bNcL1dButgnymZ hL6NFFfpqZcBdXRpYRqtjBtz0Hmg7B8xdRgiSV4MX0MoOJ8X7qX9pMvoEMxiv9OSJ293 LrR4wMqEG2wooDGhQ7ec3MLgno5HMqxVsCLNJJphTPAw31VGhythE4toVghTLPkclpgC luCA== X-Forwarded-Encrypted: i=1; AFNElJ/wg0hf0xw8kj9nO7wuy9HovwyZfWxe/wW3R7FUMgaM6x37zC5vj5bLrHWKa9PLYX0U68JpPdqa8RN8Cgg=@vger.kernel.org X-Gm-Message-State: AOJu0YwmwyZM6LBqWnrmeFR5eati7ATSUPXxxuHVv3mSCvKQ/7DDd1MW cXeUf86suYqFMk8MMYCWZAs5naPjgMXY7tTus/i8uB+zCCYFISbRA1MgNq9Cwxx7MfeInS8voew 0Bbr+clDWTSzdJkV/4mMr/Af8KGRp9kx80dVB+/ai8E+H+Y9dLXmACcggG+Uc4w== X-Gm-Gg: Acq92OFyL+IOC4DdGZOZbRfFI2ulZbJ9O79GaKAU3HbWMUaqa4vw3ZwrKnWOpGYOuH5 gXI+qiBZqV5/YxKMPiaFFJDkllGxFE6KDn0EUbpJiJE9Sfb2eggVme69ycpCblsebzjyJ6X6a1E 2XoSrw1tqQo6LEFO7I3JiPkQW+BJcaA95V5MnJDLixeHCURV/jJTxEM+nlPGYLQNgKrSudYK2Q5 aMelPgsxD57mrZego0xuW13QJsuf5TZzrVIkWVJalP1h1ZpakGRxnwVgMaU+c5PQWC9Zq++edJu tbvn3JanXww67976cxfytdcrEbyIKE75gy3re+ljq2o1T7NlJYei0X6ImSUxXprky02scu1lSL6 O1XIQrbINSJBMgKDo8asxIJ2pIlVg571dOlI3HJy1x/Eg6TXuhxvVGoDVMs+ytIjB+AZDcU0c5q 6Bfg== X-Received: by 2002:a05:620a:1986:b0:911:ed:d285 with SMTP id af79cd13be357-911cf9e0c41mr3857316185a.62.1779310241573; Wed, 20 May 2026 13:50:41 -0700 (PDT) X-Received: by 2002:a05:620a:1986:b0:911:ed:d285 with SMTP id af79cd13be357-911cf9e0c41mr3857309585a.62.1779310241059; Wed, 20 May 2026 13:50:41 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:40 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:49:00 -0400 Subject: [PATCH RFC 09/11] fs: move dir_pages() from to Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-9-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=1369; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=rA3ee1bVN85AK8Xbqqgod0QTROlaUP31hnwYXO5EEik=; b=DCkZp86sCfuyRVBmRVmURwps78q7smwQhnDSiFfY5HPBUU3kJlF0UIoGU21dz9wQUTuZXJkNF UGqHDG78m7UBDAclN2Kez4FnTKNYWYrHvBGgRUTdHLcho2GLQ+5jpkl X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfXyFU9i8CPK+vy aHBlrq83X1fdfCR9pKulgl6UtvQHclL+2iaDl/mIg4DHrT1RbFDen1o9UkSv86fdj728VojHIbY qXRpb6JLo4lPSUgGvmuFqp4Z0GQpID7hsk3A0Qz9jew8vD/02ekOJzYDFR6e5rLzJ9N444OChnB MudDeO2+Sg02L/GxNCLr6r1xiXxNL8rQn0bFxy3wk79t87jBUgyPjJDTMSPC4RDRIyr6FwIio3A irXvPjda9jzwJNCytm6YLzLCWvoV/hcAZv2QPIf0aNKk1knhClUnTYS25z9V1JPEDv3PvXQkeb0 OskQ/6a4PWvfuR+aX0z4bih3OhPV0X60V5qZLqs/OYUpBLiFmwr/Fr29/mBd1WPpoAFKVXvavBl ew5QnA3IfG8cl3h4oe8WGo/iY3X92fvA6JOB74XsQRQFPlA/N3VBLEssFul1XWjUe0EWnpx0lv6 NQiiQyr6vnKRyThJf/g== X-Authority-Analysis: v=2.4 cv=TsDWQjXh c=1 sm=1 tr=0 ts=6a0e1ea2 cx=c_pps a=hnmNkyzTK/kJ09Xio7VxxA==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=QOCMdifcju39GKoXhKua:22 a=8LwvE8iKX1rSyJpzT4YA:9 a=QEXdDO2ut3YA:10 a=PEH46H7Ffwr30OY-TuGO:22 X-Proofpoint-GUID: 4C9KsNmECyf3RsheoxFqDKZxoSnmX__I X-Proofpoint-ORIG-GUID: 4C9KsNmECyf3RsheoxFqDKZxoSnmX__I X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=10 suspectscore=0 phishscore=0 lowpriorityscore=10 adultscore=0 spamscore=0 clxscore=1015 malwarescore=0 priorityscore=1501 bulkscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 This is an inode-based helper and should live with other inode helpers. Signed-off-by: Tal Zussman --- include/linux/fs.h | 6 ++++++ include/linux/pagemap.h | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index cd5088dfe9a1..776cc82932a7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1171,6 +1171,12 @@ static inline void i_size_write(struct inode *inode,= loff_t i_size) #endif } =20 +static inline unsigned long dir_pages(const struct inode *inode) +{ + return (unsigned long)(inode->i_size + PAGE_SIZE - 1) >> + PAGE_SHIFT; +} + static inline unsigned iminor(const struct inode *inode) { return MINOR(inode->i_rdev); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 84ccb682cca8..f86a550ad516 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1356,12 +1356,6 @@ static inline size_t readahead_batch_length(const st= ruct readahead_control *rac) return rac->_batch_count * PAGE_SIZE; } =20 -static inline unsigned long dir_pages(const struct inode *inode) -{ - return (unsigned long)(inode->i_size + PAGE_SIZE - 1) >> - PAGE_SHIFT; -} - /** * folio_mkwrite_check_truncate - check if folio was truncated * @folio: the folio to check --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 322CD3769F3 for ; Wed, 20 May 2026 20:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310248; cv=none; b=lxybQCG03pPo/kZr8N6NoFnp1B1haBGfatu2bQ5Lnc3LhrcZXxG7ftTtlXombk+CSkXMMfQshLHj+ci6VPcmRCN1RXTfIApc9LLmHh+Af8wa74HpfDrX5N2mPsfic2xkARJvCasDXyiKbQNr0yRyTSkTR7HfaomPeAX2Bw5y0U0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310248; c=relaxed/simple; bh=sWXCb2voUNh/5RgXFZA32Pq1QXHhhJuRnYDZVKm8gu0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Ak9Rh5xZwl4n+iIzQMCdEOj3y4RUkb7Ir18ojVfafLbP+1cUTz11FgEfn0v5cb+2dkhYkfS+/LOqN8AAMtkVSDS3g14mmLHs6f0asbYg0VWZXvC4h+VXvDLJ6fpWlL4ivwPmdEz3TrN2SYoKmDb2ylM/RupN8y2lIb28r12d3sQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=RaMC69iL; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="RaMC69iL" Received: from pps.filterd (m0499198.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKO5uN2013232 for ; Wed, 20 May 2026 16:50:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=NuYZ UEpkdlDPA/MoPFg1KYmaCTdNZxzaMhmmXnS7n5Y=; b=RaMC69iL96yzOK/lRDf+ 5JstGQzPJTY3tGXJTW72lq1IMm7QXNtfTzTcW2dJzro8iQN+IyGFfOMigEK5kXJC TkrRPLlM6LLPe9Qif+6bs3GDBj8mFTAx3sj1CYxGfo0lwJ12gOLSKkDssy69M3pY kO3SoZGr2BJkk0iYM1n0FLY1CbFqNx1ykfaoWTx5Gb/OxbWLZcWDyIt9kLP3mzNJ cSuJlBH9j4SixMkwpaTYSSyeuOORn178xTCgNO7LWnmh8yB6rbXufNbbbx689Wdj /ONqFgUH4ENnnf762oQfQfPGjfmIXn9BhZXTOIt21neOwtoYd6cH77WqV6mIabu4 Rg== Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e9fdn2mvd-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:43 -0400 (EDT) Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-8b46c014a26so166734256d6.0 for ; Wed, 20 May 2026 13:50:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310243; x=1779915043; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=NuYZUEpkdlDPA/MoPFg1KYmaCTdNZxzaMhmmXnS7n5Y=; b=d8NJhct17bbdjW8S/rFO/Bn/lL7c5Jx/om3SNaJWbop9HOBT3ntMZfC8NG/G2ACV3A +BZ2Ag1Szvo3XOz/Zrma3wEwShCy1kR8Yn3W/mBfuqct4YZ4auNkiXvrVYK/rXU4VDU3 w1JU0PgdOjgdNl4Hc5+jfgfN1DtbuvsAh/hFEYZHQpemTkYkZwfUHqgZC7QNxKZHKXCb iy1z4bkt9enSnQqzAu23kO1q89Fpu2lfEQHZIVrZBTTWvVmX9uzO3GEinWFoQdfjQD0e l2+H5D+woscjCXyra/5TUEz3hdpRqW7gQqICslk/5c1/6WB+oE3a31lWr9D2MWzbudyJ 1Ntw== X-Forwarded-Encrypted: i=1; AFNElJ8Aa7HIEhLUoeu0IWsNaz1Wz27g4dDxF+h2HW/kFwoVOpiAiDtcZ7O7Lj44nkGQSdC9ONMIJY9wwzdAZyY=@vger.kernel.org X-Gm-Message-State: AOJu0YxJXLghZfiSEMTV1jkbL2Van7UvAW03+n1y9Kyci0kftWxRLyCd YC/Z0AWp8moamE1PEM/fBOHHnx3Cartadax5sBghHWuGSDFZ/L3N58+U+jVQgx4ckPbUF2TqhTC Sr4eKCaQgXYeLIC1Rei1i7I7MHXFiblRFBbkHGQPaA2I5dTZ0Xvj81I9vviaGmw== X-Gm-Gg: Acq92OEq2H3HFsXJGiHeoMiqVxgYcA3cNxaY1Hvgr0o23gdEQS6aZ5e0SogtLPNWCIE LGTBbDU4oDCEZU3+cpxzbihU1WI7WhZP4FMEm1YW3zrbrnCsf7IBk7b4nF3adv0B8WR+SOdD+6b xGBHak6oU6gXrIT0jJwn49MMv/GV5yfkv+0G9z4KUdFOqqyEvqH/a7qrEgXXNvXANFNT4a7a5Hp PPQiQEF4Bh4WOqUd8BSZBo23j+I9iLHHy457HfPCStbb+Ml8XNU7d/K8ZjHr2KNBKb8C49E1dk3 6/YeX7AKyv9BoAuM0zDRfUqQGxfXb5ay/lvcQNpvddh9cm94564pGOs7CQCq7IvupY09xQ029fD AM70OU62bQ0SDIH4ZR33xw5gHaZwNe92nryCNQMqc6mgctyVopBkksyiwqwrhV5vGP7SVJeMOfe qROg== X-Received: by 2002:a05:620a:258f:b0:8ed:dc5a:f668 with SMTP id af79cd13be357-911d00b7d27mr3703468085a.58.1779310242784; Wed, 20 May 2026 13:50:42 -0700 (PDT) X-Received: by 2002:a05:620a:258f:b0:8ed:dc5a:f668 with SMTP id af79cd13be357-911d00b7d27mr3703462585a.58.1779310242221; Wed, 20 May 2026 13:50:42 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:41 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:49:01 -0400 Subject: [PATCH RFC 10/11] fs: move generic_file_read_iter() to fs/read_write.c Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-10-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=8508; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=sWXCb2voUNh/5RgXFZA32Pq1QXHhhJuRnYDZVKm8gu0=; b=c4SVT0/EKyeLALT++GoCtxlOHEW6EtdwGs6LfnqVwX3PEcOVGXnxyAlvuksJ5kGk//rUF733e +mKRneLOBHUCwxRiU2NqVzX9ZiVWHPnX8ZDbFnwJYl64fBDuIVt5eIF X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: aeLt8iQgCiqoEy1ybaKCL8MCakhCun7H X-Authority-Analysis: v=2.4 cv=P/4KQCAu c=1 sm=1 tr=0 ts=6a0e1ea3 cx=c_pps a=oc9J++0uMp73DTRD5QyR2A==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=BpGzv1V74M3SfeTrGa8v:22 a=gH0tmTfNonLjgA-2hfQA:9 a=QEXdDO2ut3YA:10 a=iYH6xdkBrDN1Jqds4HTS:22 X-Proofpoint-ORIG-GUID: aeLt8iQgCiqoEy1ybaKCL8MCakhCun7H X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX2I+uxZFo1vDv 1oLcgqBRyIHSQUMd4HZhBecJAQTBmHqu++Tsbi41IbSM5F3oZf7cAwqaaQDmA1ZtFoQkWCvxIZz YKNXil22ySeIJJGu+0Do9aGZWk/eOCgy7GceT9nDQ4xz0zNxhdprHv/8MI0PAu21RzYY4mJK6TE Z+LwL70bW/junNkiXJdwW6gfNo1iyDj2knOJkGEy5MQkMDvyvwEDr1+vK9EIgzf+sYR8XLkqHSP gVdkEhJUqhuJW0Joor+IrJtdxZrhrP0Uw/YWHjVK3p/usm0T6hw+gFNlCaX4Uxpmt2VaC8LNoWc hGIlVxogqzBKwAzMUumkBrrLQ+skYsiRt+8V4PbrOcg7S1aNLu/lPGWrQxcsArSRGXTR4zgSscn c3bJ1i58Mn1yQIHBSVG2bOCAm2MhuAuyU8x5HIppCikvXyJQEU3yTBzMjIcx6IRMMQMVaBoBogh bMdRXDj0ofrPLigLmHg== X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 suspectscore=0 impostorscore=10 malwarescore=0 phishscore=0 adultscore=0 lowpriorityscore=10 spamscore=0 priorityscore=1501 bulkscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 generic_file_read_iter() and its kiocb_write_and_wait() helper are VFS-level read functions: Their callers are filesystems, and their job is to glue direct I/O or the page cache (filemap_read) to a struct kiocb and iov_iter caller. Move both to fs/read_write.c, alongside vfs_iter_read. Drop the extern from generic_file_read_iter()'s declaration and reflow the generic_file_read_iter() definition to fit on one line too. Signed-off-by: Tal Zussman --- fs/read_write.c | 82 +++++++++++++++++++++++++++++++++++++++++++++= ++++ include/linux/fs.h | 3 +- include/linux/pagemap.h | 1 - mm/filemap.c | 82 ---------------------------------------------= ---- 4 files changed, 84 insertions(+), 84 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 50bff7edc91f..59ceea85c163 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -989,6 +989,88 @@ ssize_t vfs_iter_write(struct file *file, struct iov_i= ter *iter, loff_t *ppos, } EXPORT_SYMBOL(vfs_iter_write); =20 +int kiocb_write_and_wait(struct kiocb *iocb, size_t count) +{ + struct address_space *mapping =3D iocb->ki_filp->f_mapping; + loff_t pos =3D iocb->ki_pos; + loff_t end =3D pos + count - 1; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (filemap_range_needs_writeback(mapping, pos, end)) + return -EAGAIN; + return 0; + } + + return filemap_write_and_wait_range(mapping, pos, end); +} +EXPORT_SYMBOL_GPL(kiocb_write_and_wait); + +/** + * generic_file_read_iter - generic filesystem read routine + * @iocb: kernel I/O control block + * @iter: destination for the data read + * + * This is the "read_iter()" routine for all filesystems + * that can use the page cache directly. + * + * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall + * be returned when no data can be read without waiting for I/O requests + * to complete; it doesn't prevent readahead. + * + * The IOCB_NOIO flag in iocb->ki_flags indicates that no new I/O + * requests shall be made for the read or for readahead. When no data + * can be read, -EAGAIN shall be returned. When readahead would be + * triggered, a partial, possibly empty read shall be returned. + * + * Return: + * * number of bytes copied, even for partial reads + * * negative error code (or 0 if IOCB_NOIO) if nothing was read + */ +ssize_t generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + size_t count =3D iov_iter_count(iter); + ssize_t retval =3D 0; + + if (!count) + return 0; /* skip atime */ + + if (iocb->ki_flags & IOCB_DIRECT) { + struct file *file =3D iocb->ki_filp; + struct address_space *mapping =3D file->f_mapping; + struct inode *inode =3D mapping->host; + + retval =3D kiocb_write_and_wait(iocb, count); + if (retval < 0) + return retval; + file_accessed(file); + + retval =3D mapping->a_ops->direct_IO(iocb, iter); + if (retval >=3D 0) { + iocb->ki_pos +=3D retval; + count -=3D retval; + } + if (retval !=3D -EIOCBQUEUED) + iov_iter_revert(iter, count - iov_iter_count(iter)); + + /* + * Btrfs can have a short DIO read if we encounter + * compressed extents, so if there was an error, or if + * we've already read everything we wanted to, or if + * there was a short read because we hit EOF, go ahead + * and return. Otherwise fallthrough to buffered io for + * the rest of the read. Buffered reads will not work for + * DAX files, so don't bother trying. + */ + if (retval < 0 || !count || IS_DAX(inode)) + return retval; + if (iocb->ki_pos >=3D i_size_read(inode)) + return retval; + } + + return filemap_read(iocb, iter, retval); +} +EXPORT_SYMBOL(generic_file_read_iter); + static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec, unsigned long vlen, loff_t *pos, rwf_t flags) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 776cc82932a7..c0151ced8e7a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3055,7 +3055,8 @@ extern int generic_write_check_limits(struct file *fi= le, loff_t pos, extern int generic_file_rw_checks(struct file *file_in, struct file *file_= out); ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *to, ssize_t already_read); -extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); +ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); +int kiocb_write_and_wait(struct kiocb *iocb, size_t count); extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *= ); extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *= ); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f86a550ad516..46cefd552a51 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -59,7 +59,6 @@ int filemap_fdatawrite_range(struct address_space *mappin= g, loff_t start, loff_t end); int filemap_check_errors(struct address_space *mapping); void __filemap_set_wb_err(struct address_space *mapping, int err); -int kiocb_write_and_wait(struct kiocb *iocb, size_t count); =20 static inline int filemap_write_and_wait(struct address_space *mapping) { diff --git a/mm/filemap.c b/mm/filemap.c index 079f9c3ac8a2..db7c53cd681b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2251,22 +2251,6 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_= iter *iter, } EXPORT_SYMBOL_GPL(filemap_read); =20 -int kiocb_write_and_wait(struct kiocb *iocb, size_t count) -{ - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - loff_t pos =3D iocb->ki_pos; - loff_t end =3D pos + count - 1; - - if (iocb->ki_flags & IOCB_NOWAIT) { - if (filemap_range_needs_writeback(mapping, pos, end)) - return -EAGAIN; - return 0; - } - - return filemap_write_and_wait_range(mapping, pos, end); -} -EXPORT_SYMBOL_GPL(kiocb_write_and_wait); - int filemap_invalidate_pages(struct address_space *mapping, loff_t pos, loff_t end, bool nowait) { @@ -2302,72 +2286,6 @@ int kiocb_invalidate_pages(struct kiocb *iocb, size_= t count) } EXPORT_SYMBOL_GPL(kiocb_invalidate_pages); =20 -/** - * generic_file_read_iter - generic filesystem read routine - * @iocb: kernel I/O control block - * @iter: destination for the data read - * - * This is the "read_iter()" routine for all filesystems - * that can use the page cache directly. - * - * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall - * be returned when no data can be read without waiting for I/O requests - * to complete; it doesn't prevent readahead. - * - * The IOCB_NOIO flag in iocb->ki_flags indicates that no new I/O - * requests shall be made for the read or for readahead. When no data - * can be read, -EAGAIN shall be returned. When readahead would be - * triggered, a partial, possibly empty read shall be returned. - * - * Return: - * * number of bytes copied, even for partial reads - * * negative error code (or 0 if IOCB_NOIO) if nothing was read - */ -ssize_t -generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) -{ - size_t count =3D iov_iter_count(iter); - ssize_t retval =3D 0; - - if (!count) - return 0; /* skip atime */ - - if (iocb->ki_flags & IOCB_DIRECT) { - struct file *file =3D iocb->ki_filp; - struct address_space *mapping =3D file->f_mapping; - struct inode *inode =3D mapping->host; - - retval =3D kiocb_write_and_wait(iocb, count); - if (retval < 0) - return retval; - file_accessed(file); - - retval =3D mapping->a_ops->direct_IO(iocb, iter); - if (retval >=3D 0) { - iocb->ki_pos +=3D retval; - count -=3D retval; - } - if (retval !=3D -EIOCBQUEUED) - iov_iter_revert(iter, count - iov_iter_count(iter)); - - /* - * Btrfs can have a short DIO read if we encounter - * compressed extents, so if there was an error, or if - * we've already read everything we wanted to, or if - * there was a short read because we hit EOF, go ahead - * and return. Otherwise fallthrough to buffered io for - * the rest of the read. Buffered reads will not work for - * DAX files, so don't bother trying. - */ - if (retval < 0 || !count || IS_DAX(inode)) - return retval; - if (iocb->ki_pos >=3D i_size_read(inode)) - return retval; - } - - return filemap_read(iocb, iter, retval); -} -EXPORT_SYMBOL(generic_file_read_iter); =20 /* * Splice subpages from a folio into a pipe. --=20 2.39.5 From nobody Sun May 24 22:35:55 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AD76378D7D for ; Wed, 20 May 2026 20:50:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310250; cv=none; b=Ra1+pXxR9kbrVz5ANfxFs8sZ+/aKx+DofdOLSX2nxNUKoR9VvhnGK76frSBO3rayimBSW/LtVPMsS5bNhvKzcSizh8dK1xuakm6xa8BEEpjwTS2f5YzWck+8WwfT76Suy0iaAq6NmLEtOCqe2msSFp9AW8BOqXmpoYVtKKhT/Ag= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779310250; c=relaxed/simple; bh=yTsT4MTanmPV7BjuSqLOqTfpIoxcx3ADCk+oaTv3+ZQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=HQhqQoJ7iJQxMvXGftpMHRSqQuDos8h1pcFsdW/fpEMY7T61zuY7Y6GgiLQJ3sxrkTFwWC0tkiLfkz7jHqD656o5SJG9yM4e6o3sI3iQTBaCO8ugGdfPpGhaZFeNaSGBUWlNnMjWzyXM1uXGh5rRBovO5RFmQxRSy/GfwlFgGcg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=eki62Gi6; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="eki62Gi6" Received: from pps.filterd (m0167076.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64KKO084506329 for ; Wed, 20 May 2026 16:50:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=5DYC zqTKbvdRmrDK6yQ7IbYekrJTixKptsXZVS++o38=; b=eki62Gi6Oexp6jbg0mKB +gAM4I9jH4VaSKpxJCUTq5VOpozJJuJEmsBIs1ldLyi+yLxf5Hej4VGO/Rm79ogi +lGM7nZwTZcEhSVlPH2b4j3DW2XQtVlZfxaTMqHe6HX0R9B33UVKI6Ch9WH/aaUQ /Nd3NrjeKxAgNv3LXFUeLq0RsN+M892JUxkAFcLGaekU5fNmOrLgfABzJSJayjI2 8vIGmYeIiKBMNyBP4XlLsOzqAS8j5rDWnVmWRqLyThLZQfOzij6SEXn8H7XtLOvL g8s4FquNdM6Pizj4XVpgHW8jUFG0aEz+cTOq8E2xflA4i5agkq7SX0iQbwF0R+pi Mw== Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4e9a0bd1ty-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 20 May 2026 16:50:44 -0400 (EDT) Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-90d7b3406b2so1288421385a.3 for ; Wed, 20 May 2026 13:50:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779310244; x=1779915044; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=5DYCzqTKbvdRmrDK6yQ7IbYekrJTixKptsXZVS++o38=; b=JFLzDNDl90l/rTPcBJRcAFW1YQ/adKE/h6BIB6uAnWRJE3tikYDPtEBGYztoTX9M6s 0638TpYmTrIDX706rgsgMMiDIN7rEkYTYyowharH3OmHBAbXEkk9qa+Uye3UqPANxsos QxmwNECAd2+cIUraThJi5AraHizuGf5UgCyo8+zNp1PPg9KfBICnwrtVzAZBWtElXrw/ Z+TP3NddKTmHZk7Egq69oo2Tf6OjDWQY5kV5VqA/2fr/ZeoKn6OBi7EWI6a6B5so9ON8 5IvZERf2jjy13nsaO0Rbs9xallwFWqUwY3E7N8XsJo/Avh++DDaKgvQpg9janyyNI8vL kcgA== X-Forwarded-Encrypted: i=1; AFNElJ+vlDDwiO+TfUYBmTZPR8w5wP/arKSq5Ng2YlOW3XAjN4EwpNerle6XW7HaT/UnjfHHWuVqLqVsj3d3A0o=@vger.kernel.org X-Gm-Message-State: AOJu0YxZaS4/XZ7N+aoddwk1DrgdC54plVCezHTzGLChS4wXDvD1BcjM +Gr/qlDWaAAX4dqZcqNidPX7CebSOoG+MyHr7R9WRyTawO0qcGnRCSjk+dbKPcmBQbK2OaMRReY bn7kYG1OY8b/Vn5jjQsusb/ECfAP/FDLJJeU9kscZCKubmTAkAENyd3/q4jYixA== X-Gm-Gg: Acq92OEaRRK7+RnRFqLMJ3zO1B5DpsqkLvgUqQdyWMGlZuWfQSlXXTHKEzFyMef2JAZ pX7YqnVaJbHwfkZfRkBuoCX2zszn31r6s+NFkMWVCt3dxUco7bLgt5fINf1sXmJdzah8kBlp6Hd U1x9D7xjMFcggj57LbFh4f19Wsl2QMNjtHaNlJovLDO/9A60D+08v+FkLwMW6NSXTHfNIibEGTo RbhUONWIoQJ9R+obNT99ctK0Kqb2rLznlXVJLq0U88q2++nIr+qlN9yKD3uVI0pb/VL43Tre20s T1/IiRiSYtTilA35QJfnrG9ivSY785Q34Iluso0K7e5oumCD6KfqJC6qf3COyLaYlB2rQPA7dQA R34XrAka6ItoVMOwUa9+l0N7R3CXTCBx3Pd3Incf0BnAnp9oigdHHnRVCMhD0AmKtbVUg3b5Rcm MleA== X-Received: by 2002:a05:620a:468f:b0:8cd:b70b:fd00 with SMTP id af79cd13be357-911ce330dd6mr4090125385a.16.1779310244013; Wed, 20 May 2026 13:50:44 -0700 (PDT) X-Received: by 2002:a05:620a:468f:b0:8cd:b70b:fd00 with SMTP id af79cd13be357-911ce330dd6mr4090119085a.16.1779310243267; Wed, 20 May 2026 13:50:43 -0700 (PDT) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id af79cd13be357-910bcf37274sm2232692085a.37.2026.05.20.13.50.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 13:50:42 -0700 (PDT) From: Tal Zussman Date: Wed, 20 May 2026 16:49:02 -0400 Subject: [PATCH RFC 11/11] fs: move generic_file_write_iter() family to fs/read_write.c Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260520-filemap-split-v1-11-c36ddc2b6cf2@columbia.edu> References: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> In-Reply-To: <20260520-filemap-split-v1-0-c36ddc2b6cf2@columbia.edu> To: "Matthew Wilcox (Oracle)" , Jan Kara , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexander Viro , Christian Brauner , Jens Axboe Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1779310229; l=21055; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=yTsT4MTanmPV7BjuSqLOqTfpIoxcx3ADCk+oaTv3+ZQ=; b=pnZtz23UD0UmzGK1JF55M8gJyFr39Fbr88g9tveN1RrBZttxjf03za+ECw+z8h1q8W3XU4Gc6 mLEEtbgMzn2CtlWhj/fgZUlRAzRNrQvC3nKL7GTyYq12T29/MeCqrNo X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: L6hCOdgiSm_yT_Z9Z3Jsd_mlOs5IrndO X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIwMDIwMyBTYWx0ZWRfX3ch5CSN3eQcG X2Y74IYxcEAKkk7TDKmgYc7TqoDMw4Y2rcSkKfZMBRbAXkVrVQPYG6gckwDc14pjqRq/3UwP4nD /y2BHnlElUaQl4KefAsTl0CXhcCC9R19GOS9ToYyXRayrus9UvWPZVaMnZ2thCOZfqxv0HEiACv RutBYavgYpzOeMjNnGEQs6OVwroPRCyO+aFJu9k6jADSVJY8DlwXUvtv6rsoIInHKzeo6yCbj4+ YNtyetyx4JfqB+GYO5ANyEyvmz9X61xRT/YM1UXLDRGA6moDWdskRKCadgOygz5/i91PfbOrff5 LJAqyElWBwP/w/z5Ip75KkltqZYUsk7sCqU3SeKuIH7RQLFV07mqiv9ZsFBjVQHHuVVlYonT6cu VWw/6/zs3r9IVkEkfPBoivAcEpnT7n7oLfFQ2raCEnq89dSMhwyn1wCspe2i2qplOuVyORg6yl8 lvC4TSl4QLEqP6aFvfA== X-Proofpoint-ORIG-GUID: L6hCOdgiSm_yT_Z9Z3Jsd_mlOs5IrndO X-Authority-Analysis: v=2.4 cv=KLJqylFo c=1 sm=1 tr=0 ts=6a0e1ea4 cx=c_pps a=hnmNkyzTK/kJ09Xio7VxxA==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=Qm0qsxP7aFY2tkT6R2MF:22 a=t4iOdaPQC35MTOVVYvEA:9 a=QEXdDO2ut3YA:10 a=PEH46H7Ffwr30OY-TuGO:22 X-Proofpoint-Virus-Version: vendor=nai engine=6900 definitions=11792 signatures=596817 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=10 priorityscore=1501 malwarescore=0 adultscore=0 clxscore=1015 bulkscore=10 phishscore=0 suspectscore=0 lowpriorityscore=10 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605130000 definitions=main-2605200203 Move the VFS-level generic write path out of mm/filemap.c into fs/read_write.c next to the just-relocated read path: - generic_file_write_iter() - __generic_file_write_iter() - generic_file_direct_write() - generic_perform_write() - kiocb_invalidate_pages() - kiocb_invalidate_post_direct_write() - dio_warn_stale_pagecache() The kiocb_invalidate_* prototypes move from to , joining kiocb_write_and_wait() and the other generic read/write declarations. Drop extern from the prototypes of all five generic_file_* declarations in . Reflow the generic_file_direct_write() definition to fit on one line. Signed-off-by: Tal Zussman --- fs/read_write.c | 276 ++++++++++++++++++++++++++++++++++++++++++++= +++ include/linux/fs.h | 8 +- include/linux/pagemap.h | 2 - mm/filemap.c | 277 --------------------------------------------= ---- 4 files changed, 281 insertions(+), 282 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 59ceea85c163..cea5f79fdacf 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1071,6 +1071,282 @@ ssize_t generic_file_read_iter(struct kiocb *iocb, = struct iov_iter *iter) } EXPORT_SYMBOL(generic_file_read_iter); =20 +int kiocb_invalidate_pages(struct kiocb *iocb, size_t count) +{ + struct address_space *mapping =3D iocb->ki_filp->f_mapping; + + return filemap_invalidate_pages(mapping, iocb->ki_pos, + iocb->ki_pos + count - 1, + iocb->ki_flags & IOCB_NOWAIT); +} +EXPORT_SYMBOL_GPL(kiocb_invalidate_pages); + +/* + * Warn about a page cache invalidation failure during a direct I/O write. + */ +static void dio_warn_stale_pagecache(struct file *filp) +{ + static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST); + char pathname[128]; + char *path; + + errseq_set(&filp->f_mapping->wb_err, -EIO); + if (__ratelimit(&_rs)) { + path =3D file_path(filp, pathname, sizeof(pathname)); + if (IS_ERR(path)) + path =3D "(unknown)"; + pr_crit("Page cache invalidation failure on direct I/O. Possible data c= orruption due to collision with buffered I/O!\n"); + pr_crit("File: %s PID: %d Comm: %.20s\n", path, current->pid, + current->comm); + } +} + +void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count) +{ + struct address_space *mapping =3D iocb->ki_filp->f_mapping; + + if (mapping->nrpages && + invalidate_inode_pages2_range(mapping, + iocb->ki_pos >> PAGE_SHIFT, + (iocb->ki_pos + count - 1) >> PAGE_SHIFT)) + dio_warn_stale_pagecache(iocb->ki_filp); +} + +ssize_t generic_file_direct_write(struct kiocb *iocb, struct iov_iter *fro= m) +{ + struct address_space *mapping =3D iocb->ki_filp->f_mapping; + size_t write_len =3D iov_iter_count(from); + ssize_t written; + + /* + * If a page can not be invalidated, return 0 to fall back + * to buffered write. + */ + written =3D kiocb_invalidate_pages(iocb, write_len); + if (written) { + if (written =3D=3D -EBUSY) + return 0; + return written; + } + + written =3D mapping->a_ops->direct_IO(iocb, from); + + /* + * Finally, try again to invalidate clean pages which might have been + * cached by non-direct readahead, or faulted in by get_user_pages() + * if the source of the write was an mmap'ed region of the file + * we're writing. Either one is a pretty crazy thing to do, + * so we don't support it 100%. If this invalidation + * fails, tough, the write still worked... + * + * Most of the time we do not need this since dio_complete() will do + * the invalidation for us. However there are some file systems that + * do not end up with dio_complete() being called, so let's not break + * them by removing it completely. + * + * Noticeable example is a blkdev_direct_IO(). + * + * Skip invalidation for async writes or if mapping has no pages. + */ + if (written > 0) { + struct inode *inode =3D mapping->host; + loff_t pos =3D iocb->ki_pos; + + kiocb_invalidate_post_direct_write(iocb, written); + pos +=3D written; + write_len -=3D written; + if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) { + i_size_write(inode, pos); + mark_inode_dirty(inode); + } + iocb->ki_pos =3D pos; + } + if (written !=3D -EIOCBQUEUED) + iov_iter_revert(from, write_len - iov_iter_count(from)); + return written; +} +EXPORT_SYMBOL(generic_file_direct_write); + +ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) +{ + struct file *file =3D iocb->ki_filp; + loff_t pos =3D iocb->ki_pos; + struct address_space *mapping =3D file->f_mapping; + const struct address_space_operations *a_ops =3D mapping->a_ops; + size_t chunk =3D mapping_max_folio_size(mapping); + long status =3D 0; + ssize_t written =3D 0; + + do { + struct folio *folio; + size_t offset; /* Offset into folio */ + size_t bytes; /* Bytes to write to folio */ + size_t copied; /* Bytes copied from user */ + void *fsdata =3D NULL; + + bytes =3D iov_iter_count(i); +retry: + offset =3D pos & (chunk - 1); + bytes =3D min(chunk - offset, bytes); + balance_dirty_pages_ratelimited(mapping); + + if (fatal_signal_pending(current)) { + status =3D -EINTR; + break; + } + + status =3D a_ops->write_begin(iocb, mapping, pos, bytes, + &folio, &fsdata); + if (unlikely(status < 0)) + break; + + offset =3D offset_in_folio(folio, pos); + if (bytes > folio_size(folio) - offset) + bytes =3D folio_size(folio) - offset; + + if (mapping_writably_mapped(mapping)) + flush_dcache_folio(folio); + + /* + * Faults here on mmap()s can recurse into arbitrary + * filesystem code. Lots of locks are held that can + * deadlock. Use an atomic copy to avoid deadlocking + * in page fault handling. + */ + copied =3D copy_folio_from_iter_atomic(folio, offset, bytes, i); + flush_dcache_folio(folio); + + status =3D a_ops->write_end(iocb, mapping, pos, bytes, copied, + folio, fsdata); + if (unlikely(status !=3D copied)) { + iov_iter_revert(i, copied - max(status, 0L)); + if (unlikely(status < 0)) + break; + } + cond_resched(); + + if (unlikely(status =3D=3D 0)) { + /* + * A short copy made ->write_end() reject the + * thing entirely. Might be memory poisoning + * halfway through, might be a race with munmap, + * might be severe memory pressure. + */ + if (chunk > PAGE_SIZE) + chunk /=3D 2; + if (copied) { + bytes =3D copied; + goto retry; + } + + /* + * 'folio' is now unlocked and faults on it can be + * handled. Ensure forward progress by trying to + * fault it in now. + */ + if (fault_in_iov_iter_readable(i, bytes) =3D=3D bytes) { + status =3D -EFAULT; + break; + } + } else { + pos +=3D status; + written +=3D status; + } + } while (iov_iter_count(i)); + + if (!written) + return status; + iocb->ki_pos +=3D written; + return written; +} +EXPORT_SYMBOL(generic_perform_write); + +/** + * __generic_file_write_iter - write data to a file + * @iocb: IO state structure (file, offset, etc.) + * @from: iov_iter with data to write + * + * This function does all the work needed for actually writing data to a + * file. It does all basic checks, removes SUID from the file, updates + * modification times and calls proper subroutines depending on whether we + * do direct IO or a standard buffered write. + * + * It expects i_rwsem to be grabbed unless we work on a block device or si= milar + * object which does not need locking at all. + * + * This function does *not* take care of syncing data in case of O_SYNC wr= ite. + * A caller has to handle it. This is mainly due to the fact that we want = to + * avoid syncing under i_rwsem. + * + * Return: + * * number of bytes written, even for truncated writes + * * negative error code if no data has been written at all + */ +ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *fro= m) +{ + struct file *file =3D iocb->ki_filp; + struct address_space *mapping =3D file->f_mapping; + struct inode *inode =3D mapping->host; + ssize_t ret; + + ret =3D file_remove_privs(file); + if (ret) + return ret; + + ret =3D file_update_time(file); + if (ret) + return ret; + + if (iocb->ki_flags & IOCB_DIRECT) { + ret =3D generic_file_direct_write(iocb, from); + /* + * If the write stopped short of completing, fall back to + * buffered writes. Some filesystems do this for writes to + * holes, for example. For DAX files, a buffered write will + * not succeed (even if it did, DAX does not handle dirty + * page-cache pages correctly). + */ + if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode)) + return ret; + return direct_write_fallback(iocb, from, ret, + generic_perform_write(iocb, from)); + } + + return generic_perform_write(iocb, from); +} +EXPORT_SYMBOL(__generic_file_write_iter); + +/** + * generic_file_write_iter - write data to a file + * @iocb: IO state structure + * @from: iov_iter with data to write + * + * This is a wrapper around __generic_file_write_iter() to be used by most + * filesystems. It takes care of syncing the file in case of O_SYNC file + * and acquires i_rwsem as needed. + * Return: + * * negative error code if no data has been written at all of + * vfs_fsync_range() failed for a synchronous write + * * number of bytes written, even for truncated writes + */ +ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file =3D iocb->ki_filp; + struct inode *inode =3D file->f_mapping->host; + ssize_t ret; + + inode_lock(inode); + ret =3D generic_write_checks(iocb, from); + if (ret > 0) + ret =3D __generic_file_write_iter(iocb, from); + inode_unlock(inode); + + if (ret > 0) + ret =3D generic_write_sync(iocb, ret); + return ret; +} +EXPORT_SYMBOL(generic_file_write_iter); + static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec, unsigned long vlen, loff_t *pos, rwf_t flags) { diff --git a/include/linux/fs.h b/include/linux/fs.h index c0151ced8e7a..6cfb9e46bc37 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3057,9 +3057,11 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_= iter *to, ssize_t already_read); ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); int kiocb_write_and_wait(struct kiocb *iocb, size_t count); -extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *= ); -extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); -extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *= ); +int kiocb_invalidate_pages(struct kiocb *iocb, size_t count); +void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count); +ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *); +ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); +ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *); ssize_t generic_perform_write(struct kiocb *, struct iov_iter *); ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter, ssize_t direct_written, ssize_t buffered_written); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 46cefd552a51..b7c2dc8076ab 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -31,8 +31,6 @@ static inline void invalidate_remote_inode(struct inode *= inode) int invalidate_inode_pages2(struct address_space *mapping); int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end); -int kiocb_invalidate_pages(struct kiocb *iocb, size_t count); -void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count); int filemap_invalidate_pages(struct address_space *mapping, loff_t pos, loff_t end, bool nowait); =20 diff --git a/mm/filemap.c b/mm/filemap.c index db7c53cd681b..284c0296a011 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2276,17 +2276,6 @@ int filemap_invalidate_pages(struct address_space *m= apping, end >> PAGE_SHIFT); } =20 -int kiocb_invalidate_pages(struct kiocb *iocb, size_t count) -{ - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - - return filemap_invalidate_pages(mapping, iocb->ki_pos, - iocb->ki_pos + count - 1, - iocb->ki_flags & IOCB_NOWAIT); -} -EXPORT_SYMBOL_GPL(kiocb_invalidate_pages); - - /* * Splice subpages from a folio into a pipe. */ @@ -3500,272 +3489,6 @@ struct page *read_cache_page_gfp(struct address_spa= ce *mapping, } EXPORT_SYMBOL(read_cache_page_gfp); =20 -/* - * Warn about a page cache invalidation failure during a direct I/O write. - */ -static void dio_warn_stale_pagecache(struct file *filp) -{ - static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST); - char pathname[128]; - char *path; - - errseq_set(&filp->f_mapping->wb_err, -EIO); - if (__ratelimit(&_rs)) { - path =3D file_path(filp, pathname, sizeof(pathname)); - if (IS_ERR(path)) - path =3D "(unknown)"; - pr_crit("Page cache invalidation failure on direct I/O. Possible data c= orruption due to collision with buffered I/O!\n"); - pr_crit("File: %s PID: %d Comm: %.20s\n", path, current->pid, - current->comm); - } -} - -void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count) -{ - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - - if (mapping->nrpages && - invalidate_inode_pages2_range(mapping, - iocb->ki_pos >> PAGE_SHIFT, - (iocb->ki_pos + count - 1) >> PAGE_SHIFT)) - dio_warn_stale_pagecache(iocb->ki_filp); -} - -ssize_t -generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from) -{ - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - size_t write_len =3D iov_iter_count(from); - ssize_t written; - - /* - * If a page can not be invalidated, return 0 to fall back - * to buffered write. - */ - written =3D kiocb_invalidate_pages(iocb, write_len); - if (written) { - if (written =3D=3D -EBUSY) - return 0; - return written; - } - - written =3D mapping->a_ops->direct_IO(iocb, from); - - /* - * Finally, try again to invalidate clean pages which might have been - * cached by non-direct readahead, or faulted in by get_user_pages() - * if the source of the write was an mmap'ed region of the file - * we're writing. Either one is a pretty crazy thing to do, - * so we don't support it 100%. If this invalidation - * fails, tough, the write still worked... - * - * Most of the time we do not need this since dio_complete() will do - * the invalidation for us. However there are some file systems that - * do not end up with dio_complete() being called, so let's not break - * them by removing it completely. - * - * Noticeable example is a blkdev_direct_IO(). - * - * Skip invalidation for async writes or if mapping has no pages. - */ - if (written > 0) { - struct inode *inode =3D mapping->host; - loff_t pos =3D iocb->ki_pos; - - kiocb_invalidate_post_direct_write(iocb, written); - pos +=3D written; - write_len -=3D written; - if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) { - i_size_write(inode, pos); - mark_inode_dirty(inode); - } - iocb->ki_pos =3D pos; - } - if (written !=3D -EIOCBQUEUED) - iov_iter_revert(from, write_len - iov_iter_count(from)); - return written; -} -EXPORT_SYMBOL(generic_file_direct_write); - -ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) -{ - struct file *file =3D iocb->ki_filp; - loff_t pos =3D iocb->ki_pos; - struct address_space *mapping =3D file->f_mapping; - const struct address_space_operations *a_ops =3D mapping->a_ops; - size_t chunk =3D mapping_max_folio_size(mapping); - long status =3D 0; - ssize_t written =3D 0; - - do { - struct folio *folio; - size_t offset; /* Offset into folio */ - size_t bytes; /* Bytes to write to folio */ - size_t copied; /* Bytes copied from user */ - void *fsdata =3D NULL; - - bytes =3D iov_iter_count(i); -retry: - offset =3D pos & (chunk - 1); - bytes =3D min(chunk - offset, bytes); - balance_dirty_pages_ratelimited(mapping); - - if (fatal_signal_pending(current)) { - status =3D -EINTR; - break; - } - - status =3D a_ops->write_begin(iocb, mapping, pos, bytes, - &folio, &fsdata); - if (unlikely(status < 0)) - break; - - offset =3D offset_in_folio(folio, pos); - if (bytes > folio_size(folio) - offset) - bytes =3D folio_size(folio) - offset; - - if (mapping_writably_mapped(mapping)) - flush_dcache_folio(folio); - - /* - * Faults here on mmap()s can recurse into arbitrary - * filesystem code. Lots of locks are held that can - * deadlock. Use an atomic copy to avoid deadlocking - * in page fault handling. - */ - copied =3D copy_folio_from_iter_atomic(folio, offset, bytes, i); - flush_dcache_folio(folio); - - status =3D a_ops->write_end(iocb, mapping, pos, bytes, copied, - folio, fsdata); - if (unlikely(status !=3D copied)) { - iov_iter_revert(i, copied - max(status, 0L)); - if (unlikely(status < 0)) - break; - } - cond_resched(); - - if (unlikely(status =3D=3D 0)) { - /* - * A short copy made ->write_end() reject the - * thing entirely. Might be memory poisoning - * halfway through, might be a race with munmap, - * might be severe memory pressure. - */ - if (chunk > PAGE_SIZE) - chunk /=3D 2; - if (copied) { - bytes =3D copied; - goto retry; - } - - /* - * 'folio' is now unlocked and faults on it can be - * handled. Ensure forward progress by trying to - * fault it in now. - */ - if (fault_in_iov_iter_readable(i, bytes) =3D=3D bytes) { - status =3D -EFAULT; - break; - } - } else { - pos +=3D status; - written +=3D status; - } - } while (iov_iter_count(i)); - - if (!written) - return status; - iocb->ki_pos +=3D written; - return written; -} -EXPORT_SYMBOL(generic_perform_write); - -/** - * __generic_file_write_iter - write data to a file - * @iocb: IO state structure (file, offset, etc.) - * @from: iov_iter with data to write - * - * This function does all the work needed for actually writing data to a - * file. It does all basic checks, removes SUID from the file, updates - * modification times and calls proper subroutines depending on whether we - * do direct IO or a standard buffered write. - * - * It expects i_rwsem to be grabbed unless we work on a block device or si= milar - * object which does not need locking at all. - * - * This function does *not* take care of syncing data in case of O_SYNC wr= ite. - * A caller has to handle it. This is mainly due to the fact that we want = to - * avoid syncing under i_rwsem. - * - * Return: - * * number of bytes written, even for truncated writes - * * negative error code if no data has been written at all - */ -ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *fro= m) -{ - struct file *file =3D iocb->ki_filp; - struct address_space *mapping =3D file->f_mapping; - struct inode *inode =3D mapping->host; - ssize_t ret; - - ret =3D file_remove_privs(file); - if (ret) - return ret; - - ret =3D file_update_time(file); - if (ret) - return ret; - - if (iocb->ki_flags & IOCB_DIRECT) { - ret =3D generic_file_direct_write(iocb, from); - /* - * If the write stopped short of completing, fall back to - * buffered writes. Some filesystems do this for writes to - * holes, for example. For DAX files, a buffered write will - * not succeed (even if it did, DAX does not handle dirty - * page-cache pages correctly). - */ - if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode)) - return ret; - return direct_write_fallback(iocb, from, ret, - generic_perform_write(iocb, from)); - } - - return generic_perform_write(iocb, from); -} -EXPORT_SYMBOL(__generic_file_write_iter); - -/** - * generic_file_write_iter - write data to a file - * @iocb: IO state structure - * @from: iov_iter with data to write - * - * This is a wrapper around __generic_file_write_iter() to be used by most - * filesystems. It takes care of syncing the file in case of O_SYNC file - * and acquires i_rwsem as needed. - * Return: - * * negative error code if no data has been written at all of - * vfs_fsync_range() failed for a synchronous write - * * number of bytes written, even for truncated writes - */ -ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) -{ - struct file *file =3D iocb->ki_filp; - struct inode *inode =3D file->f_mapping->host; - ssize_t ret; - - inode_lock(inode); - ret =3D generic_write_checks(iocb, from); - if (ret > 0) - ret =3D __generic_file_write_iter(iocb, from); - inode_unlock(inode); - - if (ret > 0) - ret =3D generic_write_sync(iocb, ret); - return ret; -} -EXPORT_SYMBOL(generic_file_write_iter); =20 /** * filemap_release_folio() - Release fs-specific metadata on a folio. --=20 2.39.5