From nobody Tue Apr 7 14:05:06 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94ABD34CFCB for ; Wed, 25 Feb 2026 22:41:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772059301; cv=none; b=h8NZ/C74Jc7skeGUqdVrhFkg2xj2YKkY13fHZMkeQnPx88Ny+UNvKjJg6mGcJ1uhQ9t3FpgkwVVJh0H5K+4Jd+vKPQFlUIrG5ElPJyhRN/ynQnQdy0Fqj4ZqliJF8+k+NFmr1B/eTlawtYuE/vOASE6Fzqy9K7IP5vxP0gfvDPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772059301; c=relaxed/simple; bh=wvO7UclkDfOFQdfC1evvQhRGdmzqdeL+WI5Bp8USXg0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Vzbnxh21iPvb7qHh60cVeXxkbxRybh2QR+LJt7+8fl5+P8ARd5Gh2p+v5mxb4C42SjdtqaSvLWfDrhVyGVZp+hP8NWpHBY4Yeet44VeEauJ0l4M94oDCeXwuvPyHy0Rl7vHw6qtqUlSr0YwaPv9bT3VtE2IWFt1+FJhhYZ+dToc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=dk9Jybov; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="dk9Jybov" Received: from pps.filterd (m0499199.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61PMC6Ee1162587 for ; Wed, 25 Feb 2026 17:41:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=B+x8 HlKrObJCF5zGXiosgMuiJzbAlNo3I+yj4hi3OCU=; b=dk9Jybovs8IHdEvEGP9P /Q+tTmC5El4AZB2gHDz4eA8v6PeKwiNC27gSP64f3zuophROaYaLhYmmHZXiKwMJ NSRp7hKX/59LhbwItsKVPFd7Crde5MZ4zs3tGudNHcoxx4fALdYT98VxFmtL21u7 fGfx2Y40whZsXWbmDY0qeSprynaAbzY7D4ttYLp1lNjUZRCarCpKgXdwbIuqQBKo dAh2ixL7cwqcSMAUrxeiWUHwrgu0D6JUMLqoY/Z34kR3anogYSc0Opb+0Pn0KWVM aWbq5vkVIyD/o2s7Co6n14E/UIIBlvbWXdW76yZm6WRRtPqcpoGOJFPXc2AOxDga cA== Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4cj1dtdcgh-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 25 Feb 2026 17:41:39 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-506bfff75edso29152831cf.3 for ; Wed, 25 Feb 2026 14:41:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772059299; x=1772664099; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=B+x8HlKrObJCF5zGXiosgMuiJzbAlNo3I+yj4hi3OCU=; b=p8V7lfzHgIZqNBOd3GgLs7hMiSZsMeeYxF3pppvdt1wB6T/59K8Q4lrSVJ0gughX1N 2eERe7aODg8FbHhTkveWbOABjH4a+K321SNzyHnUsIVyjDImqEhcmtk03nxqVQ2XJpZD IMzeaIgOqT9sBqkjQGanOYfVeDz2wtFCSN/CJN4sf+Ddp3OfmgcY4seG+HIhW9kMfBSe SkWQe5QAeSIRVZ6ZXZYGA1WxlWveHk83x1iVXoBy1MnkQUiyF9y9hdrhBtMw4U0syMem NXliiNMHvNBTgjwh4QEbSemuhEy/vaCU9EX0pA0XN/EtcQTbx0Us7BVgrsfzj9OU7pOh EPFQ== X-Forwarded-Encrypted: i=1; AJvYcCWkppdqyLX671H01uYdRkIj0iF5dRgiU7cW5V+dgNXIxOx4Ju+faDcLSmLT6gSgUeDhy64k730EwJr7i6A=@vger.kernel.org X-Gm-Message-State: AOJu0YyTus6oK7o8G51gaaixUKPzl+Ku6PL24fvf5IVPPcUSIXiWacAd GVoDnYaGcaFY+v+uB6WxOnIlFncBOb02YxsLtv44uaHb6OVEEtJxJwRfjwhxRzZU8NVX1HzlrmM 2dZb16aGu6XZ02X3J4bb5SXXMz2BU+gQz7G0GTpKYKiET4Oe/O9Z5AvuztZsPcA0IoA/YQw== X-Gm-Gg: ATEYQzz5KkABkY1smO6O/MxWG0jH/oSufVhQC0U/tJUuY1NxuTxBM2beClbdWj3hRl8 Xm8ssml9I2Ysn3d7EOc6yCQlEsgqIRlJS8mYx+pwXGjIJ+gR2Zt+mEPTiEzXJa5/D7g/zpWmptK tPbeKKIuKjkZKFvywusH1dV2KVafjItWF5N2GK9akwySMNXUVXMpMD2XBROSfqzKjIVIlmZHNer eKdfy/XoY7UlUO5Cz++ihteSWT55+phG2/UWqb0WEE+v3Gyk2Ja0f2zQlnD3kjmPRgo6cHKJ4Yv zkAFOgMQ843daTQghiBBD0P1oltWYrKC7eluWVzyQvFQoMdE7ZSmzV9nY8DMEX6b8PxJzrcfj92 KW4yQGnnbVg80XPnwLAGcgUrVsGO8Kg6c X-Received: by 2002:a05:622a:44:b0:4f3:59c1:768c with SMTP id d75a77b69052e-507460dc42emr758661cf.60.1772059298713; Wed, 25 Feb 2026 14:41:38 -0800 (PST) X-Received: by 2002:a05:622a:44:b0:4f3:59c1:768c with SMTP id d75a77b69052e-507460dc42emr758291cf.60.1772059298273; Wed, 25 Feb 2026 14:41:38 -0800 (PST) Received: from [127.0.1.1] ([216.158.158.246]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-507449be47dsm4196231cf.15.2026.02.25.14.41.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Feb 2026 14:41:37 -0800 (PST) From: Tal Zussman Date: Wed, 25 Feb 2026 17:40:56 -0500 Subject: [PATCH RFC v2 1/2] filemap: defer dropbehind invalidation from IRQ context Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260225-blk-dontcache-v2-1-70e7ac4f7108@columbia.edu> References: <20260225-blk-dontcache-v2-0-70e7ac4f7108@columbia.edu> In-Reply-To: <20260225-blk-dontcache-v2-0-70e7ac4f7108@columbia.edu> To: Jens Axboe , "Tigran A. Aivazian" , Alexander Viro , Christian Brauner , Jan Kara , Namjae Jeon , Sungjong Seo , Yuezhang Mo , Dave Kleikamp , Ryusuke Konishi , Viacheslav Dubeyko , Konstantin Komarov , Bob Copeland , "Matthew Wilcox (Oracle)" , Andrew Morton Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-nilfs@vger.kernel.org, ntfs3@lists.linux.dev, linux-karma-devel@lists.sourceforge.net, linux-mm@kvack.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1772059296; l=4861; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=wvO7UclkDfOFQdfC1evvQhRGdmzqdeL+WI5Bp8USXg0=; b=h4bisF4GoCaCSoqd37/ZfJNSaUymW/3u+9LXcqmZG0KnRoDbGHtk/4qRW+EsLRX8yAk3THDOa S2MblO1dCLPBna+BzWEdZKkpfva3GYEHQU94DPCiujOK2l2fhb+tpu1 X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI1MDIxNiBTYWx0ZWRfX/HR1odRdCku1 qJBIHh9PKuwAdUNiaxoQLCnHVe0m4PtZX1I0W+63deNh3HTQ5hFoII6LrxcgnsA5XymN9aNftch 7dUNTA7K84fO3nqE7/2LwCN39ww3KDlK5yNGZBTUXnTZWHupT5MT9gIxsZXVmBgyjsPU5Yi/q7O e7f/g1ZAgq5NRTfKESxbSLAszPgvRX+U3i+gPc+9QWK8XKtPvI67vTBGkieixpeavj23wEMtPp6 mah7U4OItu2Twlg5oNf0WYb2kq14I7iqck9gFL9HMwPVsGXMah7appuY0wGqRRfB1FxnG2PV8mw 2Spiw8ft30YO6Psgvl1QiEUufVJKl/keYrj6QLczeZBroteazPfoX2HmO/GiIVMjPe32eByZ9uI QlZ1iLV7HwhW/z2KFuswCv/YIT5q6LEda7w2+lUN0d+UgoVkiXtxzWs1KPmklwfFh7y6k1CDU97 0Wqn7B0avKm+KFWrbmA== X-Proofpoint-GUID: cza-rKUEj02f2chkWqyCoQ8AoLuO08aq X-Proofpoint-ORIG-GUID: cza-rKUEj02f2chkWqyCoQ8AoLuO08aq X-Authority-Analysis: v=2.4 cv=ROu+3oi+ c=1 sm=1 tr=0 ts=699f7aa3 cx=c_pps a=JbAStetqSzwMeJznSMzCyw==:117 a=mD05b5UW6KhLIDvowZ5dSQ==:17 a=IkcTkHD0fZMA:10 a=HzLeVaNsDn8A:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=G--0XuH5328wxK7v7Suf:22 a=wj6egcThClJy_xBgiekA:9 a=QEXdDO2ut3YA:10 a=uxP6HrT_eTzRwkO_Te1X:22 X-Proofpoint-Virus-Version: vendor=nai engine=6800 definitions=11712 signatures=596818 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=10 impostorscore=10 priorityscore=1501 phishscore=0 malwarescore=0 bulkscore=10 clxscore=1015 suspectscore=0 spamscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2602130000 definitions=main-2602250216 folio_end_dropbehind() is called from folio_end_writeback(), which can run in IRQ context through buffer_head completion. Previously, when folio_end_dropbehind() detected !in_task(), it skipped the invalidation entirely. This meant that folios marked for dropbehind via RWF_DONTCACHE would remain in the page cache after writeback when completed from IRQ context, defeating the purpose of using it. Fix this by deferring the dropbehind invalidation to a work item. When folio_end_dropbehind() is called from IRQ context, the folio is added to a global folio_batch and the work item is scheduled. The worker drains the batch, locking each folio and calling filemap_end_dropbehind(), and re-drains if new folios arrived while processing. This unblocks enabling RWF_UNCACHED for block devices and other buffer_head-based I/O. Signed-off-by: Tal Zussman --- mm/filemap.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++= ---- 1 file changed, 79 insertions(+), 5 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index ebd75684cb0a..6263f35c5d13 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1085,6 +1085,8 @@ static const struct ctl_table filemap_sysctl_table[] = =3D { } }; =20 +static void __init dropbehind_init(void); + void __init pagecache_init(void) { int i; @@ -1092,6 +1094,7 @@ void __init pagecache_init(void) for (i =3D 0; i < PAGE_WAIT_TABLE_SIZE; i++) init_waitqueue_head(&folio_wait_table[i]); =20 + dropbehind_init(); page_writeback_init(); register_sysctl_init("vm", filemap_sysctl_table); } @@ -1613,23 +1616,94 @@ static void filemap_end_dropbehind(struct folio *fo= lio) * If folio was marked as dropbehind, then pages should be dropped when wr= iteback * completes. Do that now. If we fail, it's likely because of a big folio - * just reset dropbehind for that case and latter completions should inval= idate. + * + * When called from IRQ context (e.g. buffer_head completion), we cannot l= ock + * the folio and invalidate. Defer to a workqueue so that callers like + * end_buffer_async_write() that complete in IRQ context still get their f= olios + * pruned. */ +static DEFINE_SPINLOCK(dropbehind_lock); +static struct folio_batch dropbehind_fbatch; +static struct work_struct dropbehind_work; + +static void dropbehind_work_fn(struct work_struct *w) +{ + struct folio_batch fbatch; + +again: + spin_lock_irq(&dropbehind_lock); + fbatch =3D dropbehind_fbatch; + folio_batch_reinit(&dropbehind_fbatch); + spin_unlock_irq(&dropbehind_lock); + + for (int i =3D 0; i < folio_batch_count(&fbatch); i++) { + struct folio *folio =3D fbatch.folios[i]; + + if (folio_trylock(folio)) { + filemap_end_dropbehind(folio); + folio_unlock(folio); + } + folio_put(folio); + } + + /* Drain folios that were added while we were processing. */ + spin_lock_irq(&dropbehind_lock); + if (folio_batch_count(&dropbehind_fbatch)) { + spin_unlock_irq(&dropbehind_lock); + goto again; + } + spin_unlock_irq(&dropbehind_lock); +} + +static void __init dropbehind_init(void) +{ + folio_batch_init(&dropbehind_fbatch); + INIT_WORK(&dropbehind_work, dropbehind_work_fn); +} + +static void folio_end_dropbehind_irq(struct folio *folio) +{ + unsigned long flags; + + spin_lock_irqsave(&dropbehind_lock, flags); + + /* If there is no space in the folio_batch, skip the invalidation. */ + if (!folio_batch_space(&dropbehind_fbatch)) { + spin_unlock_irqrestore(&dropbehind_lock, flags); + return; + } + + folio_get(folio); + folio_batch_add(&dropbehind_fbatch, folio); + spin_unlock_irqrestore(&dropbehind_lock, flags); + + schedule_work(&dropbehind_work); +} + void folio_end_dropbehind(struct folio *folio) { if (!folio_test_dropbehind(folio)) return; =20 /* - * Hitting !in_task() should not happen off RWF_DONTCACHE writeback, - * but can happen if normal writeback just happens to find dirty folios - * that were created as part of uncached writeback, and that writeback - * would otherwise not need non-IRQ handling. Just skip the - * invalidation in that case. + * Hitting !in_task() can happen for IO completed from IRQ contexts or + * if normal writeback just happens to find dirty folios that were + * created as part of uncached writeback, and that writeback would + * otherwise not need non-IRQ handling. */ if (in_task() && folio_trylock(folio)) { filemap_end_dropbehind(folio); folio_unlock(folio); + return; } + + /* + * In IRQ context we cannot lock the folio or call into the + * invalidation path. Defer to a workqueue. This happens for + * buffer_head-based writeback which runs from bio IRQ context. + */ + if (!in_task()) + folio_end_dropbehind_irq(folio); } EXPORT_SYMBOL_GPL(folio_end_dropbehind); =20 --=20 2.39.5 From nobody Tue Apr 7 14:05:06 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B46134EF04 for ; Wed, 25 Feb 2026 22:41:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772059306; cv=none; b=HhjHXzyFZmHrefhZyiahicCVwoZRFPtodkEsUFTp4f5/rTqU/LrMQ2KvsUEkjIBzAWxCaSLcK8+Kk1An7Alt0pC7vLSC9eWQLyU7kQgxdB/IwGcGcHxNbn6R9arBd8g/n2MaGRAfTsg2tCh7HEKPCZE0U8TD7n48w02SXEQPkdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772059306; c=relaxed/simple; bh=Ut4irJAP91Ja56kFfECaXVcCmgWiTiooWThaHJixb4A=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=pT+3Tn06UEZnVgj2pWtXuyEpGTOiurMXVNezmKdpa5g92JgcFCJjdVHquYm2RHrsKV9UiZofn+jUmVtbhKkMN/ogtvuI5IVCwRYZHq+TvitoC4xQwBN60VPZCK3YAkSr36gJ25N8t3SQoYu90XDfNNV4LW6t/dBFiyCSkxlN2E8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=ZQwNYgHV; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="ZQwNYgHV" Received: from pps.filterd (m0499199.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61PMC3mt1162456 for ; Wed, 25 Feb 2026 17:41:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=UCad QArHxHi9ThZ1jAIvSvngqH0LQPrZ8HK0cEt3m1g=; b=ZQwNYgHVk1e7AonBp1GN AHhms9mVIbxtdg9djgycPaIK5SH2e7LZ9supw+fWQxOsuxDxLdWYTg29qYdOZWOY Yoxind27Lp1LnTlBB+iocGqkBeujZgdoOx0KTpb3+L2flDUaQvgq7QGL0FSOMsZb aePYzQ46aAH5yfuxGueCd2PW8+BtSYb/6riEK9KE4UlCVlV4OAKU7LPjPny0bZRa r/F5oQMKeYFunqtUXZRY4gcq/FG3n9onHe0PD5YjSbIvEHhHmA/12/mqQIcDCB0E /BIYyBF4K6XcVWBm+onb/+8xjo0VkdTKX9hjL7/Xe7D735HMn5X1aR3McboDQo3F fQ== Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4cj1dtdch2-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 25 Feb 2026 17:41:44 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-50333a8184aso30969651cf.1 for ; Wed, 25 Feb 2026 14:41:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772059300; x=1772664100; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=UCadQArHxHi9ThZ1jAIvSvngqH0LQPrZ8HK0cEt3m1g=; b=GKFu1fNXgYBSYLUc6ghCkyXZ+iiQQt/N1M3096guOpDCNiWLi5PvVOv4Jad6qcEai5 nMvMs4XvCvaVu+w6/3heKVHErBVrdK4o9f2ap1ShKMQtrOIHQpHhwzuLSYfJa5aay+pD 76pElO+mh8JVgL9azwC5aNimgNaNUENlIn37a/7oQ7OtGKzbvqjnmyfOiHoWUtyRDrsj p//Z9Fl92do61XXZ38/dowjuZmmt1uNeuEkDK4hAl8lRairo15uXfs6+gaCiQNeXdE31 hvQX4a66/Oa7JTR/4tr+6xRShyvmAU1G/2zJiI+ZkPUm7Ya5u6udtrkypJP4EFJdZNEK TiOg== X-Forwarded-Encrypted: i=1; AJvYcCXE/gB5SBho3pL7TDmf247Qx4DdujIuvhFASXTIGPH5fe8zj9b9vhvmhWgs9XOlb7/b/fe/walGTkjJhTs=@vger.kernel.org X-Gm-Message-State: AOJu0Yzyr4P+eoYt6/DVC41iur6zcxr/jXmqmsLjunsog64ZaiyDnudY RAlaia3d9Q2bbC+UrvFQFSbo89UiB2QTIwR0VVe2fPokIMtBw0d9YsjnW7pP3iHhxlKhsZWx+5j T0zRgBmZmPX/2fcp6M5GZbV2D0Pn2b57L67i2uCxglCMVREgDLYEr+tChDxXI8g== X-Gm-Gg: ATEYQzwSifE1jVsAYRUTr3o9QH0bsXxmyPWB0Cf+UNyb/VvtkNSQd858suROlUyAPqQ NtmN5gUDIz6Wu/7TOUkS4CjljIYtKQZKGbuIHjsDzdoMSx2m7NSysB8hxcUO9fpqJaHQ8U8Ciko I5N4jhu+uuFlVq2LApGysAANlyxlqe5YlbJvu2SsfPU+W6Fpw7X2gr++VuRfDhsFjTDvB8WkVoT 1EKIcdaZELX/LchuuRXhfLy2DJL5sVMQgyTlZD8CDtBTBHw3RzNCSXYvIili6WqD8Vb7fU9Br0A fB+WIlgH4flZl1/GQMLwT5/QhLN0/1TXO2YTbCwif9csqddMhuYZGk8QLNZdKyw7/2DLFxm/GBO U+M6eE0+1b396KlAlxvabsqwXJZHjqCD6 X-Received: by 2002:ac8:5850:0:b0:4ec:a568:7b1c with SMTP id d75a77b69052e-50745effd87mr1368591cf.21.1772059300011; Wed, 25 Feb 2026 14:41:40 -0800 (PST) X-Received: by 2002:ac8:5850:0:b0:4ec:a568:7b1c with SMTP id d75a77b69052e-50745effd87mr1368101cf.21.1772059299490; Wed, 25 Feb 2026 14:41:39 -0800 (PST) Received: from [127.0.1.1] ([216.158.158.246]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-507449be47dsm4196231cf.15.2026.02.25.14.41.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Feb 2026 14:41:38 -0800 (PST) From: Tal Zussman Date: Wed, 25 Feb 2026 17:40:57 -0500 Subject: [PATCH RFC v2 2/2] block: enable RWF_DONTCACHE for block devices Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260225-blk-dontcache-v2-2-70e7ac4f7108@columbia.edu> References: <20260225-blk-dontcache-v2-0-70e7ac4f7108@columbia.edu> In-Reply-To: <20260225-blk-dontcache-v2-0-70e7ac4f7108@columbia.edu> To: Jens Axboe , "Tigran A. Aivazian" , Alexander Viro , Christian Brauner , Jan Kara , Namjae Jeon , Sungjong Seo , Yuezhang Mo , Dave Kleikamp , Ryusuke Konishi , Viacheslav Dubeyko , Konstantin Komarov , Bob Copeland , "Matthew Wilcox (Oracle)" , Andrew Morton Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-nilfs@vger.kernel.org, ntfs3@lists.linux.dev, linux-karma-devel@lists.sourceforge.net, linux-mm@kvack.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1772059296; l=9465; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=Ut4irJAP91Ja56kFfECaXVcCmgWiTiooWThaHJixb4A=; b=Nv2Kk4KuLTpGQX84Y042xeZddog0F8f7rPDUzrUKUw+0BVDRxxIPtrNiY2iL73Q/t4raHOUQx MqYl+nhxhJlBePWXMDkS88wNw/JvTeg7ErN4IzVAeE9mMzN1BIIMHzs X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI1MDIxNiBTYWx0ZWRfXwJxy2+XnbWR5 pOoB4YJBlGx4BpMzDdjD/vvMMTh5gZpV+nVXqRc4HEHn2CLJO4zMRY1B/KrRMWO/LjS02AGby3s wuwSZRyjevJI1XGTI4rpA/hCzknNb2EUXg8D2Ci9S65u96PT+1yEnWTd72VX8rWugcv0wLmmbrL ULRoIncKTpRuQn8/MgPsDpg5/lelC87e4iZZJv0y1hpkPGOv8/l+iZ3Hj5nrglZsXk9sfNtiuOQ 9DPmLX9PNdbrXFhJQylyubt4Yi54Wxr5R2TyMXbi3mTZN7gutYmIQtjhoTxoEDmh2CsE6ix4fFL zNpX+ZXFCqii9cwhrlcwDsDrallN1luNI28C5kprq+7XhI1TcF8c3s+EacumBO8WS10sr9wnvXP lqO0dsiKg1p1kWREUNAAqJViQfmqpZ0dpjIRy0Nd2jlofnSZZY/3U75FiZw7bAwIM7qxacU1wSS DDfPO4c1+nW4uJrtb+w== X-Proofpoint-GUID: _Lsr3UmpTlz8JS0fFAU29FbJgrR-l7b7 X-Proofpoint-ORIG-GUID: _Lsr3UmpTlz8JS0fFAU29FbJgrR-l7b7 X-Authority-Analysis: v=2.4 cv=ROu+3oi+ c=1 sm=1 tr=0 ts=699f7aa8 cx=c_pps a=mPf7EqFMSY9/WdsSgAYMbA==:117 a=mD05b5UW6KhLIDvowZ5dSQ==:17 a=IkcTkHD0fZMA:10 a=HzLeVaNsDn8A:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=G--0XuH5328wxK7v7Suf:22 a=Omgu3vtQbb9kFtOpCJYA:9 a=QEXdDO2ut3YA:10 a=dawVfQjAaf238kedN5IG:22 X-Proofpoint-Virus-Version: vendor=nai engine=6800 definitions=11712 signatures=596818 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=10 impostorscore=10 priorityscore=1501 phishscore=0 malwarescore=0 bulkscore=10 clxscore=1015 suspectscore=0 spamscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2602130000 definitions=main-2602250216 Block device buffered reads and writes already pass through filemap_read() and iomap_file_buffered_write() respectively, both of which handle IOCB_DONTCACHE. Enable RWF_DONTCACHE for block device files by setting FOP_DONTCACHE in def_blk_fops. For CONFIG_BUFFER_HEAD paths, thread the kiocb through block_write_begin() so that buffer_head-based I/O can use DONTCACHE behavior as well. Callers without a kiocb context (e.g. nilfs2 recovery) pass NULL, which preserves the existing behavior. This support is useful for databases that operate on raw block devices, among other userspace applications. Reviewed-by: Jan Kara Signed-off-by: Tal Zussman --- block/fops.c | 4 ++-- fs/bfs/file.c | 2 +- fs/buffer.c | 12 ++++++++---- fs/exfat/inode.c | 2 +- fs/ext2/inode.c | 2 +- fs/jfs/inode.c | 2 +- fs/minix/inode.c | 2 +- fs/nilfs2/inode.c | 2 +- fs/nilfs2/recovery.c | 2 +- fs/ntfs3/inode.c | 2 +- fs/omfs/file.c | 2 +- fs/udf/inode.c | 2 +- fs/ufs/inode.c | 2 +- include/linux/buffer_head.h | 5 +++-- 14 files changed, 24 insertions(+), 19 deletions(-) diff --git a/block/fops.c b/block/fops.c index 4d32785b31d9..6bc727f8b252 100644 --- a/block/fops.c +++ b/block/fops.c @@ -505,7 +505,7 @@ static int blkdev_write_begin(const struct kiocb *iocb, unsigned len, struct folio **foliop, void **fsdata) { - return block_write_begin(mapping, pos, len, foliop, blkdev_get_block); + return block_write_begin(iocb, mapping, pos, len, foliop, blkdev_get_bloc= k); } =20 static int blkdev_write_end(const struct kiocb *iocb, @@ -967,7 +967,7 @@ const struct file_operations def_blk_fops =3D { .splice_write =3D iter_file_splice_write, .fallocate =3D blkdev_fallocate, .uring_cmd =3D blkdev_uring_cmd, - .fop_flags =3D FOP_BUFFER_RASYNC, + .fop_flags =3D FOP_BUFFER_RASYNC | FOP_DONTCACHE, }; =20 static __init int blkdev_init(void) diff --git a/fs/bfs/file.c b/fs/bfs/file.c index d33d6bde992b..f2804e38b8a7 100644 --- a/fs/bfs/file.c +++ b/fs/bfs/file.c @@ -177,7 +177,7 @@ static int bfs_write_begin(const struct kiocb *iocb, { int ret; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, bfs_get_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, bfs_get_block); if (unlikely(ret)) bfs_write_failed(mapping, pos + len); =20 diff --git a/fs/buffer.c b/fs/buffer.c index 838c0c571022..33c3580b85d8 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2241,14 +2241,18 @@ EXPORT_SYMBOL(block_commit_write); * * The filesystem needs to handle block truncation upon failure. */ -int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, - struct folio **foliop, get_block_t *get_block) +int block_write_begin(const struct kiocb *iocb, struct address_space *mapp= ing, + loff_t pos, unsigned len, struct folio **foliop, get_block_t *get_block) { pgoff_t index =3D pos >> PAGE_SHIFT; + fgf_t fgp_flags =3D FGP_WRITEBEGIN; struct folio *folio; int status; =20 - folio =3D __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + if (iocb && iocb->ki_flags & IOCB_DONTCACHE) + fgp_flags |=3D FGP_DONTCACHE; + + folio =3D __filemap_get_folio(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) return PTR_ERR(folio); @@ -2591,7 +2595,7 @@ int cont_write_begin(const struct kiocb *iocb, struct= address_space *mapping, (*bytes)++; } =20 - return block_write_begin(mapping, pos, len, foliop, get_block); + return block_write_begin(iocb, mapping, pos, len, foliop, get_block); } EXPORT_SYMBOL(cont_write_begin); =20 diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c index f9501c3a3666..39d36e8fdfd6 100644 --- a/fs/exfat/inode.c +++ b/fs/exfat/inode.c @@ -456,7 +456,7 @@ static int exfat_write_begin(const struct kiocb *iocb, if (unlikely(exfat_forced_shutdown(mapping->host->i_sb))) return -EIO; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, exfat_get_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, exfat_get_bloc= k); =20 if (ret < 0) exfat_write_failed(mapping, pos+len); diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index dbfe9098a124..11aab03de752 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -930,7 +930,7 @@ ext2_write_begin(const struct kiocb *iocb, struct addre= ss_space *mapping, { int ret; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, ext2_get_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, ext2_get_block= ); if (ret < 0) ext2_write_failed(mapping, pos + len); return ret; diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c index 4709762713ef..ae52db437771 100644 --- a/fs/jfs/inode.c +++ b/fs/jfs/inode.c @@ -303,7 +303,7 @@ static int jfs_write_begin(const struct kiocb *iocb, { int ret; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, jfs_get_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, jfs_get_block); if (unlikely(ret)) jfs_write_failed(mapping, pos + len); =20 diff --git a/fs/minix/inode.c b/fs/minix/inode.c index 51ea9bdc813f..9075c0ba2f20 100644 --- a/fs/minix/inode.c +++ b/fs/minix/inode.c @@ -465,7 +465,7 @@ static int minix_write_begin(const struct kiocb *iocb, { int ret; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, minix_get_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, minix_get_bloc= k); if (unlikely(ret)) minix_write_failed(mapping, pos + len); =20 diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 51bde45d5865..d9d57eeecc5d 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -230,7 +230,7 @@ static int nilfs_write_begin(const struct kiocb *iocb, if (unlikely(err)) return err; =20 - err =3D block_write_begin(mapping, pos, len, foliop, nilfs_get_block); + err =3D block_write_begin(iocb, mapping, pos, len, foliop, nilfs_get_bloc= k); if (unlikely(err)) { nilfs_write_failed(mapping, pos + len); nilfs_transaction_abort(inode->i_sb); diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c index a9c61d0492cb..2f5fe44bf736 100644 --- a/fs/nilfs2/recovery.c +++ b/fs/nilfs2/recovery.c @@ -541,7 +541,7 @@ static int nilfs_recover_dsync_blocks(struct the_nilfs = *nilfs, } =20 pos =3D rb->blkoff << inode->i_blkbits; - err =3D block_write_begin(inode->i_mapping, pos, blocksize, + err =3D block_write_begin(NULL, inode->i_mapping, pos, blocksize, &folio, nilfs_get_block); if (unlikely(err)) { loff_t isize =3D inode->i_size; diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 0a9ac5efeb67..8c788feb319e 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -966,7 +966,7 @@ int ntfs_write_begin(const struct kiocb *iocb, struct a= ddress_space *mapping, goto out; } =20 - err =3D block_write_begin(mapping, pos, len, foliop, + err =3D block_write_begin(iocb, mapping, pos, len, foliop, ntfs_get_block_write_begin); =20 out: diff --git a/fs/omfs/file.c b/fs/omfs/file.c index 49a1de5a827f..3bade632e36e 100644 --- a/fs/omfs/file.c +++ b/fs/omfs/file.c @@ -317,7 +317,7 @@ static int omfs_write_begin(const struct kiocb *iocb, { int ret; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, omfs_get_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, omfs_get_block= ); if (unlikely(ret)) omfs_write_failed(mapping, pos + len); =20 diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 7fae8002344a..aec9cdc938be 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -259,7 +259,7 @@ static int udf_write_begin(const struct kiocb *iocb, int ret; =20 if (iinfo->i_alloc_type !=3D ICBTAG_FLAG_AD_IN_ICB) { - ret =3D block_write_begin(mapping, pos, len, foliop, + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, udf_get_block); if (unlikely(ret)) udf_write_failed(mapping, pos + len); diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c index e2b0a35de2a7..dfba985265a8 100644 --- a/fs/ufs/inode.c +++ b/fs/ufs/inode.c @@ -481,7 +481,7 @@ static int ufs_write_begin(const struct kiocb *iocb, { int ret; =20 - ret =3D block_write_begin(mapping, pos, len, foliop, ufs_getfrag_block); + ret =3D block_write_begin(iocb, mapping, pos, len, foliop, ufs_getfrag_bl= ock); if (unlikely(ret)) ufs_write_failed(mapping, pos + len); =20 diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index b16b88bfbc3e..4b07dec5f8eb 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -258,8 +258,9 @@ int __block_write_full_folio(struct inode *inode, struc= t folio *folio, get_block_t *get_block, struct writeback_control *wbc); int block_read_full_folio(struct folio *, get_block_t *); bool block_is_partially_uptodate(struct folio *, size_t from, size_t count= ); -int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, - struct folio **foliop, get_block_t *get_block); +int block_write_begin(const struct kiocb *iocb, struct address_space *mapp= ing, + loff_t pos, unsigned len, struct folio **foliop, + get_block_t *get_block); int __block_write_begin(struct folio *folio, loff_t pos, unsigned len, get_block_t *get_block); int block_write_end(loff_t pos, unsigned len, unsigned copied, struct foli= o *); --=20 2.39.5