From nobody Thu Jun 11 00:35:51 2026 Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8841B3469F8 for ; Fri, 27 Feb 2026 16:41:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.135.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772210500; cv=none; b=qpZ1tIMEMbPXyVSRioimZP9FpaB07CRuoITTlG/dQ+VGIjJxeX0ZyS8Q1wVIVEbXABGKzabaWZmEWHHyzpFQkf+w6Fd4N1e7owj9RD5hRvMtYjp+7XS7ma+h/UWCiOGwBZllLhjkCUKKxALLBsG+nMU/w0EM5FGj7Qw4jqg6xy8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772210500; c=relaxed/simple; bh=RY/e80+/kZp0ndnWK8kJgTORmskriDz0+BRBasqpa+s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=hEpG/XacPYf52AkbKidwy99dVNHvpB/lMRlcwZU4nU14DOIVUxpEI9rA+A404BQx/2iJzFXmdyRT3AXl375P/fC2RbZzdIi5jNdx8sxu377OR3sAiVOmouPuj27mZbEQLDopYIaYEIAAvWWMVu1c8eV6vR2gqqAuyPMRFONbVik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=PTddOnLl; arc=none smtp.client-ip=148.163.135.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="PTddOnLl" Received: from pps.filterd (m0167071.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61RGNXu92654437 for ; Fri, 27 Feb 2026 11:41:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=kqO9 qkRdVrAKYYgOAexi6ZPIdSHNelQDaN7QKT4AcXo=; b=PTddOnLlQSm+yYDiHtdN PTkRtJtY1GOXc7dzRBrSEut2qC0a5yoxWHLbFwBCXJNaXWQjRik3Q30/YoJAORs/ tV8lysYFLlp1xdJdiQog0twPAOTDGBbUlCSYutNFCvkbYLj9+omnS2s47xac/ZPl XIUMao1s3jA2dOW9R0xlfNEVE0HSAJByDZZ/L5oHGO3KC69V0CGvYvveqjr6lcvA 4uwVfEUMmU6zq2INQ6mjKFGjJJki+hAmHF6H6YzB1Ha9UXW9n0Mz4fbFHHg1LOWi bdKXXfabnIMKvxUYun8J0pi6dcYMwSKfMa7rBzNLgu5oGRom3FpR819asMlkhhyj Jw== Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 4cju9xq20u-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Fri, 27 Feb 2026 11:41:31 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-5033c483b76so224189301cf.1 for ; Fri, 27 Feb 2026 08:41:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772210488; x=1772815288; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=kqO9qkRdVrAKYYgOAexi6ZPIdSHNelQDaN7QKT4AcXo=; b=w8jL+1/Fb0JuDzL8AAK9o7aZKjmiH7v2axjCxaZe5LQnr5tIjjsF6STsNUVN8lUHTW B7g5de1LSsBwfeeT+RuBV5DS2KMztcB8iLJUTBKt+qWKjX0wrVx/RGfugi/PTOFHbGGe TBVVy4Kl6iC5guXCZYO2/WdGNZ9ux51m9+fKHsRODmvmdgdUZn4WLIDgAGBznOQSp2Mn ee415vBmQp3jyszZYnzXZ5xD4ExglXHa8ggT2gk33TYVqEvNcjMvJ0Hqcx7VAIGsxRaS d7lRuP8VAQgSp8iKrgK8vuiwqzqwnMlbueXwzripmxUTpJfFONEDm7EqngVnwRJXIZC+ wtjw== X-Forwarded-Encrypted: i=1; AJvYcCWrc5e4VvZTAuKBzFpCfi0azhPHskLpOyQQzmiikS5pvNNQh2cRPPMu/HlAtjZFcpYalvR2R6qDsNBn3Ec=@vger.kernel.org X-Gm-Message-State: AOJu0YxDP5oRb6jxrWkDm/gBxK2dFwDbsByLsYVrHh2SITN6yXSS3snw icjGregXSkmIlZb21XKZcAACfsyMIuDzE+zaQwzCsGz61t6qx4sZVCauiFQal38jE+dmxoWP08w Ejxsr2G096rb6+edsD/XAKUtpKCTLU+My/5vahclM5nfjKLHPABXUXYpPCgQaaQ== X-Gm-Gg: ATEYQzz4luF6QiDt8Ls+n1BGUnqfbfyVhVv7NjTAznz7b0zgW8pxzZ1dosqL8qnRRr9 kYkdbCn4o08O/KxlG2u5GOo2Z/3gxB/l+wKYXjnAEYaRfet/ulX0y5CLFgwI1FXRu4aPuLx6vzQ yLlj1AkTNDxPnZhczSEIe/3m28Uv1gY67n4ak4mmFwdzi+rwPok0RBcdRx2oD++mQ1SfWQ3YEle msrXOpf3GFJIjuQrSnyddwnt4//Ob3I0149xpm8n8z1X/evvc5EoMZV02DfLUz/KwUL5SakbCN4 Nfe4Rg+DkURFvRHHUZ8T5DOrC2xvMt0br+Rz5OkrRQTUrZPIIGT/41si3QHZZRAWZOKtOX5TRmt ZtnL4b0ru/qnqVlZgVwsm7/oFuDwQS+W8a4mrPHeJCfpwq4eYPnxz1BOltkA7bQ4Hpzo= X-Received: by 2002:ac8:584e:0:b0:506:1f48:9ffd with SMTP id d75a77b69052e-50752982c2cmr45010481cf.40.1772210487384; Fri, 27 Feb 2026 08:41:27 -0800 (PST) X-Received: by 2002:ac8:584e:0:b0:506:1f48:9ffd with SMTP id d75a77b69052e-50752982c2cmr45009771cf.40.1772210486660; Fri, 27 Feb 2026 08:41:26 -0800 (PST) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-899c716caebsm46535886d6.15.2026.02.27.08.41.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Feb 2026 08:41:25 -0800 (PST) From: Tal Zussman Date: Fri, 27 Feb 2026 11:41:07 -0500 Subject: [PATCH RFC v3 1/2] filemap: defer dropbehind invalidation from IRQ context Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260227-blk-dontcache-v3-1-cd309ccd5868@columbia.edu> References: <20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu> In-Reply-To: <20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu> To: "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Jens Axboe , Alexander Viro , Christian Brauner , Jan Kara Cc: Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1772210483; l=7573; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=RY/e80+/kZp0ndnWK8kJgTORmskriDz0+BRBasqpa+s=; b=5GcYGdmdWtXfKAIurXREVpzuab/QFgffaUCQqEN5lF8DAh34Uoemq2/kit/VagezIlDEV6Sb9 UKFtKpmhpV5CPnT03p8sgafLGn7XkDEwTt1k4s69nfd83MAmaDzml1x X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Authority-Analysis: v=2.4 cv=WZoBqkhX c=1 sm=1 tr=0 ts=69a1c93b cx=c_pps a=JbAStetqSzwMeJznSMzCyw==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=HzLeVaNsDn8A:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=79PYxaXUQd1wl-QFWJnA:22 a=-hUk4XeFNW921M1lpgYA:9 a=QEXdDO2ut3YA:10 a=uxP6HrT_eTzRwkO_Te1X:22 X-Proofpoint-ORIG-GUID: pFVhYw-nxENTpaamTRUY6CQvNIz5T3Pn X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI3MDE0OCBTYWx0ZWRfX9L9vWaQZ6uRZ etjOoHFoH23uRv2vfklqzgsuhZtyKL40cn3gr7x8HK2/ex4RxnuDidFbYoTIhGCWGXGFCVWuxEh QH9iwTzrqH6VgdfmcJ+OgmTC/5srG+F/WdSMfPrWPLEOo/b/34FajX//cFKHZPc8NwOuqI3xKnR SAHpEnSEEzdvll6nP/brVdz6DcDeqaw0/+L+q/36Hvqt3z+orF/+IQ+GwTpElkhbqbxfHvmEUt2 kbhzZ2QfP6m/zus3AFTWI/j6FlXSfeLmvAaUQkF2568NHVrFRnASr6egiWDW3iJsc3MyGm+i9KT JgGfYngfIvs2wlBwiBuxaci7FwGNDum60dg+a2LphqucUnn1vEj6wzRiq25wHZfJGbeZ0GP94Oe 4fIG5t7GppzYxwjffnMnqxVoX4ITUM1f+VjxjiOgTKaDC5oGzjaMwuHUXOSeo7diVRG8/BPwrlI EXGE0u9t1g7E7zW+U7Q== X-Proofpoint-GUID: pFVhYw-nxENTpaamTRUY6CQvNIz5T3Pn X-Proofpoint-Virus-Version: vendor=nai engine=6800 definitions=11714 signatures=596818 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 malwarescore=0 lowpriorityscore=10 phishscore=0 clxscore=1015 priorityscore=1501 impostorscore=10 spamscore=0 suspectscore=0 bulkscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2602130000 definitions=main-2602270148 folio_end_dropbehind() is called from folio_end_writeback(), which can run in IRQ context through buffer_head completion. Previously, when folio_end_dropbehind() detected !in_task(), it skipped the invalidation entirely. This meant that folios marked for dropbehind via RWF_DONTCACHE would remain in the page cache after writeback when completed from IRQ context, defeating the purpose of using it. Fix this by adding folio_end_dropbehind_irq() which defers the invalidation to a workqueue. The folio is added to a per-cpu folio_batch protected by a local_lock, and a work item pinned to that CPU drains the batch. folio_end_writeback() dispatches between the task and IRQ paths based on in_task(). A CPU hotplug dead callback drains any remaining folios from the departing CPU's batch to avoid leaking folio references. This unblocks enabling RWF_DONTCACHE for block devices and other buffer_head-based I/O. Signed-off-by: Tal Zussman --- include/linux/pagemap.h | 1 + mm/filemap.c | 130 ++++++++++++++++++++++++++++++++++++++++++++= ---- mm/page_alloc.c | 1 + 3 files changed, 123 insertions(+), 9 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ec442af3f886..ae0632cfdedd 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1260,6 +1260,7 @@ void end_page_writeback(struct page *page); void folio_end_writeback(struct folio *folio); void folio_end_writeback_no_dropbehind(struct folio *folio); void folio_end_dropbehind(struct folio *folio); +void dropbehind_drain_cpu(int cpu); void folio_wait_stable(struct folio *folio); void __folio_mark_dirty(struct folio *folio, struct address_space *, int w= arn); void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb); diff --git a/mm/filemap.c b/mm/filemap.c index ebd75684cb0a..b223dca708df 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -49,6 +49,7 @@ #include #include #include +#include =20 #include #include "internal.h" @@ -1085,6 +1086,8 @@ static const struct ctl_table filemap_sysctl_table[] = =3D { } }; =20 +static void __init dropbehind_init(void); + void __init pagecache_init(void) { int i; @@ -1092,6 +1095,7 @@ void __init pagecache_init(void) for (i =3D 0; i < PAGE_WAIT_TABLE_SIZE; i++) init_waitqueue_head(&folio_wait_table[i]); =20 + dropbehind_init(); page_writeback_init(); register_sysctl_init("vm", filemap_sysctl_table); } @@ -1613,26 +1617,131 @@ static void filemap_end_dropbehind(struct folio *f= olio) * If folio was marked as dropbehind, then pages should be dropped when wr= iteback * completes. Do that now. If we fail, it's likely because of a big folio - * just reset dropbehind for that case and latter completions should inval= idate. + * + * When called from IRQ context (e.g. buffer_head completion), we cannot l= ock + * the folio and invalidate. Defer to a workqueue so that callers like + * end_buffer_async_write() that complete in IRQ context still get their f= olios + * pruned. + */ +struct dropbehind_batch { + local_lock_t lock_irq; + struct folio_batch fbatch; + struct work_struct work; +}; + +static DEFINE_PER_CPU(struct dropbehind_batch, dropbehind_batch) =3D { + .lock_irq =3D INIT_LOCAL_LOCK(lock_irq), +}; + +static void dropbehind_work_fn(struct work_struct *w) +{ + struct dropbehind_batch *db_batch; + struct folio_batch fbatch; + +again: + local_lock_irq(&dropbehind_batch.lock_irq); + db_batch =3D this_cpu_ptr(&dropbehind_batch); + fbatch =3D db_batch->fbatch; + folio_batch_reinit(&db_batch->fbatch); + local_unlock_irq(&dropbehind_batch.lock_irq); + + for (int i =3D 0; i < folio_batch_count(&fbatch); i++) { + struct folio *folio =3D fbatch.folios[i]; + + if (folio_trylock(folio)) { + filemap_end_dropbehind(folio); + folio_unlock(folio); + } + folio_put(folio); + } + + /* Drain folios that were added while we were processing. */ + local_lock_irq(&dropbehind_batch.lock_irq); + if (folio_batch_count(&db_batch->fbatch)) { + local_unlock_irq(&dropbehind_batch.lock_irq); + goto again; + } + local_unlock_irq(&dropbehind_batch.lock_irq); +} + +/* + * Drain a dead CPU's dropbehind batch. The CPU is already dead so no + * locking is needed. + */ +void dropbehind_drain_cpu(int cpu) +{ + struct dropbehind_batch *db_batch =3D per_cpu_ptr(&dropbehind_batch, cpu); + struct folio_batch *fbatch =3D &db_batch->fbatch; + + for (int i =3D 0; i < folio_batch_count(fbatch); i++) { + struct folio *folio =3D fbatch->folios[i]; + + if (folio_trylock(folio)) { + filemap_end_dropbehind(folio); + folio_unlock(folio); + } + folio_put(folio); + } + folio_batch_reinit(fbatch); +} + +static void __init dropbehind_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct dropbehind_batch *db_batch =3D per_cpu_ptr(&dropbehind_batch, cpu= ); + + folio_batch_init(&db_batch->fbatch); + INIT_WORK(&db_batch->work, dropbehind_work_fn); + } +} + +/* + * Must be called from task context. Use folio_end_dropbehind_irq() for + * IRQ context (e.g. buffer_head completion). */ void folio_end_dropbehind(struct folio *folio) { if (!folio_test_dropbehind(folio)) return; =20 - /* - * Hitting !in_task() should not happen off RWF_DONTCACHE writeback, - * but can happen if normal writeback just happens to find dirty folios - * that were created as part of uncached writeback, and that writeback - * would otherwise not need non-IRQ handling. Just skip the - * invalidation in that case. - */ - if (in_task() && folio_trylock(folio)) { + if (folio_trylock(folio)) { filemap_end_dropbehind(folio); folio_unlock(folio); } } EXPORT_SYMBOL_GPL(folio_end_dropbehind); =20 +/* + * In IRQ context we cannot lock the folio or call into the invalidation + * path. Defer to a workqueue. This happens for buffer_head-based writeback + * which runs from bio IRQ context. + */ +static void folio_end_dropbehind_irq(struct folio *folio) +{ + struct dropbehind_batch *db_batch; + unsigned long flags; + + if (!folio_test_dropbehind(folio)) + return; + + local_lock_irqsave(&dropbehind_batch.lock_irq, flags); + db_batch =3D this_cpu_ptr(&dropbehind_batch); + + /* If there is no space in the folio_batch, skip the invalidation. */ + if (!folio_batch_space(&db_batch->fbatch)) { + local_unlock_irqrestore(&dropbehind_batch.lock_irq, flags); + return; + } + + folio_get(folio); + folio_batch_add(&db_batch->fbatch, folio); + local_unlock_irqrestore(&dropbehind_batch.lock_irq, flags); + + schedule_work_on(smp_processor_id(), &db_batch->work); +} + /** * folio_end_writeback_no_dropbehind - End writeback against a folio. * @folio: The folio. @@ -1685,7 +1794,10 @@ void folio_end_writeback(struct folio *folio) */ folio_get(folio); folio_end_writeback_no_dropbehind(folio); - folio_end_dropbehind(folio); + if (in_task()) + folio_end_dropbehind(folio); + else + folio_end_dropbehind_irq(folio); folio_put(folio); } EXPORT_SYMBOL(folio_end_writeback); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..8208223fd764 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6277,6 +6277,7 @@ static int page_alloc_cpu_dead(unsigned int cpu) struct zone *zone; =20 lru_add_drain_cpu(cpu); + dropbehind_drain_cpu(cpu); mlock_drain_remote(cpu); drain_pages(cpu); =20 --=20 2.39.5 From nobody Thu Jun 11 00:35:51 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70F1334E770 for ; Fri, 27 Feb 2026 16:41:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772210491; cv=none; b=VdALqc71kxKxaP7YYqEZThvWiiuiDyZ1FiNBtC1j/xBx+LFpTlaZOmEyPdK2cl1hzo4r0ThPpEPY0BgEucblay16i3aOw/ezVfj7RjSzSuKbYQEImpu/aWV9IfvLIqsIvk0iXGBaOUYiiL90LleetHU5MCuXWOWxZzwnNdZMpBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772210491; c=relaxed/simple; bh=8jgmldQQbiZzzlvIZlNR4V+9tVlk4w4gZb+9NNCt4y8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Xgmn6dyneMc7hwvcsI28TASq1EdZUgd0l2c8pN1uZ6IRGRC756ZTt6xLgIYjXqyPZg2NiHYDahzUuqKuF4pLHxlRBIG9DoOZOoPHaIljUqR0+naRlwkUEndzfgUXTLdIoxiVswBoOdao088x8lV1gX+1reY3EepTrffq2JVEMno= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=FM/S0CDV; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="FM/S0CDV" Received: from pps.filterd (m0499198.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 61RGNAKC1564929 for ; Fri, 27 Feb 2026 11:41:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=jlmO HeLoSvLjpCm3jD8iWkNqxE6EJRsgaFZ0qUuQgW8=; b=FM/S0CDVVFMCT2m5AUDa OOFumevWors0oqq7hz6NDLRqsjZtbb+w5BQ34v8yQEwnu2n3lFDEfSqldvNdyEVy 0ssJO1dDpY8qCk/RgAEmRv9cXwhX56TmsPpwK4nbJYDN4nSff3IfDhbhsLYAclna Wa9ga6InnhSRmJ6NqTfz3rMX4dn8n10S/ZiNAU2oPM03q6z+vwwi3+12ndvaO2UI lNMKkL7HOaAdQ4Dy6/B9jZlMgvijA8cLlRF0LBz7KRFliypGcf/fvzkS4yCu58IS MFNX6vcHiPcShCQ1tg+ZVgP13MMzx1HoK5G9VTgkW41t9/55kWsNz8lehhk04iJY BA== Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4ckdsa0rd0-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Fri, 27 Feb 2026 11:41:29 -0500 (EST) Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-899b98ebf47so179167436d6.2 for ; Fri, 27 Feb 2026 08:41:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772210489; x=1772815289; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=jlmOHeLoSvLjpCm3jD8iWkNqxE6EJRsgaFZ0qUuQgW8=; b=N/Xm0v899kWfQvW7/k5/G9ID2GnUCBJQ9YNhUaPHQ0KCdJkBl3T8Tyb/0E8ayE1cBV FzpA5/dtS5kentuwRfbTj98ZgA/ph/2Zgf5XsFcxw2n/di0XJ1oSXpd+yz6kYK+KWnij J9wNePS/v5D9rWJTVCb72qPCoHUpBBEpLTjcG/BteKkLWA+PVqa14nkcd9sL2/fekXDl 3WuImK9IcBwkHePlHPoyGZ4XkfoM0O/oR0RyRSjZpwkLRnSUwdsASAvWEHw0pCyzd8VP 7O3NOzhzkawwsbrTDD3jELBeHBkkw1y1+U3HBR62c+iDUaIFfmqRUKnwTkLyG7SgsShY qSQA== X-Forwarded-Encrypted: i=1; AJvYcCUkNH+bz+ZE+QqUY+5Hvrz3BAc1a1B2FHVvft6EnGeangpWn/Y4uVzO1Wcg9F0ajJMnTs0vv9V9Wu6T6uE=@vger.kernel.org X-Gm-Message-State: AOJu0YzafCGISW0okUPX6KAAxXJ7SYWQXnxh2k1dbQ3wul4vx61/EZHb NvRA806s0rfZ8xyC5j8NJxahU+qRJqpRu/T3DcKkBJ+BuZhDIU5IP8AYWgzxVPjkSqXqx0plAit DCFoBgROk1HVE5zDRYjGCfOY5waMUAVFQkO+n/9+yZd+QtCvVWUMKEHsK+aW8gSaxV0SVFQ== X-Gm-Gg: ATEYQzxu9/QPXRA9tTgfE8NiQAMzet9D8a2CkSJ7TB0IQSYha43pJRYOAUZ/ntnG1jC 9PMBFDzuo61RQTTE5hCkE85izKdtkHWRi7cBkAOpbpNqlwfu0u85doXGQsCu0RK+E8U4S1Sd7IQ /8vWfyqeqsfy8XEtJcv9HlFurg+1Cg6Wkxq0HTIi0iZazN40/8IWSPR9TUFAYVXI8oWXm8rJG5n cGbX0R/dsFUqNKcYrciwFbgDO1AhgUglLwBsjC+Wc2neCgaWmaIlxAv3cQRK7nLn8imhbw51nwX bVk3YuGbtG6bwrOLx9LEchr4IJ9nlLTPk8YbsTPF55B5zEazMRRE4YpjlAvxnatpAjMruHPLGex ZPhr6q746O3Kl9i6RjAT5fha3rFSd0pPH0zKyfLNzXIO54irHkA/RcGJLgFXSQlHJUuU= X-Received: by 2002:a05:620a:454e:b0:8c6:e11c:5ec4 with SMTP id af79cd13be357-8cbc8df8884mr430465385a.35.1772210488687; Fri, 27 Feb 2026 08:41:28 -0800 (PST) X-Received: by 2002:a05:620a:454e:b0:8c6:e11c:5ec4 with SMTP id af79cd13be357-8cbc8df8884mr430460585a.35.1772210488048; Fri, 27 Feb 2026 08:41:28 -0800 (PST) Received: from [127.0.1.1] (dyn-160-39-33-242.dyn.columbia.edu. [160.39.33.242]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-899c716caebsm46535886d6.15.2026.02.27.08.41.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Feb 2026 08:41:27 -0800 (PST) From: Tal Zussman Date: Fri, 27 Feb 2026 11:41:08 -0500 Subject: [PATCH RFC v3 2/2] block: enable RWF_DONTCACHE for block devices Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260227-blk-dontcache-v3-2-cd309ccd5868@columbia.edu> References: <20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu> In-Reply-To: <20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu> To: "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Jens Axboe , Alexander Viro , Christian Brauner , Jan Kara Cc: Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1772210483; l=4224; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=8jgmldQQbiZzzlvIZlNR4V+9tVlk4w4gZb+9NNCt4y8=; b=vm0JkGcE+Uz0XFcsWwyvUOREbVMTCc75ksaV58G85KW5Z1ue6w4/8TqYICgOvWLed9AR0ZKdg MXRWrkfiaqjDOu1E1bJGDUJ5l8zmf/6G8O+4wBj6LctSBz2MF/pSI8J X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: aP-22-0ymSdj6beMMvriRll7wme-4hQF X-Proofpoint-ORIG-GUID: aP-22-0ymSdj6beMMvriRll7wme-4hQF X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjI3MDE0OCBTYWx0ZWRfX7R/7WQ8/QPzn bNpKmBtkNs6uAd4plhsBSKV/OmyrTZR50/LKHUIY51FQ0GRwl8DcHNdKKD0ErEhQ6pO6mQDzdG8 HIzO7zXFfVCxDDBVFdv84A3ePmkmYpi7x+cGiHidy1AiHcocaTOoSh/FHsxhIArZGdlNZ4QGCRG imlwxdu0U3d/Js7xD+x5QeBDKjMyi+qF5CsCukHJrSnM6L/L48shPpN7AModb5D3jSWJdFGCtVb c3tF5optklwLhCWBpQIkcxp4Szwd0xdj2me1VdJmUo1ocTSeECtqpPxi/mPFz5XN1zVJ/K8SWW7 toNzM1uIfR8FzfOgHeTUNiJA55NN4/FXMEXisv0KjuGJ/mM4CHtdsp7YqHQhqYCO0Z0hbhQ8ejc /NaBYpCPoMvoetPdYP5Cl9IP0j0iAOR71/12t6kstSLDsReSs2U/oC8HH5QkNVHSWixDwNN+Hkb x7Z6BrEKjGHreJ6Yc7w== X-Authority-Analysis: v=2.4 cv=fu7RpV4f c=1 sm=1 tr=0 ts=69a1c939 cx=c_pps a=wEM5vcRIz55oU/E2lInRtA==:117 a=GaPK54s0Se3oFqK5NkZy0g==:17 a=IkcTkHD0fZMA:10 a=HzLeVaNsDn8A:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=BpGzv1V74M3SfeTrGa8v:22 a=Kw1KkKa2aV08GXngiw4A:9 a=QEXdDO2ut3YA:10 a=OIgjcC2v60KrkQgK7BGD:22 X-Proofpoint-Virus-Version: vendor=nai engine=6800 definitions=11714 signatures=596818 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=10 malwarescore=0 adultscore=0 suspectscore=0 priorityscore=1501 bulkscore=10 clxscore=1015 lowpriorityscore=10 phishscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2602130000 definitions=main-2602270148 Block device buffered reads and writes already pass through filemap_read() and iomap_file_buffered_write() respectively, both of which handle IOCB_DONTCACHE. Enable RWF_DONTCACHE for block device files by setting FOP_DONTCACHE in def_blk_fops. For CONFIG_BUFFER_HEAD paths, add block_write_begin_iocb() which threads the kiocb through so that buffer_head-based I/O can use DONTCACHE behavior. The existing block_write_begin() is preserved as a wrapper that passes a NULL iocb. This support is useful for databases that operate on raw block devices, among other userspace applications. Signed-off-by: Tal Zussman Reviewed-by: Jan Kara --- block/fops.c | 5 +++-- fs/buffer.c | 19 ++++++++++++++++--- include/linux/buffer_head.h | 3 +++ 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/block/fops.c b/block/fops.c index 4d32785b31d9..d8165f6ba71c 100644 --- a/block/fops.c +++ b/block/fops.c @@ -505,7 +505,8 @@ static int blkdev_write_begin(const struct kiocb *iocb, unsigned len, struct folio **foliop, void **fsdata) { - return block_write_begin(mapping, pos, len, foliop, blkdev_get_block); + return block_write_begin_iocb(iocb, mapping, pos, len, foliop, + blkdev_get_block); } =20 static int blkdev_write_end(const struct kiocb *iocb, @@ -967,7 +968,7 @@ const struct file_operations def_blk_fops =3D { .splice_write =3D iter_file_splice_write, .fallocate =3D blkdev_fallocate, .uring_cmd =3D blkdev_uring_cmd, - .fop_flags =3D FOP_BUFFER_RASYNC, + .fop_flags =3D FOP_BUFFER_RASYNC | FOP_DONTCACHE, }; =20 static __init int blkdev_init(void) diff --git a/fs/buffer.c b/fs/buffer.c index 838c0c571022..18f1d128bb19 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2241,14 +2241,19 @@ EXPORT_SYMBOL(block_commit_write); * * The filesystem needs to handle block truncation upon failure. */ -int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, +int block_write_begin_iocb(const struct kiocb *iocb, + struct address_space *mapping, loff_t pos, unsigned len, struct folio **foliop, get_block_t *get_block) { pgoff_t index =3D pos >> PAGE_SHIFT; + fgf_t fgp_flags =3D FGP_WRITEBEGIN; struct folio *folio; int status; =20 - folio =3D __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + if (iocb && iocb->ki_flags & IOCB_DONTCACHE) + fgp_flags |=3D FGP_DONTCACHE; + + folio =3D __filemap_get_folio(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) return PTR_ERR(folio); @@ -2263,6 +2268,13 @@ int block_write_begin(struct address_space *mapping,= loff_t pos, unsigned len, *foliop =3D folio; return status; } + +int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, + struct folio **foliop, get_block_t *get_block) +{ + return block_write_begin_iocb(NULL, mapping, pos, len, foliop, + get_block); +} EXPORT_SYMBOL(block_write_begin); =20 int block_write_end(loff_t pos, unsigned len, unsigned copied, @@ -2591,7 +2603,8 @@ int cont_write_begin(const struct kiocb *iocb, struct= address_space *mapping, (*bytes)++; } =20 - return block_write_begin(mapping, pos, len, foliop, get_block); + return block_write_begin_iocb(iocb, mapping, pos, len, foliop, + get_block); } EXPORT_SYMBOL(cont_write_begin); =20 diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index b16b88bfbc3e..ddf88ce290f2 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -260,6 +260,9 @@ int block_read_full_folio(struct folio *, get_block_t *= ); bool block_is_partially_uptodate(struct folio *, size_t from, size_t count= ); int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, struct folio **foliop, get_block_t *get_block); +int block_write_begin_iocb(const struct kiocb *iocb, + struct address_space *mapping, loff_t pos, unsigned len, + struct folio **foliop, get_block_t *get_block); int __block_write_begin(struct folio *folio, loff_t pos, unsigned len, get_block_t *get_block); int block_write_end(loff_t pos, unsigned len, unsigned copied, struct foli= o *); --=20 2.39.5