From nobody Fri Apr 3 01:24:29 2026 Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE676347BC4 for ; Wed, 25 Mar 2026 19:38:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.139.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774467497; cv=none; b=VCxOJRs87o5mEd5JAqCllbzAYIPQCPwinzptC+XsWu+F+dULnjziemtpRofFJDEWHPcZPWCa/vAN92lJO7koar+ZZnNyvZivzWaz0OO3bD2cVktETbdXsoWZ7XNC+bXYeNGIBfEqHRuCIISd+l6vRFWOP5ddr4E6QnuLkfGhVmk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774467497; c=relaxed/simple; bh=vqYjVe4t0i/AiFArpSZ9mEXLlU4rnE2ZRBBMSnozEZ0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=f4nJF08g2sOSEjCZR19Z8VV+pW25Es/ItEVvDQmrAxQfJ1tkUssK2ibrG5v5ww8cYkkt8yu1xk3veJs51hvbKCnPeDPIYyHpsuxhqNiwCCjgigWsxq8Uo7gU5uhc2/mh94Hf2oHsFn10mLbD8OAzcwHmksMGZ4XBJCxpahIVzEo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu; spf=pass smtp.mailfrom=columbia.edu; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b=ea/7FaAK; arc=none smtp.client-ip=148.163.139.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=columbia.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=columbia.edu header.i=@columbia.edu header.b="ea/7FaAK" Received: from pps.filterd (m0167075.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62PJT0Iw3609953 for ; Wed, 25 Mar 2026 15:30:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pps01; bh=S7NO oLpQC1gt8jjGEOFARMhStqFULmtQPTxyzd8Tw9E=; b=ea/7FaAK8Ey0iHDLieZh JaXjXfGGE2bKSByCc9yYFObbNVPoP9alZW0vtMBixLqg5f32Dju0WDS6uTXT3eXy qvYw4/FrDejFY13IpjVYI3moVgbwSjYbbxC5UOLGkUlH2FoRntN4o/kTvxSGxFp9 DE9S+soik1w0p5Wq4ot6+xmq0LkOsfACEmrT6/HIlrPe9nRH+wPx8PC53LpHHlDV vvUbRI3Z6LpVVzSc5+/kTomyKfTNMxPZnUoywlz8boe8S/yS7/b1g/+omNpa2ILc 4BOgd3nQElfvurH4Eh1YjcEnvJl8NRR6ahsvFYdr3OJY6rSSBvWWoPSRyUx9wQhl Iw== Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 4d4dnrmmaq-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Wed, 25 Mar 2026 15:30:43 -0400 (EDT) Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-5093787e2fdso12523341cf.2 for ; Wed, 25 Mar 2026 12:30:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774467043; x=1775071843; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=S7NOoLpQC1gt8jjGEOFARMhStqFULmtQPTxyzd8Tw9E=; b=cOA8w+bGFMJqWa/LfCcntOmPBC7z/CiEJlyE8PhcL4QIff7Vz4XnVQCWJeGTBLYB1O Q7nptK8YHu+mD7l6d5U+dxiTseoYPQkQCuw1UHGfzNL+2pXn3bwByrQMwYIxe9hT2IJq c8dMQHbX86GjFEyEeZccOfvW7LLDS8MuOVuVOnTQejTp9HVwqwRoY5MVTxDBuI6gKyRi jMHH5DpKmooDwgnxpeTvlUFKGiCsBL9lO1xY30/06+fjXUgQ26l2CHf4TEw+5z9/5yPH TpPemEXeXFvm5iUsiPTXsJvwGd4FEJj86K7JCefJ2uJ1Hka1BkpIBS/x60MGUWQP+hoy IyfQ== X-Forwarded-Encrypted: i=1; AJvYcCVkltIZOjTII+jBtjaZRQrkYcw8ePW2r0IeFPE1/5il7KEwwN3MDWPMOFddDQSn0UVXBqTXwsrs+iaswUM=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9TulKc4VoM3MnHiZ7JnvFV7bQNVXn40DC+c83weYIsIq63IKp CseaN+MVSeep85pSNgrRtfqaI7u1sP0R3GS9uZbPzvn2+XTVBU6oNx18pYVuKJjyjkfOTph1gGG mEuwsTwzTwjtMe9yY/mQJySjSueJ3pJRVKxk1xPDLFWbp8C46i7dJ0t469fjFRw== X-Gm-Gg: ATEYQzwuJQENWmWxX/eOOJvlA214qcrSEYVLrR2vkuXWoLONwnpGm7blBikSGroFq9y BpwZdKUWLWq6HhQbYIut5arE1AwdeXzu6steB8d+6GJc/uySf3/9sVH6XQqiYfDEwbmSC+ym7m1 wbUkpuRP2YQAZLS4erPalWPkNqZxv4MjJ5JjQgdp2KonRfTVULORNnYwoIDa3j7qE1toK/paKlq Iwxx4nP4VfXpvy+r4WtKAWVfrOeg/cS/3NtOHtBEJrYwhqAhpW1QT4GcORRpfNosNjN79wyopHq DP9N/DGEVsxnTgnmVvG6UmS9BQ5CchnZ7DAUN+UspQW41ndoOzKB4kDD5xpEIT3LE0zJW2LkAtv xz1fddajQ9D3dYVUOdUoPQjmjmFfKHHC2N7GsdA== X-Received: by 2002:a05:622a:4d4d:b0:509:23c5:328f with SMTP id d75a77b69052e-50b80e66e14mr71159941cf.54.1774467042898; Wed, 25 Mar 2026 12:30:42 -0700 (PDT) X-Received: by 2002:a05:622a:4d4d:b0:509:23c5:328f with SMTP id d75a77b69052e-50b80e66e14mr71159101cf.54.1774467042206; Wed, 25 Mar 2026 12:30:42 -0700 (PDT) Received: from [127.0.1.1] ([129.236.226.199]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-50b9234e3a3sm5534221cf.19.2026.03.25.12.30.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Mar 2026 12:30:41 -0700 (PDT) From: Tal Zussman Date: Wed, 25 Mar 2026 14:43:02 -0400 Subject: [PATCH RFC v4 3/3] block: enable RWF_DONTCACHE for block devices Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260325-blk-dontcache-v4-3-c4b56db43f64@columbia.edu> References: <20260325-blk-dontcache-v4-0-c4b56db43f64@columbia.edu> In-Reply-To: <20260325-blk-dontcache-v4-0-c4b56db43f64@columbia.edu> To: Jens Axboe , "Matthew Wilcox (Oracle)" , Christian Brauner , "Darrick J. Wong" , Carlos Maiolino , Alexander Viro , Jan Kara Cc: Christoph Hellwig , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Tal Zussman X-Mailer: b4 0.14.3-dev-d7477 X-Developer-Signature: v=1; a=ed25519-sha256; t=1774464193; l=4833; i=tz2294@columbia.edu; s=20250528; h=from:subject:message-id; bh=vqYjVe4t0i/AiFArpSZ9mEXLlU4rnE2ZRBBMSnozEZ0=; b=gpmN0bZuDH/LOZGTVX2LWTAbwaNFAVtx+0hCaa/06cmF7hrJjYYHDM3SkZtVNTBPzJKHCu1Ro uxYVg0q9qKMB8UfKgMIhJOiUIQQkEPxysztQm+QmC6XaBib3Uj2eKTq X-Developer-Key: i=tz2294@columbia.edu; a=ed25519; pk=BIj5KdACscEOyAC0oIkeZqLB3L94fzBnDccEooxeM5Y= X-Proofpoint-GUID: PdV94sjV4bTLrUgb6GV21sb-HleDYBkA X-Proofpoint-ORIG-GUID: PdV94sjV4bTLrUgb6GV21sb-HleDYBkA X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzI1MDE0MiBTYWx0ZWRfXz/VVIsgqp3lA 0sXGw3UUDzyEbTyH73OPPwGnm+VUYr1xSksvue+1a8QYZ3lSMaNeOIgrWpoPQTqugpOZ2uxHkXX gN1xSixSKscLrYjlNqrb6pJ4WHYGFox1B6N8EqZYFTpwlwqzSGxB2h4Bkz0cG3BhdIK8Ue/lDup 0SqgODytbFRh/SdeqcSUdAd5RU148OlBRHfvfHmzUSeZh8uinwPHXYrvXAPg+Rb/J9bWiw5InYh 0bk9QMmE9+iTSTK3ng6s1gP5QmYCgSITn0skO/mfkn6Jh37Zlw/vJYH/6SgVKcSdDHG1ITFRNr3 yCsoND2fYaD0SvO9O27sEcsoWxZkvxGv48Q50a/cWBtgOm5z1+lAvgVFq4eDqCWB3mm17gL0do5 kbtPM9vhsR/WYpMHuH6NN5J29o2toO42u22t3oXD2wPv4DSkzWHrUTNy1c2gL3EE0jzeTH5rSPj 56C3v0dWN1XFEjeNDBA== X-Authority-Analysis: v=2.4 cv=JMw2csKb c=1 sm=1 tr=0 ts=69c437e3 cx=c_pps a=WeENfcodrlLV9YRTxbY/uA==:117 a=QOUmeeuX5y9IvSxXHa6D2A==:17 a=IkcTkHD0fZMA:10 a=Yq5XynenixoA:10 a=x7bEGLp0ZPQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=Da8U98TiO7q1upZEImrf:22 a=HpS3TJQ9O3Ob1ozEcmik:22 a=KmSSIXLuV57r0wwE8Y4A:9 a=QEXdDO2ut3YA:10 a=kacYvNCVWA4VmyqE58fU:22 X-Proofpoint-Virus-Version: vendor=nai engine=6800 definitions=11740 signatures=596818 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 impostorscore=10 suspectscore=0 adultscore=0 phishscore=0 spamscore=0 malwarescore=0 bulkscore=10 lowpriorityscore=10 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2603050001 definitions=main-2603250142 Block device buffered reads and writes already pass through filemap_read() and iomap_file_buffered_write() respectively, both of which handle IOCB_DONTCACHE. Enable RWF_DONTCACHE for block device files by setting FOP_DONTCACHE in def_blk_fops. For CONFIG_BUFFER_HEAD=3Dy paths, add block_write_begin_iocb() which threads the kiocb through so that buffer_head-based I/O can use DONTCACHE behavior. The existing block_write_begin() is preserved as a wrapper that passes a NULL iocb. Set BIO_COMPLETE_IN_TASK in submit_bh_wbc() when the folio has dropbehind so that buffer_head writeback completions get deferred to task context. CONFIG_BUFFER_HEAD=3Dn paths are handled by the previously added iomap BIO_COMPLETE_IN_TASK support. This support is useful for databases that operate on raw block devices, among other userspace applications. Signed-off-by: Tal Zussman --- block/fops.c | 5 +++-- fs/buffer.c | 22 +++++++++++++++++++--- include/linux/buffer_head.h | 3 +++ 3 files changed, 25 insertions(+), 5 deletions(-) diff --git a/block/fops.c b/block/fops.c index 4d32785b31d9..d8165f6ba71c 100644 --- a/block/fops.c +++ b/block/fops.c @@ -505,7 +505,8 @@ static int blkdev_write_begin(const struct kiocb *iocb, unsigned len, struct folio **foliop, void **fsdata) { - return block_write_begin(mapping, pos, len, foliop, blkdev_get_block); + return block_write_begin_iocb(iocb, mapping, pos, len, foliop, + blkdev_get_block); } =20 static int blkdev_write_end(const struct kiocb *iocb, @@ -967,7 +968,7 @@ const struct file_operations def_blk_fops =3D { .splice_write =3D iter_file_splice_write, .fallocate =3D blkdev_fallocate, .uring_cmd =3D blkdev_uring_cmd, - .fop_flags =3D FOP_BUFFER_RASYNC, + .fop_flags =3D FOP_BUFFER_RASYNC | FOP_DONTCACHE, }; =20 static __init int blkdev_init(void) diff --git a/fs/buffer.c b/fs/buffer.c index ed724a902657..c60c0ad6cc35 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2239,14 +2239,19 @@ EXPORT_SYMBOL(block_commit_write); * * The filesystem needs to handle block truncation upon failure. */ -int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, +int block_write_begin_iocb(const struct kiocb *iocb, + struct address_space *mapping, loff_t pos, unsigned len, struct folio **foliop, get_block_t *get_block) { pgoff_t index =3D pos >> PAGE_SHIFT; + fgf_t fgp_flags =3D FGP_WRITEBEGIN; struct folio *folio; int status; =20 - folio =3D __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + if (iocb && iocb->ki_flags & IOCB_DONTCACHE) + fgp_flags |=3D FGP_DONTCACHE; + + folio =3D __filemap_get_folio(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) return PTR_ERR(folio); @@ -2261,6 +2266,13 @@ int block_write_begin(struct address_space *mapping,= loff_t pos, unsigned len, *foliop =3D folio; return status; } + +int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, + struct folio **foliop, get_block_t *get_block) +{ + return block_write_begin_iocb(NULL, mapping, pos, len, foliop, + get_block); +} EXPORT_SYMBOL(block_write_begin); =20 int block_write_end(loff_t pos, unsigned len, unsigned copied, @@ -2589,7 +2601,8 @@ int cont_write_begin(const struct kiocb *iocb, struct= address_space *mapping, (*bytes)++; } =20 - return block_write_begin(mapping, pos, len, foliop, get_block); + return block_write_begin_iocb(iocb, mapping, pos, len, foliop, + get_block); } EXPORT_SYMBOL(cont_write_begin); =20 @@ -2801,6 +2814,9 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffe= r_head *bh, =20 bio =3D bio_alloc(bh->b_bdev, 1, opf, GFP_NOIO); =20 + if (folio_test_dropbehind(bh->b_folio)) + bio_set_flag(bio, BIO_COMPLETE_IN_TASK); + fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO); =20 bio->bi_iter.bi_sector =3D bh->b_blocknr * (bh->b_size >> 9); diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index b16b88bfbc3e..ddf88ce290f2 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -260,6 +260,9 @@ int block_read_full_folio(struct folio *, get_block_t *= ); bool block_is_partially_uptodate(struct folio *, size_t from, size_t count= ); int block_write_begin(struct address_space *mapping, loff_t pos, unsigned = len, struct folio **foliop, get_block_t *get_block); +int block_write_begin_iocb(const struct kiocb *iocb, + struct address_space *mapping, loff_t pos, unsigned len, + struct folio **foliop, get_block_t *get_block); int __block_write_begin(struct folio *folio, loff_t pos, unsigned len, get_block_t *get_block); int block_write_end(loff_t pos, unsigned len, unsigned copied, struct foli= o *); --=20 2.39.5