From nobody Mon Feb 9 00:42:03 2026 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6629B3EA73 for ; Wed, 13 Mar 2024 11:06:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327967; cv=none; b=WpOAC/Gn1So1H3MVJrEPaSC4vIZ6/WCM5aa7AQtB7pam/TfKlKCo18gFyrQtbiiYoEVnnirXLsZAzEAqy/6ZCmOqMP8YgiUvDm+gemsNddgK2w4524WJq3bUnavo4g/t1qr7ajcfT/ZAyfpfy217SD1+4Lmj0JvgpXB1nVj3S1k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327967; c=relaxed/simple; bh=1rhmU2NQyW+9Rdpg1mV7M0CtrPXezOlGfXhfyCqegsA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=n8au22ulq0YgrltvXWa7iC7+cu5irbuDKc/kv3L51rB+T092N8TDB6cEM3kreGbd2YY0czaP/QYEk2mNSvUqEVPWlWGajdyExcD/l83qknUL/PmB4NRf0wlyY7+ci+D6vnNAVKR1qE7oSkHhRigFIusyy8NWZ8dIdP+aPHsDXlA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk; spf=none smtp.mailfrom=metaspace.dk; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b=C8cxM58E; arc=none smtp.client-ip=209.85.208.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b="C8cxM58E" Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5683089d4bbso6861056a12.2 for ; Wed, 13 Mar 2024 04:06:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20230601.gappssmtp.com; s=20230601; t=1710327960; x=1710932760; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sp/Ey74sP9Gtfb+86ib35pCPiq2qRX9gsOvaZhkhKKA=; b=C8cxM58ET92KJco+um6lej3AGihHDjjke0XvYfqbYBRO9Q7Ect1nNbYas8+ZKom3Xx sZnzk0paWMtBXycL+haJhnPEEzMquLt50qw+nKVDKqYNGSU8sLiTmSNdSFILhAf0w1qW xf2e7t8NxmdCQviwBXmcVrTQmPNjJ2vf5XLkaQZo7tFQ/0YGJPeKBFKNpA6h6qabFOAM Fqu9d0UWRNWUhNEi7jZnDAG+WDB/45nrmAQt14/AD2yw3W2LLfFbQNFJ6XfA/AXaL86H 3sJpUkkna7Nk5/Bw+ViKIjxm+bon4LxBqDeceMMwMJTiRTxJEWhh/YVBZQNnyq4DHZup doKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710327960; x=1710932760; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sp/Ey74sP9Gtfb+86ib35pCPiq2qRX9gsOvaZhkhKKA=; b=IO/Y3ho4BkhZoFCynhpiq2x6UpnfZ2VF6hY0d9WxNDvuTpvAEcQEue6r6X1GDYPeeK WytCSQwm1mwdMPhcfGmmI0T/B5ZLuzesTEBU1D2n0FRqpgLZcwjcquPRNGe9G+aEVzbn +GCC0dKYWdeaM1iVWA0+RpGVQ0y2UlI6fOEM78uRabGjBPXtWg1BJQutgBCw7QeIKKWc +v2DbGQju459SmtFTQpdGgSxVSRdfevy36PEnA77Z2uOEBPmyVFAGLkts8nvAR5lID2S l9HDw/uoNfRySoS7JWi7wcWc3PrfSQePvYT4/oJFUmXQ4kMfk5kaxrKJh/PK1aoV6taq 2RXQ== X-Forwarded-Encrypted: i=1; AJvYcCVXSQdlEnGMcKiuc3oZvP7mSdEMMyWa3OUuJgPG2KyIhoH1K9q14JFpWca8BHwOQh6fGjejeOZlfzmxUtGwoSBWZ2O5IM+FYyidrOJt X-Gm-Message-State: AOJu0Yz4NWMdwOCc23W/vSATCp+1qa6qfTvwfTIEKtnJ1AayGKNDdvQJ gZmjLvCu8hStkggCH03pjCFKjmkSCkJI5okDzoRXGmaS8UXKVCIkJtZtqra2xkU= X-Google-Smtp-Source: AGHT+IG7C2yM2s7J6HGytUSWEztMufd0QwrIXAkcdhYXn4o1QB2jm07z/FOgeJrXEorUA5uZtOgSXA== X-Received: by 2002:a17:906:7f92:b0:a46:651f:8e7f with SMTP id f18-20020a1709067f9200b00a46651f8e7fmr223410ejr.38.1710327959719; Wed, 13 Mar 2024 04:05:59 -0700 (PDT) Received: from localhost ([79.142.230.34]) by smtp.gmail.com with ESMTPSA id n18-20020a1709061d1200b00a46478fbbbesm1376173ejh.153.2024.03.13.04.05.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 04:05:59 -0700 (PDT) From: Andreas Hindborg To: Jens Axboe , Christoph Hellwig , Keith Busch , Damien Le Moal , Bart Van Assche , Hannes Reinecke , "linux-block@vger.kernel.org" Cc: Andreas Hindborg , Wedson Almeida Filho , Niklas Cassel , Greg KH , Matthew Wilcox , Miguel Ojeda , Alex Gaynor , Boqun Feng , Gary Guo , =?UTF-8?q?Bj=C3=B6rn=20Roy=20Baron?= , Benno Lossin , Alice Ryhl , Chaitanya Kulkarni , Luis Chamberlain , Yexuan Yang <1182282462@bupt.edu.cn>, =?UTF-8?q?Sergio=20Gonz=C3=A1lez=20Collado?= , Joel Granados , "Pankaj Raghav (Samsung)" , Daniel Gomez , open list , "rust-for-linux@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , "gost.dev@samsung.com" Subject: [RFC PATCH 1/5] rust: block: introduce `kernel::block::mq` module Date: Wed, 13 Mar 2024 12:05:08 +0100 Message-ID: <20240313110515.70088-2-nmi@metaspace.dk> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240313110515.70088-1-nmi@metaspace.dk> References: <20240313110515.70088-1-nmi@metaspace.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Andreas Hindborg Add initial abstractions for working with blk-mq. This patch is a maintained, refactored subset of code originally published = by Wedson Almeida Filho [1]. [1] https://github.com/wedsonaf/linux/tree/f2cfd2fe0e2ca4e90994f96afe268bbd= 4382a891/rust/kernel/blk/mq.rs Cc: Wedson Almeida Filho Signed-off-by: Andreas Hindborg --- block/blk-mq.c | 3 +- include/linux/blk-mq.h | 1 + rust/bindings/bindings_helper.h | 2 + rust/helpers.c | 45 ++++ rust/kernel/block.rs | 5 + rust/kernel/block/mq.rs | 131 +++++++++++ rust/kernel/block/mq/gen_disk.rs | 174 +++++++++++++++ rust/kernel/block/mq/operations.rs | 346 +++++++++++++++++++++++++++++ rust/kernel/block/mq/raw_writer.rs | 60 +++++ rust/kernel/block/mq/request.rs | 182 +++++++++++++++ rust/kernel/block/mq/tag_set.rs | 117 ++++++++++ rust/kernel/error.rs | 5 + rust/kernel/lib.rs | 1 + 13 files changed, 1071 insertions(+), 1 deletion(-) create mode 100644 rust/kernel/block.rs create mode 100644 rust/kernel/block/mq.rs create mode 100644 rust/kernel/block/mq/gen_disk.rs create mode 100644 rust/kernel/block/mq/operations.rs create mode 100644 rust/kernel/block/mq/raw_writer.rs create mode 100644 rust/kernel/block/mq/request.rs create mode 100644 rust/kernel/block/mq/tag_set.rs diff --git a/block/blk-mq.c b/block/blk-mq.c index 2dc01551e27c..a531f664bee7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -702,7 +702,7 @@ static void blk_mq_finish_request(struct request *rq) } } =20 -static void __blk_mq_free_request(struct request *rq) +void __blk_mq_free_request(struct request *rq) { struct request_queue *q =3D rq->q; struct blk_mq_ctx *ctx =3D rq->mq_ctx; @@ -722,6 +722,7 @@ static void __blk_mq_free_request(struct request *rq) blk_mq_sched_restart(hctx); blk_queue_exit(q); } +EXPORT_SYMBOL_GPL(__blk_mq_free_request); =20 void blk_mq_free_request(struct request *rq) { diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 7a8150a5f051..842bb88e6e78 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -703,6 +703,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set, unsigned int set_flags); void blk_mq_free_tag_set(struct blk_mq_tag_set *set); =20 +void __blk_mq_free_request(struct request *rq); void blk_mq_free_request(struct request *rq); int blk_rq_poll(struct request *rq, struct io_comp_batch *iob, unsigned int poll_flags); diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helpe= r.h index f8e54d398c19..df18acb229d9 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -7,6 +7,8 @@ */ =20 #include +#include +#include #include #include #include diff --git a/rust/helpers.c b/rust/helpers.c index 66411845536e..017fa90366e6 100644 --- a/rust/helpers.c +++ b/rust/helpers.c @@ -21,6 +21,9 @@ */ =20 #include +#include +#include +#include #include #include #include @@ -242,6 +245,30 @@ void *rust_helper_kmap_local_folio(struct folio *folio= , size_t offset) } EXPORT_SYMBOL_GPL(rust_helper_kmap_local_folio); =20 +struct bio_vec rust_helper_req_bvec(struct request *rq) +{ + return req_bvec(rq); +} +EXPORT_SYMBOL_GPL(rust_helper_req_bvec); + +void *rust_helper_blk_mq_rq_to_pdu(struct request *rq) +{ + return blk_mq_rq_to_pdu(rq); +} +EXPORT_SYMBOL_GPL(rust_helper_blk_mq_rq_to_pdu); + +struct request *rust_helper_blk_mq_rq_from_pdu(void* pdu) { + return blk_mq_rq_from_pdu(pdu); +} +EXPORT_SYMBOL_GPL(rust_helper_blk_mq_rq_from_pdu); + +void rust_helper_bio_advance_iter_single(const struct bio *bio, + struct bvec_iter *iter, + unsigned int bytes) { + bio_advance_iter_single(bio, iter, bytes); +} +EXPORT_SYMBOL_GPL(rust_helper_bio_advance_iter_single); + void *rust_helper_kmap(struct page *page) { return kmap(page); @@ -306,6 +333,24 @@ int rust_helper_xa_err(void *entry) } EXPORT_SYMBOL_GPL(rust_helper_xa_err); =20 +bool rust_helper_req_ref_inc_not_zero(struct request *req) +{ + return atomic_inc_not_zero(&req->ref); +} +EXPORT_SYMBOL_GPL(rust_helper_req_ref_inc_not_zero); + +bool rust_helper_req_ref_put_and_test(struct request *req) +{ + return atomic_dec_and_test(&req->ref); +} +EXPORT_SYMBOL_GPL(rust_helper_req_ref_put_and_test); + +void rust_helper_blk_mq_free_request_internal(struct request *req) +{ + __blk_mq_free_request(req); +} +EXPORT_SYMBOL_GPL(rust_helper_blk_mq_free_request_internal); + /* * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can * use it in contexts where Rust expects a `usize` like slice (array) indi= ces. diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs new file mode 100644 index 000000000000..4c93317a568a --- /dev/null +++ b/rust/kernel/block.rs @@ -0,0 +1,5 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Types for working with the block layer + +pub mod mq; diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs new file mode 100644 index 000000000000..08de1cc114ff --- /dev/null +++ b/rust/kernel/block/mq.rs @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! This module provides types for implementing block drivers that interfa= ce the +//! blk-mq subsystem. +//! +//! To implement a block device driver, a Rust module must do the followin= g: +//! +//! - Implement [`Operations`] for a type `T` +//! - Create a [`TagSet`] +//! - Create a [`GenDisk`], passing in the `TagSet` reference +//! - Add the disk to the system by calling [`GenDisk::add`] +//! +//! The types available in this module that have direct C counterparts are: +//! +//! - The `TagSet` type that abstracts the C type `struct tag_set`. +//! - The `GenDisk` type that abstracts the C type `struct gendisk`. +//! - The `Request` type that abstracts the C type `struct request`. +//! +//! Many of the C types that this module abstracts allow a driver to carry +//! private data, either embedded in the stuct directly, or as a C `void*`= . In +//! these abstractions, this data is typed. The types of the data is defin= ed by +//! associated types in `Operations`, see [`Operations::RequestData`] for = an +//! example. +//! +//! The kernel will interface with the block evice driver by calling the m= ethod +//! implementations of the `Operations` trait. +//! +//! IO requests are passed to the driver as [`Request`] references. The +//! `Request` type is a wrapper around the C `struct request`. The driver = must +//! mark start of request processing by calling [`Request::start`] and end= of +//! processing by calling one of the [`Request::end`], methods. Failure to= do so +//! can lead to IO failures. +//! +//! The `TagSet` is responsible for creating and maintaining a mapping bet= ween +//! `Request`s and integer ids as well as carrying a pointer to the vtable +//! generated by `Operations`. This mapping is useful for associating +//! completions from hardware with the correct `Request` instance. The `Ta= gSet` +//! determines the maximum queue depth by setting the number of `Request` +//! instances available to the driver, and it determines the number of que= ues to +//! instantiate for the driver. If possible, a driver should allocate one = queue +//! per core, to keep queue data local to a core. +//! +//! One `TagSet` instance can be shared between multiple `GenDisk` instanc= es. +//! This can be useful when implementing drivers where one piece of hardwa= re +//! with one set of IO resources are represented to the user as multiple d= isks. +//! +//! One significant difference between block device drivers implemented wi= th +//! these Rust abstractions and drivers implemented in C, is that the Rust +//! drivers have to own a reference count on the `Request` type when the I= O is +//! in flight. This is to ensure that the C `struct request` instances bac= king +//! the Rust `Request` instances are live while the Rust driver holds a +//! reference to the `Request`. In addition, the conversion of an ineger t= ag to +//! a `Request` via the `TagSet` would not be sound without this bookkeepi= ng. +//! +//! # =E2=9A=A0 Note +//! +//! For Rust block device drivers, the point in time where a request +//! is freed and made available for recycling is usualy at the point in ti= me +//! when the last `ARef` is dropped. For C drivers, this event us= ually +//! occurs when `bindings::blk_mq_end_request` is called. +//! +//! # Example +//! +//! ```rust +//! use kernel::{ +//! block::mq::*, +//! new_mutex, +//! prelude::*, +//! sync::{Arc, Mutex}, +//! types::{ARef, ForeignOwnable}, +//! }; +//! +//! struct MyBlkDevice; +//! +//! #[vtable] +//! impl Operations for MyBlkDevice { +//! type RequestData =3D (); +//! type RequestDataInit =3D impl PinInit<()>; +//! type QueueData =3D (); +//! type HwData =3D (); +//! type TagSetData =3D (); +//! +//! fn new_request_data( +//! _tagset_data: ::Borrowed<'= _>, +//! ) -> Self::RequestDataInit { +//! kernel::init::zeroed() +//! } +//! +//! fn queue_rq(_hw_data: (), _queue_data: (), rq: ARef>= , _is_last: bool) -> Result { +//! rq.start(); +//! rq.end_ok(); +//! Ok(()) +//! } +//! +//! fn commit_rqs( +//! _hw_data: ::Borrowed<'_>, +//! _queue_data: ::Borrowed<'_>, +//! ) { +//! } +//! +//! fn complete(rq: &Request) { +//! rq.end_ok(); +//! } +//! +//! fn init_hctx( +//! _tagset_data: ::Borrowed<'= _>, +//! _hctx_idx: u32, +//! ) -> Result { +//! Ok(()) +//! } +//! } +//! +//! let tagset: Arc> =3D Arc::pin_init(TagSet::try_new= (1, (), 256, 1))?; +//! let mut disk =3D GenDisk::try_new(tagset, ())?; +//! disk.set_name(format_args!("myblk"))?; +//! disk.set_capacity_sectors(4096); +//! disk.add()?; +//! +//! # Ok::<(), kernel::error::Error>(()) +//! ``` + +mod gen_disk; +mod operations; +mod raw_writer; +mod request; +mod tag_set; + +pub use gen_disk::GenDisk; +pub use operations::Operations; +pub use request::{Request, RequestDataRef}; +pub use tag_set::TagSet; diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_di= sk.rs new file mode 100644 index 000000000000..b7845fc9e39f --- /dev/null +++ b/rust/kernel/block/mq/gen_disk.rs @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Generic disk abstraction. +//! +//! C header: [`include/linux/blkdev.h`](srctree/include/linux/blkdev.h) +//! C header: [`include/linux/blk_mq.h`](srctree/include/linux/blk_mq.h) + +use crate::block::mq::{raw_writer::RawWriter, Operations, TagSet}; +use crate::{ + bindings, error::from_err_ptr, error::Result, sync::Arc, types::Foreig= nOwnable, + types::ScopeGuard, +}; +use core::fmt::{self, Write}; + +/// A generic block device +/// +/// # Invariants +/// +/// - `gendisk` must always point to an initialized and valid `struct gen= disk`. +pub struct GenDisk { + _tagset: Arc>, + gendisk: *mut bindings::gendisk, +} + +// SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc= ` to a +// `TagSet` It is safe to send this to other threads as long as T is Send. +unsafe impl Send for GenDisk {} + +impl GenDisk { + /// Try to create a new `GenDisk` + pub fn try_new(tagset: Arc>, queue_data: T::QueueData) -> Re= sult { + let data =3D queue_data.into_foreign(); + let recover_data =3D ScopeGuard::new(|| { + // SAFETY: T::QueueData was created by the call to `into_forei= gn()` above + unsafe { T::QueueData::from_foreign(data) }; + }); + + let lock_class_key =3D crate::sync::LockClassKey::new(); + + // SAFETY: `tagset.raw_tag_set()` points to a valid and initialize= d tag set + let gendisk =3D from_err_ptr(unsafe { + bindings::__blk_mq_alloc_disk( + tagset.raw_tag_set(), + data.cast_mut(), + lock_class_key.as_ptr(), + ) + })?; + + const TABLE: bindings::block_device_operations =3D bindings::block= _device_operations { + submit_bio: None, + open: None, + release: None, + ioctl: None, + compat_ioctl: None, + check_events: None, + unlock_native_capacity: None, + getgeo: None, + set_read_only: None, + swap_slot_free_notify: None, + report_zones: None, + devnode: None, + alternative_gpt_sector: None, + get_unique_id: None, + // TODO: Set to THIS_MODULE. Waiting for const_refs_to_static = feature to be merged + // https://github.com/rust-lang/rust/issues/119618 + owner: core::ptr::null_mut(), + pr_ops: core::ptr::null_mut(), + free_disk: None, + poll_bio: None, + }; + + // SAFETY: gendisk is a valid pointer as we initialized it above + unsafe { (*gendisk).fops =3D &TABLE }; + + recover_data.dismiss(); + Ok(Self { + _tagset: tagset, + gendisk, + }) + } + + /// Set the name of the device + pub fn set_name(&mut self, args: fmt::Arguments<'_>) -> Result { + let mut raw_writer =3D RawWriter::from_array( + // SAFETY: By type invariant `self.gendisk` points to a valid = and initialized instance + unsafe { &mut (*self.gendisk).disk_name }, + ); + raw_writer.write_fmt(args)?; + raw_writer.write_char('\0')?; + Ok(()) + } + + /// Register the device with the kernel. When this function return, the + /// device is accessible from VFS. The kernel may issue reads to the d= evice + /// during registration to discover partition infomation. + pub fn add(&self) -> Result { + crate::error::to_result( + // SAFETY: By type invariant, `self.gendisk` points to a valid= and + // initialized instance of `struct gendisk` + unsafe { + bindings::device_add_disk( + core::ptr::null_mut(), + self.gendisk, + core::ptr::null_mut(), + ) + }, + ) + } + + /// Call to tell the block layer the capacity of the device in sectors= (512B) + pub fn set_capacity_sectors(&self, sectors: u64) { + // SAFETY: By type invariant, `self.gendisk` points to a valid and + // initialized instance of `struct gendisk` + unsafe { bindings::set_capacity(self.gendisk, sectors) }; + } + + /// Set the logical block size of the device. + /// + /// This is the smallest unit the storage device can address. It is + /// typically 512 bytes. + pub fn set_queue_logical_block_size(&self, size: u32) { + // SAFETY: By type invariant, `self.gendisk` points to a valid and + // initialized instance of `struct gendisk` + unsafe { bindings::blk_queue_logical_block_size((*self.gendisk).qu= eue, size) }; + } + + /// Set the physical block size of the device. + /// + /// This is the smallest unit a physical storage device can write + /// atomically. It is usually the same as the logical block size but m= ay be + /// bigger. One example is SATA drives with 4KB sectors that expose a + /// 512-byte logical block size to the operating system. + pub fn set_queue_physical_block_size(&self, size: u32) { + // SAFETY: By type invariant, `self.gendisk` points to a valid and + // initialized instance of `struct gendisk` + unsafe { bindings::blk_queue_physical_block_size((*self.gendisk).q= ueue, size) }; + } + + /// Set the rotational media attribute for the device + pub fn set_rotational(&self, rotational: bool) { + if !rotational { + // SAFETY: By type invariant, `self.gendisk` points to a valid= and + // initialized instance of `struct gendisk` + unsafe { + bindings::blk_queue_flag_set(bindings::QUEUE_FLAG_NONROT, = (*self.gendisk).queue) + }; + } else { + // SAFETY: By type invariant, `self.gendisk` points to a valid= and + // initialized instance of `struct gendisk` + unsafe { + bindings::blk_queue_flag_clear(bindings::QUEUE_FLAG_NONROT= , (*self.gendisk).queue) + }; + } + } +} + +impl Drop for GenDisk { + fn drop(&mut self) { + // SAFETY: By type invariant of `Self`, `self.gendisk` points to a= valid + // and initialized instance of `struct gendisk`. As such, `queueda= ta` + // was initialized by the initializer returned by `try_new` with a= call + // to `ForeignOwnable::into_foreign`. + let queue_data =3D unsafe { (*(*self.gendisk).queue).queuedata }; + + // SAFETY: By type invariant, `self.gendisk` points to a valid and + // initialized instance of `struct gendisk` + unsafe { bindings::del_gendisk(self.gendisk) }; + + // SAFETY: `queue.queuedata` was created by `GenDisk::try_new()` w= ith a + // call to `ForeignOwnable::into_pointer()` to create `queuedata`. + // `ForeignOwnable::from_foreign()` is only called here. + let _queue_data =3D unsafe { T::QueueData::from_foreign(queue_data= ) }; + } +} diff --git a/rust/kernel/block/mq/operations.rs b/rust/kernel/block/mq/oper= ations.rs new file mode 100644 index 000000000000..53c6ad663208 --- /dev/null +++ b/rust/kernel/block/mq/operations.rs @@ -0,0 +1,346 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! This module provides an interface for blk-mq drivers to implement. +//! +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h) + +use crate::{ + bindings, + block::mq::Request, + error::{from_result, Result}, + init::PinInit, + types::{ARef, ForeignOwnable}, +}; +use core::{marker::PhantomData, ptr::NonNull}; + +use super::TagSet; + +/// Implement this trait to interface blk-mq as block devices +#[macros::vtable] +pub trait Operations: Sized { + /// Data associated with a request. This data is located next to the r= equest + /// structure. + /// + /// To be able to handle accessing this data from interrupt context, t= his + /// data must be `Sync`. + type RequestData: Sized + Sync; + + /// Initializer for `Self::RequestDta`. Used to initialize private dat= a area + /// when requst structure is allocated. + type RequestDataInit: PinInit; + + /// Data associated with the `struct request_queue` that is allocated = for + /// the `GenDisk` associated with this `Operations` implementation. + type QueueData: ForeignOwnable; + + /// Data associated with a dispatch queue. This is stored as a pointer= in + /// the C `struct blk_mq_hw_ctx` that represents a hardware queue. + type HwData: ForeignOwnable; + + /// Data associated with a `TagSet`. This is stored as a pointer in `s= truct + /// blk_mq_tag_set`. + type TagSetData: ForeignOwnable; + + /// Called by the kernel to get an initializer for a `Pin<&mut Request= Data>`. + fn new_request_data( + //rq: ARef>, + tagset_data: ::Borrowed<'_>, + ) -> Self::RequestDataInit; + + /// Called by the kernel to queue a request with the driver. If `is_la= st` is + /// `false`, the driver is allowed to defer commiting the request. + fn queue_rq( + hw_data: ::Borrowed<'_>, + queue_data: ::Borrowed<'_>, + rq: ARef>, + is_last: bool, + ) -> Result; + + /// Called by the kernel to indicate that queued requests should be su= bmitted + fn commit_rqs( + hw_data: ::Borrowed<'_>, + queue_data: ::Borrowed<'_>, + ); + + /// Called by the kernel when the request is completed + fn complete(_rq: &Request); + + /// Called by the kernel to allocate and initialize a driver specific = hardware context data + fn init_hctx( + tagset_data: ::Borrowed<'_>, + hctx_idx: u32, + ) -> Result; + + /// Called by the kernel to poll the device for completed requests. On= ly + /// used for poll queues. + fn poll(_hw_data: ::Borrowed<'_>) -> b= ool { + crate::build_error(crate::error::VTABLE_DEFAULT_ERROR) + } + + /// Called by the kernel to map submission queues to CPU cores. + fn map_queues(_tag_set: &TagSet) { + crate::build_error(crate::error::VTABLE_DEFAULT_ERROR) + } + + // There is no need for exit_request() because `drop` will be called. +} + +/// A vtable for blk-mq to interact with a block device driver. +/// +/// A `bindings::blk_mq_opa` vtable is constructed from pointers to the `e= xtern +/// "C"` functions of this struct, exposed through the `OperationsVTable::= VTABLE`. +/// +/// For general documentation of these methods, see the kernel source +/// documentation related to `struct blk_mq_operations` in +/// [`include/linux/blk-mq.h`]. +/// +/// [`include/linux/blk-mq.h`]: srctree/include/linux/blk-mq.h +pub(crate) struct OperationsVTable(PhantomData); + +impl OperationsVTable { + // # Safety + // + // - The caller of this function must ensure that `hctx` and `bd` are = valid + // and initialized. The pointees must outlive this function. + // - `hctx->driver_data` must be a pointer created by a call to + // `Self::init_hctx_callback()` and the pointee must outlive this + // function. + // - This function must not be called with a `hctx` for which + // `Self::exit_hctx_callback()` has been called. + // - (*bd).rq must point to a valid `bindings:request` with a positive= refcount in the `ref` field. + unsafe extern "C" fn queue_rq_callback( + hctx: *mut bindings::blk_mq_hw_ctx, + bd: *const bindings::blk_mq_queue_data, + ) -> bindings::blk_status_t { + // SAFETY: `bd` is valid as required by the safety requirement for= this + // function. + let request_ptr =3D unsafe { (*bd).rq }; + + // SAFETY: By C API contract, the pointee of `request_ptr` is vali= d and has a refcount of 1 + #[cfg_attr(not(CONFIG_DEBUG_MISC), allow(unused_variables))] + let updated =3D unsafe { bindings::req_ref_inc_not_zero(request_pt= r) }; + + #[cfg(CONFIG_DEBUG_MISC)] + if !updated { + crate::pr_err!("Request ref was zero at queue time\n"); + } + + let rq =3D + // SAFETY: We own a refcount that we took above. We pass that = to + // `ARef`. + unsafe { ARef::from_raw(NonNull::new_unchecked(request_ptr.cas= t::>())) }; + + // SAFETY: The safety requirement for this function ensure that `h= ctx` + // is valid and that `driver_data` was produced by a call to + // `into_foreign` in `Self::init_hctx_callback`. + let hw_data =3D unsafe { T::HwData::borrow((*hctx).driver_data) }; + + // SAFETY: `hctx` is valid as required by this function. + let queue_data =3D unsafe { (*(*hctx).queue).queuedata }; + + // SAFETY: `queue.queuedata` was created by `GenDisk::try_new()` w= ith a + // call to `ForeignOwnable::into_pointer()` to create `queuedata`. + // `ForeignOwnable::from_foreign()` is only called when the tagset= is + // dropped, which happens after we are dropped. + let queue_data =3D unsafe { T::QueueData::borrow(queue_data) }; + + let ret =3D T::queue_rq( + hw_data, + queue_data, + rq, + // SAFETY: `bd` is valid as required by the safety requirement= for this function. + unsafe { (*bd).last }, + ); + if let Err(e) =3D ret { + e.to_blk_status() + } else { + bindings::BLK_STS_OK as _ + } + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. The c= aller + /// must ensure that `hctx` is valid. + unsafe extern "C" fn commit_rqs_callback(hctx: *mut bindings::blk_mq_h= w_ctx) { + // SAFETY: `driver_data` was installed by us in `init_hctx_callbac= k` as + // the result of a call to `into_foreign`. + let hw_data =3D unsafe { T::HwData::borrow((*hctx).driver_data) }; + + // SAFETY: `hctx` is valid as required by this function. + let queue_data =3D unsafe { (*(*hctx).queue).queuedata }; + + // SAFETY: `queue.queuedata` was created by `GenDisk::try_new()` w= ith a + // call to `ForeignOwnable::into_pointer()` to create `queuedata`. + // `ForeignOwnable::from_foreign()` is only called when the tagset= is + // dropped, which happens after we are dropped. + let queue_data =3D unsafe { T::QueueData::borrow(queue_data) }; + T::commit_rqs(hw_data, queue_data) + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. `rq` = must + /// point to a valid request that has been marked as completed. The po= intee + /// of `rq` must be valid for write for the duration of this function. + unsafe extern "C" fn complete_callback(rq: *mut bindings::request) { + // SAFETY: By function safety requirement `rq`is valid for write f= or the + // lifetime of the returned `Request`. + T::complete(unsafe { Request::from_ptr_mut(rq) }); + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. `hctx= ` must + /// be a pointer to a valid and aligned `struct blk_mq_hw_ctx` that was + /// previously initialized by a call to `init_hctx_callback`. + unsafe extern "C" fn poll_callback( + hctx: *mut bindings::blk_mq_hw_ctx, + _iob: *mut bindings::io_comp_batch, + ) -> core::ffi::c_int { + // SAFETY: By function safety requirement, `hctx` was initialized = by + // `init_hctx_callback` and thus `driver_data` came from a call to + // `into_foreign`. + let hw_data =3D unsafe { T::HwData::borrow((*hctx).driver_data) }; + T::poll(hw_data).into() + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. + /// `tagset_data` must be initialized by the initializer returned by + /// `TagSet::try_new` as part of tag set initialization. `hctx` must b= e a + /// pointer to a valid `blk_mq_hw_ctx` where the `driver_data` field w= as not + /// yet initialized. This function may only be called onece before + /// `exit_hctx_callback` is called for the same context. + unsafe extern "C" fn init_hctx_callback( + hctx: *mut bindings::blk_mq_hw_ctx, + tagset_data: *mut core::ffi::c_void, + hctx_idx: core::ffi::c_uint, + ) -> core::ffi::c_int { + from_result(|| { + // SAFETY: By the safety requirements of this function, + // `tagset_data` came from a call to `into_foreign` when the + // `TagSet` was initialized. + let tagset_data =3D unsafe { T::TagSetData::borrow(tagset_data= ) }; + let data =3D T::init_hctx(tagset_data, hctx_idx)?; + + // SAFETY: by the safety requirments of this function, `hctx` = is + // valid for write + unsafe { (*hctx).driver_data =3D data.into_foreign() as _ }; + Ok(0) + }) + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. `hctx= ` must + /// be a valid pointer that was previously initialized by a call to + /// `init_hctx_callback`. This function may be called only once after + /// `init_hctx_callback` was called. + unsafe extern "C" fn exit_hctx_callback( + hctx: *mut bindings::blk_mq_hw_ctx, + _hctx_idx: core::ffi::c_uint, + ) { + // SAFETY: By the safety requirements of this function, `hctx` is = valid for read. + let ptr =3D unsafe { (*hctx).driver_data }; + + // SAFETY: By the safety requirements of this function, `ptr` came= from + // a call to `into_foreign` in `init_hctx_callback` + unsafe { T::HwData::from_foreign(ptr) }; + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. `set`= must point to an initialized `TagSet`. + unsafe extern "C" fn init_request_callback( + set: *mut bindings::blk_mq_tag_set, + rq: *mut bindings::request, + _hctx_idx: core::ffi::c_uint, + _numa_node: core::ffi::c_uint, + ) -> core::ffi::c_int { + from_result(|| { + // SAFETY: The tagset invariants guarantee that all requests a= re allocated with extra memory + // for the request data. + let pdu =3D unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::(); + + // SAFETY: Because `set` is a `TagSet`, `driver_data` comes= from + // a call to `into_foregn` by the initializer returned by + // `TagSet::try_new`. + let tagset_data =3D unsafe { T::TagSetData::borrow((*set).driv= er_data) }; + + let initializer =3D T::new_request_data(tagset_data); + + // SAFETY: `pdu` is a valid pointer as established above. We d= o not + // touch `pdu` if `__pinned_init` returns an error. We promise= ot to + // move the pointee of `pdu`. + unsafe { initializer.__pinned_init(pdu)? }; + + Ok(0) + }) + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. `rq` = must + /// point to a request that was initialized by a call to + /// `Self::init_request_callback`. + unsafe extern "C" fn exit_request_callback( + _set: *mut bindings::blk_mq_tag_set, + rq: *mut bindings::request, + _hctx_idx: core::ffi::c_uint, + ) { + // SAFETY: The tagset invariants guarantee that all requests are a= llocated with extra memory + // for the request data. + let pdu =3D unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::(); + + // SAFETY: `pdu` is valid for read and write and is properly initi= alised. + unsafe { core::ptr::drop_in_place(pdu) }; + } + + /// # Safety + /// + /// This function may only be called by blk-mq C infrastructure. `tag_= set` + /// must be a pointer to a valid and initialized `TagSet`. The poin= tee + /// must be valid for use as a reference at least the duration of this= call. + unsafe extern "C" fn map_queues_callback(tag_set: *mut bindings::blk_m= q_tag_set) { + // SAFETY: The safety requirements of this function satiesfies the + // requirements of `TagSet::from_ptr`. + let tag_set =3D unsafe { TagSet::from_ptr(tag_set) }; + T::map_queues(tag_set); + } + + const VTABLE: bindings::blk_mq_ops =3D bindings::blk_mq_ops { + queue_rq: Some(Self::queue_rq_callback), + queue_rqs: None, + commit_rqs: Some(Self::commit_rqs_callback), + get_budget: None, + put_budget: None, + set_rq_budget_token: None, + get_rq_budget_token: None, + timeout: None, + poll: if T::HAS_POLL { + Some(Self::poll_callback) + } else { + None + }, + complete: Some(Self::complete_callback), + init_hctx: Some(Self::init_hctx_callback), + exit_hctx: Some(Self::exit_hctx_callback), + init_request: Some(Self::init_request_callback), + exit_request: Some(Self::exit_request_callback), + cleanup_rq: None, + busy: None, + map_queues: if T::HAS_MAP_QUEUES { + Some(Self::map_queues_callback) + } else { + None + }, + #[cfg(CONFIG_BLK_DEBUG_FS)] + show_rq: None, + }; + + pub(crate) const fn build() -> &'static bindings::blk_mq_ops { + &Self::VTABLE + } +} diff --git a/rust/kernel/block/mq/raw_writer.rs b/rust/kernel/block/mq/raw_= writer.rs new file mode 100644 index 000000000000..f7857740af29 --- /dev/null +++ b/rust/kernel/block/mq/raw_writer.rs @@ -0,0 +1,60 @@ +use core::{ + fmt::{self, Write}, + marker::PhantomData, +}; + +/// A mutable reference to a byte buffer where a string can be written into +/// +/// # Invariants +/// +/// * `ptr` is not aliased and valid for read and write for `len` bytes +/// +pub(crate) struct RawWriter<'a> { + ptr: *mut u8, + len: usize, + _p: PhantomData<&'a ()>, +} + +impl<'a> RawWriter<'a> { + /// Create a new `RawWriter` instance. + /// + /// # Safety + /// + /// * `ptr` must be valid for read and write for `len` consecutive `u8= ` elements + /// * `ptr` must not be aliased + unsafe fn new(ptr: *mut u8, len: usize) -> RawWriter<'a> { + Self { + ptr, + len, + _p: PhantomData, + } + } + + pub(crate) fn from_array(a: &'a mut [core::ffi::c_char= ; N]) -> RawWriter<'a> { + // SAFETY: the buffer of `a` is valid for read and write for at le= ast `N` bytes + unsafe { Self::new(a.as_mut_ptr().cast::(), N) } + } +} + +impl Write for RawWriter<'_> { + fn write_str(&mut self, s: &str) -> fmt::Result { + let bytes =3D s.as_bytes(); + let len =3D bytes.len(); + if len > self.len { + return Err(fmt::Error); + } + + // SAFETY: + // * `bytes` is valid for reads of `bytes.len()` size because we h= old a shared reference to `s` + // * By type invariant `self.ptr` is valid for writes for at lest = `self.len` bytes + // * The regions are not overlapping as `ptr` is not aliased + unsafe { core::ptr::copy_nonoverlapping(&bytes[0], self.ptr, len) = }; + + // SAFETY: By type invariant of `Self`, `ptr` is in bounds of an + // allocation. Also by type invariant, the pointer resulting from = this + // addition is also in bounds. + self.ptr =3D unsafe { self.ptr.add(len) }; + self.len -=3D len; + Ok(()) + } +} diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request= .rs new file mode 100644 index 000000000000..b4dacac5e091 --- /dev/null +++ b/rust/kernel/block/mq/request.rs @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! This module provides a wrapper for the C `struct request` type. +//! +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h) + +use crate::{ + bindings, + block::mq::Operations, + error::{Error, Result}, + types::{ARef, AlwaysRefCounted, Opaque}, +}; +use core::{ffi::c_void, marker::PhantomData, ops::Deref}; + +/// A wrapper around a blk-mq `struct request`. This represents an IO requ= est. +/// +/// # Invariants +/// +/// * `self.0` is a valid `struct request` created by the C portion of the= kernel +/// * `self` is reference counted. a call to `req_ref_inc_not_zero` keeps = the +/// instance alive at least until a matching call to `req_ref_put_and_t= est` +/// +#[repr(transparent)] +pub struct Request(Opaque, PhantomData); + +impl Request { + /// Create a `&mut Request` from a `bindings::request` pointer + /// + /// # Safety + /// + /// * `ptr` must be aligned and point to a valid `bindings::request` i= nstance + /// * Caller must ensure that the pointee of `ptr` is live and owned + /// exclusively by caller for at least `'a` + /// + pub(crate) unsafe fn from_ptr_mut<'a>(ptr: *mut bindings::request) -> = &'a mut Self { + // SAFETY: + // * The cast is valid as `Self` is transparent. + // * By safety requirements of this function, the reference will be + // valid for 'a. + unsafe { &mut *(ptr.cast::()) } + } + + /// Get the command identifier for the request + pub fn command(&self) -> u32 { + // SAFETY: By C API contract and type invariant, `cmd_flags` is va= lid for read + unsafe { (*self.0.get()).cmd_flags & ((1 << bindings::REQ_OP_BITS)= - 1) } + } + + /// Call this to indicate to the kernel that the request has been issu= ed by the driver + pub fn start(&self) { + // SAFETY: By type invariant, `self.0` is a valid `struct request`= . By + // existence of `&mut self` we have exclusive access. + unsafe { bindings::blk_mq_start_request(self.0.get()) }; + } + + /// Call this to indicate to the kernel that the request has been comp= leted without errors + pub fn end_ok(&self) { + // SAFETY: By type invariant, `self.0` is a valid `struct request`= . By + // existence of `&mut self` we have exclusive access. + unsafe { bindings::blk_mq_end_request(self.0.get(), bindings::BLK_= STS_OK as _) }; + } + + /// Call this to indicate to the kernel that the request completed wit= h an error + pub fn end_err(&self, err: Error) { + // SAFETY: By type invariant, `self.0` is a valid `struct request`= . By + // existence of `&mut self` we have exclusive access. + unsafe { bindings::blk_mq_end_request(self.0.get(), err.to_blk_sta= tus()) }; + } + + /// Call this to indicate that the request completed with the status i= ndicated by `status` + pub fn end(&self, status: Result) { + if let Err(e) =3D status { + self.end_err(e); + } else { + self.end_ok(); + } + } + + /// Call this to schedule defered completion of the request + pub fn complete(&self) { + // SAFETY: By type invariant, `self.0` is a valid `struct request` + if !unsafe { bindings::blk_mq_complete_request_remote(self.0.get()= ) } { + T::complete(self); + } + } + + /// Get the target sector for the request + #[inline(always)] + pub fn sector(&self) -> usize { + // SAFETY: By type invariant of `Self`, `self.0` is valid and live. + unsafe { (*self.0.get()).__sector as usize } + } + + /// Returns an owned reference to the per-request data associated with= this + /// request + pub fn owned_data_ref(request: ARef) -> RequestDataRef { + RequestDataRef::new(request) + } + + /// Returns a reference to the oer-request data associated with this r= equest + pub fn data_ref(&self) -> &T::RequestData { + let request_ptr =3D self.0.get().cast::(); + + // SAFETY: `request_ptr` is a valid `struct request` because `ARef= ` is + // `repr(transparent)` + let p: *mut c_void =3D unsafe { bindings::blk_mq_rq_to_pdu(request= _ptr) }; + + let p =3D p.cast::(); + + // SAFETY: By C API contract, `p` is initialized by a call to + // `OperationsVTable::init_request_callback()`. By existence of `&= self` + // it must be valid for use as a shared reference. + unsafe { &*p } + } +} + +// SAFETY: It is impossible to obtain an owned or mutable `Request`, so we= can +// mark it `Send`. +unsafe impl Send for Request {} + +// SAFETY: `Request` references can be shared across threads. +unsafe impl Sync for Request {} + +/// An owned reference to a `Request` +#[repr(transparent)] +pub struct RequestDataRef { + request: ARef>, +} + +impl RequestDataRef +where + T: Operations, +{ + /// Create a new instance. + fn new(request: ARef>) -> Self { + Self { request } + } + + /// Get a reference to the underlying request + pub fn request(&self) -> &Request { + &self.request + } +} + +impl Deref for RequestDataRef +where + T: Operations, +{ + type Target =3D T::RequestData; + + fn deref(&self) -> &Self::Target { + self.request.data_ref() + } +} + +// SAFETY: All instances of `Request` are reference counted. This +// implementation of `AlwaysRefCounted` ensure that increments to the ref = count +// keeps the object alive in memory at least until a matching reference co= unt +// decrement is executed. +unsafe impl AlwaysRefCounted for Request { + fn inc_ref(&self) { + // SAFETY: By type invariant `self.0` is a valid `struct reqeust` + #[cfg_attr(not(CONFIG_DEBUG_MISC), allow(unused_variables))] + let updated =3D unsafe { bindings::req_ref_inc_not_zero(self.0.get= ()) }; + #[cfg(CONFIG_DEBUG_MISC)] + if !updated { + crate::pr_err!("Request refcount zero on clone"); + } + } + + unsafe fn dec_ref(obj: core::ptr::NonNull) { + // SAFETY: By type invariant `self.0` is a valid `struct reqeust` + let zero =3D unsafe { bindings::req_ref_put_and_test(obj.as_ref().= 0.get()) }; + if zero { + // SAFETY: By type invariant of `self` we have the last refere= nce to + // `obj` and it is safe to free it. + unsafe { + bindings::blk_mq_free_request_internal(obj.as_ptr().cast::= ()) + }; + } + } +} diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set= .rs new file mode 100644 index 000000000000..7f463b7e288b --- /dev/null +++ b/rust/kernel/block/mq/tag_set.rs @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! This module provides the `TagSet` struct to wrap the C `struct blk_mq_= tag_set`. +//! +//! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h) + +use core::pin::Pin; + +use crate::{ + bindings, + block::mq::{operations::OperationsVTable, Operations}, + error::{self, Error, Result}, + prelude::PinInit, + try_pin_init, + types::{ForeignOwnable, Opaque}, +}; +use core::{convert::TryInto, marker::PhantomData}; +use macros::{pin_data, pinned_drop}; + +/// A wrapper for the C `struct blk_mq_tag_set`. +/// +/// `struct blk_mq_tag_set` contains a `struct list_head` and so must be p= inned. +#[pin_data(PinnedDrop)] +#[repr(transparent)] +pub struct TagSet { + #[pin] + inner: Opaque, + _p: PhantomData, +} + +impl TagSet { + /// Try to create a new tag set + pub fn try_new( + nr_hw_queues: u32, + tagset_data: T::TagSetData, + num_tags: u32, + num_maps: u32, + ) -> impl PinInit { + try_pin_init!( TagSet { + inner <- Opaque::try_ffi_init(move |place: *mut bindings::blk_= mq_tag_set| -> Result<()> { + + // SAFETY: try_ffi_init promises that `place` is writable,= and + // zeroes is a valid bit pattern for this structure. + unsafe { core::ptr::write_bytes(place, 0, 1) }; + + /// For a raw pointer to a struct, write a struct field wi= thout + /// creating a reference to the field + macro_rules! write_ptr_field { + ($target:ident, $field:ident, $value:expr) =3D> { + ::core::ptr::write(::core::ptr::addr_of_mut!((*$ta= rget).$field), $value) + }; + } + + // SAFETY: try_ffi_init promises that `place` is writable + unsafe { + write_ptr_field!(place, ops, OperationsVTable::::bu= ild()); + write_ptr_field!(place, nr_hw_queues , nr_hw_queues); + write_ptr_field!(place, timeout , 0); // 0 means defau= lt which is 30 * HZ in C + write_ptr_field!(place, numa_node , bindings::NUMA_NO_= NODE); + write_ptr_field!(place, queue_depth , num_tags); + write_ptr_field!(place, cmd_size , core::mem::size_of:= :().try_into()?); + write_ptr_field!(place, flags , bindings::BLK_MQ_F_SHO= ULD_MERGE); + write_ptr_field!(place, driver_data , tagset_data.into= _foreign() as _); + write_ptr_field!(place, nr_maps , num_maps); + } + + // SAFETY: Relevant fields of `place` are initialised above + let ret =3D unsafe { bindings::blk_mq_alloc_tag_set(place)= }; + if ret < 0 { + // SAFETY: We created `driver_data` above with `into_f= oreign` + unsafe { T::TagSetData::from_foreign((*place).driver_d= ata) }; + return Err(Error::from_errno(ret)); + } + + Ok(()) + }), + _p: PhantomData, + }) + } + + /// Return the pointer to the wrapped `struct blk_mq_tag_set` + pub(crate) fn raw_tag_set(&self) -> *mut bindings::blk_mq_tag_set { + self.inner.get() + } + + /// Create a `TagSet` from a raw pointer. + /// + /// # Safety + /// + /// `ptr` must be a pointer to a valid and initialized `TagSet`. Th= ere + /// may be no other mutable references to the tag set. The pointee mus= t be + /// live and valid at least for the duration of the returned lifetime = `'a`. + pub(crate) unsafe fn from_ptr<'a>(ptr: *mut bindings::blk_mq_tag_set) = -> &'a Self { + // SAFETY: By the safety requirements of this function, `ptr` is v= alid + // for use as a reference for the duration of `'a`. + unsafe { &*(ptr.cast::()) } + } +} + +#[pinned_drop] +impl PinnedDrop for TagSet { + fn drop(self: Pin<&mut Self>) { + // SAFETY: We are not moving self below + let this =3D unsafe { Pin::into_inner_unchecked(self) }; + + // SAFETY: `this.inner.get()` points to a valid `blk_mq_tag_set` a= nd + // thus is safe to dereference. + let tagset_data =3D unsafe { (*this.inner.get()).driver_data }; + + // SAFETY: `inner` is valid and has been properly initialised duri= ng construction. + unsafe { bindings::blk_mq_free_tag_set(this.inner.get()) }; + + // SAFETY: `tagset_data` was created by a call to + // `ForeignOwnable::into_foreign` in `TagSet::try_new()` + unsafe { T::TagSetData::from_foreign(tagset_data) }; + } +} diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs index 4f0c1edd63b7..c947fd631416 100644 --- a/rust/kernel/error.rs +++ b/rust/kernel/error.rs @@ -130,6 +130,11 @@ pub fn to_errno(self) -> core::ffi::c_int { self.0 } =20 + pub(crate) fn to_blk_status(self) -> bindings::blk_status_t { + // SAFETY: `self.0` is a valid error due to its invariant. + unsafe { bindings::errno_to_blk_status(self.0) } + } + /// Returns the error encoded as a pointer. #[allow(dead_code)] pub(crate) fn to_ptr(self) -> *mut T { diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index 638a68af341a..9f02a8b352e0 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -34,6 +34,7 @@ #[cfg(not(test))] #[cfg(not(testlib))] mod allocator; +pub mod block; mod build_assert; mod cache_aligned; pub mod error; --=20 2.44.0 From nobody Mon Feb 9 00:42:03 2026 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F49D45BE7 for ; Wed, 13 Mar 2024 11:06:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327973; cv=none; b=RABZFX8ufqnTk+ErqptEG/tMu63hmqkpi9xoEQyxLScdj/FmS3BEKvllhwA2SfEpafy76kJJle1yu313V3IlYkE5EeyaMpw6k83mJlx843dTW1ECyrKQTBJYoV4y7pnFYuoohbKU3t3Io5uiI/kUQVZvi4Q8N6ICMgofmVBJ7Nc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327973; c=relaxed/simple; bh=5GbuDkx/t3ca+1vsJKcxunbH6qs63waeRLqj2MYB5Ro=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m7QUBt2vsTPEqJz1VmBYC52oGvbKRcPl54MGpYKHezxNgSBhaknQb2+KH+B2wwPEubWM0Cskw0FGQPQI5R86ChuOjQIeaad1rbbgkgPkpIf2e6V6hKVo/Fc/uVWVNMjSW1359rqT9jVnKalhCH4BRNdT/sy4yCjeBrWQ6Rx/saQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk; spf=none smtp.mailfrom=metaspace.dk; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b=YMyqvaV/; arc=none smtp.client-ip=209.85.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b="YMyqvaV/" Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5683576ea18so6719347a12.3 for ; Wed, 13 Mar 2024 04:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20230601.gappssmtp.com; s=20230601; t=1710327969; x=1710932769; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=02J+9loqch2JUjY1j89Pn56oCFWFFh5txriQBAftH6E=; b=YMyqvaV/Iv9S0jEpsLdgxswKxB8JYgX1XANbrdRmBZIkCsi7b/mfoyCumU2FVpomz5 5IWzMZe3mC/DS/PypAX6TzxtaTySW7iqx1kPleAixDW0fatBuNQhsCUMYXPQdJwGUmgi uFIgme1Y0OGg8tBpUXH19FGsUREWvxfYm1FWjLoZJlF1l0WYU2TVuFfTeG9gtBqU/6iK Xl8JwkH55K5tuNS9rUaJ/KtRRxGX60bQ5TYktSkivhHKBY7z0zMb2IXjWLPIp36dLM6l 2KRH24Ftb0GpmY6Ia450eF5yOzwsVsvqsq3PsSN0iScRyNSnsPLo9gM+xoXSuzWe0Ncm h9FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710327969; x=1710932769; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=02J+9loqch2JUjY1j89Pn56oCFWFFh5txriQBAftH6E=; b=CC+fB4T8Q8+7FXUiz/YB5O50iruJ+sbRUk3/ux7k4XwwMQWEA6GrLy+C4OSVzRUNCH yybtTrz/8H8hT9I7spUpxDujgirmU3sMnFSw3thBtgFpgTkJeoDh3rQFCVB+ZQ94NF2Q APXAYFVmm+qtYZfVG2qE4yKlbkEXZ6DqpTxwDhLXh4SURu1PrPzcv/LtPS/JPEx4CiYy 3j9Wc8FZZgTaJ/54MxCCPwiJhpQwfK9NGMpmlbuDCVpsStceBjSSLHlpbyIU0CqZf4pp ZCAUTJGWYGWCvT2vyIA2pSAtGbcyQaxTmqqzVuk55QVLpL/gThwQ9y7x9G1Thvp5NQnO WLjg== X-Forwarded-Encrypted: i=1; AJvYcCWPMRsqFUOKpXv6loKEo85GlkCIxw4o7h9ZiAYAVWho5Lsyvb1LOPJpBhxQ7P4m9BtQnj5HmU5v2JfzOu3i/hYfOMq7IK+VisM74iFy X-Gm-Message-State: AOJu0YwlIVfgb52O7eZoETIgMfU8HQXbrHCZE48yyvTxFfeq9arL4eWb g+lpuVWhYXWYne6AF19vodkoI0PXiwI0Zlwmid3jNTr/axZdV4FoPl7jzcXtYLE= X-Google-Smtp-Source: AGHT+IEcThuL3vIoWtwKtXW9tSmz7xjNlpNKF5lpt2rzWBNXagbGz7lsNNxPgXrvJ1Nn9AxFGjtUgA== X-Received: by 2002:a17:907:972a:b0:a45:f4c2:38d7 with SMTP id jg42-20020a170907972a00b00a45f4c238d7mr8536226ejc.18.1710327969457; Wed, 13 Mar 2024 04:06:09 -0700 (PDT) Received: from localhost ([79.142.230.34]) by smtp.gmail.com with ESMTPSA id jx15-20020a170907760f00b00a466591eaebsm55134ejc.194.2024.03.13.04.06.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 04:06:09 -0700 (PDT) From: Andreas Hindborg To: Jens Axboe , Christoph Hellwig , Keith Busch , Damien Le Moal , Bart Van Assche , Hannes Reinecke , "linux-block@vger.kernel.org" Cc: Andreas Hindborg , Niklas Cassel , Greg KH , Matthew Wilcox , Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?UTF-8?q?Bj=C3=B6rn=20Roy=20Baron?= , Benno Lossin , Alice Ryhl , Chaitanya Kulkarni , Luis Chamberlain , Yexuan Yang <1182282462@bupt.edu.cn>, =?UTF-8?q?Sergio=20Gonz=C3=A1lez=20Collado?= , Joel Granados , "Pankaj Raghav (Samsung)" , Daniel Gomez , open list , "rust-for-linux@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , "gost.dev@samsung.com" Subject: [RFC PATCH 2/5] rust: block: introduce `kernel::block::bio` module Date: Wed, 13 Mar 2024 12:05:09 +0100 Message-ID: <20240313110515.70088-3-nmi@metaspace.dk> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240313110515.70088-1-nmi@metaspace.dk> References: <20240313110515.70088-1-nmi@metaspace.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Andreas Hindborg Add abstractions for working with `struct bio`. Signed-off-by: Andreas Hindborg --- rust/kernel/block.rs | 1 + rust/kernel/block/bio.rs | 112 +++++++++++++ rust/kernel/block/bio/vec.rs | 279 ++++++++++++++++++++++++++++++++ rust/kernel/block/mq/request.rs | 22 +++ 4 files changed, 414 insertions(+) create mode 100644 rust/kernel/block/bio.rs create mode 100644 rust/kernel/block/bio/vec.rs diff --git a/rust/kernel/block.rs b/rust/kernel/block.rs index 4c93317a568a..1797859551fd 100644 --- a/rust/kernel/block.rs +++ b/rust/kernel/block.rs @@ -2,4 +2,5 @@ =20 //! Types for working with the block layer =20 +pub mod bio; pub mod mq; diff --git a/rust/kernel/block/bio.rs b/rust/kernel/block/bio.rs new file mode 100644 index 000000000000..0d4336cbe9c1 --- /dev/null +++ b/rust/kernel/block/bio.rs @@ -0,0 +1,112 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Types for working with the bio layer. +//! +//! C header: [`include/linux/blk_types.h`](../../include/linux/blk_types.= h) + +use core::fmt; +use core::ptr::NonNull; + +mod vec; + +pub use vec::BioSegmentIterator; +pub use vec::Segment; + +use crate::types::Opaque; + +/// A block device IO descriptor (`struct bio`) +/// +/// # Invariants +/// +/// Instances of this type is always reference counted. A call to +/// `bindings::bio_get()` ensures that the instance is valid for read at l= east +/// until a matching call to `bindings :bio_put()`. +#[repr(transparent)] +pub struct Bio(Opaque); + +impl Bio { + /// Returns an iterator over segments in this `Bio`. Does not consider + /// segments of other bios in this bio chain. + #[inline(always)] + pub fn segment_iter(&self) -> BioSegmentIterator<'_> { + BioSegmentIterator::new(self) + } + + /// Get slice referencing the `bio_vec` array of this bio + #[inline(always)] + fn io_vec(&self) -> &[bindings::bio_vec] { + let this =3D self.0.get(); + + // SAFETY: By the type invariant of `Bio` and existence of `&self`, + // `this` is valid for read. + let io_vec =3D unsafe { (*this).bi_io_vec }; + + // SAFETY: By the type invariant of `Bio` and existence of `&self`, + // `this` is valid for read. + let length =3D unsafe { (*this).bi_vcnt }; + + // SAFETY: By C API contract, `io_vec` points to `length` consecut= ive + // and properly initialized `bio_vec` values. The array is properly + // aligned because it is #[repr(C)]. By C API contract and safety + // requirement of `from_raw()`, the elements of the `io_vec` array= are + // not modified for the duration of the lifetime of `&self` + unsafe { core::slice::from_raw_parts(io_vec, length as usize) } + } + + /// Return a copy of the `bvec_iter` for this `Bio`. This iterator alw= ays + /// indexes to a valid `bio_vec` entry. + #[inline(always)] + fn raw_iter(&self) -> bindings::bvec_iter { + // SAFETY: By the type invariant of `Bio` and existence of `&self`, + // `self` is valid for read. + unsafe { (*self.0.get()).bi_iter } + } + + /// Get the next `Bio` in the chain + #[inline(always)] + fn next(&self) -> Option<&Self> { + // SAFETY: By the type invariant of `Bio` and existence of `&self`, + // `self` is valid for read. + let next =3D unsafe { (*self.0.get()).bi_next }; + // SAFETY: By C API contract `bi_next` has nonzero reference count= if it + // is not null, for at least the duration of the lifetime of &self. + unsafe { Self::from_raw(next) } + } + + /// Create an instance of `Bio` from a raw pointer. + /// + /// # Safety + /// + /// If `ptr` is not null, caller must ensure positive refcount for the + /// pointee and immutability for the duration of the returned lifetime. + #[inline(always)] + pub(crate) unsafe fn from_raw<'a>(ptr: *mut bindings::bio) -> Option<&= 'a Self> { + Some( + // SAFETY: by the safety requirement of this funciton, `ptr` is + // valid for read for the duration of the returned lifetime + unsafe { &*NonNull::new(ptr)?.as_ptr().cast::() }, + ) + } +} + +impl core::fmt::Display for Bio { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "Bio({:?})", self.0.get()) + } +} + +/// An iterator over `Bio` +pub struct BioIterator<'a> { + pub(crate) bio: Option<&'a Bio>, +} + +impl<'a> core::iter::Iterator for BioIterator<'a> { + type Item =3D &'a Bio; + + #[inline(always)] + fn next(&mut self) -> Option<&'a Bio> { + let current =3D self.bio.take()?; + self.bio =3D current.next(); + Some(current) + } +} diff --git a/rust/kernel/block/bio/vec.rs b/rust/kernel/block/bio/vec.rs new file mode 100644 index 000000000000..b61380807f38 --- /dev/null +++ b/rust/kernel/block/bio/vec.rs @@ -0,0 +1,279 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! Types for working with `struct bio_vec` IO vectors +//! +//! C header: [`include/linux/bvec.h`](../../include/linux/bvec.h) + +use super::Bio; +use crate::error::Result; +use crate::folio::UniqueFolio; +use crate::page::Page; +use core::fmt; +use core::mem::ManuallyDrop; + +/// A wrapper around a `strutct bio_vec` - a contiguous range of physical = memory addresses +/// +/// # Invariants +/// +/// `bio_vec` must always be initialized and valid for read and write +pub struct Segment<'a> { + bio_vec: bindings::bio_vec, + _marker: core::marker::PhantomData<&'a ()>, +} + +impl Segment<'_> { + /// Get he lenght of the segment in bytes + #[inline(always)] + pub fn len(&self) -> usize { + self.bio_vec.bv_len as usize + } + + /// Returns true if the length of the segment is 0 + #[inline(always)] + pub fn is_empty(&self) -> bool { + self.len() =3D=3D 0 + } + + /// Get the offset field of the `bio_vec` + #[inline(always)] + pub fn offset(&self) -> usize { + self.bio_vec.bv_offset as usize + } + + /// Copy data of this segment into `folio`. + /// + /// Note: Disregards `self.offset()` + #[inline(always)] + pub fn copy_to_folio(&self, dst_folio: &mut UniqueFolio) -> Result { + // SAFETY: self.bio_vec is valid and thus bv_page must be a valid + // pointer to a `struct page`. We do not own the page, but we prev= ent + // drop by wrapping the `Page` in `ManuallyDrop`. + let src_page =3D ManuallyDrop::new(unsafe { Page::from_raw(self.bi= o_vec.bv_page) }); + + src_page.with_slice_into_page(|src| { + dst_folio.with_slice_into_page_mut(0, |dst| { + dst.copy_from_slice(src); + Ok(()) + }) + }) + } + + /// Copy data to the page of this segment from `src`. + /// + /// Note: Disregards `self.offset()` + pub fn copy_from_folio(&mut self, src_folio: &UniqueFolio) -> Result { + // SAFETY: self.bio_vec is valid and thus bv_page must be a valid + // pointer to a `struct page`. We do not own the page, but we prev= ent + // drop by wrapping the `Page` in `ManuallyDrop`. + let mut dst_page =3D ManuallyDrop::new(unsafe { Page::from_raw(sel= f.bio_vec.bv_page) }); + + dst_page.with_slice_into_page_mut(|dst| { + src_folio.with_slice_into_page(0, |src| { + dst.copy_from_slice(src); + Ok(()) + }) + }) + } +} + +impl core::fmt::Display for Segment<'_> { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!( + f, + "Segment {:?} len: {}", + self.bio_vec.bv_page, self.bio_vec.bv_len + ) + } +} + +/// An iterator over `Segment` +/// +/// # Invariants +/// +/// If `iter.bi_size` > 0, `iter` must always index a valid `bio_vec` in `= bio.io_vec()`. +pub struct BioSegmentIterator<'a> { + bio: &'a Bio, + iter: bindings::bvec_iter, +} + +impl<'a> BioSegmentIterator<'a> { + /// Creeate a new segemnt iterator for iterating the segments of `bio`= . The + /// iterator starts at the beginning of `bio`. + #[inline(always)] + pub(crate) fn new(bio: &'a Bio) -> BioSegmentIterator<'_> { + // SAFETY: `bio.raw_iter()` returns an index that indexes into a v= alid + // `bio_vec` in `bio.io_vec()`. + Self { + bio, + iter: bio.raw_iter(), + } + } + + // The accessors in this implementation block are modelled after C side + // macros and static functions `bvec_iter_*` and `mp_bvec_iter_*` from + // bvec.h. + + /// Construct a `bio_vec` from the current iterator state. + /// + /// This will return a `bio_vec`of size <=3D PAGE_SIZE + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + unsafe fn io_vec(&self) -> bindings::bio_vec { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + unsafe { + bindings::bio_vec { + bv_page: self.page(), + bv_len: self.len(), + bv_offset: self.offset(), + } + } + } + + /// Get the currently indexed `bio_vec` entry. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn bvec(&self) -> &bindings::bio_vec { + // SAFETY: By the safety requirement of this function and the type + // invariant of `Self`, `self.iter.bi_idx` indexes into a valid + // `bio_vec` + unsafe { self.bio.io_vec().get_unchecked(self.iter.bi_idx as usize= ) } + } + + /// Get the currently indexed page, indexing into pages of order > 0. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn page(&self) -> *mut bindings::page { + // SAFETY: By C API contract, the following offset cannot exceed p= ages + // allocated to this bio. + unsafe { self.mp_page().add(self.mp_page_idx()) } + } + + /// Get the remaining bytes in the current page. Never more than PAGE_= SIZE. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn len(&self) -> u32 { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + unsafe { self.mp_len().min((bindings::PAGE_SIZE as u32) - self.off= set()) } + } + + /// Get the offset from the last page boundary in the currently indexed + /// `bio_vec` entry. Never more than PAGE_SIZE. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn offset(&self) -> u32 { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + unsafe { self.mp_offset() % (bindings::PAGE_SIZE as u32) } + } + + /// Return the first page of the currently indexed `bio_vec` entry. Th= is + /// might be a multi-page entry, meaning that page might have order > = 0. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn mp_page(&self) -> *mut bindings::page { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + unsafe { self.bvec().bv_page } + } + + /// Get the offset in whole pages into the currently indexed `bio_vec`= . This + /// can be more than 0 is the page has order > 0. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn mp_page_idx(&self) -> usize { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + (unsafe { self.mp_offset() } / (bindings::PAGE_SIZE as u32)) as us= ize + } + + /// Get the offset in the currently indexed `bio_vec` multi-page entry= . This + /// can be more than `PAGE_SIZE` if the page has order > 0. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn mp_offset(&self) -> u32 { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + unsafe { self.bvec().bv_offset + self.iter.bi_bvec_done } + } + + /// Get the number of remaining bytes for the currently indexed `bio_v= ec` + /// entry. Can be more than PAGE_SIZE for `bio_vec` entries with pages= of + /// order > 0. + /// + /// # Safety + /// + /// Caller must ensure that `self.iter.bi_size` > 0 before calling this + /// method. + #[inline(always)] + unsafe fn mp_len(&self) -> u32 { + // SAFETY: By safety requirement of this function `self.iter.bi_si= ze` is + // greater than 0. + self.iter + .bi_size + .min(unsafe { self.bvec().bv_len } - self.iter.bi_bvec_done) + } +} + +impl<'a> core::iter::Iterator for BioSegmentIterator<'a> { + type Item =3D Segment<'a>; + + #[inline(always)] + fn next(&mut self) -> Option { + if self.iter.bi_size =3D=3D 0 { + return None; + } + + // SAFETY: We checked that `self.iter.bi_size` > 0 above. + let bio_vec_ret =3D unsafe { self.io_vec() }; + + // SAFETY: By existence of reference `&bio`, `bio.0` contains a va= lid + // `struct bio`. By type invariant of `BioSegmentItarator` `self.i= ter` + // indexes into a valid `bio_vec` entry. By C API contracit, `bv_l= en` + // does not exceed the size of the bio. + unsafe { + bindings::bio_advance_iter_single( + self.bio.0.get(), + &mut self.iter as *mut bindings::bvec_iter, + bio_vec_ret.bv_len, + ) + }; + + Some(Segment { + bio_vec: bio_vec_ret, + _marker: core::marker::PhantomData, + }) + } +} diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request= .rs index b4dacac5e091..cccffde45981 100644 --- a/rust/kernel/block/mq/request.rs +++ b/rust/kernel/block/mq/request.rs @@ -12,6 +12,9 @@ }; use core::{ffi::c_void, marker::PhantomData, ops::Deref}; =20 +use crate::block::bio::Bio; +use crate::block::bio::BioIterator; + /// A wrapper around a blk-mq `struct request`. This represents an IO requ= est. /// /// # Invariants @@ -84,6 +87,25 @@ pub fn complete(&self) { } } =20 + /// Get a wrapper for the first Bio in this request + #[inline(always)] + pub fn bio(&self) -> Option<&Bio> { + // SAFETY: By type invariant of `Self`, `self.0` is valid and the = deref + // is safe. + let ptr =3D unsafe { (*self.0.get()).bio }; + // SAFETY: By C API contract, if `bio` is not null it will have a + // positive refcount at least for the duration of the lifetime of + // `&self`. + unsafe { Bio::from_raw(ptr) } + } + + /// Get an iterator over all bio structurs in this request + #[inline(always)] + pub fn bio_iter(&self) -> BioIterator<'_> { + BioIterator { bio: self.bio() } + } + + // TODO: Check if inline is still required for cross language LTO inli= ning into module /// Get the target sector for the request #[inline(always)] pub fn sector(&self) -> usize { --=20 2.44.0 From nobody Mon Feb 9 00:42:03 2026 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 520C03EA88 for ; Wed, 13 Mar 2024 11:06:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327967; cv=none; b=FqzcEwredXA+wyzvgpFx+O/Fvn84HbNOmIn1HyPKL4ScKfJ1WCXaHLBMf9NrWuY1KIPlxXyViNpFQneQPKFdiki6BVKkF6t+qmwOy0gJ7sf7mgV7DL33bZcemortN0JdS6bnL55Vls5klD93ZKCUQ5NtR3UlyBLWlblPwSeLmJM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327967; c=relaxed/simple; bh=0nWQWFCl6B4xYKkDm97E2SVl9EyjGoZdLZ86Sb/jU9k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cY6/BWlyu2A51p8bA6nOyhwT6MNTYp8auzT1IJpEb1C+YMdlT36ykeED9Gqh2pO9FMo8d4eO+f2WtYX/UFdrGOm0JM3vwt1hruCxV/Gf1wjQHcEQ+ovb9+AeDtDnqWoBgFhKfiFXGgHlCXLYKnyrLVTBcY7nbznNxb9ep9op2d8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk; spf=none smtp.mailfrom=metaspace.dk; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b=QpGPgSSu; arc=none smtp.client-ip=209.85.218.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b="QpGPgSSu" Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a44665605f3so899997366b.2 for ; Wed, 13 Mar 2024 04:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20230601.gappssmtp.com; s=20230601; t=1710327964; x=1710932764; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=g9bZI5MgzNqcWYjN2D6bG0PRoNDep28qrRC33Rpd138=; b=QpGPgSSurQeAV00kBohfIDQQ/sRbcn65+FgAjpsiTxNjEekBpJMerxBs2t3phB2iwJ KPkkqNlIK+ifJR+HcW+SqLy5RmnIwuy/B87lJ0e0pRYw+xsxG0Fv8/ucTq2YXBVu/pXG 0bb1k4K3/b88I9bCvm7GrrAEuz/jHnruh5RFztsAoRdsDgreIMZPKJ0pRid4a1jR/SDN 2hmavXdRUOHAq68FvSK4sgI1ZtM+lt/kgWQzbCaYlvKn/uXhlODQEtfCBkSHeBdF9S3q rZaEJkcsswGeR1RuA1XOu76bMkzWyD5+fS0BKfY4IV9TRBqr24H3z89Ze97+c0xVKKUL s0Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710327964; x=1710932764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g9bZI5MgzNqcWYjN2D6bG0PRoNDep28qrRC33Rpd138=; b=GBDzoWxZyyIUl98+OHd6B1KXfc+XRSIE/ZAxxx+808MlRNXzz9K5I/WHyi41oDUylH K4YUFzdRlEEhm2iM+rdAIimvcbeZf0g/LG4w0QpQL94KP9oSAIVSS2yj0wXoJ/t/i23l tWtCNLh3ZUC/agzNt3ooxyWOkbfFiLxDtpw+XgOEOJC2WmOQmlx83USHUZbR91eVOIUN bqAQPFeBwPxAK3kYJJSphj0TFb0NnLsdAqS1oeIoums0r8fpmrNY2SKOfgm5GaQkfQjD s0PW5TnaWp1nfE4pf2BnozeLiGNKwhIqPf16V01p5UPXTMxhkl/LTt5u2Gxe6bt6o21Q kUjQ== X-Forwarded-Encrypted: i=1; AJvYcCU6dy0YcXYe3Jq6jcuxD+/e0LtjyHuuLhI3qzePU/YB55oYtMs7OZnZKWSFQJpwk8GcCbi76xlHQ7gFgPClapFGh7aWijJsHYnjtAo3 X-Gm-Message-State: AOJu0YyXhwOCKqESNC80Nuxs0u80WFvgxUSmTtlQQERAMFHdyCx7eAN9 VmUqhzeo7dOb15DoBMdDKynCoBgu2sJJyYlXNL2SLF4pFmM3NFVwmjlBfxTu/BQ= X-Google-Smtp-Source: AGHT+IGELi8A7Ge1QGtgoog0Js/kx4/CnhNpUUdBbM6NEhHLHoRBcFT6jXqrv9OjTLpIoqQcdAlmcQ== X-Received: by 2002:a17:907:d049:b0:a43:f587:d427 with SMTP id vb9-20020a170907d04900b00a43f587d427mr9420734ejc.34.1710327963467; Wed, 13 Mar 2024 04:06:03 -0700 (PDT) Received: from localhost ([79.142.230.34]) by smtp.gmail.com with ESMTPSA id o18-20020a17090608d200b00a461f6da4e3sm3367049eje.94.2024.03.13.04.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 04:06:03 -0700 (PDT) From: Andreas Hindborg To: Jens Axboe , Christoph Hellwig , Keith Busch , Damien Le Moal , Bart Van Assche , Hannes Reinecke , "linux-block@vger.kernel.org" Cc: Andreas Hindborg , Niklas Cassel , Greg KH , Matthew Wilcox , Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?UTF-8?q?Bj=C3=B6rn=20Roy=20Baron?= , Benno Lossin , Alice Ryhl , Chaitanya Kulkarni , Luis Chamberlain , Yexuan Yang <1182282462@bupt.edu.cn>, =?UTF-8?q?Sergio=20Gonz=C3=A1lez=20Collado?= , Joel Granados , "Pankaj Raghav (Samsung)" , Daniel Gomez , open list , "rust-for-linux@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , "gost.dev@samsung.com" Subject: [RFC PATCH 3/5] rust: block: allow `hrtimer::Timer` in `RequestData` Date: Wed, 13 Mar 2024 12:05:10 +0100 Message-ID: <20240313110515.70088-4-nmi@metaspace.dk> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240313110515.70088-1-nmi@metaspace.dk> References: <20240313110515.70088-1-nmi@metaspace.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Andreas Hindborg Signed-off-by: Andreas Hindborg --- rust/kernel/block/mq/request.rs | 67 ++++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/rust/kernel/block/mq/request.rs b/rust/kernel/block/mq/request= .rs index cccffde45981..8b7f08f894be 100644 --- a/rust/kernel/block/mq/request.rs +++ b/rust/kernel/block/mq/request.rs @@ -4,13 +4,16 @@ //! //! C header: [`include/linux/blk-mq.h`](srctree/include/linux/blk-mq.h) =20 +use kernel::hrtimer::RawTimer; + use crate::{ bindings, block::mq::Operations, error::{Error, Result}, + hrtimer::{HasTimer, TimerCallback}, types::{ARef, AlwaysRefCounted, Opaque}, }; -use core::{ffi::c_void, marker::PhantomData, ops::Deref}; +use core::{ffi::c_void, marker::PhantomData, ops::Deref, ptr::NonNull}; =20 use crate::block::bio::Bio; use crate::block::bio::BioIterator; @@ -175,6 +178,68 @@ fn deref(&self) -> &Self::Target { } } =20 +impl RawTimer for RequestDataRef +where + T: Operations, + T::RequestData: HasTimer, + T::RequestData: Sync, +{ + fn schedule(self, expires: u64) { + let self_ptr =3D self.deref() as *const T::RequestData; + core::mem::forget(self); + + // SAFETY: `self_ptr` is a valid pointer to a `T::RequestData` + let timer_ptr =3D unsafe { T::RequestData::raw_get_timer(self_ptr)= }; + + // `Timer` is `repr(transparent)` + let c_timer_ptr =3D timer_ptr.cast::(); + + // Schedule the timer - if it is already scheduled it is removed a= nd + // inserted + + // SAFETY: c_timer_ptr points to a valid hrtimer instance that was + // initialized by `hrtimer_init` + unsafe { + bindings::hrtimer_start_range_ns( + c_timer_ptr as *mut _, + expires as i64, + 0, + bindings::hrtimer_mode_HRTIMER_MODE_REL, + ); + } + } +} + +impl kernel::hrtimer::RawTimerCallback for RequestDataRef +where + T: Operations, + T: Sync, + T::RequestData: HasTimer, + T::RequestData: TimerCallback, +{ + unsafe extern "C" fn run(ptr: *mut bindings::hrtimer) -> bindings::hrt= imer_restart { + // `Timer` is `repr(transparent)` + let timer_ptr =3D ptr.cast::>(); + + // SAFETY: By C API contract `ptr` is the pointer we passed when + // enqueing the timer, so it is a `Timer` embedded= in a `T::RequestData` + let receiver_ptr =3D unsafe { T::RequestData::timer_container_of(t= imer_ptr) }; + + // SAFETY: The pointer was returned by `T::timer_container_of` so = it + // points to a valid `T::RequestData` + let request_ptr =3D unsafe { bindings::blk_mq_rq_from_pdu(receiver= _ptr.cast::()) }; + + // SAFETY: We own a refcount that we leaked during `RawTimer::sche= dule()` + let dref =3D RequestDataRef::new(unsafe { + ARef::from_raw(NonNull::new_unchecked(request_ptr.cast::>())) + }); + + T::RequestData::run(dref); + + bindings::hrtimer_restart_HRTIMER_NORESTART + } +} + // SAFETY: All instances of `Request` are reference counted. This // implementation of `AlwaysRefCounted` ensure that increments to the ref = count // keeps the object alive in memory at least until a matching reference co= unt --=20 2.44.0 From nobody Mon Feb 9 00:42:03 2026 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 754693F9C2 for ; Wed, 13 Mar 2024 11:06:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327970; cv=none; b=mTVXMLgo6Y+R1N03kM4q+wjQHDUM8fz5Dkk9LKY+Yl+OlbTFOZfcn+9z/1E1zESXOomUYX9hzqjK9zx0v4R5+pdlj0+IOB+ZSjXxzGNyNV96+lyUjz6sFmlbn/w9qIMXgRvYkhbPMk8Dn4rz58+5Rhi9RWOWPJCFUgSg1wHfRYk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327970; c=relaxed/simple; bh=xjQCJb4J7tMhqbOJEMnpM+ZXei9HK+wmDq2+xus48ZQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C/KQdwVsT+hpZGHubJjyAOG47kqTYc06L2gumMHsJWTGWxaXqK2Iabg/WLgYM2IgEvDUOlPrpCGb+w1LNIby2LKjxFy2zJD5IEpkfgwkEiq2hcIFOzcUX40VBtTur/rWnxbbYGfRw6/fLVjckCv3NUuy7xXXHV4YRkod++pBcWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk; spf=none smtp.mailfrom=metaspace.dk; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b=0A3yRXY1; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b="0A3yRXY1" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a450615d1c4so130462266b.0 for ; Wed, 13 Mar 2024 04:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20230601.gappssmtp.com; s=20230601; t=1710327966; x=1710932766; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2I4BjeZBW5065VSFTPBsKa9SI4Ksu3+QBW/uFzCaABA=; b=0A3yRXY1VMqCKvJ6Oe6xCSlecM4puPbyCfKKksLY9sMVEx29KhWIVrKAHBzgA7zsgy sJ9CnvWD6+jnICpW+lhZRknTGK+z9TYQd+OzBM05W3957eg7KPMahmVpV6KgIVz6itU/ D25YRI54VRKFS8FIr+GsQpDK2hg7M68gxJm1Or6CMtSsXDeoOLT63CI1FxfP0VUoMvrb RF+rd3IlgH7WQ5Lr/4g0oJMxuht8CQgbLYulzBcBu/M/DvtlIjEPD5CfVAhZiWqlbyBI U7nGzWnq6eDsCx7hLhfuDSMuE47P3ztPLZIauGOpBGDA9BtlY+YbIBkYT/YKr/Cy1CHB Elaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710327966; x=1710932766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2I4BjeZBW5065VSFTPBsKa9SI4Ksu3+QBW/uFzCaABA=; b=iHexoju3b6czQ/NSpyzKeTk5KRYa09tg95QQHP5S8nXoVVgjVpbHQedOBT5mukgXd2 HPRx9AfPWyiObXUTnhrBA848RvuR+W24DOD02bb8uIMxSo1aVyvAAEYfD+74TonVoCqW FVPhZEePhkj3PhawRbPSVJeX1nJBzOdNGfAS2vXHRxBcpRiCiZgO2w3RNBqqgNtBXbWh +cohOMKfxPxzmE8doRAeZ6k8sVNjkXa5iCbXYkdDAiPR3RkwqrMWv1pLwH8XBRnQdJjU RO5VYDwLTzPfOE+AZ1q2KMpvvI2KxUsi5OrIc+cPx4XZnWn1v+0TNlBSQaJQ0aNlgDdI y7Xw== X-Forwarded-Encrypted: i=1; AJvYcCXrT/5QiVFMyqjzqTXTWS7Xau9GnX4OogNA6lMBoC4rD7N8lwgdNHdePfow8mjI2AHgiM7MY2TKwo8uftgY69DUM44QJNG4mHjBkZ9q X-Gm-Message-State: AOJu0Yx4bOrLnmNi/AE9dcu5UNqKWckU3yq9aJk2eb9mf2NPhdJ19ISw 54+vT60aOBgR8eL5EC5Sov4DyJWg8e3VOuF29NU7Vihg/TsjiVq2iUD4z1rAqeM= X-Google-Smtp-Source: AGHT+IGOchPTEwVc1NP+20OHE3woXNtq7ZGAK95XOq6o/e/XnJ5OU4vtDQ9GpW7fyFG0Xs6V3WYxGw== X-Received: by 2002:a17:906:54c2:b0:a44:52ec:b9e7 with SMTP id c2-20020a17090654c200b00a4452ecb9e7mr1771157ejp.16.1710327965737; Wed, 13 Mar 2024 04:06:05 -0700 (PDT) Received: from localhost ([79.142.230.34]) by smtp.gmail.com with ESMTPSA id jw22-20020a170906e95600b00a4623030893sm3173785ejb.126.2024.03.13.04.06.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 04:06:05 -0700 (PDT) From: Andreas Hindborg To: Jens Axboe , Christoph Hellwig , Keith Busch , Damien Le Moal , Bart Van Assche , Hannes Reinecke , "linux-block@vger.kernel.org" Cc: Andreas Hindborg , Niklas Cassel , Greg KH , Matthew Wilcox , Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?UTF-8?q?Bj=C3=B6rn=20Roy=20Baron?= , Benno Lossin , Alice Ryhl , Chaitanya Kulkarni , Luis Chamberlain , Yexuan Yang <1182282462@bupt.edu.cn>, =?UTF-8?q?Sergio=20Gonz=C3=A1lez=20Collado?= , Joel Granados , "Pankaj Raghav (Samsung)" , Daniel Gomez , open list , "rust-for-linux@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , "gost.dev@samsung.com" Subject: [RFC PATCH 4/5] rust: block: add rnull, Rust null_blk implementation Date: Wed, 13 Mar 2024 12:05:11 +0100 Message-ID: <20240313110515.70088-5-nmi@metaspace.dk> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240313110515.70088-1-nmi@metaspace.dk> References: <20240313110515.70088-1-nmi@metaspace.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Andreas Hindborg Signed-off-by: Andreas Hindborg --- drivers/block/Kconfig | 4 + drivers/block/Makefile | 3 + drivers/block/rnull.rs | 323 +++++++++++++++++++++++++++++++++++++++++ rust/helpers.c | 1 + scripts/Makefile.build | 2 +- 5 files changed, 332 insertions(+), 1 deletion(-) create mode 100644 drivers/block/rnull.rs diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 5b9d4aaebb81..fb877d4f8ddf 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -354,6 +354,10 @@ config VIRTIO_BLK This is the virtual block driver for virtio. It can be used with QEMU based VMMs (like KVM or Xen). Say Y or M. =20 +config BLK_DEV_RUST_NULL + tristate "Rust null block driver" + depends on RUST + config BLK_DEV_RBD tristate "Rados block device (RBD)" depends on INET && BLOCK diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 101612cba303..1105a2d4fdcb 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -9,6 +9,9 @@ # needed for trace events ccflags-y +=3D -I$(src) =20 +obj-$(CONFIG_BLK_DEV_RUST_NULL) +=3D rnull_mod.o +rnull_mod-y :=3D rnull.o + obj-$(CONFIG_MAC_FLOPPY) +=3D swim3.o obj-$(CONFIG_BLK_DEV_SWIM) +=3D swim_mod.o obj-$(CONFIG_BLK_DEV_FD) +=3D floppy.o diff --git a/drivers/block/rnull.rs b/drivers/block/rnull.rs new file mode 100644 index 000000000000..05fef30e910c --- /dev/null +++ b/drivers/block/rnull.rs @@ -0,0 +1,323 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! This is a Rust implementation of the C null block driver. +//! +//! Supported features: +//! +//! - optional memory backing +//! - blk-mq interface +//! - direct completion +//! - softirq completion +//! - timer completion +//! +//! The driver is configured at module load time by parameters +//! `param_memory_backed`, `param_capacity_mib`, `param_irq_mode` and +//! `param_completion_time_nsec!. + +use core::ops::Deref; + +use kernel::{ + bindings, + block::{ + bio::Segment, + mq::{self, GenDisk, Operations, RequestDataRef, TagSet}, + }, + error::Result, + folio::*, + hrtimer::{RawTimer, TimerCallback}, + new_mutex, pr_info, + prelude::*, + sync::{Arc, Mutex}, + types::{ARef, ForeignOwnable}, + xarray::XArray, +}; + +use kernel::new_spinlock; +use kernel::CacheAligned; +use kernel::sync::SpinLock; + +module! { + type: NullBlkModule, + name: "rnull_mod", + author: "Andreas Hindborg", + license: "GPL v2", + params: { + param_memory_backed: bool { + default: true, + permissions: 0, + description: "Use memory backing", + }, + // Problems with pin_init when `irq_mode` + param_irq_mode: u8 { + default: 0, + permissions: 0, + description: "IRQ Mode (0: None, 1: Soft, 2: Timer)", + }, + param_capacity_mib: u64 { + default: 4096, + permissions: 0, + description: "Device capacity in MiB", + }, + param_completion_time_nsec: u64 { + default: 1_000_000, + permissions: 0, + description: "Completion time in nano seconds for timer mode", + }, + param_block_size: u16 { + default: 4096, + permissions: 0, + description: "Block size in bytes", + }, + }, +} + +#[derive(Debug)] +enum IRQMode { + None, + Soft, + Timer, +} + +impl TryFrom for IRQMode { + type Error =3D kernel::error::Error; + + fn try_from(value: u8) -> Result { + match value { + 0 =3D> Ok(Self::None), + 1 =3D> Ok(Self::Soft), + 2 =3D> Ok(Self::Timer), + _ =3D> Err(kernel::error::code::EINVAL), + } + } +} + +struct NullBlkModule { + _disk: Pin>>>, +} + +fn add_disk(tagset: Arc>) -> Result> { + let block_size =3D *param_block_size.read(); + if block_size % 512 !=3D 0 || !(512..=3D4096).contains(&block_size) { + return Err(kernel::error::code::EINVAL); + } + + let irq_mode =3D (*param_irq_mode.read()).try_into()?; + + let queue_data =3D Box::pin_init(pin_init!( + QueueData { + tree <- TreeContainer::new(), + completion_time_nsec: *param_completion_time_nsec.read(), + irq_mode, + memory_backed: *param_memory_backed.read(), + block_size, + } + ))?; + + let block_size =3D queue_data.block_size; + + let mut disk =3D GenDisk::try_new(tagset, queue_data)?; + disk.set_name(format_args!("rnullb{}", 0))?; + disk.set_capacity_sectors(*param_capacity_mib.read() << 11); + disk.set_queue_logical_block_size(block_size.into()); + disk.set_queue_physical_block_size(block_size.into()); + disk.set_rotational(false); + Ok(disk) +} + +impl kernel::Module for NullBlkModule { + fn init(_module: &'static ThisModule) -> Result { + pr_info!("Rust null_blk loaded\n"); + let tagset =3D Arc::pin_init(TagSet::try_new(1, (), 256, 1))?; + let disk =3D Box::pin_init(new_mutex!(add_disk(tagset)?, "nullb:di= sk"))?; + + disk.lock().add()?; + + Ok(Self { _disk: disk }) + } +} + +impl Drop for NullBlkModule { + fn drop(&mut self) { + pr_info!("Dropping rnullb\n"); + } +} + +struct NullBlkDevice; + +type Tree =3D XArray>; +type TreeRef<'a> =3D &'a Tree; + +#[pin_data] +struct TreeContainer { + // `XArray` is safe to use without a lock, as it applies internal lock= ing. + // However, there are two reasons to use an external lock: a) cache li= ne + // contention and b) we don't want to take the lock for each page we + // process. + // + // A: The `XArray` lock (xa_lock) is located on the same cache line as= the + // xarray data pointer (xa_head). The effect of this arrangement is th= at + // under heavy contention, we often get a cache miss when we try to fo= llow + // the data pointer after acquiring the lock. We would rather have con= sumers + // spinning on another lock, so we do not get a miss on xa_head. This = issue + // can potentially be fixed by padding the C `struct xarray`. + // + // B: The current `XArray` Rust API requires that we take the `xa_lock= ` for + // each `XArray` operation. This is very inefficient when the lock is + // contended and we have many operations to perform. Eventually we sho= uld + // update the `XArray` API to allow multiple tree operations under a s= ingle + // lock acquisition. For now, serialize tree access with an external l= ock. + #[pin] + tree: CacheAligned, + #[pin] + lock: CacheAligned>, +} + +impl TreeContainer { + fn new() -> impl PinInit { + pin_init!(TreeContainer { + tree <- CacheAligned::new_initializer(XArray::new(0)), + lock <- CacheAligned::new_initializer(new_spinlock!((), "rnull= b:mem")), + }) + } +} + +#[pin_data] +struct QueueData { + #[pin] + tree: TreeContainer, + completion_time_nsec: u64, + irq_mode: IRQMode, + memory_backed: bool, + block_size: u16, +} + +impl NullBlkDevice { + #[inline(always)] + fn write(tree: TreeRef<'_>, sector: usize, segment: &Segment<'_>) -> R= esult { + let idx =3D sector >> bindings::PAGE_SECTORS_SHIFT; + + let mut folio =3D if let Some(page) =3D tree.get_locked(idx) { + page + } else { + tree.set(idx, Box::try_new(Folio::try_new(0)?)?)?; + tree.get_locked(idx).unwrap() + }; + + segment.copy_to_folio(&mut folio)?; + + Ok(()) + } + + #[inline(always)] + fn read(tree: TreeRef<'_>, sector: usize, segment: &mut Segment<'_>) -= > Result { + let idx =3D sector >> bindings::PAGE_SECTORS_SHIFT; + + if let Some(folio) =3D tree.get_locked(idx) { + segment.copy_from_folio(folio.deref())?; + } + + Ok(()) + } + + #[inline(never)] + fn transfer( + command: bindings::req_op, + tree: TreeRef<'_>, + sector: usize, + segment: &mut Segment<'_>, + ) -> Result { + match command { + bindings::req_op_REQ_OP_WRITE =3D> Self::write(tree, sector, s= egment)?, + bindings::req_op_REQ_OP_READ =3D> Self::read(tree, sector, seg= ment)?, + _ =3D> (), + } + Ok(()) + } +} + +#[pin_data] +struct Pdu { + #[pin] + timer: kernel::hrtimer::Timer, +} + +impl TimerCallback for Pdu { + type Receiver =3D RequestDataRef; + + fn run(this: Self::Receiver) { + this.request().end_ok(); + } +} + +kernel::impl_has_timer! { + impl HasTimer for Pdu { self.timer } +} + +#[vtable] +impl Operations for NullBlkDevice { + type RequestData =3D Pdu; + type RequestDataInit =3D impl PinInit; + type QueueData =3D Pin>; + type HwData =3D (); + type TagSetData =3D (); + + fn new_request_data( + _tagset_data: ::Borrowed<'_>, + ) -> Self::RequestDataInit { + pin_init!( Pdu { + timer <- kernel::hrtimer::Timer::new(), + }) + } + + #[inline(always)] + fn queue_rq( + _hw_data: (), + queue_data: &QueueData, + rq: ARef>, + _is_last: bool, + ) -> Result { + rq.start(); + if queue_data.memory_backed { + let guard =3D queue_data.tree.lock.lock(); + let tree =3D queue_data.tree.tree.deref(); + + let mut sector =3D rq.sector(); + for bio in rq.bio_iter() { + for mut segment in bio.segment_iter() { + Self::transfer(rq.command(), tree, sector, &mut segmen= t)?; + sector +=3D segment.len() >> bindings::SECTOR_SHIFT; + } + } + + drop(guard); + } + + + match queue_data.irq_mode { + IRQMode::None =3D> rq.end_ok(), + IRQMode::Soft =3D> rq.complete(), + IRQMode::Timer =3D> { + mq::Request::owned_data_ref(rq).schedule(queue_data.comple= tion_time_nsec) + } + } + + Ok(()) + } + + fn commit_rqs( + _hw_data: ::Borrowed<'_>, + _queue_data: ::Borrowed<'_>, + ) { + } + + fn complete(rq: &mq::Request) { + rq.end_ok(); + } + + fn init_hctx( + _tagset_data: ::Borrowed<'_>, + _hctx_idx: u32, + ) -> Result { + Ok(()) + } +} diff --git a/rust/helpers.c b/rust/helpers.c index 017fa90366e6..9c8976629e90 100644 --- a/rust/helpers.c +++ b/rust/helpers.c @@ -200,6 +200,7 @@ struct page *rust_helper_folio_page(struct folio *folio= , size_t n) { return folio_page(folio, n); } +EXPORT_SYMBOL_GPL(rust_helper_folio_page); =20 loff_t rust_helper_folio_pos(struct folio *folio) { diff --git a/scripts/Makefile.build b/scripts/Makefile.build index dae447a1ad30..f64be2310010 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -262,7 +262,7 @@ $(obj)/%.lst: $(src)/%.c FORCE # Compile Rust sources (.rs) # ------------------------------------------------------------------------= --- =20 -rust_allowed_features :=3D new_uninit,offset_of +rust_allowed_features :=3D new_uninit,offset_of,allocator_api,impl_trait_i= n_assoc_type =20 # `--out-dir` is required to avoid temporaries being created by `rustc` in= the # current working directory, which may be not accessible in the out-of-tree --=20 2.44.0 From nobody Mon Feb 9 00:42:03 2026 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B81C243AAE for ; Wed, 13 Mar 2024 11:06:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327971; cv=none; b=ABXzAIU0qnPlNBSwLAAec7Db2loSTJNgN6pYqT5w6RCQRQX0GrsMzwkGP86aS0MGm6YxcVLJaoDLAAO2p+c+z83quMmYJPKyIMT9AzaqcPmXeyQMoS0V+xzc6Xoa3uB6YjtNxNhEeSVpZHDzubGf5EVBTcI9lKdKU3SDMe09CIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710327971; c=relaxed/simple; bh=poKb8VxBrFQtoK1H4oQ1UgXa3OWgewJdw+oLXHq0Dns=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cjFe2n4StlEAVqw8Eby4HSdT8yskZmB5lrwc+i5njJ8P4yMOX7/QiALl89ZWhTnSAXTiENxQm3g2dtqOlh0KBRyRWDM3zlhy7qFVBhbSpgRtPK2J7MV8UZK66Z+oaQAbpG62WB0A4zkmmlNJA6AlSV3G5moqCpz7RGd0zJfwsUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk; spf=none smtp.mailfrom=metaspace.dk; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b=TPwjM3jM; arc=none smtp.client-ip=209.85.208.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=metaspace.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=metaspace-dk.20230601.gappssmtp.com header.i=@metaspace-dk.20230601.gappssmtp.com header.b="TPwjM3jM" Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-56847d9b002so5382775a12.1 for ; Wed, 13 Mar 2024 04:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20230601.gappssmtp.com; s=20230601; t=1710327968; x=1710932768; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZJhpLaBye+vDxABkGVxNk0R4/Je1BHTggYd+F9CS0JE=; b=TPwjM3jMqw80NkaKsRfTJcf8kAhvMWN4NrwL/mKZKZBzeEAgNwWUAEH2UFzY1ZyFXZ 95PP6VA5FBk8NGCFWrMiLp16CF2UmGsZD1jK+R+JLTbxOfIR2Vwfe5F1dH3Dqo+HhMHn DD1fvIA98FBH17Zqga8Axn7fuwA1xpTqo8pc2cBRDFDPbqTgw0Zyzl6PLQw/oma9M/pK oR1guAO0pF+al7Z7EObuYOMOYhWuFljZuwULxSJEOKgCC+dLdD//C6Vt0PZnoFoOg4W9 1IcWM42+GtYqxJVpl+imHtn7MK/0ORqFThIS00vHmONRFldclB2xPughk+iFTdAxBwvF xgvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710327968; x=1710932768; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZJhpLaBye+vDxABkGVxNk0R4/Je1BHTggYd+F9CS0JE=; b=UFardWq6rwxLjbNngxjBnz6iC46Vfh6AXzOMvv11Y1yQ1cEKH17GeOL7aNJwuWPjKk B3q61LT6IIyglgEYiEv9iBynVkb3pyA4QCiKb92QKjt0uM18yYkc+GoNt/DkeV5R7Tin lgXSfi75hGVPpO5BI6nrxOMtRtHPGZ/NIqPDi7C1CHO+IRolNhPiW0xkPBznxBj2fEM4 xNV5MVOg9uUxIWqA4FCsko2I/kCUDm6j3yi8uiw8ccllPOukaJnYwrOer3gB3WteFDxk thO4gZNbNGA25bgMcXA0V6lImTXhOo9AZCn4watwtoVNJIHrc7UQGhOYkTpgHKQYPboS S5IA== X-Forwarded-Encrypted: i=1; AJvYcCXv9q81U521pzudc+qsfwpvNlYDQJXCAoxEgOLcbT65Z3X/8Vou/WXx8MpATPmuMbJicMz1+jJW4wYznFMAKKpFSpJCTs8/dqf9PP7n X-Gm-Message-State: AOJu0YwApxxUrhkWcpq+xlHTQUI9Dt6zV1d2YOqKfJQZHHwOp4jSJR9+ keB2p8hPNPhGIg96ys8rU+/u1YKXwoz83ZAMLClbYbTMw+0BALny28of/kO/Lfo= X-Google-Smtp-Source: AGHT+IH2kW+8cJvn0ZC8fmOzX8cVltqW6h1KgZezqvXcb0+MfBFkrVtJmrcgIoAGxQxFO9AZ2LURKQ== X-Received: by 2002:a17:907:8a85:b0:a46:5dc4:dab9 with SMTP id sf5-20020a1709078a8500b00a465dc4dab9mr951325ejc.38.1710327968092; Wed, 13 Mar 2024 04:06:08 -0700 (PDT) Received: from localhost ([79.142.230.34]) by smtp.gmail.com with ESMTPSA id bf2-20020a170907098200b00a461e206c00sm3397412ejc.20.2024.03.13.04.06.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 04:06:06 -0700 (PDT) From: Andreas Hindborg To: Jens Axboe , Christoph Hellwig , Keith Busch , Damien Le Moal , Bart Van Assche , Hannes Reinecke , "linux-block@vger.kernel.org" Cc: Andreas Hindborg , Niklas Cassel , Greg KH , Matthew Wilcox , Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?UTF-8?q?Bj=C3=B6rn=20Roy=20Baron?= , Benno Lossin , Alice Ryhl , Chaitanya Kulkarni , Luis Chamberlain , Yexuan Yang <1182282462@bupt.edu.cn>, =?UTF-8?q?Sergio=20Gonz=C3=A1lez=20Collado?= , Joel Granados , "Pankaj Raghav (Samsung)" , Daniel Gomez , open list , "rust-for-linux@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , "gost.dev@samsung.com" Subject: [RFC PATCH 5/5] MAINTAINERS: add entry for Rust block device driver API Date: Wed, 13 Mar 2024 12:05:12 +0100 Message-ID: <20240313110515.70088-6-nmi@metaspace.dk> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240313110515.70088-1-nmi@metaspace.dk> References: <20240313110515.70088-1-nmi@metaspace.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Andreas Hindborg Signed-off-by: Andreas Hindborg --- MAINTAINERS | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1aabf1c15bb3..031198967782 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3623,6 +3623,20 @@ F: include/linux/blk* F: kernel/trace/blktrace.c F: lib/sbitmap.c =20 +BLOCK LAYER DEVICE DRIVER API [RUST] +M: Andreas Hindborg +R: Boqun Feng +L: linux-block@vger.kernel.org +L: rust-for-linux@vger.kernel.org +S: Supported +W: https://rust-for-linux.com +B: https://github.com/Rust-for-Linux/linux/issues +C: https://rust-for-linux.zulipchat.com/#narrow/stream/Block +T: git https://github.com/Rust-for-Linux/linux.git rust-block-next +F: drivers/block/rnull.rs +F: rust/kernel/block.rs +F: rust/kernel/block/ + BLOCK2MTD DRIVER M: Joern Engel L: linux-mtd@lists.infradead.org --=20 2.44.0