The pdu field in io_uring_cmd may contain stale data when a request
object is recycled from the slab cache. Accessing uninitialized or
garbage memory can lead to undefined behavior in users of the pdu.
Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that
each command starts from a well-defined state. This avoids exposing
uninitialized memory and prevents potential misinterpretation of data
from previous requests.
No functional change is intended other than guaranteeing that pdu is
always zero-initialized before use.
Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai>
---
io_uring/uring_cmd.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 053bac89b6c0..2492525d4e43 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (!ac)
return -ENOMEM;
ioucmd->sqe = sqe;
+ memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu));
return 0;
}
--
2.43.0
On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > The pdu field in io_uring_cmd may contain stale data when a request > object is recycled from the slab cache. Accessing uninitialized or > garbage memory can lead to undefined behavior in users of the pdu. > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > each command starts from a well-defined state. This avoids exposing > uninitialized memory and prevents potential misinterpretation of data > from previous requests. > > No functional change is intended other than guaranteeing that pdu is > always zero-initialized before use. > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > --- > io_uring/uring_cmd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > index 053bac89b6c0..2492525d4e43 100644 > --- a/io_uring/uring_cmd.c > +++ b/io_uring/uring_cmd.c > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > if (!ac) > return -ENOMEM; > ioucmd->sqe = sqe; > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); Adding this overhead to every existing uring_cmd() implementation is unfortunate. Could we instead track the initialized/uninitialized state by using different types on the Rust side? The io_uring_cmd could start as an IoUringCmd, where the PDU field is MaybeUninit, write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the PDU has been initialized. Best, Caleb > return 0; > } > > -- > 2.43.0 >
On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > The pdu field in io_uring_cmd may contain stale data when a request > > object is recycled from the slab cache. Accessing uninitialized or > > garbage memory can lead to undefined behavior in users of the pdu. > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > each command starts from a well-defined state. This avoids exposing > > uninitialized memory and prevents potential misinterpretation of data > > from previous requests. > > > > No functional change is intended other than guaranteeing that pdu is > > always zero-initialized before use. > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > --- > > io_uring/uring_cmd.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > index 053bac89b6c0..2492525d4e43 100644 > > --- a/io_uring/uring_cmd.c > > +++ b/io_uring/uring_cmd.c > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > if (!ac) > > return -ENOMEM; > > ioucmd->sqe = sqe; > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > Adding this overhead to every existing uring_cmd() implementation is > unfortunate. Could we instead track the initialized/uninitialized > state by using different types on the Rust side? The io_uring_cmd > could start as an IoUringCmd, where the PDU field is MaybeUninit, > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > PDU has been initialized. I've found a flag IORING_URING_CMD_REISSUE that we could initialize the pdu. In uring_cmd callback, we can fill zero when it's not reissued. But I don't know that we could call T::default() in miscdevice. If we make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. How about assign a byte in pdu for checking initialized? In uring_cmd(), We could set a byte flag that it's not initialized. And we could return error that it's not initialized in read_pdu(). Thanks, Sidong > > Best, > Caleb > > > return 0; > > } > > > > -- > > 2.43.0 > >
On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > object is recycled from the slab cache. Accessing uninitialized or > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > each command starts from a well-defined state. This avoids exposing > > > uninitialized memory and prevents potential misinterpretation of data > > > from previous requests. > > > > > > No functional change is intended other than guaranteeing that pdu is > > > always zero-initialized before use. > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > --- > > > io_uring/uring_cmd.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > index 053bac89b6c0..2492525d4e43 100644 > > > --- a/io_uring/uring_cmd.c > > > +++ b/io_uring/uring_cmd.c > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > if (!ac) > > > return -ENOMEM; > > > ioucmd->sqe = sqe; > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > Adding this overhead to every existing uring_cmd() implementation is > > unfortunate. Could we instead track the initialized/uninitialized > > state by using different types on the Rust side? The io_uring_cmd > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > PDU has been initialized. > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > But I don't know that we could call T::default() in miscdevice. If we > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > We could set a byte flag that it's not initialized. And we could return > error that it's not initialized in read_pdu(). Could we do the zero-initialization (or T::default()) in MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag isn't set (i.e. on the initial issue)? That way, we avoid any performance penalty for the existing C uring_cmd() implementations. I'm not quite sure what you mean by "assign a byte in pdu for checking initialized". Best, Caleb
On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > each command starts from a well-defined state. This avoids exposing > > > > uninitialized memory and prevents potential misinterpretation of data > > > > from previous requests. > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > always zero-initialized before use. > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > --- > > > > io_uring/uring_cmd.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > --- a/io_uring/uring_cmd.c > > > > +++ b/io_uring/uring_cmd.c > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > if (!ac) > > > > return -ENOMEM; > > > > ioucmd->sqe = sqe; > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > unfortunate. Could we instead track the initialized/uninitialized > > > state by using different types on the Rust side? The io_uring_cmd > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > PDU has been initialized. > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > But I don't know that we could call T::default() in miscdevice. If we > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > We could set a byte flag that it's not initialized. And we could return > > error that it's not initialized in read_pdu(). > > Could we do the zero-initialization (or T::default()) in > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > isn't set (i.e. on the initial issue)? That way, we avoid any > performance penalty for the existing C uring_cmd() implementations. > I'm not quite sure what you mean by "assign a byte in pdu for checking > initialized". Sure, we could fill zero when it's the first time uring_cmd called with checking the flag. I would remove this commit for next version. I also suggests that we would provide the method that read_pdu() and write_pdu(). In read_pdu() I want to check write_pdu() is called before. So along the 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is initialized? But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and issue_flags. How about some additional field for pdu is initialized like below? struct IoUringCmdArgs { ioucmd: Pin<&mut IoUringCmd>, issue_flags: u32, pdu_initialized: bool, } > > Best, > Caleb
On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > each command starts from a well-defined state. This avoids exposing > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > from previous requests. > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > always zero-initialized before use. > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > --- > > > > > io_uring/uring_cmd.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > --- a/io_uring/uring_cmd.c > > > > > +++ b/io_uring/uring_cmd.c > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > if (!ac) > > > > > return -ENOMEM; > > > > > ioucmd->sqe = sqe; > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > state by using different types on the Rust side? The io_uring_cmd > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > PDU has been initialized. > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > But I don't know that we could call T::default() in miscdevice. If we > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > We could set a byte flag that it's not initialized. And we could return > > > error that it's not initialized in read_pdu(). > > > > Could we do the zero-initialization (or T::default()) in > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > isn't set (i.e. on the initial issue)? That way, we avoid any > > performance penalty for the existing C uring_cmd() implementations. > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > initialized". > > Sure, we could fill zero when it's the first time uring_cmd called with > checking the flag. I would remove this commit for next version. I also > suggests that we would provide the method that read_pdu() and write_pdu(). > In read_pdu() I want to check write_pdu() is called before. So along the > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > initialized? Not sure what you mean about "20 bytes for pdu". It seems like it would be preferable to enforce that write_pdu() has been called before read_pdu() using the Rust type system instead of a runtime check. I was thinking a signature like fn write_pdu(cmd: IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a reason that wouldn't work and a runtime check would be necessary? > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > issue_flags. How about some additional field for pdu is initialized like below? > > struct IoUringCmdArgs { > ioucmd: Pin<&mut IoUringCmd>, > issue_flags: u32, > pdu_initialized: bool, > } One other thing I realized is that issue_flags should come from the *current* context rather than the context the uring_cmd() callback was called in. For example, if io_uring_cmd_done() is called from task work context, issue_flags should match the issue_flags passed to the io_uring_cmd_tw_t callback, not the issue_flags originally passed to the uring_cmd() callback. So it probably makes more sense to decouple issue_flags from the (owned) IoUringCmd. I think you could pass it by reference (&IssueFlags) or with a phantom reference lifetime (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to ensure it can't be used after those callbacks have returned. Best, Caleb
On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > from previous requests. > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > always zero-initialized before use. > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > --- > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > --- a/io_uring/uring_cmd.c > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > if (!ac) > > > > > > return -ENOMEM; > > > > > > ioucmd->sqe = sqe; > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > PDU has been initialized. > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > We could set a byte flag that it's not initialized. And we could return > > > > error that it's not initialized in read_pdu(). > > > > > > Could we do the zero-initialization (or T::default()) in > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > performance penalty for the existing C uring_cmd() implementations. > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > initialized". > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > checking the flag. I would remove this commit for next version. I also > > suggests that we would provide the method that read_pdu() and write_pdu(). > > In read_pdu() I want to check write_pdu() is called before. So along the > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > initialized? > > Not sure what you mean about "20 bytes for pdu". > It seems like it would be preferable to enforce that write_pdu() has > been called before read_pdu() using the Rust type system instead of a > runtime check. I was thinking a signature like fn write_pdu(cmd: > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > reason that wouldn't work and a runtime check would be necessary? I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. I think it's good way to pdu is safe without adding a new generic param for MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > issue_flags. How about some additional field for pdu is initialized like below? > > > > struct IoUringCmdArgs { > > ioucmd: Pin<&mut IoUringCmd>, > > issue_flags: u32, > > pdu_initialized: bool, > > } > > One other thing I realized is that issue_flags should come from the > *current* context rather than the context the uring_cmd() callback was > called in. For example, if io_uring_cmd_done() is called from task > work context, issue_flags should match the issue_flags passed to the > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > the uring_cmd() callback. So it probably makes more sense to decouple > issue_flags from the (owned) IoUringCmd. I think you could pass it by > reference (&IssueFlags) or with a phantom reference lifetime > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > ensure it can't be used after those callbacks have returned. I have had no idea about task work context. I agree with you that it would be better to separate issue_flags from IoUringCmd. So, IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > Best, > Caleb
On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > from previous requests. > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > --- > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > if (!ac) > > > > > > > return -ENOMEM; > > > > > > > ioucmd->sqe = sqe; > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > PDU has been initialized. > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > error that it's not initialized in read_pdu(). > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > performance penalty for the existing C uring_cmd() implementations. > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > initialized". > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > checking the flag. I would remove this commit for next version. I also > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > initialized? > > > > Not sure what you mean about "20 bytes for pdu". > > It seems like it would be preferable to enforce that write_pdu() has > > been called before read_pdu() using the Rust type system instead of a > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > reason that wouldn't work and a runtime check would be necessary? > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > I think it's good way to pdu is safe without adding a new generic param for > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. Yes, that's what I was thinking. > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > struct IoUringCmdArgs { > > > ioucmd: Pin<&mut IoUringCmd>, > > > issue_flags: u32, > > > pdu_initialized: bool, > > > } > > > > One other thing I realized is that issue_flags should come from the > > *current* context rather than the context the uring_cmd() callback was > > called in. For example, if io_uring_cmd_done() is called from task > > work context, issue_flags should match the issue_flags passed to the > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > the uring_cmd() callback. So it probably makes more sense to decouple > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > reference (&IssueFlags) or with a phantom reference lifetime > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > ensure it can't be used after those callbacks have returned. > > I have had no idea about task work context. I agree with you that > it would be better to separate issue_flags from IoUringCmd. So, > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? "Task work" is a mechanism io_uring uses to queue work to run on the thread that submitted an io_uring operation. It's basically a per-thread atomic queue of callbacks that the thread will process whenever it returns from the kernel to userspace (after a syscall or an interrupt). This is the context where asynchronous uring_cmd completions are generally processed (see io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I can't speak to the history of why io_uring uses task work, but my guess would be that it provides a safe context to acquire the io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be called from an interrupt handler, so it's not allowed to take a mutex). Processing all the task work at once also provides natural opportunities for batching. Yes, we probably don't need to bundle anything else with the IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut IoUringCmd> will work for uring_cmds that complete asynchronously, as they will need to outlive the uring_cmd() call. So uring_cmd() needs to transfer ownership of the struct io_uring_cmd. Best, Caleb
On Tue, Sep 09, 2025 at 09:32:37AM -0700, Caleb Sander Mateos wrote: > On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > > from previous requests. > > > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > > --- > > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > > if (!ac) > > > > > > > > return -ENOMEM; > > > > > > > > ioucmd->sqe = sqe; > > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > > PDU has been initialized. > > > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > > error that it's not initialized in read_pdu(). > > > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > > performance penalty for the existing C uring_cmd() implementations. > > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > > initialized". > > > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > > checking the flag. I would remove this commit for next version. I also > > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > > initialized? > > > > > > Not sure what you mean about "20 bytes for pdu". > > > It seems like it would be preferable to enforce that write_pdu() has > > > been called before read_pdu() using the Rust type system instead of a > > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > > reason that wouldn't work and a runtime check would be necessary? > > > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > > I think it's good way to pdu is safe without adding a new generic param for > > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > Yes, that's what I was thinking. Good, I'll change api in this way. Thanks! > > > > > > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > > > struct IoUringCmdArgs { > > > > ioucmd: Pin<&mut IoUringCmd>, > > > > issue_flags: u32, > > > > pdu_initialized: bool, > > > > } > > > > > > One other thing I realized is that issue_flags should come from the > > > *current* context rather than the context the uring_cmd() callback was > > > called in. For example, if io_uring_cmd_done() is called from task > > > work context, issue_flags should match the issue_flags passed to the > > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > > the uring_cmd() callback. So it probably makes more sense to decouple > > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > > reference (&IssueFlags) or with a phantom reference lifetime > > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > > ensure it can't be used after those callbacks have returned. > > > > I have had no idea about task work context. I agree with you that > > it would be better to separate issue_flags from IoUringCmd. So, > > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > "Task work" is a mechanism io_uring uses to queue work to run on the > thread that submitted an io_uring operation. It's basically a > per-thread atomic queue of callbacks that the thread will process > whenever it returns from the kernel to userspace (after a syscall or > an interrupt). This is the context where asynchronous uring_cmd > completions are generally processed (see > io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I > can't speak to the history of why io_uring uses task work, but my > guess would be that it provides a safe context to acquire the > io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be > called from an interrupt handler, so it's not allowed to take a > mutex). Processing all the task work at once also provides natural > opportunities for batching. Thanks, I've checked io_uring_cmd_complete_in_task() that it receives callback that has issue_flags different with io_uring_cmd(). I'll try to add a api that wrapping io_uring_cmd_complete_in_task() for next version. > > Yes, we probably don't need to bundle anything else with the > IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut > IoUringCmd> will work for uring_cmds that complete asynchronously, as > they will need to outlive the uring_cmd() call. So uring_cmd() needs > to transfer ownership of the struct io_uring_cmd. I can't think that how to take ownership of struct io_uring_cmd. The struct allocated with io_alloc_req() and should be freed with io_free_req(). If taking ownership means having pointer of struct io_uring_cmd, I think it's no difference with current version. Also could it be called with mem::forget() if it has ownership? Thanks, Sidong > Best, > Caleb
On Fri, Sep 12, 2025 at 9:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > On Tue, Sep 09, 2025 at 09:32:37AM -0700, Caleb Sander Mateos wrote: > > On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > > > from previous requests. > > > > > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > > > --- > > > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > > > if (!ac) > > > > > > > > > return -ENOMEM; > > > > > > > > > ioucmd->sqe = sqe; > > > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > > > PDU has been initialized. > > > > > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > > > error that it's not initialized in read_pdu(). > > > > > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > > > performance penalty for the existing C uring_cmd() implementations. > > > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > > > initialized". > > > > > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > > > checking the flag. I would remove this commit for next version. I also > > > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > > > initialized? > > > > > > > > Not sure what you mean about "20 bytes for pdu". > > > > It seems like it would be preferable to enforce that write_pdu() has > > > > been called before read_pdu() using the Rust type system instead of a > > > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > > > reason that wouldn't work and a runtime check would be necessary? > > > > > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > > > I think it's good way to pdu is safe without adding a new generic param for > > > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > > > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > > > Yes, that's what I was thinking. > > Good, I'll change api in this way. Thanks! > > > > > > > > > > > > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > > > > > struct IoUringCmdArgs { > > > > > ioucmd: Pin<&mut IoUringCmd>, > > > > > issue_flags: u32, > > > > > pdu_initialized: bool, > > > > > } > > > > > > > > One other thing I realized is that issue_flags should come from the > > > > *current* context rather than the context the uring_cmd() callback was > > > > called in. For example, if io_uring_cmd_done() is called from task > > > > work context, issue_flags should match the issue_flags passed to the > > > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > > > the uring_cmd() callback. So it probably makes more sense to decouple > > > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > > > reference (&IssueFlags) or with a phantom reference lifetime > > > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > > > ensure it can't be used after those callbacks have returned. > > > > > > I have had no idea about task work context. I agree with you that > > > it would be better to separate issue_flags from IoUringCmd. So, > > > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > > > "Task work" is a mechanism io_uring uses to queue work to run on the > > thread that submitted an io_uring operation. It's basically a > > per-thread atomic queue of callbacks that the thread will process > > whenever it returns from the kernel to userspace (after a syscall or > > an interrupt). This is the context where asynchronous uring_cmd > > completions are generally processed (see > > io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I > > can't speak to the history of why io_uring uses task work, but my > > guess would be that it provides a safe context to acquire the > > io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be > > called from an interrupt handler, so it's not allowed to take a > > mutex). Processing all the task work at once also provides natural > > opportunities for batching. > > Thanks, I've checked io_uring_cmd_complete_in_task() that it receives > callback that has issue_flags different with io_uring_cmd(). I'll try to add > a api that wrapping io_uring_cmd_complete_in_task() for next version. > > > > > Yes, we probably don't need to bundle anything else with the > > IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut > > IoUringCmd> will work for uring_cmds that complete asynchronously, as > > they will need to outlive the uring_cmd() call. So uring_cmd() needs > > to transfer ownership of the struct io_uring_cmd. > > I can't think that how to take ownership of struct io_uring_cmd. The > struct allocated with io_alloc_req() and should be freed with io_free_req(). > If taking ownership means having pointer of struct io_uring_cmd, I think > it's no difference with current version. Also could it be called with > mem::forget() if it has ownership? I don't mean ownership of the io_uring_cmd allocation; that's the responsibility of the io_uring layer. But once the io_uring_cmd is handed to the uring_cmd() implementation, it belongs to that layer until it completes the command back to io_uring. Maybe a better way to describe it would be as ownership of the "executing io_uring_cmd". The problem with Pin<&mut IoUringCmd> is that it is a borrowed reference to the io_uring_cmd, so it can't outlive the uring_cmd() callback. Yes, it's possible to leak the io_uring_cmd by never calling io_uring_cmd_done() to return it to the io_uring layer. I would imagine something like this: #[derive(Clone, Copy)] struct IssueFlags<'a>(c_uint, PhantomData<&'a ()>); // Indicates ownership of the io_uring_cmd between uring_cmd() and io_uring_cmd_done() struct IoUringCmd(NonNull<bindings::io_uring_cmd>); impl IoUringCmd { // ... fn done(self, ret: i32, res2: u64, issue_flags: IssueFlags<'_>) { let cmd = self.0.as_ptr(); let issue_flags = issue_flags.0; unsafe { bindings::io_uring_cmd_done(cmd, ret, res2, issue_flags) } } } // Can choose whether to complete the command synchronously or asynchronously. // If take_async() is called, IoUringCmd::done() needs to be called to complete the command. // If take_async() isn't called, the command is completed synchronously // with the return value from MiscDevice::uring_cmd(). struct UringCmdInput<'a>(&mut Option<NonNull<bindings::io_uring_cmd>>); impl UringCmdInput<'_> { fn take_async(self) -> IoUringCmd { IoUringCmd(self.0.take().unwrap()) } } trait MiscDevice { // ... fn uring_cmd( _device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, _cmd: UringCmdInput<'_>, _ issue_flags: IssueFlags <'_>, ) -> Result<i32> { build_error!(VTABLE_DEFAULT_ERROR) } } impl<T: MiscDevice> MiscdeviceVTable<T> { // ... unsafe extern "C" fn uring_cmd( ioucmd: *mut bindings::io_uring_cmd, issue_flags: c_uint, ) -> c_int { let raw_file = unsafe { (*ioucmd).file }; let private = unsafe { (*raw_file).private_data }.cast(); let device = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; let mut ioucmd = Some(NonNull::new(ioucmd).unwrap()); let issue_flags = IssueFlags(issue_flags, PhantomData); let ret = T::uring_cmd(device, UringCmdInput(&mut ioucmd), issue_flags); // -EIOCBQUEUED indicates ownership of io_uring_cmd has been taken if iou_cmd.is_none() { return -bindings::EIOCBQUEUED; } let ret = from_result(|| ret); if ret == -bindings::EIOCBQUEUED { -bindings::EINVAL } else { ret } } } Best, Caleb
On Fri, Sep 12, 2025 at 10:56:31AM -0700, Caleb Sander Mateos wrote: > On Fri, Sep 12, 2025 at 9:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > On Tue, Sep 09, 2025 at 09:32:37AM -0700, Caleb Sander Mateos wrote: > > > On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > > > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > > > > from previous requests. > > > > > > > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > > > > --- > > > > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > > > > if (!ac) > > > > > > > > > > return -ENOMEM; > > > > > > > > > > ioucmd->sqe = sqe; > > > > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > > > > PDU has been initialized. > > > > > > > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > > > > error that it's not initialized in read_pdu(). > > > > > > > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > > > > performance penalty for the existing C uring_cmd() implementations. > > > > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > > > > initialized". > > > > > > > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > > > > checking the flag. I would remove this commit for next version. I also > > > > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > > > > initialized? > > > > > > > > > > Not sure what you mean about "20 bytes for pdu". > > > > > It seems like it would be preferable to enforce that write_pdu() has > > > > > been called before read_pdu() using the Rust type system instead of a > > > > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > > > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > > > > reason that wouldn't work and a runtime check would be necessary? > > > > > > > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > > > > I think it's good way to pdu is safe without adding a new generic param for > > > > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > > > > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > > > > > Yes, that's what I was thinking. > > > > Good, I'll change api in this way. Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > > > > > > > struct IoUringCmdArgs { > > > > > > ioucmd: Pin<&mut IoUringCmd>, > > > > > > issue_flags: u32, > > > > > > pdu_initialized: bool, > > > > > > } > > > > > > > > > > One other thing I realized is that issue_flags should come from the > > > > > *current* context rather than the context the uring_cmd() callback was > > > > > called in. For example, if io_uring_cmd_done() is called from task > > > > > work context, issue_flags should match the issue_flags passed to the > > > > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > > > > the uring_cmd() callback. So it probably makes more sense to decouple > > > > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > > > > reference (&IssueFlags) or with a phantom reference lifetime > > > > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > > > > ensure it can't be used after those callbacks have returned. > > > > > > > > I have had no idea about task work context. I agree with you that > > > > it would be better to separate issue_flags from IoUringCmd. So, > > > > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > > > > > "Task work" is a mechanism io_uring uses to queue work to run on the > > > thread that submitted an io_uring operation. It's basically a > > > per-thread atomic queue of callbacks that the thread will process > > > whenever it returns from the kernel to userspace (after a syscall or > > > an interrupt). This is the context where asynchronous uring_cmd > > > completions are generally processed (see > > > io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I > > > can't speak to the history of why io_uring uses task work, but my > > > guess would be that it provides a safe context to acquire the > > > io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be > > > called from an interrupt handler, so it's not allowed to take a > > > mutex). Processing all the task work at once also provides natural > > > opportunities for batching. > > > > Thanks, I've checked io_uring_cmd_complete_in_task() that it receives > > callback that has issue_flags different with io_uring_cmd(). I'll try to add > > a api that wrapping io_uring_cmd_complete_in_task() for next version. > > > > > > > > Yes, we probably don't need to bundle anything else with the > > > IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut > > > IoUringCmd> will work for uring_cmds that complete asynchronously, as > > > they will need to outlive the uring_cmd() call. So uring_cmd() needs > > > to transfer ownership of the struct io_uring_cmd. > > > > I can't think that how to take ownership of struct io_uring_cmd. The > > struct allocated with io_alloc_req() and should be freed with io_free_req(). > > If taking ownership means having pointer of struct io_uring_cmd, I think > > it's no difference with current version. Also could it be called with > > mem::forget() if it has ownership? > > I don't mean ownership of the io_uring_cmd allocation; that's the > responsibility of the io_uring layer. But once the io_uring_cmd is > handed to the uring_cmd() implementation, it belongs to that layer > until it completes the command back to io_uring. Maybe a better way to > describe it would be as ownership of the "executing io_uring_cmd". The > problem with Pin<&mut IoUringCmd> is that it is a borrowed reference > to the io_uring_cmd, so it can't outlive the uring_cmd() callback. > Yes, it's possible to leak the io_uring_cmd by never calling > io_uring_cmd_done() to return it to the io_uring layer. Thanks, I understood that IoUringCmd could be outlive uring_cmd callback. But it's sad that it could be leaked without any unsafe code. > > I would imagine something like this: > > #[derive(Clone, Copy)] > struct IssueFlags<'a>(c_uint, PhantomData<&'a ()>); > > // Indicates ownership of the io_uring_cmd between uring_cmd() and > io_uring_cmd_done() > struct IoUringCmd(NonNull<bindings::io_uring_cmd>); > > impl IoUringCmd { > // ... > > fn done(self, ret: i32, res2: u64, issue_flags: IssueFlags<'_>) { > let cmd = self.0.as_ptr(); > let issue_flags = issue_flags.0; > unsafe { > bindings::io_uring_cmd_done(cmd, ret, res2, issue_flags) > } > } > } > > // Can choose whether to complete the command synchronously or asynchronously. > // If take_async() is called, IoUringCmd::done() needs to be called to > complete the command. > // If take_async() isn't called, the command is completed synchronously > // with the return value from MiscDevice::uring_cmd(). > struct UringCmdInput<'a>(&mut Option<NonNull<bindings::io_uring_cmd>>); Thanks for a detailed example! But rather than this, We could introduce new return type that has a callback that user could take IoUringCmd instead of returning -EIOCBQUEUED. But I prefer that we provide just one type IoUringCmd without UringCmdInput. Although UringCmdInput, user could call done() in uring_cmd callback and it makes confusion that whether task_async() called and returning -EIOCBQUEUED is mismatched that returns -EINVAL. We don't need to make it complex. Thanks, Sidong > > impl UringCmdInput<'_> { > fn take_async(self) -> IoUringCmd { > IoUringCmd(self.0.take().unwrap()) > } > } > > trait MiscDevice { > // ... > > fn uring_cmd( > _device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > _cmd: UringCmdInput<'_>, > _ issue_flags: IssueFlags <'_>, > ) -> Result<i32> { > build_error!(VTABLE_DEFAULT_ERROR) > } > } > > impl<T: MiscDevice> MiscdeviceVTable<T> { > // ... > > unsafe extern "C" fn uring_cmd( > ioucmd: *mut bindings::io_uring_cmd, > issue_flags: c_uint, > ) -> c_int { > let raw_file = unsafe { (*ioucmd).file }; > let private = unsafe { (*raw_file).private_data }.cast(); > let device = unsafe { <T::Ptr as > ForeignOwnable>::borrow(private) }; > let mut ioucmd = Some(NonNull::new(ioucmd).unwrap()); > let issue_flags = IssueFlags(issue_flags, PhantomData); > let ret = T::uring_cmd(device, UringCmdInput(&mut > ioucmd), issue_flags); > // -EIOCBQUEUED indicates ownership of io_uring_cmd > has been taken > if iou_cmd.is_none() { > return -bindings::EIOCBQUEUED; > } > > let ret = from_result(|| ret); > if ret == -bindings::EIOCBQUEUED { > -bindings::EINVAL > } else { > ret > } > } > } > > Best, > Caleb
On Sat, Sep 13, 2025 at 5:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > On Fri, Sep 12, 2025 at 10:56:31AM -0700, Caleb Sander Mateos wrote: > > On Fri, Sep 12, 2025 at 9:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > On Tue, Sep 09, 2025 at 09:32:37AM -0700, Caleb Sander Mateos wrote: > > > > On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > > > > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > > > > > from previous requests. > > > > > > > > > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > > > > > --- > > > > > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > > > > > if (!ac) > > > > > > > > > > > return -ENOMEM; > > > > > > > > > > > ioucmd->sqe = sqe; > > > > > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > > > > > PDU has been initialized. > > > > > > > > > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > > > > > error that it's not initialized in read_pdu(). > > > > > > > > > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > > > > > performance penalty for the existing C uring_cmd() implementations. > > > > > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > > > > > initialized". > > > > > > > > > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > > > > > checking the flag. I would remove this commit for next version. I also > > > > > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > > > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > > > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > > > > > initialized? > > > > > > > > > > > > Not sure what you mean about "20 bytes for pdu". > > > > > > It seems like it would be preferable to enforce that write_pdu() has > > > > > > been called before read_pdu() using the Rust type system instead of a > > > > > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > > > > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > > > > > reason that wouldn't work and a runtime check would be necessary? > > > > > > > > > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > > > > > I think it's good way to pdu is safe without adding a new generic param for > > > > > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > > > > > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > > > > > > > Yes, that's what I was thinking. > > > > > > Good, I'll change api in this way. Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > > > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > > > > > > > > > struct IoUringCmdArgs { > > > > > > > ioucmd: Pin<&mut IoUringCmd>, > > > > > > > issue_flags: u32, > > > > > > > pdu_initialized: bool, > > > > > > > } > > > > > > > > > > > > One other thing I realized is that issue_flags should come from the > > > > > > *current* context rather than the context the uring_cmd() callback was > > > > > > called in. For example, if io_uring_cmd_done() is called from task > > > > > > work context, issue_flags should match the issue_flags passed to the > > > > > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > > > > > the uring_cmd() callback. So it probably makes more sense to decouple > > > > > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > > > > > reference (&IssueFlags) or with a phantom reference lifetime > > > > > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > > > > > ensure it can't be used after those callbacks have returned. > > > > > > > > > > I have had no idea about task work context. I agree with you that > > > > > it would be better to separate issue_flags from IoUringCmd. So, > > > > > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > > > > > > > "Task work" is a mechanism io_uring uses to queue work to run on the > > > > thread that submitted an io_uring operation. It's basically a > > > > per-thread atomic queue of callbacks that the thread will process > > > > whenever it returns from the kernel to userspace (after a syscall or > > > > an interrupt). This is the context where asynchronous uring_cmd > > > > completions are generally processed (see > > > > io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I > > > > can't speak to the history of why io_uring uses task work, but my > > > > guess would be that it provides a safe context to acquire the > > > > io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be > > > > called from an interrupt handler, so it's not allowed to take a > > > > mutex). Processing all the task work at once also provides natural > > > > opportunities for batching. > > > > > > Thanks, I've checked io_uring_cmd_complete_in_task() that it receives > > > callback that has issue_flags different with io_uring_cmd(). I'll try to add > > > a api that wrapping io_uring_cmd_complete_in_task() for next version. > > > > > > > > > > > Yes, we probably don't need to bundle anything else with the > > > > IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut > > > > IoUringCmd> will work for uring_cmds that complete asynchronously, as > > > > they will need to outlive the uring_cmd() call. So uring_cmd() needs > > > > to transfer ownership of the struct io_uring_cmd. > > > > > > I can't think that how to take ownership of struct io_uring_cmd. The > > > struct allocated with io_alloc_req() and should be freed with io_free_req(). > > > If taking ownership means having pointer of struct io_uring_cmd, I think > > > it's no difference with current version. Also could it be called with > > > mem::forget() if it has ownership? > > > > I don't mean ownership of the io_uring_cmd allocation; that's the > > responsibility of the io_uring layer. But once the io_uring_cmd is > > handed to the uring_cmd() implementation, it belongs to that layer > > until it completes the command back to io_uring. Maybe a better way to > > describe it would be as ownership of the "executing io_uring_cmd". The > > problem with Pin<&mut IoUringCmd> is that it is a borrowed reference > > to the io_uring_cmd, so it can't outlive the uring_cmd() callback. > > Yes, it's possible to leak the io_uring_cmd by never calling > > io_uring_cmd_done() to return it to the io_uring layer. > > Thanks, I understood that IoUringCmd could be outlive uring_cmd callback. > But it's sad that it could be leaked without any unsafe code. Safety in Rust doesn't require destructors to run, which means any resource can be safely leaked (https://faultlore.com/blah/everyone-poops/ has some historical background on why Rust decided leaks had to be considered safe). Leaking an io_uring_cmd is analogous to leaking a Box, both are perfectly possible in safe Rust. > > > > > I would imagine something like this: > > > > #[derive(Clone, Copy)] > > struct IssueFlags<'a>(c_uint, PhantomData<&'a ()>); > > > > // Indicates ownership of the io_uring_cmd between uring_cmd() and > > io_uring_cmd_done() > > struct IoUringCmd(NonNull<bindings::io_uring_cmd>); > > > > impl IoUringCmd { > > // ... > > > > fn done(self, ret: i32, res2: u64, issue_flags: IssueFlags<'_>) { > > let cmd = self.0.as_ptr(); > > let issue_flags = issue_flags.0; > > unsafe { > > bindings::io_uring_cmd_done(cmd, ret, res2, issue_flags) > > } > > } > > } > > > > // Can choose whether to complete the command synchronously or asynchronously. > > // If take_async() is called, IoUringCmd::done() needs to be called to > > complete the command. > > // If take_async() isn't called, the command is completed synchronously > > // with the return value from MiscDevice::uring_cmd(). > > struct UringCmdInput<'a>(&mut Option<NonNull<bindings::io_uring_cmd>>); > > Thanks for a detailed example! > > But rather than this, We could introduce new return type that has a callback that > user could take IoUringCmd instead of returning -EIOCBQUEUED. I'm not following what you're suggesting, maybe a code sample would help? > > But I prefer that we provide just one type IoUringCmd without UringCmdInput. > Although UringCmdInput, user could call done() in uring_cmd callback and > it makes confusion that whether task_async() called and returning -EIOCBQUEUED > is mismatched that returns -EINVAL. We don't need to make it complex. Sure, if you only want to support asynchronous io_uring_cmd completions, than you can just pass IoUringCmd to MiscDevice::uring_cmd() and require it to call IoUringCmd::done() to complete the command. There's a small performance overhead to that over just returning the result from the uring_cmd() callback for synchronous completions (and it's more verbose), but I think that would be fine for an initial implementation. Best, Caleb
On Mon, Sep 15, 2025 at 09:54:50AM -0700, Caleb Sander Mateos wrote: > On Sat, Sep 13, 2025 at 5:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > On Fri, Sep 12, 2025 at 10:56:31AM -0700, Caleb Sander Mateos wrote: > > > On Fri, Sep 12, 2025 at 9:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > On Tue, Sep 09, 2025 at 09:32:37AM -0700, Caleb Sander Mateos wrote: > > > > > On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > > > > > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > > > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > > > > > > from previous requests. > > > > > > > > > > > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > > > > > > --- > > > > > > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > > > > > > if (!ac) > > > > > > > > > > > > return -ENOMEM; > > > > > > > > > > > > ioucmd->sqe = sqe; > > > > > > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > > > > > > PDU has been initialized. > > > > > > > > > > > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > > > > > > error that it's not initialized in read_pdu(). > > > > > > > > > > > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > > > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > > > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > > > > > > performance penalty for the existing C uring_cmd() implementations. > > > > > > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > > > > > > initialized". > > > > > > > > > > > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > > > > > > checking the flag. I would remove this commit for next version. I also > > > > > > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > > > > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > > > > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > > > > > > initialized? > > > > > > > > > > > > > > Not sure what you mean about "20 bytes for pdu". > > > > > > > It seems like it would be preferable to enforce that write_pdu() has > > > > > > > been called before read_pdu() using the Rust type system instead of a > > > > > > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > > > > > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > > > > > > reason that wouldn't work and a runtime check would be necessary? > > > > > > > > > > > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > > > > > > I think it's good way to pdu is safe without adding a new generic param for > > > > > > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > > > > > > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > > > > > > > > > Yes, that's what I was thinking. > > > > > > > > Good, I'll change api in this way. Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > > > > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > > > > > > > > > > > struct IoUringCmdArgs { > > > > > > > > ioucmd: Pin<&mut IoUringCmd>, > > > > > > > > issue_flags: u32, > > > > > > > > pdu_initialized: bool, > > > > > > > > } > > > > > > > > > > > > > > One other thing I realized is that issue_flags should come from the > > > > > > > *current* context rather than the context the uring_cmd() callback was > > > > > > > called in. For example, if io_uring_cmd_done() is called from task > > > > > > > work context, issue_flags should match the issue_flags passed to the > > > > > > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > > > > > > the uring_cmd() callback. So it probably makes more sense to decouple > > > > > > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > > > > > > reference (&IssueFlags) or with a phantom reference lifetime > > > > > > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > > > > > > ensure it can't be used after those callbacks have returned. > > > > > > > > > > > > I have had no idea about task work context. I agree with you that > > > > > > it would be better to separate issue_flags from IoUringCmd. So, > > > > > > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > > > > > > > > > "Task work" is a mechanism io_uring uses to queue work to run on the > > > > > thread that submitted an io_uring operation. It's basically a > > > > > per-thread atomic queue of callbacks that the thread will process > > > > > whenever it returns from the kernel to userspace (after a syscall or > > > > > an interrupt). This is the context where asynchronous uring_cmd > > > > > completions are generally processed (see > > > > > io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I > > > > > can't speak to the history of why io_uring uses task work, but my > > > > > guess would be that it provides a safe context to acquire the > > > > > io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be > > > > > called from an interrupt handler, so it's not allowed to take a > > > > > mutex). Processing all the task work at once also provides natural > > > > > opportunities for batching. > > > > > > > > Thanks, I've checked io_uring_cmd_complete_in_task() that it receives > > > > callback that has issue_flags different with io_uring_cmd(). I'll try to add > > > > a api that wrapping io_uring_cmd_complete_in_task() for next version. > > > > > > > > > > > > > > Yes, we probably don't need to bundle anything else with the > > > > > IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut > > > > > IoUringCmd> will work for uring_cmds that complete asynchronously, as > > > > > they will need to outlive the uring_cmd() call. So uring_cmd() needs > > > > > to transfer ownership of the struct io_uring_cmd. > > > > > > > > I can't think that how to take ownership of struct io_uring_cmd. The > > > > struct allocated with io_alloc_req() and should be freed with io_free_req(). > > > > If taking ownership means having pointer of struct io_uring_cmd, I think > > > > it's no difference with current version. Also could it be called with > > > > mem::forget() if it has ownership? > > > > > > I don't mean ownership of the io_uring_cmd allocation; that's the > > > responsibility of the io_uring layer. But once the io_uring_cmd is > > > handed to the uring_cmd() implementation, it belongs to that layer > > > until it completes the command back to io_uring. Maybe a better way to > > > describe it would be as ownership of the "executing io_uring_cmd". The > > > problem with Pin<&mut IoUringCmd> is that it is a borrowed reference > > > to the io_uring_cmd, so it can't outlive the uring_cmd() callback. > > > Yes, it's possible to leak the io_uring_cmd by never calling > > > io_uring_cmd_done() to return it to the io_uring layer. > > > > Thanks, I understood that IoUringCmd could be outlive uring_cmd callback. > > But it's sad that it could be leaked without any unsafe code. > > Safety in Rust doesn't require destructors to run, which means any > resource can be safely leaked > (https://faultlore.com/blah/everyone-poops/ has some historical > background on why Rust decided leaks had to be considered safe). > Leaking an io_uring_cmd is analogous to leaking a Box, both are > perfectly possible in safe Rust. Thanks for the reference. If driver just drops `IoUringCmd` without taking, the request wouldn't be completed until io-uring instance deinitialized. I understood that we cannot handle this. > > > > > > > > > I would imagine something like this: > > > > > > #[derive(Clone, Copy)] > > > struct IssueFlags<'a>(c_uint, PhantomData<&'a ()>); > > > > > > // Indicates ownership of the io_uring_cmd between uring_cmd() and > > > io_uring_cmd_done() > > > struct IoUringCmd(NonNull<bindings::io_uring_cmd>); > > > > > > impl IoUringCmd { > > > // ... > > > > > > fn done(self, ret: i32, res2: u64, issue_flags: IssueFlags<'_>) { > > > let cmd = self.0.as_ptr(); > > > let issue_flags = issue_flags.0; > > > unsafe { > > > bindings::io_uring_cmd_done(cmd, ret, res2, issue_flags) > > > } > > > } > > > } > > > > > > // Can choose whether to complete the command synchronously or asynchronously. > > > // If take_async() is called, IoUringCmd::done() needs to be called to > > > complete the command. > > > // If take_async() isn't called, the command is completed synchronously > > > // with the return value from MiscDevice::uring_cmd(). > > > struct UringCmdInput<'a>(&mut Option<NonNull<bindings::io_uring_cmd>>); > > > > Thanks for a detailed example! > > > > But rather than this, We could introduce new return type that has a callback that > > user could take IoUringCmd instead of returning -EIOCBQUEUED. > > I'm not following what you're suggesting, maybe a code sample would help? maybe like below... #[vtable] pub trait MiscDevice: Sized { /// ... /// If user returns `EIOCBQUEUED`, this function would be called for /// users who want to take `IoUringCmdAsync`. It could call `done()` for complete /// request. fn uring_cmd_async_prep( device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, io_uring_cmd_async: IoUringCmdAsync); /// ... } impl<T: MiscDevice> MiscdeviceVTable<T> { // ... unsafe extern "C" fn uring_cmd( ioucmd: *mut bindings::io_uring_cmd, issue_flags: ffi::c_uint, ) -> c_int { // ... let result = T::uring_cmd(device, ioucmd, issue_flags); if let Err(EIOCBQUEUED) = result { T::uring_cmd_async_prep(device, ioucmd.to_async()); } from_result(|| result) } } > > > > > But I prefer that we provide just one type IoUringCmd without UringCmdInput. > > Although UringCmdInput, user could call done() in uring_cmd callback and > > it makes confusion that whether task_async() called and returning -EIOCBQUEUED > > is mismatched that returns -EINVAL. We don't need to make it complex. > > Sure, if you only want to support asynchronous io_uring_cmd > completions, than you can just pass IoUringCmd to > MiscDevice::uring_cmd() and require it to call IoUringCmd::done() to > complete the command. There's a small performance overhead to that > over just returning the result from the uring_cmd() callback for > synchronous completions (and it's more verbose), but I think that > would be fine for an initial implementation. I appreciate for your understanding. I think it would be good to have just one simple struct `IoUringCmd`. I'll make next version patch soon. Thanks, Sidong > > Best, > Caleb
On Wed, Sep 17, 2025 at 7:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > On Mon, Sep 15, 2025 at 09:54:50AM -0700, Caleb Sander Mateos wrote: > > On Sat, Sep 13, 2025 at 5:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > On Fri, Sep 12, 2025 at 10:56:31AM -0700, Caleb Sander Mateos wrote: > > > > On Fri, Sep 12, 2025 at 9:42 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > On Tue, Sep 09, 2025 at 09:32:37AM -0700, Caleb Sander Mateos wrote: > > > > > > On Tue, Sep 9, 2025 at 7:43 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > On Mon, Sep 08, 2025 at 12:45:58PM -0700, Caleb Sander Mateos wrote: > > > > > > > > On Sat, Sep 6, 2025 at 7:28 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > On Tue, Sep 02, 2025 at 08:31:00AM -0700, Caleb Sander Mateos wrote: > > > > > > > > > > On Tue, Sep 2, 2025 at 3:23 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 01, 2025 at 05:34:28PM -0700, Caleb Sander Mateos wrote: > > > > > > > > > > > > On Fri, Aug 22, 2025 at 5:56 AM Sidong Yang <sidong.yang@furiosa.ai> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > The pdu field in io_uring_cmd may contain stale data when a request > > > > > > > > > > > > > object is recycled from the slab cache. Accessing uninitialized or > > > > > > > > > > > > > garbage memory can lead to undefined behavior in users of the pdu. > > > > > > > > > > > > > > > > > > > > > > > > > > Ensure the pdu buffer is cleared during io_uring_cmd_prep() so that > > > > > > > > > > > > > each command starts from a well-defined state. This avoids exposing > > > > > > > > > > > > > uninitialized memory and prevents potential misinterpretation of data > > > > > > > > > > > > > from previous requests. > > > > > > > > > > > > > > > > > > > > > > > > > > No functional change is intended other than guaranteeing that pdu is > > > > > > > > > > > > > always zero-initialized before use. > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> > > > > > > > > > > > > > --- > > > > > > > > > > > > > io_uring/uring_cmd.c | 1 + > > > > > > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > > > > > > > > > > > > > index 053bac89b6c0..2492525d4e43 100644 > > > > > > > > > > > > > --- a/io_uring/uring_cmd.c > > > > > > > > > > > > > +++ b/io_uring/uring_cmd.c > > > > > > > > > > > > > @@ -203,6 +203,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > > > > > > > > > > > > if (!ac) > > > > > > > > > > > > > return -ENOMEM; > > > > > > > > > > > > > ioucmd->sqe = sqe; > > > > > > > > > > > > > + memset(&ioucmd->pdu, 0, sizeof(ioucmd->pdu)); > > > > > > > > > > > > > > > > > > > > > > > > Adding this overhead to every existing uring_cmd() implementation is > > > > > > > > > > > > unfortunate. Could we instead track the initialized/uninitialized > > > > > > > > > > > > state by using different types on the Rust side? The io_uring_cmd > > > > > > > > > > > > could start as an IoUringCmd, where the PDU field is MaybeUninit, > > > > > > > > > > > > write_pdu<T>() could return a new IoUringCmdPdu<T> that guarantees the > > > > > > > > > > > > PDU has been initialized. > > > > > > > > > > > > > > > > > > > > > > I've found a flag IORING_URING_CMD_REISSUE that we could initialize > > > > > > > > > > > the pdu. In uring_cmd callback, we can fill zero when it's not reissued. > > > > > > > > > > > But I don't know that we could call T::default() in miscdevice. If we > > > > > > > > > > > make IoUringCmdPdu<T>, MiscDevice also should be MiscDevice<T>. > > > > > > > > > > > > > > > > > > > > > > How about assign a byte in pdu for checking initialized? In uring_cmd(), > > > > > > > > > > > We could set a byte flag that it's not initialized. And we could return > > > > > > > > > > > error that it's not initialized in read_pdu(). > > > > > > > > > > > > > > > > > > > > Could we do the zero-initialization (or T::default()) in > > > > > > > > > > MiscdeviceVTable::uring_cmd() if the IORING_URING_CMD_REISSUE flag > > > > > > > > > > isn't set (i.e. on the initial issue)? That way, we avoid any > > > > > > > > > > performance penalty for the existing C uring_cmd() implementations. > > > > > > > > > > I'm not quite sure what you mean by "assign a byte in pdu for checking > > > > > > > > > > initialized". > > > > > > > > > > > > > > > > > > Sure, we could fill zero when it's the first time uring_cmd called with > > > > > > > > > checking the flag. I would remove this commit for next version. I also > > > > > > > > > suggests that we would provide the method that read_pdu() and write_pdu(). > > > > > > > > > In read_pdu() I want to check write_pdu() is called before. So along the > > > > > > > > > 20 bytes for pdu, maybe we could use a bytes for the flag that pdu is > > > > > > > > > initialized? > > > > > > > > > > > > > > > > Not sure what you mean about "20 bytes for pdu". > > > > > > > > It seems like it would be preferable to enforce that write_pdu() has > > > > > > > > been called before read_pdu() using the Rust type system instead of a > > > > > > > > runtime check. I was thinking a signature like fn write_pdu(cmd: > > > > > > > > IoUringCmd, value: T) -> IoUringCmdPdu<T>. Do you feel there's a > > > > > > > > reason that wouldn't work and a runtime check would be necessary? > > > > > > > > > > > > > > I didn't think about make write_pdu() to return IoUringCmdPdu<T> before. > > > > > > > I think it's good way to pdu is safe without adding a new generic param for > > > > > > > MiscDevice. write_pdu() would return IoUringCmdPdu<T> and it could call > > > > > > > IoUringCmdPdu<T>::pdu(&mut self) -> &mut T safely maybe. > > > > > > > > > > > > Yes, that's what I was thinking. > > > > > > > > > > Good, I'll change api in this way. Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But maybe I would introduce a new struct that has Pin<&mut IoUringCmd> and > > > > > > > > > issue_flags. How about some additional field for pdu is initialized like below? > > > > > > > > > > > > > > > > > > struct IoUringCmdArgs { > > > > > > > > > ioucmd: Pin<&mut IoUringCmd>, > > > > > > > > > issue_flags: u32, > > > > > > > > > pdu_initialized: bool, > > > > > > > > > } > > > > > > > > > > > > > > > > One other thing I realized is that issue_flags should come from the > > > > > > > > *current* context rather than the context the uring_cmd() callback was > > > > > > > > called in. For example, if io_uring_cmd_done() is called from task > > > > > > > > work context, issue_flags should match the issue_flags passed to the > > > > > > > > io_uring_cmd_tw_t callback, not the issue_flags originally passed to > > > > > > > > the uring_cmd() callback. So it probably makes more sense to decouple > > > > > > > > issue_flags from the (owned) IoUringCmd. I think you could pass it by > > > > > > > > reference (&IssueFlags) or with a phantom reference lifetime > > > > > > > > (IssueFlags<'_>) to the Rust uring_cmd() and task work callbacks to > > > > > > > > ensure it can't be used after those callbacks have returned. > > > > > > > > > > > > > > I have had no idea about task work context. I agree with you that > > > > > > > it would be better to separate issue_flags from IoUringCmd. So, > > > > > > > IoUringCmdArgs would have a only field Pin<&mut IoUringCmd>? > > > > > > > > > > > > "Task work" is a mechanism io_uring uses to queue work to run on the > > > > > > thread that submitted an io_uring operation. It's basically a > > > > > > per-thread atomic queue of callbacks that the thread will process > > > > > > whenever it returns from the kernel to userspace (after a syscall or > > > > > > an interrupt). This is the context where asynchronous uring_cmd > > > > > > completions are generally processed (see > > > > > > io_uring_cmd_complete_in_task() and io_uring_cmd_do_in_task_lazy()). I > > > > > > can't speak to the history of why io_uring uses task work, but my > > > > > > guess would be that it provides a safe context to acquire the > > > > > > io_ring_ctx uring_lock mutex (e.g. nvme_uring_cmd_end_io() can be > > > > > > called from an interrupt handler, so it's not allowed to take a > > > > > > mutex). Processing all the task work at once also provides natural > > > > > > opportunities for batching. > > > > > > > > > > Thanks, I've checked io_uring_cmd_complete_in_task() that it receives > > > > > callback that has issue_flags different with io_uring_cmd(). I'll try to add > > > > > a api that wrapping io_uring_cmd_complete_in_task() for next version. > > > > > > > > > > > > > > > > > Yes, we probably don't need to bundle anything else with the > > > > > > IoUringCmd after all. As I mentioned earlier, I don't think Pin<&mut > > > > > > IoUringCmd> will work for uring_cmds that complete asynchronously, as > > > > > > they will need to outlive the uring_cmd() call. So uring_cmd() needs > > > > > > to transfer ownership of the struct io_uring_cmd. > > > > > > > > > > I can't think that how to take ownership of struct io_uring_cmd. The > > > > > struct allocated with io_alloc_req() and should be freed with io_free_req(). > > > > > If taking ownership means having pointer of struct io_uring_cmd, I think > > > > > it's no difference with current version. Also could it be called with > > > > > mem::forget() if it has ownership? > > > > > > > > I don't mean ownership of the io_uring_cmd allocation; that's the > > > > responsibility of the io_uring layer. But once the io_uring_cmd is > > > > handed to the uring_cmd() implementation, it belongs to that layer > > > > until it completes the command back to io_uring. Maybe a better way to > > > > describe it would be as ownership of the "executing io_uring_cmd". The > > > > problem with Pin<&mut IoUringCmd> is that it is a borrowed reference > > > > to the io_uring_cmd, so it can't outlive the uring_cmd() callback. > > > > Yes, it's possible to leak the io_uring_cmd by never calling > > > > io_uring_cmd_done() to return it to the io_uring layer. > > > > > > Thanks, I understood that IoUringCmd could be outlive uring_cmd callback. > > > But it's sad that it could be leaked without any unsafe code. > > > > Safety in Rust doesn't require destructors to run, which means any > > resource can be safely leaked > > (https://faultlore.com/blah/everyone-poops/ has some historical > > background on why Rust decided leaks had to be considered safe). > > Leaking an io_uring_cmd is analogous to leaking a Box, both are > > perfectly possible in safe Rust. > > Thanks for the reference. If driver just drops `IoUringCmd` without taking, > the request wouldn't be completed until io-uring instance deinitialized. > I understood that we cannot handle this. > > > > > > > > > > > > > > I would imagine something like this: > > > > > > > > #[derive(Clone, Copy)] > > > > struct IssueFlags<'a>(c_uint, PhantomData<&'a ()>); > > > > > > > > // Indicates ownership of the io_uring_cmd between uring_cmd() and > > > > io_uring_cmd_done() > > > > struct IoUringCmd(NonNull<bindings::io_uring_cmd>); > > > > > > > > impl IoUringCmd { > > > > // ... > > > > > > > > fn done(self, ret: i32, res2: u64, issue_flags: IssueFlags<'_>) { > > > > let cmd = self.0.as_ptr(); > > > > let issue_flags = issue_flags.0; > > > > unsafe { > > > > bindings::io_uring_cmd_done(cmd, ret, res2, issue_flags) > > > > } > > > > } > > > > } > > > > > > > > // Can choose whether to complete the command synchronously or asynchronously. > > > > // If take_async() is called, IoUringCmd::done() needs to be called to > > > > complete the command. > > > > // If take_async() isn't called, the command is completed synchronously > > > > // with the return value from MiscDevice::uring_cmd(). > > > > struct UringCmdInput<'a>(&mut Option<NonNull<bindings::io_uring_cmd>>); > > > > > > Thanks for a detailed example! > > > > > > But rather than this, We could introduce new return type that has a callback that > > > user could take IoUringCmd instead of returning -EIOCBQUEUED. > > > > I'm not following what you're suggesting, maybe a code sample would help? > > maybe like below... > > #[vtable] > pub trait MiscDevice: Sized { > /// ... > > /// If user returns `EIOCBQUEUED`, this function would be called for > /// users who want to take `IoUringCmdAsync`. It could call `done()` for complete > /// request. > fn uring_cmd_async_prep( > device: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > io_uring_cmd_async: IoUringCmdAsync); > /// ... > } > > impl<T: MiscDevice> MiscdeviceVTable<T> { > // ... > unsafe extern "C" fn uring_cmd( > ioucmd: *mut bindings::io_uring_cmd, > issue_flags: ffi::c_uint, > ) -> c_int { > // ... > let result = T::uring_cmd(device, ioucmd, issue_flags); > > if let Err(EIOCBQUEUED) = result { > T::uring_cmd_async_prep(device, ioucmd.to_async()); > } I see. Yes, this could work, but I worry separating the implementation into 2 calls makes it difficult to pass state between them. If T::uring_cmd() can determine whether the command will complete synchronously or asynchronously just by inspecting the io_uring_cmd/device, it may be a convenient interface. But if T::uring_cmd() performs a bunch of work before deciding the command should go async, it may be a pain to pass those computed values to T::uring_cmd_async_prep(). > > from_result(|| result) > } > } > > > > > > > > But I prefer that we provide just one type IoUringCmd without UringCmdInput. > > > Although UringCmdInput, user could call done() in uring_cmd callback and > > > it makes confusion that whether task_async() called and returning -EIOCBQUEUED > > > is mismatched that returns -EINVAL. We don't need to make it complex. > > > > Sure, if you only want to support asynchronous io_uring_cmd > > completions, than you can just pass IoUringCmd to > > MiscDevice::uring_cmd() and require it to call IoUringCmd::done() to > > complete the command. There's a small performance overhead to that > > over just returning the result from the uring_cmd() callback for > > synchronous completions (and it's more verbose), but I think that > > would be fine for an initial implementation. > > I appreciate for your understanding. I think it would be good to have just one > simple struct `IoUringCmd`. I'll make next version patch soon. Yeah, I think it will probably be easiest for an initial implementation to always transfer ownership of the io_uring_cmd to MiscDevice::uring_cmd() and requiring it to call IoUringCmd::done(). The synchronous case could be optimized in the future, but that's not the primary io_uring_cmd use case. Best, Caleb
© 2016 - 2025 Red Hat, Inc.