[v2] rust: Add IO polling

[PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by FUJITA Tomonori 1 year, 4 months ago

Add read_poll_timeout function which polls periodically until a
condition is met or a timeout is reached.

C's read_poll_timeout (include/linux/iopoll.h) is a complicated macro
and a simple wrapper for Rust doesn't work. So this implements the
same functionality in Rust.

The C version uses usleep_range() while the Rust version uses
fsleep(), which uses the best sleep method so it works with spans that
usleep_range() doesn't work nicely with.

might_sleep() is called via a wrapper so the __FILE__ and __LINE__
debug info with CONFIG_DEBUG_ATOMIC_SLEEP enabled isn't what we
expect; the wrapper instead of the caller.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
---
 rust/helpers/helpers.c |  1 +
 rust/helpers/kernel.c  | 13 ++++++++
 rust/kernel/error.rs   |  1 +
 rust/kernel/io.rs      |  5 +++
 rust/kernel/io/poll.rs | 70 ++++++++++++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs     |  1 +
 6 files changed, 91 insertions(+)
 create mode 100644 rust/helpers/kernel.c
 create mode 100644 rust/kernel/io.rs
 create mode 100644 rust/kernel/io/poll.rs

diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index c274546bcf78..f9569ff1717e 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -12,6 +12,7 @@
 #include "build_assert.c"
 #include "build_bug.c"
 #include "err.c"
+#include "kernel.c"
 #include "kunit.c"
 #include "mutex.c"
 #include "page.c"
diff --git a/rust/helpers/kernel.c b/rust/helpers/kernel.c
new file mode 100644
index 000000000000..5b9614974c76
--- /dev/null
+++ b/rust/helpers/kernel.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kernel.h>
+
+void rust_helper_cpu_relax(void)
+{
+	cpu_relax();
+}
+
+void rust_helper_might_sleep(void)
+{
+	might_sleep();
+}
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 6f1587a2524e..d571b9587ed6 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -58,6 +58,7 @@ macro_rules! declare_err {
     declare_err!(EPIPE, "Broken pipe.");
     declare_err!(EDOM, "Math argument out of domain of func.");
     declare_err!(ERANGE, "Math result not representable.");
+    declare_err!(ETIMEDOUT, "Connection timed out.");
     declare_err!(ERESTARTSYS, "Restart the system call.");
     declare_err!(ERESTARTNOINTR, "System call was interrupted by a signal and will be restarted.");
     declare_err!(ERESTARTNOHAND, "Restart if no handler.");
diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs
new file mode 100644
index 000000000000..033f3c4e4adf
--- /dev/null
+++ b/rust/kernel/io.rs
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Input and Output.
+
+pub mod poll;
diff --git a/rust/kernel/io/poll.rs b/rust/kernel/io/poll.rs
new file mode 100644
index 000000000000..d248a16a7158
--- /dev/null
+++ b/rust/kernel/io/poll.rs
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! IO polling.
+//!
+//! C header: [`include/linux/iopoll.h`](srctree/include/linux/iopoll.h).
+
+use crate::{
+    bindings,
+    error::{code::*, Result},
+    time::{fsleep, Delta, Ktime},
+};
+
+/// Polls periodically until a condition is met or a timeout is reached.
+///
+/// `op` is called repeatedly until `cond` returns `true` or the timeout is
+///  reached. The return value of `op` is passed to `cond`.
+///
+/// `sleep_delta` is the duration to sleep between calls to `op`.
+/// If `sleep_delta` is less than one microsecond, the function will busy-wait.
+///
+/// `timeout_delta` is the maximum time to wait for `cond` to return `true`.
+///
+/// If `sleep_before_read` is `true`, the function will sleep for `sleep_delta`
+/// first.
+///
+/// This function can only be used in a nonatomic context.
+pub fn read_poll_timeout<Op, Cond, T: Copy>(
+    mut op: Op,
+    cond: Cond,
+    sleep_delta: Delta,
+    timeout_delta: Delta,
+    sleep_before_read: bool,
+) -> Result<T>
+where
+    Op: FnMut() -> Result<T>,
+    Cond: Fn(T) -> bool,
+{
+    let timeout = Ktime::ktime_get() + timeout_delta;
+    let sleep = sleep_delta.as_micros() != 0;
+
+    if sleep {
+        // SAFETY: FFI call.
+        unsafe { bindings::might_sleep() }
+    }
+
+    if sleep_before_read && sleep {
+        fsleep(sleep_delta);
+    }
+
+    let val = loop {
+        let val = op()?;
+        if cond(val) {
+            break val;
+        }
+        if !timeout_delta.is_zero() && Ktime::ktime_get() > timeout {
+            break op()?;
+        }
+        if sleep {
+            fsleep(sleep_delta);
+        }
+        // SAFETY: FFI call.
+        unsafe { bindings::cpu_relax() }
+    };
+
+    if cond(val) {
+        Ok(val)
+    } else {
+        Err(ETIMEDOUT)
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 22a3bfa5a9e9..7b6888723fc4 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -35,6 +35,7 @@
 #[cfg(CONFIG_RUST_FW_LOADER_ABSTRACTIONS)]
 pub mod firmware;
 pub mod init;
+pub mod io;
 pub mod ioctl;
 #[cfg(CONFIG_KUNIT)]
 pub mod kunit;
-- 
2.34.1

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

> might_sleep() is called via a wrapper so the __FILE__ and __LINE__
> debug info with CONFIG_DEBUG_ATOMIC_SLEEP enabled isn't what we
> expect; the wrapper instead of the caller.

So not very useful. All we know is that somewhere in Rust something is
sleeping in atomic context. Is it possible to do better? Does __FILE__
and __LINE__ exist in Rust?

> +    if sleep {
> +        // SAFETY: FFI call.
> +        unsafe { bindings::might_sleep() }
> +    }

What is actually unsafe about might_sleep()? It is a void foo(void)
function, so takes no parameters, returns no results. It cannot affect
anything which Rust is managing.

> +        // SAFETY: FFI call.
> +        unsafe { bindings::cpu_relax() }

Same here.

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Sat, Oct 05, 2024 at 08:32:01PM +0200, Andrew Lunn wrote:
> > might_sleep() is called via a wrapper so the __FILE__ and __LINE__
> > debug info with CONFIG_DEBUG_ATOMIC_SLEEP enabled isn't what we
> > expect; the wrapper instead of the caller.
> 
> So not very useful. All we know is that somewhere in Rust something is
> sleeping in atomic context. Is it possible to do better? Does __FILE__
> and __LINE__ exist in Rust?
> 

Sure, you can use: 

	https://doc.rust-lang.org/core/macro.line.html

> > +    if sleep {
> > +        // SAFETY: FFI call.
> > +        unsafe { bindings::might_sleep() }
> > +    }
> 
> What is actually unsafe about might_sleep()? It is a void foo(void)

Every extern "C" function is by default unsafe, because C doesn't have
the concept of safe/unsafe. If you want to avoid unsafe, you could
introduce a Rust's might_sleep() which calls into
`bindings::might_sleep()`:

	pub fn might_sleep() {
	    // SAFETY: ??
	    unsafe { bindings::might_sleep() }
	}

however, if you call a might_sleep() in a preemption disabled context
when CONFIG_DEBUG_ATOMIC_SLEEP=n and PREEMPT=VOLUNTERY, it could means
an unexpected RCU quiescent state, which results an early RCU grace
period, and that may mean a use-after-free. So it's not that safe as you
may expected.

Regards,
Boqun

> function, so takes no parameters, returns no results. It cannot affect
> anything which Rust is managing.
> 
> > +        // SAFETY: FFI call.
> > +        unsafe { bindings::cpu_relax() }
> 
> Same here.
> 
> 	Andrew
>

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by FUJITA Tomonori 1 year, 3 months ago

On Sat, 5 Oct 2024 15:22:23 -0700
Boqun Feng <boqun.feng@gmail.com> wrote:

> On Sat, Oct 05, 2024 at 08:32:01PM +0200, Andrew Lunn wrote:
>> > might_sleep() is called via a wrapper so the __FILE__ and __LINE__
>> > debug info with CONFIG_DEBUG_ATOMIC_SLEEP enabled isn't what we
>> > expect; the wrapper instead of the caller.
>> 
>> So not very useful. All we know is that somewhere in Rust something is
>> sleeping in atomic context. Is it possible to do better? Does __FILE__
>> and __LINE__ exist in Rust?
>> 
> 
> Sure, you can use: 
> 
> 	https://doc.rust-lang.org/core/macro.line.html

To get the proper file name and line number with those macros, we need
to use __might_sleep() instead of might_sleep(). We could also call to
might_resched() there to emulate might_sleep() but fsleep() works
without it so I call only __might_sleep() in the next version.

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

On Sat, Oct 05, 2024 at 03:22:23PM -0700, Boqun Feng wrote:
> On Sat, Oct 05, 2024 at 08:32:01PM +0200, Andrew Lunn wrote:
> > > might_sleep() is called via a wrapper so the __FILE__ and __LINE__
> > > debug info with CONFIG_DEBUG_ATOMIC_SLEEP enabled isn't what we
> > > expect; the wrapper instead of the caller.
> > 
> > So not very useful. All we know is that somewhere in Rust something is
> > sleeping in atomic context. Is it possible to do better? Does __FILE__
> > and __LINE__ exist in Rust?
> > 
> 
> Sure, you can use: 
> 
> 	https://doc.rust-lang.org/core/macro.line.html

So i guess might_sleep() needs turning into some sort of macro, calling
__might_sleep(__FILE__, __LINE__); might_resched();

> > > +    if sleep {
> > > +        // SAFETY: FFI call.
> > > +        unsafe { bindings::might_sleep() }
> > > +    }
> > 
> > What is actually unsafe about might_sleep()? It is a void foo(void)
> 
> Every extern "C" function is by default unsafe, because C doesn't have
> the concept of safe/unsafe. If you want to avoid unsafe, you could
> introduce a Rust's might_sleep() which calls into
> `bindings::might_sleep()`:
> 
> 	pub fn might_sleep() {
> 	    // SAFETY: ??
> 	    unsafe { bindings::might_sleep() }
> 	}
> 
> however, if you call a might_sleep() in a preemption disabled context
> when CONFIG_DEBUG_ATOMIC_SLEEP=n and PREEMPT=VOLUNTERY, it could means
> an unexpected RCU quiescent state, which results an early RCU grace
> period, and that may mean a use-after-free. So it's not that safe as you
> may expected.

If you call might_sleep() in a preemption disabled context you code is
already unsafe, since that is the whole point of it, to find bugs
where you use a sleeping function in atomic context. Depending on why
you are in atomic context, it might appear to work, until it does not
actually work, and bad things happen. So it is not might_sleep() which
is unsafe, it is the Rust code calling it.

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
[...]
> > > > +    if sleep {
> > > > +        // SAFETY: FFI call.
> > > > +        unsafe { bindings::might_sleep() }
> > > > +    }
> > > 
> > > What is actually unsafe about might_sleep()? It is a void foo(void)
> > 
> > Every extern "C" function is by default unsafe, because C doesn't have
> > the concept of safe/unsafe. If you want to avoid unsafe, you could
> > introduce a Rust's might_sleep() which calls into
> > `bindings::might_sleep()`:
> > 
> > 	pub fn might_sleep() {
> > 	    // SAFETY: ??
> > 	    unsafe { bindings::might_sleep() }
> > 	}
> > 
> > however, if you call a might_sleep() in a preemption disabled context
> > when CONFIG_DEBUG_ATOMIC_SLEEP=n and PREEMPT=VOLUNTERY, it could means
> > an unexpected RCU quiescent state, which results an early RCU grace
> > period, and that may mean a use-after-free. So it's not that safe as you
> > may expected.
> 
> If you call might_sleep() in a preemption disabled context you code is
> already unsafe, since that is the whole point of it, to find bugs

Well, in Rust, the rule is: any type-checked (compiled successfully)
code that only calls safe Rust functions cannot be unsafe. So the fact
that calling might_sleep() in a preemption disabled context is unsafe
means that something has to be unsafe.

This eventually can turn into a "blaming game" in the design space: we
can either design the preemption disable function as unsafe or the
might_sleep() function as unsafe. But one of them has to be unsafe
function, otherwise we are breaking the safe code guarantee.

However, this is actually a special case: currently we want to use klint
[1] to detect all context mis-matches at compile time. So the above rule
extends for kernel: any type-checked *and klint-checked* code that only
calls safe Rust functions cannot be unsafe. I.e. we add additional
compile time checking for unsafe code. So if might_sleep() has the
proper klint annotation, and we actually enable klint for kernel code,
then we can make it safe (along with preemption disable functions being
safe).

> where you use a sleeping function in atomic context. Depending on why
> you are in atomic context, it might appear to work, until it does not
> actually work, and bad things happen. So it is not might_sleep() which
> is unsafe, it is the Rust code calling it.

The whole point of unsafe functions is that calling it may result into
unsafe code, so that's why all extern "C" functions are unsafe, so are
might_sleep() (without klint in the picture).

[1]: https://lwn.net/Articles/951550/

Regards,
Boqun

> 
> 	Andrew
> 
> 
> 
>

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> [...]
> > > > > +    if sleep {
> > > > > +        // SAFETY: FFI call.
> > > > > +        unsafe { bindings::might_sleep() }
> > > > > +    }
> > > > 
> > > > What is actually unsafe about might_sleep()? It is a void foo(void)
> > > 
> > > Every extern "C" function is by default unsafe, because C doesn't have
> > > the concept of safe/unsafe. If you want to avoid unsafe, you could
> > > introduce a Rust's might_sleep() which calls into
> > > `bindings::might_sleep()`:
> > > 
> > > 	pub fn might_sleep() {
> > > 	    // SAFETY: ??
> > > 	    unsafe { bindings::might_sleep() }
> > > 	}
> > > 
> > > however, if you call a might_sleep() in a preemption disabled context
> > > when CONFIG_DEBUG_ATOMIC_SLEEP=n and PREEMPT=VOLUNTERY, it could means
> > > an unexpected RCU quiescent state, which results an early RCU grace
> > > period, and that may mean a use-after-free. So it's not that safe as you
> > > may expected.
> > 
> > If you call might_sleep() in a preemption disabled context you code is
> > already unsafe, since that is the whole point of it, to find bugs
> 
> Well, in Rust, the rule is: any type-checked (compiled successfully)
> code that only calls safe Rust functions cannot be unsafe. So the fact
> that calling might_sleep() in a preemption disabled context is unsafe
> means that something has to be unsafe.
> 
> This eventually can turn into a "blaming game" in the design space: we
> can either design the preemption disable function as unsafe or the
> might_sleep() function as unsafe. But one of them has to be unsafe
> function, otherwise we are breaking the safe code guarantee.

Just keep in mind, it could of been C which put you into atomic
context before calling into Rust. An interrupt handler would be a good
example, and i'm sure there are others.

> However, this is actually a special case: currently we want to use klint
> [1] to detect all context mis-matches at compile time. So the above rule
> extends for kernel: any type-checked *and klint-checked* code that only
> calls safe Rust functions cannot be unsafe. I.e. we add additional
> compile time checking for unsafe code. So if might_sleep() has the
> proper klint annotation, and we actually enable klint for kernel code,
> then we can make it safe (along with preemption disable functions being
> safe).
> 
> > where you use a sleeping function in atomic context. Depending on why
> > you are in atomic context, it might appear to work, until it does not
> > actually work, and bad things happen. So it is not might_sleep() which
> > is unsafe, it is the Rust code calling it.
> 
> The whole point of unsafe functions is that calling it may result into
> unsafe code, so that's why all extern "C" functions are unsafe, so are
> might_sleep() (without klint in the picture).

There is a psychological part to this. might_sleep() is a good debug
tool, which costs very little in normal builds, but finds logic bugs
when enabled in debug builds. What we don't want is Rust developers
not scattering it though their code because it adds unsafe code, and
the aim is not to have any unsafe code.

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Alice Ryhl 1 year, 4 months ago

On Mon, Oct 7, 2024 at 3:48 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> > On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> > However, this is actually a special case: currently we want to use klint
> > [1] to detect all context mis-matches at compile time. So the above rule
> > extends for kernel: any type-checked *and klint-checked* code that only
> > calls safe Rust functions cannot be unsafe. I.e. we add additional
> > compile time checking for unsafe code. So if might_sleep() has the
> > proper klint annotation, and we actually enable klint for kernel code,
> > then we can make it safe (along with preemption disable functions being
> > safe).
> >
> > > where you use a sleeping function in atomic context. Depending on why
> > > you are in atomic context, it might appear to work, until it does not
> > > actually work, and bad things happen. So it is not might_sleep() which
> > > is unsafe, it is the Rust code calling it.
> >
> > The whole point of unsafe functions is that calling it may result into
> > unsafe code, so that's why all extern "C" functions are unsafe, so are
> > might_sleep() (without klint in the picture).
>
> There is a psychological part to this. might_sleep() is a good debug
> tool, which costs very little in normal builds, but finds logic bugs
> when enabled in debug builds. What we don't want is Rust developers
> not scattering it though their code because it adds unsafe code, and
> the aim is not to have any unsafe code.

We can add a safe wrapper for it:

pub fn might_sleep() {
    // SAFETY: Always safe to call.
    unsafe { bindings::might_sleep() };
}

Alice

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Mon, Oct 07, 2024 at 04:08:48PM +0200, Alice Ryhl wrote:
> On Mon, Oct 7, 2024 at 3:48 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> > > On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> > > However, this is actually a special case: currently we want to use klint
> > > [1] to detect all context mis-matches at compile time. So the above rule
> > > extends for kernel: any type-checked *and klint-checked* code that only
> > > calls safe Rust functions cannot be unsafe. I.e. we add additional
> > > compile time checking for unsafe code. So if might_sleep() has the
> > > proper klint annotation, and we actually enable klint for kernel code,
> > > then we can make it safe (along with preemption disable functions being
> > > safe).
> > >
> > > > where you use a sleeping function in atomic context. Depending on why
> > > > you are in atomic context, it might appear to work, until it does not
> > > > actually work, and bad things happen. So it is not might_sleep() which
> > > > is unsafe, it is the Rust code calling it.
> > >
> > > The whole point of unsafe functions is that calling it may result into
> > > unsafe code, so that's why all extern "C" functions are unsafe, so are
> > > might_sleep() (without klint in the picture).
> >
> > There is a psychological part to this. might_sleep() is a good debug
> > tool, which costs very little in normal builds, but finds logic bugs
> > when enabled in debug builds. What we don't want is Rust developers
> > not scattering it though their code because it adds unsafe code, and
> > the aim is not to have any unsafe code.
> 
> We can add a safe wrapper for it:
> 
> pub fn might_sleep() {
>     // SAFETY: Always safe to call.
>     unsafe { bindings::might_sleep() };

It's not always safe to call, because might_sleep() has a
might_resched() and in preempt=voluntary kernel, that's a
cond_resched(), which may eventually call __schedule() and report a
quiescent state of RCU. This could means an unexpected early grace
period, and that means a potential use-afer-free.

Regards,
Boqun

> }
> 
> Alice

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

> > pub fn might_sleep() {
> >     // SAFETY: Always safe to call.
> >     unsafe { bindings::might_sleep() };
> 
> It's not always safe to call, because might_sleep() has a
> might_resched() and in preempt=voluntary kernel, that's a
> cond_resched(), which may eventually call __schedule() and report a
> quiescent state of RCU. This could means an unexpected early grace
> period, and that means a potential use-afer-free.

How does C handle this?

I'm not an RCU person...

But if you have called might_sleep() you are about to do something
which could sleep. If it does sleep, the scheduler is going to be
called, the grace period has ended, and RCU is going to do its
thing. If that results in a use-after-free, your code is
broken. might_sleep makes no difference here, the code is still
broken, it just happens to light the fuse for the explosion a bit
earlier.

Or, i'm missing something, not being an RCU person.

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Mon, Oct 07, 2024 at 07:13:40PM +0200, Andrew Lunn wrote:
> > > pub fn might_sleep() {
> > >     // SAFETY: Always safe to call.
> > >     unsafe { bindings::might_sleep() };
> > 
> > It's not always safe to call, because might_sleep() has a
> > might_resched() and in preempt=voluntary kernel, that's a
> > cond_resched(), which may eventually call __schedule() and report a
> > quiescent state of RCU. This could means an unexpected early grace
> > period, and that means a potential use-afer-free.
> 
> How does C handle this?
> 
> I'm not an RCU person...
> 
> But if you have called might_sleep() you are about to do something
> which could sleep. If it does sleep, the scheduler is going to be
> called, the grace period has ended, and RCU is going to do its
> thing. If that results in a use-after-free, your code is
> broken. might_sleep makes no difference here, the code is still
> broken, it just happens to light the fuse for the explosion a bit
> earlier.
> 

Because of the might_resched() in might_sleep(), it will report the
quiescent state of the current CPU, and RCU will pass a grace period if
all CPUs have passed a quiescent state. So for example if someone writes
the following:

    <reader>			<updater>
    rcu_read_lock();
    p = rcu_dereference(gp);
    might_sleep():
      might_resched():
				todo = gp;
				rcu_assign_pointer(gp, NULL);
				synchronize_rcu();

        rcu_all_qs(); // report a quiescent state inside RCU read-side
	              // critical section, which may make a grace period
		      // pass even there is an active RCU reader

				kfree(todo);

    a = READ_ONCE(p->a); // UAF
    rcu_read_unlock();

We probably call the reader side code a "wrong annotation", however,
it's still unsafe code because of the UAF. Also you seems to assume that
might_sleep() is always attached to a sleepable function, which is not
an invalid assumption, but we couldn't use it for reasoning the
safe/unsafe property of Rust functions unless we can encode this in the
type system. For Rust code, without klint rule, might_sleep() needs to
be unsafe. So we have two options for might_sleep().

* Since we rely on klint for atomic context detection, we can mark the
  trivial wrapper (as what Alice presented in the other email) as safe,
  but we need to begin to add klint annotation for that function, unless
  Gary finds a smart way to auto-annotate functions.

* Instead of might_sleep(), we provide the wrapper of __might_sleep(),
  since it doesn't have might_resched() in it, it should be safe. And
  all we care about here is the debugging rather than voluntary context
  switch. (Besides I think preempt=volunatry is eventually going to be
  gone because of PREEMPT_AUTO [1], if that happens I think the
  might_resched() might be dropped entirely).

Does this make sense?

[1]: https://lore.kernel.org/lkml/20240528003521.979836-1-ankur.a.arora@oracle.com/

Regards,
Boqun

> Or, i'm missing something, not being an RCU person.
> 
> 	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

> Because of the might_resched() in might_sleep(), it will report the
> quiescent state of the current CPU, and RCU will pass a grace period if
> all CPUs have passed a quiescent state. So for example if someone writes
> the following:
> 
>     <reader>			<updater>
>     rcu_read_lock();
>     p = rcu_dereference(gp);
>     might_sleep():
>       might_resched():
> 				todo = gp;
> 				rcu_assign_pointer(gp, NULL);
> 				synchronize_rcu();
> 
>         rcu_all_qs(); // report a quiescent state inside RCU read-side
> 	              // critical section, which may make a grace period
> 		      // pass even there is an active RCU reader
> 
> 				kfree(todo);
> 

You are obviously missing something here. The call that actually sleeps

      mutex_lock(&lock)

>     a = READ_ONCE(p->a); // UAF
>     rcu_read_unlock();

A might_sleep() should be paired with something which does actually
sleep, under some condition.  At least, that is how it is used in C.
The iopoll being re-implemented here is an example of that.

So take the might_sleep out above, just leaving the mutex_lock. If the
mutex is uncontested, the code does not sleep and everything is O.K?
If it needs to wait for the mutex, it triggers a UAF.

The might_sleep() will also trigger a stack trace, if its is enabled,
because you are not allowed to sleep inside rcu_read_lock(), it is an
example of atomic context.

As far as i see, might_sleep() will cause UAF where there is going to
be a UAF anyway. If you are using it correctly, it does not cause UAF.

> We probably call the reader side code a "wrong annotation", however,
> it's still unsafe code because of the UAF. Also you seems to assume that
> might_sleep() is always attached to a sleepable function, which is not
> an invalid assumption, but we couldn't use it for reasoning the
> safe/unsafe property of Rust functions unless we can encode this in the
> type system.

How are any of the sleeping call encoded in the type system? I assume
any use of a mutex lock, sleep, wait for completion, etc are not all
marked as unsafe? There is some sort of wrapper around them? Why not
just extend that wrapper to might_sleep().

> For Rust code, without klint rule, might_sleep() needs to
> be unsafe. So we have two options for might_sleep().
> 
> * Since we rely on klint for atomic context detection, we can mark the
>   trivial wrapper (as what Alice presented in the other email) as safe,
>   but we need to begin to add klint annotation for that function, unless
>   Gary finds a smart way to auto-annotate functions.

Are there klint annotations for all sleeping functions?

> * Instead of might_sleep(), we provide the wrapper of __might_sleep(),
>   since it doesn't have might_resched() in it, it should be safe. And
>   all we care about here is the debugging rather than voluntary context
>   switch. (Besides I think preempt=volunatry is eventually going to be
>   gone because of PREEMPT_AUTO [1], if that happens I think the
>   might_resched() might be dropped entirely).

__might_sleep() might be safe, but your code is still broken and going
to UAF at some point. Don't you want that UAF to happen more reliably
and faster so you can find the issue? That would be the advantage of
might_sleep() over __might_sleep().

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Miguel Ojeda 1 year, 4 months ago

On Tue, Oct 8, 2024 at 2:13 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> As far as i see, might_sleep() will cause UAF where there is going to
> be a UAF anyway. If you are using it correctly, it does not cause UAF.

This already implies that it is an unsafe function (in general, i.e.
modulo klint, or a way to force the user to have to write `unsafe`
somewhere else, or what I call ASHes -- "acknowledged soundness
holes").

If we consider as safe functions that, if used correctly, do not cause
UB, then all functions would be safe.

Cheers,
Miguel

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

On Tue, Oct 08, 2024 at 03:14:05PM +0200, Miguel Ojeda wrote:
> On Tue, Oct 8, 2024 at 2:13 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > As far as i see, might_sleep() will cause UAF where there is going to
> > be a UAF anyway. If you are using it correctly, it does not cause UAF.
> 
> This already implies that it is an unsafe function (in general, i.e.
> modulo klint, or a way to force the user to have to write `unsafe`
> somewhere else, or what I call ASHes -- "acknowledged soundness
> holes").
> 
> If we consider as safe functions that, if used correctly, do not cause
> UB, then all functions would be safe.

From what i hear, klint is still WIP. So we have to accept there will
be bad code out there, which will UAF. We want to find such bad code,
and the easiest way to find it at the moment is to make it UAF as fast
as possible. might_sleep() does that, __might_sleep() does not, and
using neither is the slowest way.

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Tue, Oct 08, 2024 at 07:16:42PM +0200, Andrew Lunn wrote:
> On Tue, Oct 08, 2024 at 03:14:05PM +0200, Miguel Ojeda wrote:
> > On Tue, Oct 8, 2024 at 2:13 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > As far as i see, might_sleep() will cause UAF where there is going to
> > > be a UAF anyway. If you are using it correctly, it does not cause UAF.
> > 
> > This already implies that it is an unsafe function (in general, i.e.
> > modulo klint, or a way to force the user to have to write `unsafe`
> > somewhere else, or what I call ASHes -- "acknowledged soundness
> > holes").
> > 
> > If we consider as safe functions that, if used correctly, do not cause
> > UB, then all functions would be safe.
> 
> From what i hear, klint is still WIP. So we have to accept there will
> be bad code out there, which will UAF. We want to find such bad code,

If you don't believe in klint, then we need to mark might_sleep() as
unsafe, as I already explain a million times, might_sleep() is unsafe
without the klint compile time check. You have to accept that an unsafe
function should really be marked as unsafe. And yes, in this way, all
sleep functions would be marked as unsafe as well (or we could mark all
preemption disable function as unsafe), but still an unsafe function is
unsafe.

Again, as Miguel mentioned, we can only mark might_sleep() because sleep
in atomic context is an ASH, not because it's really safe.

> and the easiest way to find it at the moment is to make it UAF as
> fast as possible. might_sleep() does that, __might_sleep() does not,
> and using neither is the slowest way.
> 

might_sleep() is useful because it checks preemption count and task
state, which is provided by __might_sleep() as well. I don't think
causing UAF helps we detect atomic context violation faster than what
__might_sleep() already have. Again, could you provide an example that
help me understand your reasoning here?

Regards,
Boqun

> 	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Andrew Lunn 1 year, 4 months ago

On Tue, Oct 08, 2024 at 02:53:56PM -0700, Boqun Feng wrote:
> On Tue, Oct 08, 2024 at 07:16:42PM +0200, Andrew Lunn wrote:
> > On Tue, Oct 08, 2024 at 03:14:05PM +0200, Miguel Ojeda wrote:
> > > On Tue, Oct 8, 2024 at 2:13 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > As far as i see, might_sleep() will cause UAF where there is going to
> > > > be a UAF anyway. If you are using it correctly, it does not cause UAF.
> > > 
> > > This already implies that it is an unsafe function (in general, i.e.
> > > modulo klint, or a way to force the user to have to write `unsafe`
> > > somewhere else, or what I call ASHes -- "acknowledged soundness
> > > holes").
> > > 
> > > If we consider as safe functions that, if used correctly, do not cause
> > > UB, then all functions would be safe.
> > 
> > From what i hear, klint is still WIP. So we have to accept there will
> > be bad code out there, which will UAF. We want to find such bad code,
> 
> If you don't believe in klint

I did not say that. It is WIP, and without it i assume nothing is
detecting at compile time that the code is broken. Hence we need to
find the problem at runtime, which is what might_sleep() is all about.

> might_sleep() is useful because it checks preemption count and task
> state, which is provided by __might_sleep() as well. I don't think
> causing UAF helps we detect atomic context violation faster than what
> __might_sleep() already have. Again, could you provide an example that
> help me understand your reasoning here?

> > while (1) {
> >     <reader>                        <updater>
> >     rcu_read_lock();
> >     p = rcu_dereference(gp);
> >     mutex_lock(&lock)
> >     a = READ_ONCE(p->a);
> >     mutex_unlock(&lock)
> >     rcu_read_unlock();
> > }

The mutex lock is likely to be uncontested, so you don't sleep, and so
don't trigger the UAF. The code is clearly broken, but you survive.
Until the lock is contested, you do sleep, RCU falls apart, resulting
in a UAF.

Now if you used might_sleep(), every time you go around that loop you
do some of the same processing as actually sleeping, so are much more
likely to trigger the UAF.

might_sleep() as you pointed out, is also active when
CONFIG_DEBUG_ATOMIC_SLEEP is false. Thus it is also going to trigger
the broken code to UAF faster. And i expect a lot of testing is done
without CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING.

Once klint is completed, and detects all these problems at compile
time, we can then discard all this might_sleep stuff. But until then,
the faster code explodes, the more likely it is going to be quickly
and cheaply fixed.

	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Wed, Oct 09, 2024 at 12:26:00AM +0200, Andrew Lunn wrote:
> On Tue, Oct 08, 2024 at 02:53:56PM -0700, Boqun Feng wrote:
> > On Tue, Oct 08, 2024 at 07:16:42PM +0200, Andrew Lunn wrote:
> > > On Tue, Oct 08, 2024 at 03:14:05PM +0200, Miguel Ojeda wrote:
> > > > On Tue, Oct 8, 2024 at 2:13 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > > >
> > > > > As far as i see, might_sleep() will cause UAF where there is going to
> > > > > be a UAF anyway. If you are using it correctly, it does not cause UAF.
> > > > 
> > > > This already implies that it is an unsafe function (in general, i.e.
> > > > modulo klint, or a way to force the user to have to write `unsafe`
> > > > somewhere else, or what I call ASHes -- "acknowledged soundness
> > > > holes").
> > > > 
> > > > If we consider as safe functions that, if used correctly, do not cause
> > > > UB, then all functions would be safe.
> > > 
> > > From what i hear, klint is still WIP. So we have to accept there will
> > > be bad code out there, which will UAF. We want to find such bad code,
> > 
> > If you don't believe in klint
> 
> I did not say that. It is WIP, and without it i assume nothing is
> detecting at compile time that the code is broken. Hence we need to
> find the problem at runtime, which is what might_sleep() is all about.
> 
> > might_sleep() is useful because it checks preemption count and task
> > state, which is provided by __might_sleep() as well. I don't think
> > causing UAF helps we detect atomic context violation faster than what
> > __might_sleep() already have. Again, could you provide an example that
> > help me understand your reasoning here?
> 
> > > while (1) {
> > >     <reader>                        <updater>
> > >     rcu_read_lock();
> > >     p = rcu_dereference(gp);
> > >     mutex_lock(&lock)
> > >     a = READ_ONCE(p->a);
> > >     mutex_unlock(&lock)
> > >     rcu_read_unlock();
> > > }
> 
> The mutex lock is likely to be uncontested, so you don't sleep, and so
> don't trigger the UAF. The code is clearly broken, but you survive.
> Until the lock is contested, you do sleep, RCU falls apart, resulting
> in a UAF.
> 
> Now if you used might_sleep(), every time you go around that loop you
> do some of the same processing as actually sleeping, so are much more
> likely to trigger the UAF.
> 
> might_sleep() as you pointed out, is also active when
> CONFIG_DEBUG_ATOMIC_SLEEP is false. Thus it is also going to trigger
> the broken code to UAF faster. And i expect a lot of testing is done
> without CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING.
> 

Hmm.. but that means we need to quickly detect UAF and track down to the
source, right? In a build without CONFIG_DEBUG_ATOMIC_SLEEP and
CONFIG_PROVE_LOCKING, may I assume memory sanitizer is also not
available? Then how do we detect UAF relatively quickly? Or memory
sanitizer is in fact relatively cheap, so it can still be enabled,
what's the configuration of netdev CI/testing?

Regards,
Boqun

> Once klint is completed, and detects all these problems at compile
> time, we can then discard all this might_sleep stuff. But until then,
> the faster code explodes, the more likely it is going to be quickly
> and cheaply fixed.
> 
> 	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Tue, Oct 08, 2024 at 02:53:56PM -0700, Boqun Feng wrote:
> On Tue, Oct 08, 2024 at 07:16:42PM +0200, Andrew Lunn wrote:
> > On Tue, Oct 08, 2024 at 03:14:05PM +0200, Miguel Ojeda wrote:
> > > On Tue, Oct 8, 2024 at 2:13 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > As far as i see, might_sleep() will cause UAF where there is going to
> > > > be a UAF anyway. If you are using it correctly, it does not cause UAF.
> > > 
> > > This already implies that it is an unsafe function (in general, i.e.
> > > modulo klint, or a way to force the user to have to write `unsafe`
> > > somewhere else, or what I call ASHes -- "acknowledged soundness
> > > holes").
> > > 
> > > If we consider as safe functions that, if used correctly, do not cause
> > > UB, then all functions would be safe.
> > 
> > From what i hear, klint is still WIP. So we have to accept there will
> > be bad code out there, which will UAF. We want to find such bad code,
> 
> If you don't believe in klint, then we need to mark might_sleep() as
> unsafe, as I already explain a million times, might_sleep() is unsafe
> without the klint compile time check. You have to accept that an unsafe
> function should really be marked as unsafe. And yes, in this way, all
> sleep functions would be marked as unsafe as well (or we could mark all
> preemption disable function as unsafe), but still an unsafe function is
> unsafe.
> 
> Again, as Miguel mentioned, we can only mark might_sleep() because sleep
> in atomic context is an ASH, not because it's really safe.
> 
> > and the easiest way to find it at the moment is to make it UAF as
> > fast as possible. might_sleep() does that, __might_sleep() does not,
> > and using neither is the slowest way.
> > 
> 
> might_sleep() is useful because it checks preemption count and task
> state, which is provided by __might_sleep() as well. I don't think
> causing UAF helps we detect atomic context violation faster than what
> __might_sleep() already have. Again, could you provide an example that
> help me understand your reasoning here?
> 

Another advantage of __might_sleep() is that it's already an exported
symbol, so we don't need to introduce a rust helper.

Regards,
Boqun

> Regards,
> Boqun
> 
> > 	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Tue, Oct 08, 2024 at 02:12:51PM +0200, Andrew Lunn wrote:
> > Because of the might_resched() in might_sleep(), it will report the
> > quiescent state of the current CPU, and RCU will pass a grace period if
> > all CPUs have passed a quiescent state. So for example if someone writes
> > the following:
> > 
> >     <reader>			<updater>
> >     rcu_read_lock();
> >     p = rcu_dereference(gp);
> >     might_sleep():
> >       might_resched():
> > 				todo = gp;
> > 				rcu_assign_pointer(gp, NULL);
> > 				synchronize_rcu();
> > 
> >         rcu_all_qs(); // report a quiescent state inside RCU read-side
> > 	              // critical section, which may make a grace period
> > 		      // pass even there is an active RCU reader
> > 
> > 				kfree(todo);
> > 
> 
> You are obviously missing something here. The call that actually sleeps
> 
>       mutex_lock(&lock)
> 
> >     a = READ_ONCE(p->a); // UAF
> >     rcu_read_unlock();
> 
> A might_sleep() should be paired with something which does actually
> sleep, under some condition.  At least, that is how it is used in C.

How do you guarantee the "should" part? How can a compiler detect a
might_sleep() that doesn't have a paired "something which does actually
sleep"? I feel like we are just talking through each other, what I was
trying to say is might_sleep() is unsafe because the rule of Rust safe
code (if we don't consider klint) and I'm using an example here to
explain why. And when we are talking about the safe/unsafe attribute of
a function, we cannot use the reasoning "this function should be always
used with another function".

> The iopoll being re-implemented here is an example of that.
> 
> So take the might_sleep out above, just leaving the mutex_lock. If the
> mutex is uncontested, the code does not sleep and everything is O.K?
> If it needs to wait for the mutex, it triggers a UAF.
> 
> The might_sleep() will also trigger a stack trace, if its is enabled,
> because you are not allowed to sleep inside rcu_read_lock(), it is an
> example of atomic context.

These functionalities you mentioned above are also provided by
__might_sleep(), no?

> 
> As far as i see, might_sleep() will cause UAF where there is going to
> be a UAF anyway. If you are using it correctly, it does not cause UAF.
> 

Again, I agree with your assumption that might_sleep() will always be
paired with a sleep function, but we cannot mark might_sleep() as safe
because of that. We can, however, mark might_sleep() as safe because
klint is supposed to cover the detection of atomic context violations.
But we have a better option: __might_sleep().

> > We probably call the reader side code a "wrong annotation", however,
> > it's still unsafe code because of the UAF. Also you seems to assume that
> > might_sleep() is always attached to a sleepable function, which is not
> > an invalid assumption, but we couldn't use it for reasoning the
> > safe/unsafe property of Rust functions unless we can encode this in the
> > type system.
> 
> How are any of the sleeping call encoded in the type system? I assume

There's no easy way, something might work is introducing effect system
[1] into Rust, but that's very complicated and may take years. When
there's no easy way to encode something in the type system, it's usually
the time that unsafe comes to happen, an unsafe function can have a
requirement that cannot be easily detected by compilers, and via unsafe
block and safety comments, programmers provide the reasons why these
requirements are fulfilled.

> any use of a mutex lock, sleep, wait for completion, etc are not all
> marked as unsafe? There is some sort of wrapper around them? Why not

They are marked as safe because of the klint extension of safe Rust rule
I mentioned.

> just extend that wrapper to might_sleep().
> 
> > For Rust code, without klint rule, might_sleep() needs to
> > be unsafe. So we have two options for might_sleep().
> > 
> > * Since we rely on klint for atomic context detection, we can mark the
> >   trivial wrapper (as what Alice presented in the other email) as safe,
> >   but we need to begin to add klint annotation for that function, unless
> >   Gary finds a smart way to auto-annotate functions.
> 
> Are there klint annotations for all sleeping functions?
> 

Not yet, klint is still WIP. But we generally agree that atomic context
violations should be detected by klint (instead of making sleep
functions unsafe or using type system to encode sleep functions).

> > * Instead of might_sleep(), we provide the wrapper of __might_sleep(),
> >   since it doesn't have might_resched() in it, it should be safe. And
> >   all we care about here is the debugging rather than voluntary context
> >   switch. (Besides I think preempt=volunatry is eventually going to be
> >   gone because of PREEMPT_AUTO [1], if that happens I think the
> >   might_resched() might be dropped entirely).
> 
> __might_sleep() might be safe, but your code is still broken and going
> to UAF at some point. Don't you want that UAF to happen more reliably
> and faster so you can find the issue? That would be the advantage of
> might_sleep() over __might_sleep().
> 

Could you give me an example that might_sleep() can detect a bug while
__might_sleep() cannot? IIUC, __might_sleep() is the core of atomic
context detection in might_sleep(), so when CONFIG_DEBUG_ATOMIC_SLEEP=y,
__might_sleep() should detect all bugs that might_sleep() would detect.
Or you are talking about detecting even when
CONFIG_DEBUG_ATOMIC_SLEEP=n?

[1]: https://en.wikipedia.org/wiki/Effect_system

Regards,
Boqun

> 	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Alice Ryhl 1 year, 4 months ago

On Mon, Oct 7, 2024 at 4:14 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Mon, Oct 07, 2024 at 04:08:48PM +0200, Alice Ryhl wrote:
> > On Mon, Oct 7, 2024 at 3:48 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> > > > On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> > > > However, this is actually a special case: currently we want to use klint
> > > > [1] to detect all context mis-matches at compile time. So the above rule
> > > > extends for kernel: any type-checked *and klint-checked* code that only
> > > > calls safe Rust functions cannot be unsafe. I.e. we add additional
> > > > compile time checking for unsafe code. So if might_sleep() has the
> > > > proper klint annotation, and we actually enable klint for kernel code,
> > > > then we can make it safe (along with preemption disable functions being
> > > > safe).
> > > >
> > > > > where you use a sleeping function in atomic context. Depending on why
> > > > > you are in atomic context, it might appear to work, until it does not
> > > > > actually work, and bad things happen. So it is not might_sleep() which
> > > > > is unsafe, it is the Rust code calling it.
> > > >
> > > > The whole point of unsafe functions is that calling it may result into
> > > > unsafe code, so that's why all extern "C" functions are unsafe, so are
> > > > might_sleep() (without klint in the picture).
> > >
> > > There is a psychological part to this. might_sleep() is a good debug
> > > tool, which costs very little in normal builds, but finds logic bugs
> > > when enabled in debug builds. What we don't want is Rust developers
> > > not scattering it though their code because it adds unsafe code, and
> > > the aim is not to have any unsafe code.
> >
> > We can add a safe wrapper for it:
> >
> > pub fn might_sleep() {
> >     // SAFETY: Always safe to call.
> >     unsafe { bindings::might_sleep() };
>
> It's not always safe to call, because might_sleep() has a
> might_resched() and in preempt=voluntary kernel, that's a
> cond_resched(), which may eventually call __schedule() and report a
> quiescent state of RCU. This could means an unexpected early grace
> period, and that means a potential use-afer-free.

Atomicity violations are intended to be caught by klint. If you want
to change that decision, you'll have to add unsafe to all functions
that sleep including Mutex::lock, CondVar::wait, and many others.

Alice

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Mon, Oct 07, 2024 at 04:16:46PM +0200, Alice Ryhl wrote:
> On Mon, Oct 7, 2024 at 4:14 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Mon, Oct 07, 2024 at 04:08:48PM +0200, Alice Ryhl wrote:
> > > On Mon, Oct 7, 2024 at 3:48 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> > > > > On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> > > > > However, this is actually a special case: currently we want to use klint
> > > > > [1] to detect all context mis-matches at compile time. So the above rule
> > > > > extends for kernel: any type-checked *and klint-checked* code that only
> > > > > calls safe Rust functions cannot be unsafe. I.e. we add additional
> > > > > compile time checking for unsafe code. So if might_sleep() has the
> > > > > proper klint annotation, and we actually enable klint for kernel code,
> > > > > then we can make it safe (along with preemption disable functions being
> > > > > safe).
> > > > >
> > > > > > where you use a sleeping function in atomic context. Depending on why
> > > > > > you are in atomic context, it might appear to work, until it does not
> > > > > > actually work, and bad things happen. So it is not might_sleep() which
> > > > > > is unsafe, it is the Rust code calling it.
> > > > >
> > > > > The whole point of unsafe functions is that calling it may result into
> > > > > unsafe code, so that's why all extern "C" functions are unsafe, so are
> > > > > might_sleep() (without klint in the picture).
> > > >
> > > > There is a psychological part to this. might_sleep() is a good debug
> > > > tool, which costs very little in normal builds, but finds logic bugs
> > > > when enabled in debug builds. What we don't want is Rust developers
> > > > not scattering it though their code because it adds unsafe code, and
> > > > the aim is not to have any unsafe code.
> > >
> > > We can add a safe wrapper for it:
> > >
> > > pub fn might_sleep() {
> > >     // SAFETY: Always safe to call.
> > >     unsafe { bindings::might_sleep() };
> >
> > It's not always safe to call, because might_sleep() has a
> > might_resched() and in preempt=voluntary kernel, that's a
> > cond_resched(), which may eventually call __schedule() and report a
> > quiescent state of RCU. This could means an unexpected early grace
> > period, and that means a potential use-afer-free.
> 
> Atomicity violations are intended to be caught by klint. If you want

Yes, I already mentioned this to Andrew previously.

> to change that decision, you'll have to add unsafe to all functions
> that sleep including Mutex::lock, CondVar::wait, and many others.

No, I'm not trying to change that decision, just to make it clear that
we can mark might_sleep() as safe because of the decision, not because
it's really safe even without klint...

Regards,
Boqun

> 
> Alice

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Mon, Oct 07, 2024 at 07:19:56AM -0700, Boqun Feng wrote:
> On Mon, Oct 07, 2024 at 04:16:46PM +0200, Alice Ryhl wrote:
> > On Mon, Oct 7, 2024 at 4:14 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> > >
> > > On Mon, Oct 07, 2024 at 04:08:48PM +0200, Alice Ryhl wrote:
> > > > On Mon, Oct 7, 2024 at 3:48 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > > >
> > > > > On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> > > > > > On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> > > > > > However, this is actually a special case: currently we want to use klint
> > > > > > [1] to detect all context mis-matches at compile time. So the above rule
> > > > > > extends for kernel: any type-checked *and klint-checked* code that only
> > > > > > calls safe Rust functions cannot be unsafe. I.e. we add additional
> > > > > > compile time checking for unsafe code. So if might_sleep() has the
> > > > > > proper klint annotation, and we actually enable klint for kernel code,
> > > > > > then we can make it safe (along with preemption disable functions being
> > > > > > safe).
> > > > > >
> > > > > > > where you use a sleeping function in atomic context. Depending on why
> > > > > > > you are in atomic context, it might appear to work, until it does not
> > > > > > > actually work, and bad things happen. So it is not might_sleep() which
> > > > > > > is unsafe, it is the Rust code calling it.
> > > > > >
> > > > > > The whole point of unsafe functions is that calling it may result into
> > > > > > unsafe code, so that's why all extern "C" functions are unsafe, so are
> > > > > > might_sleep() (without klint in the picture).
> > > > >
> > > > > There is a psychological part to this. might_sleep() is a good debug
> > > > > tool, which costs very little in normal builds, but finds logic bugs
> > > > > when enabled in debug builds. What we don't want is Rust developers
> > > > > not scattering it though their code because it adds unsafe code, and
> > > > > the aim is not to have any unsafe code.
> > > >
> > > > We can add a safe wrapper for it:
> > > >
> > > > pub fn might_sleep() {
> > > >     // SAFETY: Always safe to call.
> > > >     unsafe { bindings::might_sleep() };
> > >
> > > It's not always safe to call, because might_sleep() has a
> > > might_resched() and in preempt=voluntary kernel, that's a
> > > cond_resched(), which may eventually call __schedule() and report a
> > > quiescent state of RCU. This could means an unexpected early grace
> > > period, and that means a potential use-afer-free.
> > 
> > Atomicity violations are intended to be caught by klint. If you want
> 
> Yes, I already mentioned this to Andrew previously.
> 
> > to change that decision, you'll have to add unsafe to all functions
> > that sleep including Mutex::lock, CondVar::wait, and many others.
> 
> No, I'm not trying to change that decision, just to make it clear that
> we can mark might_sleep() as safe because of the decision, not because
> it's really safe even without klint...
> 

Anyway, I think Tomo needs to call __might_sleep() instead of
might_sleep(), and __might_sleep() seems a pure debug function (not
involved with schedule at all). So the wrapper of __might_sleep() can be
perfectly safe.

Regards,
Boqun

> Regards,
> Boqun
> 
> > 
> > Alice

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by Boqun Feng 1 year, 4 months ago

On Mon, Oct 07, 2024 at 03:48:09PM +0200, Andrew Lunn wrote:
> On Mon, Oct 07, 2024 at 05:28:28AM -0700, Boqun Feng wrote:
> > On Sun, Oct 06, 2024 at 04:45:21PM +0200, Andrew Lunn wrote:
> > [...]
> > > > > > +    if sleep {
> > > > > > +        // SAFETY: FFI call.
> > > > > > +        unsafe { bindings::might_sleep() }
> > > > > > +    }
> > > > > 
> > > > > What is actually unsafe about might_sleep()? It is a void foo(void)
> > > > 
> > > > Every extern "C" function is by default unsafe, because C doesn't have
> > > > the concept of safe/unsafe. If you want to avoid unsafe, you could
> > > > introduce a Rust's might_sleep() which calls into
> > > > `bindings::might_sleep()`:
> > > > 
> > > > 	pub fn might_sleep() {
> > > > 	    // SAFETY: ??
> > > > 	    unsafe { bindings::might_sleep() }
> > > > 	}
> > > > 
> > > > however, if you call a might_sleep() in a preemption disabled context
> > > > when CONFIG_DEBUG_ATOMIC_SLEEP=n and PREEMPT=VOLUNTERY, it could means
> > > > an unexpected RCU quiescent state, which results an early RCU grace
> > > > period, and that may mean a use-after-free. So it's not that safe as you
> > > > may expected.
> > > 
> > > If you call might_sleep() in a preemption disabled context you code is
> > > already unsafe, since that is the whole point of it, to find bugs
> > 
> > Well, in Rust, the rule is: any type-checked (compiled successfully)
> > code that only calls safe Rust functions cannot be unsafe. So the fact
> > that calling might_sleep() in a preemption disabled context is unsafe
> > means that something has to be unsafe.
> > 
> > This eventually can turn into a "blaming game" in the design space: we
> > can either design the preemption disable function as unsafe or the
> > might_sleep() function as unsafe. But one of them has to be unsafe
> > function, otherwise we are breaking the safe code guarantee.
> 
> Just keep in mind, it could of been C which put you into atomic
> context before calling into Rust. An interrupt handler would be a good
> example, and i'm sure there are others.
> 

That's why the klint approach is preferred right now. Without klint, and
if we don't want to mark might_sleep() as unsafe, we probably need to
mark the registration of an interrupt handler unsafe, and the safety
requirement would be "making sure the handler doesn't call schedule()".

> > However, this is actually a special case: currently we want to use klint
> > [1] to detect all context mis-matches at compile time. So the above rule
> > extends for kernel: any type-checked *and klint-checked* code that only
> > calls safe Rust functions cannot be unsafe. I.e. we add additional
> > compile time checking for unsafe code. So if might_sleep() has the
> > proper klint annotation, and we actually enable klint for kernel code,
> > then we can make it safe (along with preemption disable functions being
> > safe).
> > 
> > > where you use a sleeping function in atomic context. Depending on why
> > > you are in atomic context, it might appear to work, until it does not
> > > actually work, and bad things happen. So it is not might_sleep() which
> > > is unsafe, it is the Rust code calling it.
> > 
> > The whole point of unsafe functions is that calling it may result into
> > unsafe code, so that's why all extern "C" functions are unsafe, so are
> > might_sleep() (without klint in the picture).
> 
> There is a psychological part to this. might_sleep() is a good debug
> tool, which costs very little in normal builds, but finds logic bugs
> when enabled in debug builds. What we don't want is Rust developers
> not scattering it though their code because it adds unsafe code, and
> the aim is not to have any unsafe code.
> 

Sure, but my point is these need to be put together into a proper
design. For example, spin_lock() is currently exposed into Rust as a
safe API lock(), so the following code is unsafe:

	let g = lock1.lock(); // lock1 is a spinlock
	might_sleep();
	drop(g);

without the klint rule, if we want to mark might_sleep() as safe, then
we need to mark lock() as unsafe, otherwise, it's an unsafe code block
constructed by pure safe functions. However, compared to might_sleep(),
I think we would like keep lock() as safe since it is used more widely.

Regards,
Boqun

> 	Andrew

Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

Posted by FUJITA Tomonori 1 year, 4 months ago

On Sun, 6 Oct 2024 16:45:21 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Sat, Oct 05, 2024 at 03:22:23PM -0700, Boqun Feng wrote:
>> On Sat, Oct 05, 2024 at 08:32:01PM +0200, Andrew Lunn wrote:
>> > > might_sleep() is called via a wrapper so the __FILE__ and __LINE__
>> > > debug info with CONFIG_DEBUG_ATOMIC_SLEEP enabled isn't what we
>> > > expect; the wrapper instead of the caller.
>> > 
>> > So not very useful. All we know is that somewhere in Rust something is
>> > sleeping in atomic context. Is it possible to do better? Does __FILE__
>> > and __LINE__ exist in Rust?
>> > 
>> 
>> Sure, you can use: 
>> 
>> 	https://doc.rust-lang.org/core/macro.line.html
> 
> So i guess might_sleep() needs turning into some sort of macro, calling
> __might_sleep(__FILE__, __LINE__); might_resched();

Yeah, I think we could do such.

Or we could drop the might_sleep call here? We might be able to expect
the improvement C support in Rust in the future.