[PATCH] iio: trigger: Avoid data race

Gyeyoung Baek posted 1 patch 10 months, 3 weeks ago
There is a newer version of this series
drivers/iio/industrialio-trigger.c | 38 +++++++++++++-----------------
1 file changed, 17 insertions(+), 21 deletions(-)
[PATCH] iio: trigger: Avoid data race
Posted by Gyeyoung Baek 10 months, 3 weeks ago
A data race could occur between `atomic_read()` and `atomic_set()`
Use `atomic_cmpxchg_relaxed()` to group them atomically.

Previously the main logic was executed when `use_count` is 0.
Now it returns early when `use_count` is not 0.

Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
---
 drivers/iio/industrialio-trigger.c | 38 +++++++++++++-----------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/drivers/iio/industrialio-trigger.c b/drivers/iio/industrialio-trigger.c
index 54416a384232..33a565037e0d 100644
--- a/drivers/iio/industrialio-trigger.c
+++ b/drivers/iio/industrialio-trigger.c
@@ -194,17 +194,15 @@ static void iio_trigger_notify_done_atomic(struct iio_trigger *trig)
  */
 void iio_trigger_poll(struct iio_trigger *trig)
 {
-	int i;
-
-	if (!atomic_read(&trig->use_count)) {
-		atomic_set(&trig->use_count, CONFIG_IIO_CONSUMERS_PER_TRIGGER);
-
-		for (i = 0; i < CONFIG_IIO_CONSUMERS_PER_TRIGGER; i++) {
-			if (trig->subirqs[i].enabled)
-				generic_handle_irq(trig->subirq_base + i);
-			else
-				iio_trigger_notify_done_atomic(trig);
-		}
+	if (atomic_cmpxchg_relaxed(&trig->use_count, 0,
+				   CONFIG_IIO_CONSUMERS_PER_TRIGGER))
+		return;
+
+	for (int i = 0; i < CONFIG_IIO_CONSUMERS_PER_TRIGGER; i++) {
+		if (trig->subirqs[i].enabled)
+			generic_handle_irq(trig->subirq_base + i);
+		else
+			iio_trigger_notify_done_atomic(trig);
 	}
 }
 EXPORT_SYMBOL(iio_trigger_poll);
@@ -225,17 +223,15 @@ EXPORT_SYMBOL(iio_trigger_generic_data_rdy_poll);
  */
 void iio_trigger_poll_nested(struct iio_trigger *trig)
 {
-	int i;
-
-	if (!atomic_read(&trig->use_count)) {
-		atomic_set(&trig->use_count, CONFIG_IIO_CONSUMERS_PER_TRIGGER);
+	if (atomic_cmpxchg_relaxed(&trig->use_count, 0,
+				   CONFIG_IIO_CONSUMERS_PER_TRIGGER))
+		return;
 
-		for (i = 0; i < CONFIG_IIO_CONSUMERS_PER_TRIGGER; i++) {
-			if (trig->subirqs[i].enabled)
-				handle_nested_irq(trig->subirq_base + i);
-			else
-				iio_trigger_notify_done(trig);
-		}
+	for (int i = 0; i < CONFIG_IIO_CONSUMERS_PER_TRIGGER; i++) {
+		if (trig->subirqs[i].enabled)
+			handle_nested_irq(trig->subirq_base + i);
+		else
+			iio_trigger_notify_done(trig);
 	}
 }
 EXPORT_SYMBOL(iio_trigger_poll_nested);
-- 
2.43.0
Re: [PATCH] iio: trigger: Avoid data race
Posted by Andy Shevchenko 10 months, 3 weeks ago
On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:
>
> A data race could occur between `atomic_read()` and `atomic_set()`
> Use `atomic_cmpxchg_relaxed()` to group them atomically.
>
> Previously the main logic was executed when `use_count` is 0.
> Now it returns early when `use_count` is not 0.

> -       int i;

I don't see the point in changing this line.

...

> -       int i;

Ditto.

...

At bare minimum they are not relevant to the patch change and haven't
been described in the commit messages.


-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH] iio: trigger: Avoid data race
Posted by Gyeyoung Baek 10 months, 3 weeks ago
On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:
> >
> > A data race could occur between `atomic_read()` and `atomic_set()`
> > Use `atomic_cmpxchg_relaxed()` to group them atomically.
> >
> > Previously the main logic was executed when `use_count` is 0.
> > Now it returns early when `use_count` is not 0.
>
> > -       int i;
>
> I don't see the point in changing this line.
> ...
>
> > -       int i;
>
> Ditto.
>
> ...
>
> At bare minimum they are not relevant to the patch change and haven't
> been described in the commit messages.

Hi Andy, thanks for your review.
I initially skipped this part as I thought it was minor.
But on a second look, it seems better to separate the declaration from
the logic.

What do you think about the data race logic? Would it make sense?

>
> --
> With Best Regards,
> Andy Shevchenko
Re: [PATCH] iio: trigger: Avoid data race
Posted by Andy Shevchenko 10 months, 3 weeks ago
On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@gmail.com> wrote:
> On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:

...

> > At bare minimum they are not relevant to the patch change and haven't
> > been described in the commit messages.
>
> Hi Andy, thanks for your review.
> I initially skipped this part as I thought it was minor.
> But on a second look, it seems better to separate the declaration from
> the logic.
>
> What do you think about the data race logic? Would it make sense?

The point is valid, the atomic_read() + atomic_set() is 101 thingy,
whoever did that doesn't really have a clue what atomic(ity) is.

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH] iio: trigger: Avoid data race
Posted by Gyeyoung Baek 10 months, 3 weeks ago
On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@gmail.com> wrote:
> > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:
>
> ...
>
> > > At bare minimum they are not relevant to the patch change and haven't
> > > been described in the commit messages.
> >
> > Hi Andy, thanks for your review.
> > I initially skipped this part as I thought it was minor.
> > But on a second look, it seems better to separate the declaration from
> > the logic.
> >
> > What do you think about the data race logic? Would it make sense?
>
> The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> whoever did that doesn't really have a clue what atomic(ity) is.

Thanks for your explanation.
Then I’ll send a v2 patch with only the `int i` change, following the
review feedback.

--
Best regards,
Gyeyoung
Re: [PATCH] iio: trigger: Avoid data race
Posted by Jonathan Cameron 10 months, 3 weeks ago
On Wed, 28 May 2025 16:17:06 +0900
Gyeyoung Baek <gye976@gmail.com> wrote:

> On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@gmail.com> wrote:  
> > > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > > <andy.shevchenko@gmail.com> wrote:  
> > > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:  
> >
> > ...
> >  
> > > > At bare minimum they are not relevant to the patch change and haven't
> > > > been described in the commit messages.  
> > >
> > > Hi Andy, thanks for your review.
> > > I initially skipped this part as I thought it was minor.
> > > But on a second look, it seems better to separate the declaration from
> > > the logic.
> > >
> > > What do you think about the data race logic? Would it make sense?  
> >
> > The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> > whoever did that doesn't really have a clue what atomic(ity) is.  

:) 

I'm trying to recall what this protection is actually for so this might
be a little vague as descriptions go...

The key here is what can happen in that 'race' and hence why I'm still fairly 
sure it isn't a real race.  Firstly this is called in an irq handler
so we can only have one call of this particular function at a time
for a given trigger.  So no race against itself.

The atomic part is about decrements that can happen elsewhere, but there
can never be 'more' decrements than the value we set the counter to in this
function.  That is it never goes negative.

Those decrements ultimately happen in calls that can't happen until after
the set (via various convoluted paths ultimately getting to
iio_trigger_notify_done()).  In many cases the trigger is masked until it
is reenabled on the counter == 0 (elsewhere) - but not always...

IIRC correctly aim is to not double trigger in cases where we can't mask
the trigger (a particularly rubbish trigger) - so if any of the downstream
devices still hasn't called iio_trigger_notify_done() then we quietly
drop this particular irq on the floor. We don't mind dropping a few
too many, just dropping too few a then we end up loosing count of who
has to be 'done' with the trigger.

Hence the counter won't change between atomic_get and the atomic_set
as it's always 0 which means no one else is left to decrement it.

Atomics don't always need to be atomic all the time, they just are in
some states.

So, is this something that has caused observed problems, or based
on code inspection? My remembering of what was going on here might well
be flawed.

There are some 'fun' corners for what happens after that set though
where a handler can run fast enough in race conditions we end up
hitting 0 in iio_trigger_notify_done_atomic() and have to schedule
restarting of the trigger because that might involve a bus write over
a sleeping bus.  That one was a painful bug report some years ago...

Jonathan

> 
> Thanks for your explanation.
> Then I’ll send a v2 patch with only the `int i` change, following the
> review feedback.
> 
> --
> Best regards,
> Gyeyoung
> 
> 
Re: [PATCH] iio: trigger: Avoid data race
Posted by Gyeyoung Baek 10 months, 3 weeks ago
On Thu, May 29, 2025 at 2:02 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Wed, 28 May 2025 16:17:06 +0900
> Gyeyoung Baek <gye976@gmail.com> wrote:
>
> > On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > >
> > > On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@gmail.com> wrote:
> > > > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > > > <andy.shevchenko@gmail.com> wrote:
> > > > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:
> > >
> > > ...
> > >
> > > > > At bare minimum they are not relevant to the patch change and haven't
> > > > > been described in the commit messages.
> > > >
> > > > Hi Andy, thanks for your review.
> > > > I initially skipped this part as I thought it was minor.
> > > > But on a second look, it seems better to separate the declaration from
> > > > the logic.
> > > >
> > > > What do you think about the data race logic? Would it make sense?
> > >
> > > The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> > > whoever did that doesn't really have a clue what atomic(ity) is.
>
> :)
>
> I'm trying to recall what this protection is actually for so this might
> be a little vague as descriptions go...
>
> The key here is what can happen in that 'race' and hence why I'm still fairly
> sure it isn't a real race.  Firstly this is called in an irq handler
> so we can only have one call of this particular function at a time
> for a given trigger.  So no race against itself.
>
> The atomic part is about decrements that can happen elsewhere, but there
> can never be 'more' decrements than the value we set the counter to in this
> function.  That is it never goes negative.
>
> Those decrements ultimately happen in calls that can't happen until after
> the set (via various convoluted paths ultimately getting to
> iio_trigger_notify_done()).  In many cases the trigger is masked until it
> is reenabled on the counter == 0 (elsewhere) - but not always...
>
> IIRC correctly aim is to not double trigger in cases where we can't mask
> the trigger (a particularly rubbish trigger) - so if any of the downstream
> devices still hasn't called iio_trigger_notify_done() then we quietly
> drop this particular irq on the floor. We don't mind dropping a few
> too many, just dropping too few a then we end up loosing count of who
> has to be 'done' with the trigger.
>
> Hence the counter won't change between atomic_get and the atomic_set
> as it's always 0 which means no one else is left to decrement it.
>
> Atomics don't always need to be atomic all the time, they just are in
> some states.
>
> So, is this something that has caused observed problems, or based
> on code inspection? My remembering of what was going on here might well
> be flawed.

based on code inspection.
initially, I simply assumed a data race because `atomic_read()` and
`atomic_set()` were used.
I didn’t question it further, sorry for that...
However, after reading Jonathan's review, I take a look at the previous commits.
It now seems that there is no data race.

(I wonder why there isn't a wrapper API which does something like
`atomic->counter = val;` for situations like this,
    where only consumers need atomic API but not producers?)

Since synchronization isn't needed here, I think `cmpxchg()` may not
be appropriate.
I considered a few possible ways to improve clarity:

1. Add comments only.
2. Add comments and access directly like `use_count->counter =
CONFIG_IIO_CONSUMERS_PER_TRIGGER;`.
    (Wouldn't it make sense to have an official API for such direct
access in `linux/atomic/~~.h`?)
3. Add a separate bool flag to represent trigger's on/off state.
`iio_trigger_notify_done()` still uses an atomic API, and it sets a
boolean flag when count is 0.
Then `poll()` and `poll_nested()` would simply check the flag without
using atomic API.

I think `3.` seems the best.
I would appreciate your reviews on this.

> There are some 'fun' corners for what happens after that set though
> where a handler can run fast enough in race conditions we end up
> hitting 0 in iio_trigger_notify_done_atomic() and have to schedule
> restarting of the trigger because that might involve a bus write over
> a sleeping bus.  That one was a painful bug report some years ago...
>
> Jonathan
>
> >
> > Thanks for your explanation.
> > Then I’ll send a v2 patch with only the `int i` change, following the
> > review feedback.
> >
> > --
> > Best regards,
> > Gyeyoung
> >
> >
>
Re: [PATCH] iio: trigger: Avoid data race
Posted by Andy Shevchenko 10 months, 3 weeks ago
On Wed, May 28, 2025 at 7:02 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
> On Wed, 28 May 2025 16:17:06 +0900
> Gyeyoung Baek <gye976@gmail.com> wrote:
> > On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > > On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@gmail.com> wrote:
> > > > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > > > <andy.shevchenko@gmail.com> wrote:
> > > > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:

...

> > > > > At bare minimum they are not relevant to the patch change and haven't
> > > > > been described in the commit messages.
> > > >
> > > > I initially skipped this part as I thought it was minor.
> > > > But on a second look, it seems better to separate the declaration from
> > > > the logic.
> > > >
> > > > What do you think about the data race logic? Would it make sense?
> > >
> > > The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> > > whoever did that doesn't really have a clue what atomic(ity) is.
>
> :)
>
> I'm trying to recall what this protection is actually for so this might
> be a little vague as descriptions go...
>
> The key here is what can happen in that 'race' and hence why I'm still fairly
> sure it isn't a real race.  Firstly this is called in an irq handler
> so we can only have one call of this particular function at a time
> for a given trigger.  So no race against itself.
>
> The atomic part is about decrements that can happen elsewhere, but there
> can never be 'more' decrements than the value we set the counter to in this
> function.  That is it never goes negative.
>
> Those decrements ultimately happen in calls that can't happen until after
> the set (via various convoluted paths ultimately getting to
> iio_trigger_notify_done()).  In many cases the trigger is masked until it
> is reenabled on the counter == 0 (elsewhere) - but not always...
>
> IIRC correctly aim is to not double trigger in cases where we can't mask
> the trigger (a particularly rubbish trigger) - so if any of the downstream
> devices still hasn't called iio_trigger_notify_done() then we quietly
> drop this particular irq on the floor. We don't mind dropping a few
> too many, just dropping too few a then we end up loosing count of who
> has to be 'done' with the trigger.
>
> Hence the counter won't change between atomic_get and the atomic_set
> as it's always 0 which means no one else is left to decrement it.
>
> Atomics don't always need to be atomic all the time, they just are in
> some states.

Yes, but this is confusing and/or buggy code to begin with (and
independently on the background of its usage). Technically this patch
(perhaps with a better commit message) is valid. Main two points here:
1) avoiding potential (even theoretical) race; 2) do not spread a
really wrong pattern to avoid people learning from it (in case
somebody takes this code as an example for a new driver which might
reside outside of IIO and hence might not be caught by the reviewers
of _that_ subsystem).

Alternatively we need to get rid of atomic operations, but with the
same effect as 2) above.

> So, is this something that has caused observed problems, or based
> on code inspection? My remembering of what was going on here might well
> be flawed.
>
> There are some 'fun' corners for what happens after that set though
> where a handler can run fast enough in race conditions we end up
> hitting 0 in iio_trigger_notify_done_atomic() and have to schedule
> restarting of the trigger because that might involve a bus write over
> a sleeping bus.  That one was a painful bug report some years ago...

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH] iio: trigger: Avoid data race
Posted by Jonathan Cameron 10 months, 3 weeks ago
On Thu, 29 May 2025 07:42:46 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

> On Wed, May 28, 2025 at 7:02 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> > On Wed, 28 May 2025 16:17:06 +0900
> > Gyeyoung Baek <gye976@gmail.com> wrote:  
> > > On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
> > > <andy.shevchenko@gmail.com> wrote:  
> > > > On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@gmail.com> wrote:  
> > > > > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > > > > <andy.shevchenko@gmail.com> wrote:  
> > > > > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@gmail.com> wrote:  
> 
> ...
> 
> > > > > > At bare minimum they are not relevant to the patch change and haven't
> > > > > > been described in the commit messages.  
> > > > >
> > > > > I initially skipped this part as I thought it was minor.
> > > > > But on a second look, it seems better to separate the declaration from
> > > > > the logic.
> > > > >
> > > > > What do you think about the data race logic? Would it make sense?  
> > > >
> > > > The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> > > > whoever did that doesn't really have a clue what atomic(ity) is.  
> >
> > :)
> >
> > I'm trying to recall what this protection is actually for so this might
> > be a little vague as descriptions go...
> >
> > The key here is what can happen in that 'race' and hence why I'm still fairly
> > sure it isn't a real race.  Firstly this is called in an irq handler
> > so we can only have one call of this particular function at a time
> > for a given trigger.  So no race against itself.
> >
> > The atomic part is about decrements that can happen elsewhere, but there
> > can never be 'more' decrements than the value we set the counter to in this
> > function.  That is it never goes negative.
> >
> > Those decrements ultimately happen in calls that can't happen until after
> > the set (via various convoluted paths ultimately getting to
> > iio_trigger_notify_done()).  In many cases the trigger is masked until it
> > is reenabled on the counter == 0 (elsewhere) - but not always...
> >
> > IIRC correctly aim is to not double trigger in cases where we can't mask
> > the trigger (a particularly rubbish trigger) - so if any of the downstream
> > devices still hasn't called iio_trigger_notify_done() then we quietly
> > drop this particular irq on the floor. We don't mind dropping a few
> > too many, just dropping too few a then we end up loosing count of who
> > has to be 'done' with the trigger.
> >
> > Hence the counter won't change between atomic_get and the atomic_set
> > as it's always 0 which means no one else is left to decrement it.
> >
> > Atomics don't always need to be atomic all the time, they just are in
> > some states.  
> 
> Yes, but this is confusing and/or buggy code to begin with (and
> independently on the background of its usage). Technically this patch
> (perhaps with a better commit message) is valid. Main two points here:
> 1) avoiding potential (even theoretical) race; 

Not a race, even a theoretical one, that I can establish.


> 2) do not spread a
> really wrong pattern to avoid people learning from it (in case
> somebody takes this code as an example for a new driver which might
> reside outside of IIO and hence might not be caught by the reviewers
> of _that_ subsystem).

Hmm. I'd conjecture (I'll pester some uarch folk later to confirm)
that an implementation of cmpxchg might be considerably more costly
than a get followed by a set. Might be cheaper too depending on
implementation.  The compiler should be able to fuse them anyway
if that makes sense for a given uarch (guess I'll pester compiler
folk as well.)

> 
> Alternatively we need to get rid of atomic operations, but with the
> same effect as 2) above.

IIRC to access atomics, atomic the atomic get / set have to be used and
in the decrement path we must ensure only one winner of the race to 0
(because there is an operation to perform on that which must happen only
once).

I'm not against some changes here to make sure the code isn't cut
and paste for other uses (it's pretty specific so I doubt it, but you never
know). That might either be switching to cmpxchg or just adding some comments
on the logic.

Without a clear path to bugs this isn't a fix, so fixes tag isn't appropriate.
 
> 
> > So, is this something that has caused observed problems, or based
> > on code inspection? My remembering of what was going on here might well
> > be flawed.
> >
> > There are some 'fun' corners for what happens after that set though
> > where a handler can run fast enough in race conditions we end up
> > hitting 0 in iio_trigger_notify_done_atomic() and have to schedule
> > restarting of the trigger because that might involve a bus write over
> > a sleeping bus.  That one was a painful bug report some years ago...  
>