[PATCH v3 0/8] timers/migration: Fix three possible races and some improvements

Anna-Maria Behnsen posted 8 patches 1 year, 5 months ago
There is a newer version of this series
include/linux/cpuhotplug.h             |   1 +
include/trace/events/timer_migration.h |   4 +-
kernel/time/timer_migration.c          | 366 ++++++++++++++++-----------------
kernel/time/timer_migration.h          |  27 ++-
4 files changed, 197 insertions(+), 201 deletions(-)
[PATCH v3 0/8] timers/migration: Fix three possible races and some improvements
Posted by Anna-Maria Behnsen 1 year, 5 months ago
Borislav reported a warning in timer migration deactive path

  https://lore.kernel.org/r/20240612090347.GBZmlkc5PwlVpOG6vT@fat_crate.local

Sadly it doesn't reproduce directly. But with the change of timing (by
adding a trace prinkt before the warning), it is possible to trigger the
warning reliable at least in my test setup. The problem here is a racy
check agains group->parent pointer. This is also used in other places in
the code and fixing this racy usage is adressed by the first patch.

There were two other races reported by Frederic in setup path:

  https://lore.kernel.org/r/ZnWOswTMML6ShzYO@localhost.localdomain

  https://lore.kernel.org/r/ZnoIlO22habOyQRe@lothringen

Those races are both is addressed by the change of patch 2.

Some updates/cleanups are provided by patch 3-8. ("timers/migration:
Improve tracing" and "timers/migration: Spare write when nothing changed"
are the same as provided by v2).

Patches are available here:

  https://git.kernel.org/pub/scm/linux/kernel/git/anna-maria/linux-devel.git timers/misc

---
Changes in v3:
- Address the new reported possible race (childmask and parent pointer)
  together with the existing race (both reported by Frederic).
- New cleanup: Two patches to access childmask and parent pointer only in
  one place
- New cleanup: Rename childmask to parentmask as during discussions there
  was some kind of confusion because of the naming
- New cleanup: Fix typo
- Fix prefix in all patches (s$timer_migration$timers/migration$)
- Link to v2: https://lore.kernel.org/r/20240624-tmigr-fixes-v2-0-3eb4c0604790@linutronix.de

Changes in v2:
- Address another possible race in setup code (reported by Frederic) and
  recycle therefore one improvement patch
- Change order and move the already existing improvement patch to the end
  of the queue
- Existing patches didn't change
- Link to v1: https://lore.kernel.org/r/20240621-tmigr-fixes-v1-0-8c8a2d8e8d77@linutronix.de

Thanks,

        Anna-Maria

---
Anna-Maria Behnsen (8):
      timers/migration: Do not rely always on group->parent
      timers/migration: Move hierarchy setup into cpuhotplug prepare callback
      timers/migration: Improve tracing
      timers/migration: Use a single struct for hierarchy walk data
      timers/migration: Read childmask and parent pointer in a single place
      timers/migration: Rename childmask by parentmask to make naming more obvious
      timers/migration: Spare write when nothing changed
      timers/migration: Fix grammar in comment

 include/linux/cpuhotplug.h             |   1 +
 include/trace/events/timer_migration.h |   4 +-
 kernel/time/timer_migration.c          | 366 ++++++++++++++++-----------------
 kernel/time/timer_migration.h          |  27 ++-
 4 files changed, 197 insertions(+), 201 deletions(-)
Re: [PATCH v3 0/8] timers/migration: Fix three possible races and some improvements
Posted by Anna-Maria Behnsen 1 year, 5 months ago
Hi,

(cc Oliver Sang)

Anna-Maria Behnsen <anna-maria@linutronix.de> writes:

> Borislav reported a warning in timer migration deactive path
>
>   https://lore.kernel.org/r/20240612090347.GBZmlkc5PwlVpOG6vT@fat_crate.local
>
> Sadly it doesn't reproduce directly. But with the change of timing (by
> adding a trace prinkt before the warning), it is possible to trigger the
> warning reliable at least in my test setup. The problem here is a racy
> check agains group->parent pointer. This is also used in other places in
> the code and fixing this racy usage is adressed by the first patch.
>
> There were two other races reported by Frederic in setup path:
>
>   https://lore.kernel.org/r/ZnWOswTMML6ShzYO@localhost.localdomain
>
>   https://lore.kernel.org/r/ZnoIlO22habOyQRe@lothringen
>
> Those races are both is addressed by the change of patch 2.
>
> Some updates/cleanups are provided by patch 3-8. ("timers/migration:
> Improve tracing" and "timers/migration: Spare write when nothing changed"
> are the same as provided by v2).
>
> Patches are available here:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/anna-maria/linux-devel.git timers/misc
>

Thomas, please remove this queue when possible from
tip/timers/urgent. There are some things broken and needs to be
fixed. Otherwise we get a Fixes-Fixes-Patch. See report of kernel test
robot:

  https://lore.kernel.org/r/202407101636.d9d4e8be-oliver.sang@intel.com

Two main problems are:
 - wrong CPU hotplug state is used for prepare in cpuhp_setup_state()
 - using this_cpu_ptr() instead of per_cpu_ptr()

Working on preparation of v4.

Thanks,

	Anna-Maria