[v1] Give more control of sync_state()

[PATCH v1 1/2] driver core: Add fw_devlink.sync_state command line param

Posted by Saravana Kannan 2 years, 11 months ago

When all devices that could probe have finished probing, this parameter
controls what to do with devices that haven't yet received their
sync_state() calls.

fw_devlink.sync_state=strict is the default and the driver core will
continue waiting on consumers to probe successfully in the future. This
is the default behavior since calling sync_state() when all the
consumers haven't probed could make some systems unusable/unstable.

fw_devlink.sync_state=timeout will cause the driver core to give up
waiting on consumers and call sync_state() on any devices that haven't
yet received their sync_state() calls. This option is provided for
systems that won't become unusable/unstable as they might be able to
save power (depends on state of hardware before kernel starts) if all
devices get their sync_state().

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 .../admin-guide/kernel-parameters.txt         | 12 ++++
 drivers/base/base.h                           |  1 +
 drivers/base/core.c                           | 58 +++++++++++++++++++
 drivers/base/dd.c                             |  6 ++
 4 files changed, 77 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6cfa6e3996cf..f0bf2f40af64 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1594,6 +1594,18 @@
 			dependencies. This only applies for fw_devlink=on|rpm.
 			Format: <bool>
 
+	fw_devlink.sync_state =
+			[KNL] When all devices that could probe have finished
+			probing, this parameter controls what to do with
+			devices that haven't yet received their sync_state()
+			calls.
+			Format: { strict | timeout }
+			strict -- Default. Continue waiting on consumers to
+				probe successfully.
+			timeout -- Give up waiting on consumers and call
+				sync_state() on any devices that haven't yet
+				received their sync_state() calls.
+
 	gamecon.map[2|3]=
 			[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
 			support via parallel port (up to 5 devices per port)
diff --git a/drivers/base/base.h b/drivers/base/base.h
index 726a12a244c0..6fcd71803d35 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -209,6 +209,7 @@ extern void device_links_no_driver(struct device *dev);
 extern bool device_links_busy(struct device *dev);
 extern void device_links_unbind_consumers(struct device *dev);
 extern void fw_devlink_drivers_done(void);
+extern void fw_devlink_probing_done(void);
 
 /* device pm support */
 void device_pm_move_to_tail(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index f9297c68214a..929ec218f180 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1727,6 +1727,26 @@ static int __init fw_devlink_strict_setup(char *arg)
 }
 early_param("fw_devlink.strict", fw_devlink_strict_setup);
 
+#define FW_DEVLINK_SYNC_STATE_STRICT	0
+#define FW_DEVLINK_SYNC_STATE_TIMEOUT	1
+
+static int fw_devlink_sync_state;
+static int __init fw_devlink_sync_state_setup(char *arg)
+{
+	if (!arg)
+		return -EINVAL;
+
+	if (strcmp(arg, "strict") == 0) {
+		fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_STRICT;
+		return 0;
+	} else if (strcmp(arg, "timeout") == 0) {
+		fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_TIMEOUT;
+		return 0;
+	}
+	return -EINVAL;
+}
+early_param("fw_devlink.sync_state", fw_devlink_sync_state_setup);
+
 static inline u32 fw_devlink_get_flags(u8 fwlink_flags)
 {
 	if (fwlink_flags & FWLINK_FLAG_CYCLE)
@@ -1797,6 +1817,44 @@ void fw_devlink_drivers_done(void)
 	device_links_write_unlock();
 }
 
+static int fw_devlink_dev_sync_state(struct device *dev, void *data)
+{
+	struct device_link *link = to_devlink(dev);
+	struct device *sup = link->supplier;
+
+	if (!(link->flags & DL_FLAG_MANAGED) ||
+	    link->status == DL_STATE_ACTIVE || sup->state_synced ||
+	    !dev_has_sync_state(sup))
+		return 0;
+
+	if (fw_devlink_sync_state == FW_DEVLINK_SYNC_STATE_STRICT) {
+		dev_warn(sup, "sync_state() pending due to %s\n",
+			 dev_name(link->consumer));
+		return 0;
+	}
+
+	if (!list_empty(&sup->links.defer_sync))
+		return 0;
+
+	dev_warn(sup, "Timed out. Calling sync_state()\n");
+	sup->state_synced = true;
+	get_device(sup);
+	list_add_tail(&sup->links.defer_sync, data);
+
+	return 0;
+}
+
+void fw_devlink_probing_done(void)
+{
+	LIST_HEAD(sync_list);
+
+	device_links_write_lock();
+	class_for_each_device(&devlink_class, NULL, &sync_list,
+			      fw_devlink_dev_sync_state);
+	device_links_write_unlock();
+	device_links_flush_sync_list(&sync_list, NULL);
+}
+
 /**
  * wait_for_init_devices_probe - Try to probe any device needed for init
  *
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 8def2ba08a82..84f07e0050dd 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -315,6 +315,8 @@ static void deferred_probe_timeout_work_func(struct work_struct *work)
 	list_for_each_entry(p, &deferred_probe_pending_list, deferred_probe)
 		dev_info(p->device, "deferred probe pending\n");
 	mutex_unlock(&deferred_probe_mutex);
+
+	fw_devlink_probing_done();
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
 
@@ -364,6 +366,10 @@ static int deferred_probe_initcall(void)
 		schedule_delayed_work(&deferred_probe_timeout_work,
 			driver_deferred_probe_timeout * HZ);
 	}
+
+	if (!IS_ENABLED(CONFIG_MODULES))
+		fw_devlink_probing_done();
+
 	return 0;
 }
 late_initcall(deferred_probe_initcall);
-- 
2.39.2.637.g21b0678d19-goog

Re: [PATCH v1 1/2] driver core: Add fw_devlink.sync_state command line param

Posted by Doug Anderson 2 years, 11 months ago

Hi,

On Thu, Feb 23, 2023 at 11:05 PM Saravana Kannan <saravanak@google.com> wrote:
>
> When all devices that could probe have finished probing, this parameter
> controls what to do with devices that haven't yet received their
> sync_state() calls.
>
> fw_devlink.sync_state=strict is the default and the driver core will
> continue waiting on consumers to probe successfully in the future.

This description is misleading / borderline wrong. You say that when
"sync_state=strict" that you'll wait on consumers to probe
successfully in the future. As talked about below, I think that when
the pre-existing "deferred_probe_timeout" (which you're tying into)
expires, it's unlikely that devices will probe successfully in the
future. Sure, it's possible, but in general once the
"deferred_probe_timeout" expires then the system is done waiting for
new devices to show up. While it's still _possible_ to add new
devices, you need to take care to deal with the fact that some
important devices might have already given up and also that you're
adding these new devices in strict dependency order...

IMO better would be to say something like when sync_state=strict that
you'll just leave resources in a high power state if not all devices
have shown up and the system thinks probing is done.

> This
> is the default behavior since calling sync_state() when all the
> consumers haven't probed could make some systems unusable/unstable.
>
> fw_devlink.sync_state=timeout will cause the driver core to give up
> waiting on consumers and call sync_state() on any devices that haven't
> yet received their sync_state() calls. This option is provided for
> systems that won't become unusable/unstable as they might be able to
> save power (depends on state of hardware before kernel starts) if all
> devices get their sync_state().

While I don't object to this being a kernel command line flag, the
default should also be a Kconfig option. The kernel command line is
not a great place for general configuration. As we jam too much stuff
in the kernel command line it gets unwieldy quickly. IMO:

* Kconfig: the right place for stuff for config options that a person
building the kernel might want to tweak.

* Kernel command line: the right place for a user of a pre-built
kernel to tweak; also (sometimes) the right place for the bootloader
to pass info to the kernel; also a good place for debug options that a
kernel engineer might want to tweak w/out rebuilding the kernel.

In this case it makes sense for the person building the kernel to
choose a default that makes sense for the hardware that their kernel
is targetting. It can also make sense for a user of a pre-built kernel
to tweak this if their hardware isn't working correctly. Thus it makes
sense for Kconfig to choose the default and the kernel command line to
override.

> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  .../admin-guide/kernel-parameters.txt         | 12 ++++
>  drivers/base/base.h                           |  1 +
>  drivers/base/core.c                           | 58 +++++++++++++++++++
>  drivers/base/dd.c                             |  6 ++
>  4 files changed, 77 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6cfa6e3996cf..f0bf2f40af64 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1594,6 +1594,18 @@
>                         dependencies. This only applies for fw_devlink=on|rpm.
>                         Format: <bool>
>
> +       fw_devlink.sync_state =

Is there a reason this is nested under "fw_devlink"? The sysfs
attribute "sync_state" that you modify in patch #2 doesn't reference
"fw_devlink" at all.

> +                       [KNL] When all devices that could probe have finished
> +                       probing, this parameter controls what to do with
> +                       devices that haven't yet received their sync_state()
> +                       calls.
> +                       Format: { strict | timeout }
> +                       strict -- Default. Continue waiting on consumers to
> +                               probe successfully.
> +                       timeout -- Give up waiting on consumers and call
> +                               sync_state() on any devices that haven't yet
> +                               received their sync_state() calls.

Some description needs to be included about how long the timeout is.
Specifically, tie it into the "deferred_probe_timeout" feature since
that's what you're using.

> +
>         gamecon.map[2|3]=
>                         [HW,JOY] Multisystem joystick and NES/SNES/PSX pad
>                         support via parallel port (up to 5 devices per port)
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 726a12a244c0..6fcd71803d35 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -209,6 +209,7 @@ extern void device_links_no_driver(struct device *dev);
>  extern bool device_links_busy(struct device *dev);
>  extern void device_links_unbind_consumers(struct device *dev);
>  extern void fw_devlink_drivers_done(void);
> +extern void fw_devlink_probing_done(void);
>
>  /* device pm support */
>  void device_pm_move_to_tail(struct device *dev);
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index f9297c68214a..929ec218f180 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1727,6 +1727,26 @@ static int __init fw_devlink_strict_setup(char *arg)
>  }
>  early_param("fw_devlink.strict", fw_devlink_strict_setup);
>
> +#define FW_DEVLINK_SYNC_STATE_STRICT   0
> +#define FW_DEVLINK_SYNC_STATE_TIMEOUT  1

I don't care tons, but I feel like this should be an enum, or a bool.

> +
> +static int fw_devlink_sync_state;
> +static int __init fw_devlink_sync_state_setup(char *arg)
> +{
> +       if (!arg)
> +               return -EINVAL;
> +
> +       if (strcmp(arg, "strict") == 0) {
> +               fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_STRICT;
> +               return 0;
> +       } else if (strcmp(arg, "timeout") == 0) {
> +               fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_TIMEOUT;
> +               return 0;
> +       }
> +       return -EINVAL;
> +}
> +early_param("fw_devlink.sync_state", fw_devlink_sync_state_setup);
> +
>  static inline u32 fw_devlink_get_flags(u8 fwlink_flags)
>  {
>         if (fwlink_flags & FWLINK_FLAG_CYCLE)
> @@ -1797,6 +1817,44 @@ void fw_devlink_drivers_done(void)
>         device_links_write_unlock();
>  }
>
> +static int fw_devlink_dev_sync_state(struct device *dev, void *data)
> +{
> +       struct device_link *link = to_devlink(dev);
> +       struct device *sup = link->supplier;
> +
> +       if (!(link->flags & DL_FLAG_MANAGED) ||
> +           link->status == DL_STATE_ACTIVE || sup->state_synced ||
> +           !dev_has_sync_state(sup))
> +               return 0;
> +
> +       if (fw_devlink_sync_state == FW_DEVLINK_SYNC_STATE_STRICT) {
> +               dev_warn(sup, "sync_state() pending due to %s\n",
> +                        dev_name(link->consumer));

This warning message is (IMO) an important feature of your patch. IMO
it deserves a mention in the commit message and even if (for some
reason) we decide we don't like the concept of forcing sync_state
after a timeout then we should still find a way to get this warning
message printed out. Maybe promote it to its own patch?

Specifically, I think this warning message gets printed out after
we've given up waiting for devices to show up. At this point
-EPROBE_DEFER becomes an error that we won't retry. That means that we
expect that sync state will _never_ be called in the future and that
resources will be left enabled / in a higher power state than needed.

I would perhaps also make it sound a little scarier since, IMO, this
is a problem that really shouldn't be "shipped" if this is an embedded
kernel. Maybe something like:

  sync_state pending (%s); resources left in high power state

> +               return 0;
> +       }
> +
> +       if (!list_empty(&sup->links.defer_sync))
> +               return 0;
> +
> +       dev_warn(sup, "Timed out. Calling sync_state()\n");

nit: since you aren't directly calling it after this print (you're
adding it to the queue), maybe change to "Forcing sync_state()".

Re: [PATCH v1 1/2] driver core: Add fw_devlink.sync_state command line param

Posted by Saravana Kannan 2 years, 11 months ago

On Tue, Feb 28, 2023 at 2:33 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Thu, Feb 23, 2023 at 11:05 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > When all devices that could probe have finished probing, this parameter
> > controls what to do with devices that haven't yet received their
> > sync_state() calls.
> >
> > fw_devlink.sync_state=strict is the default and the driver core will
> > continue waiting on consumers to probe successfully in the future.
>
> This description is misleading / borderline wrong.

I definitely disagree on this being borderline wrong -- it's
definitely correct. I'm open to suggestions for clarification. I've
tweaked the commit text for clarity for v2.

> You say that when
> "sync_state=strict" that you'll wait on consumers to probe
> successfully in the future.

This is very true. I'll add "before sync_state() is called" to the end
of the sentence if that'll clarify things.

> As talked about below, I think that when
> the pre-existing "deferred_probe_timeout" (which you're tying into)
> expires, it's unlikely that devices will probe successfully in the
> future. Sure, it's possible, but in general once the
> "deferred_probe_timeout" expires then the system is done waiting for
> new devices to show up. While it's still _possible_ to add new
> devices, you need to take care to deal with the fact that some
> important devices might have already given up and also that you're
> adding these new devices in strict dependency order...

But after all this, it's still possible for consumers to probe and
that's what the "strict" option is doing and that's what I'm saying in
the commit text.

> IMO better would be to say something like when sync_state=strict that
> you'll just leave resources in a high power state

But this statement is not true either. Just because a device driver
has a sync_state() doesn't mean the device was left in a powered on
state by the bootloader.

> if not all devices
> have shown up and the system thinks probing is done.

And this isn't true either. It's not "all devices have shown up". It's
"all consumers have shown up".

> > This
> > is the default behavior since calling sync_state() when all the
> > consumers haven't probed could make some systems unusable/unstable.
> >
> > fw_devlink.sync_state=timeout will cause the driver core to give up
> > waiting on consumers and call sync_state() on any devices that haven't
> > yet received their sync_state() calls. This option is provided for
> > systems that won't become unusable/unstable as they might be able to
> > save power (depends on state of hardware before kernel starts) if all
> > devices get their sync_state().

I've tried to be equally "could" about both options -- because that's
the reality. So you could have a system where you won't have any power
saving or have any stability issues independent of strict/timeout.

> While I don't object to this being a kernel command line flag, the
> default should also be a Kconfig option. The kernel command line is
> not a great place for general configuration. As we jam too much stuff
> in the kernel command line it gets unwieldy quickly. IMO:
>
> * Kconfig: the right place for stuff for config options that a person
> building the kernel might want to tweak.
>
> * Kernel command line: the right place for a user of a pre-built
> kernel to tweak; also (sometimes) the right place for the bootloader
> to pass info to the kernel; also a good place for debug options that a
> kernel engineer might want to tweak w/out rebuilding the kernel.
>
> In this case it makes sense for the person building the kernel to
> choose a default that makes sense for the hardware that their kernel
> is targetting. It can also make sense for a user of a pre-built kernel
> to tweak this if their hardware isn't working correctly. Thus it makes
> sense for Kconfig to choose the default and the kernel command line to
> override.

I don't mind adding a Kconfig to select the default behavior, but
maybe as a separate patch in the future so if there's any debate about
that, you'll at least get this option.

> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         | 12 ++++
> >  drivers/base/base.h                           |  1 +
> >  drivers/base/core.c                           | 58 +++++++++++++++++++
> >  drivers/base/dd.c                             |  6 ++
> >  4 files changed, 77 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 6cfa6e3996cf..f0bf2f40af64 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1594,6 +1594,18 @@
> >                         dependencies. This only applies for fw_devlink=on|rpm.
> >                         Format: <bool>
> >
> > +       fw_devlink.sync_state =
>
> Is there a reason this is nested under "fw_devlink"? The sysfs
> attribute "sync_state" that you modify in patch #2 doesn't reference
> "fw_devlink" at all.

Because one of the main reasons for writing fw_devlink is for
sync_state() to be meaningful. Without fw_devlink, sync_state()
happens right after probe() or late_initcall() whichever is later. So,
it's effectively pointless for any module without fw_devlink.
sync_state() documentation also says that it's based on device links.
So they are both very closely tied to each other.

> > +                       [KNL] When all devices that could probe have finished
> > +                       probing, this parameter controls what to do with
> > +                       devices that haven't yet received their sync_state()
> > +                       calls.
> > +                       Format: { strict | timeout }
> > +                       strict -- Default. Continue waiting on consumers to
> > +                               probe successfully.
> > +                       timeout -- Give up waiting on consumers and call
> > +                               sync_state() on any devices that haven't yet
> > +                               received their sync_state() calls.
>
> Some description needs to be included about how long the timeout is.
> Specifically, tie it into the "deferred_probe_timeout" feature since
> that's what you're using.

Thanks! I meant to do this, but forgot.

> > +
> >         gamecon.map[2|3]=
> >                         [HW,JOY] Multisystem joystick and NES/SNES/PSX pad
> >                         support via parallel port (up to 5 devices per port)
> > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > index 726a12a244c0..6fcd71803d35 100644
> > --- a/drivers/base/base.h
> > +++ b/drivers/base/base.h
> > @@ -209,6 +209,7 @@ extern void device_links_no_driver(struct device *dev);
> >  extern bool device_links_busy(struct device *dev);
> >  extern void device_links_unbind_consumers(struct device *dev);
> >  extern void fw_devlink_drivers_done(void);
> > +extern void fw_devlink_probing_done(void);
> >
> >  /* device pm support */
> >  void device_pm_move_to_tail(struct device *dev);
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index f9297c68214a..929ec218f180 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1727,6 +1727,26 @@ static int __init fw_devlink_strict_setup(char *arg)
> >  }
> >  early_param("fw_devlink.strict", fw_devlink_strict_setup);
> >
> > +#define FW_DEVLINK_SYNC_STATE_STRICT   0
> > +#define FW_DEVLINK_SYNC_STATE_TIMEOUT  1
>
> I don't care tons, but I feel like this should be an enum, or a bool.
>
>
> > +
> > +static int fw_devlink_sync_state;
> > +static int __init fw_devlink_sync_state_setup(char *arg)
> > +{
> > +       if (!arg)
> > +               return -EINVAL;
> > +
> > +       if (strcmp(arg, "strict") == 0) {
> > +               fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_STRICT;
> > +               return 0;
> > +       } else if (strcmp(arg, "timeout") == 0) {
> > +               fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_TIMEOUT;
> > +               return 0;
> > +       }
> > +       return -EINVAL;
> > +}
> > +early_param("fw_devlink.sync_state", fw_devlink_sync_state_setup);
> > +
> >  static inline u32 fw_devlink_get_flags(u8 fwlink_flags)
> >  {
> >         if (fwlink_flags & FWLINK_FLAG_CYCLE)
> > @@ -1797,6 +1817,44 @@ void fw_devlink_drivers_done(void)
> >         device_links_write_unlock();
> >  }
> >
> > +static int fw_devlink_dev_sync_state(struct device *dev, void *data)
> > +{
> > +       struct device_link *link = to_devlink(dev);
> > +       struct device *sup = link->supplier;
> > +
> > +       if (!(link->flags & DL_FLAG_MANAGED) ||
> > +           link->status == DL_STATE_ACTIVE || sup->state_synced ||
> > +           !dev_has_sync_state(sup))
> > +               return 0;
> > +
> > +       if (fw_devlink_sync_state == FW_DEVLINK_SYNC_STATE_STRICT) {
> > +               dev_warn(sup, "sync_state() pending due to %s\n",
> > +                        dev_name(link->consumer));
>
> This warning message is (IMO) an important feature of your patch. IMO
> it deserves a mention in the commit message

Sure, I can mention it in the commit text.

> and even if (for some
> reason) we decide we don't like the concept of forcing sync_state
> after a timeout then we should still find a way to get this warning
> message printed out. Maybe promote it to its own patch?

ehhh...

> Specifically, I think this warning message gets printed out after
> we've given up waiting for devices to show up. At this point
> -EPROBE_DEFER becomes an error that we won't retry.

This is not true. We will always retry on an -EPROBE_DEFER, even after timeout.

> That means that we
> expect that sync state will _never_ be called in the future

Not true. I can be called after the timeout.

> and that
> resources will be left enabled / in a higher power state than needed.

Higher power state is also not always true (as described earlier).

> I would perhaps also make it sound a little scarier since,

I definitely don't want to make it sound scarier and get everyone to
enable the timeout by default without actually knowing if it has a
power impact on their system.

> IMO, this
> is a problem that really shouldn't be "shipped" if this is an embedded
> kernel. Maybe something like:

This is how it's shipped on all Android devices in the past 2 years.
So it's not a global problem like you make it to be.

>
>   sync_state pending (%s); resources left in high power state

I don't want to be alarmist. But I also agree that it's worth
highlighting. And I think the current message is the best compromise.
I even made it a dev_warn() instead of dev_info() or dev_dbg() just to
keep you happy :)

>
> > +               return 0;
> > +       }
> > +
> > +       if (!list_empty(&sup->links.defer_sync))
> > +               return 0;
> > +
> > +       dev_warn(sup, "Timed out. Calling sync_state()\n");
>
> nit: since you aren't directly calling it after this print (you're
> adding it to the queue), maybe change to "Forcing sync_state()".

Thanks. Will do.


-Saravana

Re: [PATCH v1 1/2] driver core: Add fw_devlink.sync_state command line param

Posted by Doug Anderson 2 years, 11 months ago

Hi,

On Fri, Mar 3, 2023 at 4:53 PM Saravana Kannan <saravanak@google.com> wrote:
>
> > IMO better would be to say something like when sync_state=strict that
> > you'll just leave resources in a high power state
>
> But this statement is not true either. Just because a device driver
> has a sync_state() doesn't mean the device was left in a powered on
> state by the bootloader.

Though I guess it's theoretically possible that a device using
sync_state will leave resources in a _lower_ power state until
sync_state is reached, I'm skeptical if that actually happens. Can you
point to any examples? The sync state docs
"sysfs-devices-state_synced" actually document that the common case is
when the bootloader left a resource enabled and we won't disable the
resource until sync_state is reached. That's almost certainly a higher
power state.

I would also point to one of the users of sync_state: the interconnect
framework. Take a look at commit b1d681d8d324 ("interconnect: Add sync
state support"). You can see that in icc_node_add() if we can't read
the bandwidth at bootup we end up at the max (INT_MAX). That's exactly
the case we actually hit for Qualcomm. It's not that we just avoid
touching the resources until sync state is reached--we actually max it
out.

In general, something feels a bit awkward here in defining this as
"however the bootloader left it". That concept makes sense for things
where we need to manage a handoff from the bootloader for the kernel,
but it's not the answer for all things. The bootloader's job is to
boot the system and get out of the way, not to init all resources. It
only inits resources that it cares about. That means if the bootloader
displays a splash screen then it might init resources for the display.
if it doesn't display a splash screen it might not. The kernel needs
to handle either case.

In general, the problems being solved with sync_state seem to require
resources to be left on and in high power until sync state is reached.
Today, you define that as "the state the bootloader left it in".
...but if the bootloader didn't leave it in a high power state then
you'd need to change this definition.

If you truly want to couch the verbiage, I guess I'd be OK with saying
"when sync_state=strict that you'll _LIKELY_ leave resources in a high
power state if sync_state is never reached"

> > While I don't object to this being a kernel command line flag, the
> > default should also be a Kconfig option. The kernel command line is
> > not a great place for general configuration. As we jam too much stuff
> > in the kernel command line it gets unwieldy quickly. IMO:
> >
> > * Kconfig: the right place for stuff for config options that a person
> > building the kernel might want to tweak.
> >
> > * Kernel command line: the right place for a user of a pre-built
> > kernel to tweak; also (sometimes) the right place for the bootloader
> > to pass info to the kernel; also a good place for debug options that a
> > kernel engineer might want to tweak w/out rebuilding the kernel.
> >
> > In this case it makes sense for the person building the kernel to
> > choose a default that makes sense for the hardware that their kernel
> > is targetting. It can also make sense for a user of a pre-built kernel
> > to tweak this if their hardware isn't working correctly. Thus it makes
> > sense for Kconfig to choose the default and the kernel command line to
> > override.
>
> I don't mind adding a Kconfig to select the default behavior, but
> maybe as a separate patch in the future so if there's any debate about
> that, you'll at least get this option.

I don't mind it being a separate patch, but it should be part of the
initial series.

> > Specifically, I think this warning message gets printed out after
> > we've given up waiting for devices to show up. At this point
> > -EPROBE_DEFER becomes an error that we won't retry.
>
> This is not true. We will always retry on an -EPROBE_DEFER, even after timeout.

OK, so I think this is the main point of contention here, so let's get
to the bottom of it first and then we can address anything else.

I guess I'm trying to figure out what "deferred_probe_timeout" is
supposed to be about. From reading
driver_deferred_probe_check_state(), I see that the idea is that once
the timeout expires then we'll start returning -ETIMEDOUT when we used
to return -EPROBE_DEFER. I guess I mispoke then. You're correct that
-EPROBE_DEFER will still be retried. That being said, things that used
to be retired (because they returned -EPROBE_DEFER) will now become
permanent/non-retired errors (because they return -ETIMEDOUT).

My point is that if we ever actually hit that case (where we return
-ETIMEDOUT instead of -EPROBE_DEFER) we really enter a state where
it's not going to be great to load any more drivers. Once a driver
failed to probe (because it got back an -ETIMEDOUT instead of
-EPROBE_DEFER) then the user needs to manually unbind/rebind the
device to retry. That's not a good state.

So the above is the crux of my argument that once
"deferred_probe_timeout" fires that the system really isn't in good
shape to load more drivers.

So looking more carefully, I think I can understand where you're
coming from. Specifically I note that very few subsystems have "opted
in" to the deferred_probe_timeout on ToT. I can also see that recently
you made the effort to delete driver_deferred_probe_check_state(),
though those were reverted. That means that, as it stands, devices
will _probably_ not end up with the problem I describe above (unless
they depend on a subsystem that has opted-in). ...and, if your plans
come to fruition, then eventually we'll never hit it.

Where does that leave us? I guess I will step back on my assertion
that when the timeout fires that drivers can't load anymore. Certainly
the state that ToT Linux is in is confusing. "deferred_probe_timeout"
is still documented (in kernel-parameters.txt) to cause us to "give
up" waiting for dependencies. ...and it still causes a few subsystems
to give up. ...but I guess it mostly works.

> > I would perhaps also make it sound a little scarier since,
>
> I definitely don't want to make it sound scarier and get everyone to
> enable the timeout by default without actually knowing if it has a
> power impact on their system.
>
> > IMO, this
> > is a problem that really shouldn't be "shipped" if this is an embedded
> > kernel. Maybe something like:
>
> This is how it's shipped on all Android devices in the past 2 years.
> So it's not a global problem like you make it to be.

You're saying devices _shipped_ but booted up where devices never
reached sync_state? ...and that's not a power consumption problem???
I'm not saying that the sync_state concept couldn't ship, I'm saying
that if this printout shows up in boot logs that it's highly likely
there's a problem that needs to be fixed and that's causing extra
power consumption. That's why I want the printout to sound scarier.

-Doug

Re: [PATCH v1 1/2] driver core: Add fw_devlink.sync_state command line param

Posted by Matthias Kaehlcke 2 years, 11 months ago

On Wed, Mar 08, 2023 at 07:39:03AM -0800, Doug Anderson wrote:
> Hi,
> 
> On Fri, Mar 3, 2023 at 4:53 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > > IMO better would be to say something like when sync_state=strict that
> > > you'll just leave resources in a high power state
> >
> > But this statement is not true either. Just because a device driver
> > has a sync_state() doesn't mean the device was left in a powered on
> > state by the bootloader.
> 
> Though I guess it's theoretically possible that a device using
> sync_state will leave resources in a _lower_ power state until
> sync_state is reached, I'm skeptical if that actually happens. Can you
> point to any examples? The sync state docs
> "sysfs-devices-state_synced" actually document that the common case is
> when the bootloader left a resource enabled and we won't disable the
> resource until sync_state is reached. That's almost certainly a higher
> power state.
> 
> I would also point to one of the users of sync_state: the interconnect
> framework. Take a look at commit b1d681d8d324 ("interconnect: Add sync
> state support"). You can see that in icc_node_add() if we can't read
> the bandwidth at bootup we end up at the max (INT_MAX). That's exactly
> the case we actually hit for Qualcomm. It's not that we just avoid
> touching the resources until sync state is reached--we actually max it
> out.

Another example is commit 3a39049f88e4 ("soc: qcom: rpmhpd: Use highest
corner until sync_state"), which does the same for rpmhpds.

> In general, something feels a bit awkward here in defining this as
> "however the bootloader left it". That concept makes sense for things
> where we need to manage a handoff from the bootloader for the kernel,
> but it's not the answer for all things. The bootloader's job is to
> boot the system and get out of the way, not to init all resources. It
> only inits resources that it cares about. That means if the bootloader
> displays a splash screen then it might init resources for the display.
> if it doesn't display a splash screen it might not. The kernel needs
> to handle either case.
> 
> In general, the problems being solved with sync_state seem to require
> resources to be left on and in high power until sync state is reached.
> Today, you define that as "the state the bootloader left it in".
> ...but if the bootloader didn't leave it in a high power state then
> you'd need to change this definition.
> 
> If you truly want to couch the verbiage, I guess I'd be OK with saying
> "when sync_state=strict that you'll _LIKELY_ leave resources in a high
> power state if sync_state is never reached"
> 
> 
> > > While I don't object to this being a kernel command line flag, the
> > > default should also be a Kconfig option. The kernel command line is
> > > not a great place for general configuration. As we jam too much stuff
> > > in the kernel command line it gets unwieldy quickly. IMO:
> > >
> > > * Kconfig: the right place for stuff for config options that a person
> > > building the kernel might want to tweak.
> > >
> > > * Kernel command line: the right place for a user of a pre-built
> > > kernel to tweak; also (sometimes) the right place for the bootloader
> > > to pass info to the kernel; also a good place for debug options that a
> > > kernel engineer might want to tweak w/out rebuilding the kernel.
> > >
> > > In this case it makes sense for the person building the kernel to
> > > choose a default that makes sense for the hardware that their kernel
> > > is targetting. It can also make sense for a user of a pre-built kernel
> > > to tweak this if their hardware isn't working correctly. Thus it makes
> > > sense for Kconfig to choose the default and the kernel command line to
> > > override.
> >
> > I don't mind adding a Kconfig to select the default behavior, but
> > maybe as a separate patch in the future so if there's any debate about
> > that, you'll at least get this option.
> 
> I don't mind it being a separate patch, but it should be part of the
> initial series.

+1

> > > Specifically, I think this warning message gets printed out after
> > > we've given up waiting for devices to show up. At this point
> > > -EPROBE_DEFER becomes an error that we won't retry.
> >
> > This is not true. We will always retry on an -EPROBE_DEFER, even after timeout.
> 
> OK, so I think this is the main point of contention here, so let's get
> to the bottom of it first and then we can address anything else.
> 
> I guess I'm trying to figure out what "deferred_probe_timeout" is
> supposed to be about. From reading
> driver_deferred_probe_check_state(), I see that the idea is that once
> the timeout expires then we'll start returning -ETIMEDOUT when we used
> to return -EPROBE_DEFER. I guess I mispoke then. You're correct that
> -EPROBE_DEFER will still be retried. That being said, things that used
> to be retired (because they returned -EPROBE_DEFER) will now become
> permanent/non-retired errors (because they return -ETIMEDOUT).
> 
> My point is that if we ever actually hit that case (where we return
> -ETIMEDOUT instead of -EPROBE_DEFER) we really enter a state where
> it's not going to be great to load any more drivers. Once a driver
> failed to probe (because it got back an -ETIMEDOUT instead of
> -EPROBE_DEFER) then the user needs to manually unbind/rebind the
> device to retry. That's not a good state.
> 
> So the above is the crux of my argument that once
> "deferred_probe_timeout" fires that the system really isn't in good
> shape to load more drivers.
> 
> So looking more carefully, I think I can understand where you're
> coming from. Specifically I note that very few subsystems have "opted
> in" to the deferred_probe_timeout on ToT. I can also see that recently
> you made the effort to delete driver_deferred_probe_check_state(),
> though those were reverted. That means that, as it stands, devices
> will _probably_ not end up with the problem I describe above (unless
> they depend on a subsystem that has opted-in). ...and, if your plans
> come to fruition, then eventually we'll never hit it.
> 
> Where does that leave us? I guess I will step back on my assertion
> that when the timeout fires that drivers can't load anymore. Certainly
> the state that ToT Linux is in is confusing. "deferred_probe_timeout"
> is still documented (in kernel-parameters.txt) to cause us to "give
> up" waiting for dependencies. ...and it still causes a few subsystems
> to give up. ...but I guess it mostly works.
> 
> 
> > > I would perhaps also make it sound a little scarier since,
> >
> > I definitely don't want to make it sound scarier and get everyone to
> > enable the timeout by default without actually knowing if it has a
> > power impact on their system.
> >
> > > IMO, this
> > > is a problem that really shouldn't be "shipped" if this is an embedded
> > > kernel. Maybe something like:
> >
> > This is how it's shipped on all Android devices in the past 2 years.
> > So it's not a global problem like you make it to be.
> 
> You're saying devices _shipped_ but booted up where devices never
> reached sync_state? ...and that's not a power consumption problem???
> I'm not saying that the sync_state concept couldn't ship, I'm saying
> that if this printout shows up in boot logs that it's highly likely
> there's a problem that needs to be fixed and that's causing extra
> power consumption. That's why I want the printout to sound scarier.

+1

[PATCH v1 1/2] driver core: Add fw_devlink.sync_state command line param
[PATCH v1 2/2] driver core: Make state_synced device attribute writeable