[PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox

Florian Fainelli posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
drivers/firmware/arm_scmi/transports/Makefile | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
[PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Florian Fainelli 1 month, 3 weeks ago
Broadcom STB platforms have for historical reasons included both
"arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
string.

After the commit cited in the Fixes tag and with a kernel
configuration that enables both the SCMI and the Mailbox transports, we
would probe the mailbox transport, but fail to complete since we would
not have a mailbox driver available.

By keeping the SMC transport objects linked first, we can let the
platform driver, match the compatible string and probe successfully with
no adverse effects on platforms using the mailbox transport.

Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Change-Id: I8e348e3e0deabdc5c1d596929d7f9134793f346e
---
 drivers/firmware/arm_scmi/transports/Makefile | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/arm_scmi/transports/Makefile b/drivers/firmware/arm_scmi/transports/Makefile
index 362a406f08e6..3ba3d3bee151 100644
--- a/drivers/firmware/arm_scmi/transports/Makefile
+++ b/drivers/firmware/arm_scmi/transports/Makefile
@@ -1,8 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0-only
-scmi_transport_mailbox-objs := mailbox.o
-obj-$(CONFIG_ARM_SCMI_TRANSPORT_MAILBOX) += scmi_transport_mailbox.o
+# Keep before scmi_transport_mailbox.o to allow precedence
+# while matching the compatible.
 scmi_transport_smc-objs := smc.o
 obj-$(CONFIG_ARM_SCMI_TRANSPORT_SMC) += scmi_transport_smc.o
+scmi_transport_mailbox-objs := mailbox.o
+obj-$(CONFIG_ARM_SCMI_TRANSPORT_MAILBOX) += scmi_transport_mailbox.o
 scmi_transport_optee-objs := optee.o
 obj-$(CONFIG_ARM_SCMI_TRANSPORT_OPTEE) += scmi_transport_optee.o
 scmi_transport_virtio-objs := virtio.o
-- 
2.34.1
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Sudeep Holla 1 month, 3 weeks ago
On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
> Broadcom STB platforms have for historical reasons included both
> "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
> string.
>

I assume in the same order.

> After the commit cited in the Fixes tag and with a kernel
> configuration that enables both the SCMI and the Mailbox transports, we

^^^^^ s/SCMI/SMC ?

> would probe the mailbox transport, but fail to complete since we would
> not have a mailbox driver available.
>

I always assumed the node compatible match happens from the more specific
compatible(on the left) to the more generic ones(on the right) from the
compatible property list. Looks like that was a wrong assumption then ?

> By keeping the SMC transport objects linked first, we can let the
> platform driver, match the compatible string and probe successfully with
> no adverse effects on platforms using the mailbox transport.
>

I don't have strong objection to the patch itself, happy to get it merged.
Just curious if my understanding of the issue is correct. I think Cristian
has more detailed query, so just responding to that will suffice.

> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
> Change-Id: I8e348e3e0deabdc5c1d596929d7f9134793f346e

Spurious from internal gerrit repo ?

--
Regards,
Sudeep
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Florian Fainelli 1 month, 3 weeks ago
On 10/7/24 06:13, Sudeep Holla wrote:
> On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
>> Broadcom STB platforms have for historical reasons included both
>> "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
>> string.
>>
> 
> I assume in the same order.

That is correct, in that exact order indeed.

> 
>> After the commit cited in the Fixes tag and with a kernel
>> configuration that enables both the SCMI and the Mailbox transports, we
> 
> ^^^^^ s/SCMI/SMC ?

Yes, this should read "SMC" here.

> 
>> would probe the mailbox transport, but fail to complete since we would
>> not have a mailbox driver available.
>>
> 
> I always assumed the node compatible match happens from the more specific
> compatible(on the left) to the more generic ones(on the right) from the
> compatible property list. Looks like that was a wrong assumption then ?

This is the correct assumption, and this worked very well, and we were 
utilizing that as long as all of the transports where "sub" entities 
within the common and single arm_scmi platform device.

When breaking up the transports into individual platform drivers, now 
each one is responsible for matching, and if they are all built-into the 
kernel, they are matching in the order in which they have been linked 
into the kernel.

> 
>> By keeping the SMC transport objects linked first, we can let the
>> platform driver, match the compatible string and probe successfully with
>> no adverse effects on platforms using the mailbox transport.
>>
> 
> I don't have strong objection to the patch itself, happy to get it merged.
> Just curious if my understanding of the issue is correct. I think Cristian
> has more detailed query, so just responding to that will suffice.

Sounds good, thanks.

> 
>> Fixes: b53515fa177c ("firmware: arm_scmi: Make MBOX transport a standalone driver")
>> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
>> Change-Id: I8e348e3e0deabdc5c1d596929d7f9134793f346e
> 
> Spurious from internal gerrit repo ?

Indeed, will post v2 with the typo you highlighted and that remove, and 
any additional explanation Cristian deems necessary to add, thanks!
-- 
Florian
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Cristian Marussi 1 month, 3 weeks ago
On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
> Broadcom STB platforms have for historical reasons included both
> "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
> string.

Hi Florian,

did not know this..

> 
> After the commit cited in the Fixes tag and with a kernel
> configuration that enables both the SCMI and the Mailbox transports, we
> would probe the mailbox transport, but fail to complete since we would
> not have a mailbox driver available.
>
Not sure to have understood this...

...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled
into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in
DT, BUT your platform does NOT physically have a mbox/shmem transport
and as a consequence, when MBOX probes (at first), you see an error from
the core like:

    "arm-scmi: unable to communicate with SCMI"

since it gets no reply from the SCMI server (being not connnected via
mbox) and it bails out .... am I right ?

If this is the case, without this patch, after this error and the mbox probe
failing, the SMC transport, instead, DO probe successfully at the end, right ?

IOW, what is the impact without this patch, an error and a delay in the
probe sequence till it gets to the SMC transport probe 9as second
attempt) or worse ? (trying to understand here...)

Thanks,
Cristian
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Florian Fainelli 1 month, 3 weeks ago
Hi Cristian,

On October 7, 2024 4:52:33 AM PDT, Cristian Marussi 
<cristian.marussi@arm.com> wrote:
>On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
>> Broadcom STB platforms have for historical reasons included both
>> "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
>> string.
>
>Hi Florian,
>
>did not know this..

It stems from us starting with a mailbox driver that did the SMC call, 
and later transitioning to the "smc" transport proper. Our boot loader 
provides the Device Tree blob to the kernel and we maintain 
backward/forward compatibility as much as possible.

>
>> 
>> After the commit cited in the Fixes tag and with a kernel
>> configuration that enables both the SCMI and the Mailbox transports, we
>> would probe the mailbox transport, but fail to complete since we would
>> not have a mailbox driver available.
>>
>Not sure to have understood this...
>
>...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled
>into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in
>DT, BUT your platform does NOT physically have a mbox/shmem transport
>and as a consequence, when MBOX probes (at first), you see an error from
>the core like:
>
>    "arm-scmi: unable to communicate with SCMI"
>
>since it gets no reply from the SCMI server (being not connnected via
>mbox) and it bails out .... am I right ?

In an unmodified kernel where both the "mailbox" and "smc" transports 
are enabled, we get the "mailbox" driver to probe first since it matched 
the "arm,scmi" part of the compatible string and it is linked first into 
the kernel. Down the road though we will fail the initialization with:

[    1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport
[    1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms
[    1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for 
protocol:0x10
[    1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup 
channels
[    1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi 
failed with error -22

Because the platform device is now bound, and there is no mechanism to 
return -ENODEV, we won't try another transport driver that would attempt 
to match the other compatibility strings. That makes sense because in 
general you specify the Device Tree precisely, and you also have a 
tailored kernel configuration. Right now this is only an issue using 
arm's multi_v7_defconfig and arm64's defconfig both of which that we 
intend to keep on using for CI purposes.


>
>If this is the case, without this patch, after this error and the mbox probe
>failing, the SMC transport, instead, DO probe successfully at the end, right ?

With my patch we probe the "smc" transport first and foremost and we 
successfully initialize it, therefore we do not even try the "mailbox" 
transport at all, which is intended.

>
>IOW, what is the impact without this patch, an error and a delay in the
>probe sequence till it gets to the SMC transport probe 9as second
>attempt) or worse ? (trying to understand here...)

There is no recovery without the patch, we are not giving up the 
arm_scmi platform device because there is no mechanism to return -ENODEV 
and allow any of the subsequent transport drivers enabled to attempt to 
take over the platform device and probe it again.

Thanks!
--
Florian
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Sudeep Holla 1 month, 2 weeks ago
Hi Florian,

Thanks for the detailed explanation.

On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote:
> Hi Cristian,
>
> On October 7, 2024 4:52:33 AM PDT, Cristian Marussi
> <cristian.marussi@arm.com> wrote:
> > On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
> > > Broadcom STB platforms have for historical reasons included both
> > > "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
> > > string.
> >
> > Hi Florian,
> >
> > did not know this..
>
> It stems from us starting with a mailbox driver that did the SMC call, and
> later transitioning to the "smc" transport proper. Our boot loader provides
> the Device Tree blob to the kernel and we maintain backward/forward
> compatibility as much as possible.
>

IIUC, you need to support old kernel with SMC mailbox driver and new SMC
transport within the SCMI. Is that right understanding ?

> >
> > >
> > > After the commit cited in the Fixes tag and with a kernel
> > > configuration that enables both the SCMI and the Mailbox transports, we
> > > would probe the mailbox transport, but fail to complete since we would
> > > not have a mailbox driver available.
> > >
> > Not sure to have understood this...
> >
> > ...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled
> > into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in
> > DT, BUT your platform does NOT physically have a mbox/shmem transport
> > and as a consequence, when MBOX probes (at first), you see an error from
> > the core like:
> >
> >    "arm-scmi: unable to communicate with SCMI"
> >
> > since it gets no reply from the SCMI server (being not connnected via
> > mbox) and it bails out .... am I right ?
>
> In an unmodified kernel where both the "mailbox" and "smc" transports are
> enabled, we get the "mailbox" driver to probe first since it matched the
> "arm,scmi" part of the compatible string and it is linked first into the
> kernel. Down the road though we will fail the initialization with:
>
> [    1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport
> [    1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms
> [    1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for
> protocol:0x10

IIUC, the DTB has mailbox nodes that are available but fail only in the setup
stage ? Or is it marked unavailable and we are missing some checks either
in SCMI or mailbox ?

IOW, have you already explored that this -EINVAL is correct return value
here and can't be changed to -ENODEV ? I might be not following the failure
path correctly here, but I assume it is
	scmi_chan_setup()
	info->desc->ops->chan_setup()
	mailbox_chan_setup()
	mbox_request_channel()

> [    1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup
> channels
> [    1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi failed
> with error -22
>
> Because the platform device is now bound, and there is no mechanism to
> return -ENODEV, we won't try another transport driver that would attempt to
> match the other compatibility strings. That makes sense because in general
> you specify the Device Tree precisely, and you also have a tailored kernel
> configuration. Right now this is only an issue using arm's
> multi_v7_defconfig and arm64's defconfig both of which that we intend to
> keep on using for CI purposes.
>
>
> >
> > If this is the case, without this patch, after this error and the mbox probe
> > failing, the SMC transport, instead, DO probe successfully at the end, right ?
>
> With my patch we probe the "smc" transport first and foremost and we
> successfully initialize it, therefore we do not even try the "mailbox"
> transport at all, which is intended.
>
> >
> > IOW, what is the impact without this patch, an error and a delay in the
> > probe sequence till it gets to the SMC transport probe 9as second
> > attempt) or worse ? (trying to understand here...)
>
> There is no recovery without the patch, we are not giving up the arm_scmi
> platform device because there is no mechanism to return -ENODEV and allow
> any of the subsequent transport drivers enabled to attempt to take over the
> platform device and probe it again.
>

OK this sounds like you have already explored returning -ENODEV is not
an option ? It is fair enough, but just want to understand correctly.
I still think I am missing something.

I understand the bootloader maintaining backward compatibility, but
just want to understand better. I also wonder if the old SMC mailbox driver
returns -EINVAL instead of -ENODEV ? Again it is based on my assumption
about your backward compatibility usecase.

--
Regards,
Sudeep
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Florian Fainelli 1 month, 2 weeks ago
On 10/8/24 06:06, Sudeep Holla wrote:
> Hi Florian,
> 
> Thanks for the detailed explanation.
> 
> On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote:
>> Hi Cristian,
>>
>> On October 7, 2024 4:52:33 AM PDT, Cristian Marussi
>> <cristian.marussi@arm.com> wrote:
>>> On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
>>>> Broadcom STB platforms have for historical reasons included both
>>>> "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
>>>> string.
>>>
>>> Hi Florian,
>>>
>>> did not know this..
>>
>> It stems from us starting with a mailbox driver that did the SMC call, and
>> later transitioning to the "smc" transport proper. Our boot loader provides
>> the Device Tree blob to the kernel and we maintain backward/forward
>> compatibility as much as possible.
>>
> 
> IIUC, you need to support old kernel with SMC mailbox driver and new SMC
> transport within the SCMI. Is that right understanding ?

That is correct.

> 
>>>
>>>>
>>>> After the commit cited in the Fixes tag and with a kernel
>>>> configuration that enables both the SCMI and the Mailbox transports, we
>>>> would probe the mailbox transport, but fail to complete since we would
>>>> not have a mailbox driver available.
>>>>
>>> Not sure to have understood this...
>>>
>>> ...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled
>>> into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in
>>> DT, BUT your platform does NOT physically have a mbox/shmem transport
>>> and as a consequence, when MBOX probes (at first), you see an error from
>>> the core like:
>>>
>>>     "arm-scmi: unable to communicate with SCMI"
>>>
>>> since it gets no reply from the SCMI server (being not connnected via
>>> mbox) and it bails out .... am I right ?
>>
>> In an unmodified kernel where both the "mailbox" and "smc" transports are
>> enabled, we get the "mailbox" driver to probe first since it matched the
>> "arm,scmi" part of the compatible string and it is linked first into the
>> kernel. Down the road though we will fail the initialization with:
>>
>> [    1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport
>> [    1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms
>> [    1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for
>> protocol:0x10
> 
> IIUC, the DTB has mailbox nodes that are available but fail only in the setup
> stage ? Or is it marked unavailable and we are missing some checks either
> in SCMI or mailbox ?

We fail at scmi_chan_setup -> idr_find() returning -EINVAL. I did check 
that returning -ENODEV, which arguably might be a somewhat more accurate 
return code (-ENOENT being one, too) does not help us here. Cristian 
suggested device_release_driver() which sounded like a good idea, but 
will deadlock.

The reason why we fail there is because mailbox_chan_available() returns 
false. With fw_devlink=on Linux will parse the Device Tree, find the 
'mboxes' property pointing to the brcm_scmi_mailbox Device Tree node and 
puts it on a list of providers that it is waiting for.

Because we are using the ARM SMC transport however, the 
brcm_scmi_mailbox node is never backed by any driver in Linux and this 
causes the system to fail booting since we do not have any SCMI 
provider. At the time, because we were under pressure to get a GKI 
kernel we decided to "break" our older downstream kernels by doing this 
property rename and put in a patch in those kernel to treat 
"brcm,mboxes" the same as "mboxes" where relevant, which was mostly in SCMI.

Now, assuming that we revert that DT property rename, that still does 
not really solve anything anyway, the channel is not available 
regardless of how we shake it.

> 
> IOW, have you already explored that this -EINVAL is correct return value
> here and can't be changed to -ENODEV ? I might be not following the failure
> path correctly here, but I assume it is
> 	scmi_chan_setup()
> 	info->desc->ops->chan_setup()
> 	mailbox_chan_setup()
> 	mbox_request_channel()
> 
>> [    1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup
>> channels
>> [    1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi failed
>> with error -22
>>
>> Because the platform device is now bound, and there is no mechanism to
>> return -ENODEV, we won't try another transport driver that would attempt to
>> match the other compatibility strings. That makes sense because in general
>> you specify the Device Tree precisely, and you also have a tailored kernel
>> configuration. Right now this is only an issue using arm's
>> multi_v7_defconfig and arm64's defconfig both of which that we intend to
>> keep on using for CI purposes.
>>
>>
>>>
>>> If this is the case, without this patch, after this error and the mbox probe
>>> failing, the SMC transport, instead, DO probe successfully at the end, right ?
>>
>> With my patch we probe the "smc" transport first and foremost and we
>> successfully initialize it, therefore we do not even try the "mailbox"
>> transport at all, which is intended.
>>
>>>
>>> IOW, what is the impact without this patch, an error and a delay in the
>>> probe sequence till it gets to the SMC transport probe 9as second
>>> attempt) or worse ? (trying to understand here...)
>>
>> There is no recovery without the patch, we are not giving up the arm_scmi
>> platform device because there is no mechanism to return -ENODEV and allow
>> any of the subsequent transport drivers enabled to attempt to take over the
>> platform device and probe it again.
>>
> 
> OK this sounds like you have already explored returning -ENODEV is not
> an option ? It is fair enough, but just want to understand correctly.
> I still think I am missing something.

Yes, that was my first start.

> 
> I understand the bootloader maintaining backward compatibility, but
> just want to understand better. I also wonder if the old SMC mailbox driver
> returns -EINVAL instead of -ENODEV ? Again it is based on my assumption
> about your backward compatibility usecase.

The old SMC mailbox driver is not present in any upstream kernel, and on 
the downstream kernels where we need it, it would be used and not return 
an error.
-- 
Florian
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Sudeep Holla 1 month, 2 weeks ago
On Tue, Oct 08, 2024 at 10:49:01AM -0700, Florian Fainelli wrote:
> On 10/8/24 06:06, Sudeep Holla wrote:

[...]

> > IIUC, you need to support old kernel with SMC mailbox driver and new SMC
> > transport within the SCMI. Is that right understanding ?
> 
> That is correct.
> 

Thanks.

> > IIUC, the DTB has mailbox nodes that are available but fail only in the setup
> > stage ? Or is it marked unavailable and we are missing some checks either
> > in SCMI or mailbox ?
> 
> We fail at scmi_chan_setup -> idr_find() returning -EINVAL.

IIRC, the original intention of code under if(!info->desc->ops->chan_available()
in scmi_chan_setup(), is to just handle Rx case and valid Tx missing case for
non BASE protocols. I wonder if we can add additional check at the start of
this if block:
	if (tx && prot_id == SCMI_PROTOCOL_BASE)
		return -ENODEV or -ENOENT;

just to better handle your scenario. But IIUC it may not fix your issue as
we still return success from the platform_device probing with the new
restructuring. It is only the probe of the device we create from this one
can be made to fail. I think I know understand the issue much better than
before.

> I did check that
> returning -ENODEV, which arguably might be a somewhat more accurate return
> code (-ENOENT being one, too) does not help us here. Cristian suggested
> device_release_driver() which sounded like a good idea, but will deadlock.
>

Cristian mentioned in private he will explore other options just in case
we need solid alternative to address your issue.

> The reason why we fail there is because mailbox_chan_available() returns
> false. With fw_devlink=on Linux will parse the Device Tree, find the
> 'mboxes' property pointing to the brcm_scmi_mailbox Device Tree node and
> puts it on a list of providers that it is waiting for.
> 
> Because we are using the ARM SMC transport however, the brcm_scmi_mailbox
> node is never backed by any driver in Linux and this causes the system to
> fail booting since we do not have any SCMI provider. At the time, because we
> were under pressure to get a GKI kernel we decided to "break" our older
> downstream kernels by doing this property rename and put in a patch in those
> kernel to treat "brcm,mboxes" the same as "mboxes" where relevant, which was
> mostly in SCMI.
> 
> Now, assuming that we revert that DT property rename, that still does not
> really solve anything anyway, the channel is not available regardless of how
> we shake it.
>

Understood. Thanks for detailed explanation and time.

> > 
> > IOW, have you already explored that this -EINVAL is correct return value
> > here and can't be changed to -ENODEV ? I might be not following the failure
> > path correctly here, but I assume it is
> > 	scmi_chan_setup()
> > 	info->desc->ops->chan_setup()
> > 	mailbox_chan_setup()
> > 	mbox_request_channel()

[...]

> > OK this sounds like you have already explored returning -ENODEV is not
> > an option ? It is fair enough, but just want to understand correctly.
> > I still think I am missing something.
> 
> Yes, that was my first start.
>

Now I know why that won't work or its not so trivial as we have some kind
of redirection to address various transport dependencies and the mechanism we
have implemented to avoid probe deferral with additional platform devices
and associated probing make it difficult to propagate the error of DD model.
We need that to handle the dependency better than relying on probe deferral
which comes with its problems(like initcall level adjustments when using
some transports like OPTEE/FFA/virtio/...). Your issue is not an issue
in normal case 😄. Anyways, I will queue this as it is not affecting anyone
else. Hopefully no one has any objections.

-- 
Regards,
Sudeep
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Cristian Marussi 1 month, 2 weeks ago
On Tue, Oct 08, 2024 at 02:06:17PM +0100, Sudeep Holla wrote:
> Hi Florian,
> 
> Thanks for the detailed explanation.
> 
> On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote:
> > Hi Cristian,
> >
> > On October 7, 2024 4:52:33 AM PDT, Cristian Marussi
> > <cristian.marussi@arm.com> wrote:
> > > On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
> > > > Broadcom STB platforms have for historical reasons included both
> > > > "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
> > > > string.
> > >
> > > Hi Florian,
> > >
> > > did not know this..
> >
> > It stems from us starting with a mailbox driver that did the SMC call, and
> > later transitioning to the "smc" transport proper. Our boot loader provides
> > the Device Tree blob to the kernel and we maintain backward/forward
> > compatibility as much as possible.
> >
> 
> IIUC, you need to support old kernel with SMC mailbox driver and new SMC
> transport within the SCMI. Is that right understanding ?
> 
> > >
> > > >
> > > > After the commit cited in the Fixes tag and with a kernel
> > > > configuration that enables both the SCMI and the Mailbox transports, we
> > > > would probe the mailbox transport, but fail to complete since we would
> > > > not have a mailbox driver available.
> > > >
> > > Not sure to have understood this...
> > >
> > > ...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled
> > > into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in
> > > DT, BUT your platform does NOT physically have a mbox/shmem transport
> > > and as a consequence, when MBOX probes (at first), you see an error from
> > > the core like:
> > >
> > >    "arm-scmi: unable to communicate with SCMI"
> > >
> > > since it gets no reply from the SCMI server (being not connnected via
> > > mbox) and it bails out .... am I right ?
> >
> > In an unmodified kernel where both the "mailbox" and "smc" transports are
> > enabled, we get the "mailbox" driver to probe first since it matched the
> > "arm,scmi" part of the compatible string and it is linked first into the
> > kernel. Down the road though we will fail the initialization with:
> >
> > [    1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport
> > [    1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms
> > [    1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for
> > protocol:0x10
> 
> IIUC, the DTB has mailbox nodes that are available but fail only in the setup
> stage ? Or is it marked unavailable and we are missing some checks either
> in SCMI or mailbox ?
> 
> IOW, have you already explored that this -EINVAL is correct return value
> here and can't be changed to -ENODEV ? I might be not following the failure
> path correctly here, but I assume it is
> 	scmi_chan_setup()
> 	info->desc->ops->chan_setup()
> 	mailbox_chan_setup()
> 	mbox_request_channel()
> 
> > [    1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup
> > channels
> > [    1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi failed
> > with error -22
> >
> > Because the platform device is now bound, and there is no mechanism to
> > return -ENODEV, we won't try another transport driver that would attempt to
> > match the other compatibility strings. That makes sense because in general
> > you specify the Device Tree precisely, and you also have a tailored kernel
> > configuration. Right now this is only an issue using arm's
> > multi_v7_defconfig and arm64's defconfig both of which that we intend to
> > keep on using for CI purposes.
> >
> >
> > >
> > > If this is the case, without this patch, after this error and the mbox probe
> > > failing, the SMC transport, instead, DO probe successfully at the end, right ?
> >
> > With my patch we probe the "smc" transport first and foremost and we
> > successfully initialize it, therefore we do not even try the "mailbox"
> > transport at all, which is intended.
> >
> > >
> > > IOW, what is the impact without this patch, an error and a delay in the
> > > probe sequence till it gets to the SMC transport probe 9as second
> > > attempt) or worse ? (trying to understand here...)
> >
> > There is no recovery without the patch, we are not giving up the arm_scmi
> > platform device because there is no mechanism to return -ENODEV and allow
> > any of the subsequent transport drivers enabled to attempt to take over the
> > platform device and probe it again.
> >
> 
> OK this sounds like you have already explored returning -ENODEV is not
> an option ? It is fair enough, but just want to understand correctly.
> I still think I am missing something.

Having a quick look at dd.c it seems to me that the probe error from
the first matched driver->probe is propagated back to the callchain
(and the driver that fails the probe in any way is NOT bound at that
point) till driver_probe_device() 

THEN, on one side, in  __driver_attach() then the retval is ignored:

dd.c::__driver_attach()

 /*                                                                                                                                                     
  * Lock device and try to bind to it. We drop the error
  * here and always return 0, because we need to keep trying
  * to bind to devices and some drivers will return an error                                                                                            
  * simply if it didn't support the device.
  *
  * driver_probe_device() will spit a warning if there
  * is an error.


...while, on the other side, looking at __device_attach_driver() it DOES
report the error from driver_probe_device() BUT the __device_attach_driver()
routine is called by bus_for_eachdrv() inside __device_attach() and DOES
cause such loop (bus_for_each_drv() to bail out with an error...BUT, again,
no more driver match/probe is attempted and I suppose that if you restart
somehow such sequence you will endup again failing at the same point on the
same first-match driver...

So seems a sort of structural issue...also because indeed you have something
that is somehow a malformed DT so the device_match succeeds for good reasons...

I may have miss a lot more, though :D

Thanks,
Cristian
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Cristian Marussi 1 month, 2 weeks ago
On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote:
> Hi Cristian,
> 
> On October 7, 2024 4:52:33 AM PDT, Cristian Marussi
> <cristian.marussi@arm.com> wrote:
> > On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote:
> > > Broadcom STB platforms have for historical reasons included both
> > > "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible
> > > string.
> > 
> > Hi Florian,
> > 
> > did not know this..
> 
> It stems from us starting with a mailbox driver that did the SMC call, and
> later transitioning to the "smc" transport proper. Our boot loader provides
> the Device Tree blob to the kernel and we maintain backward/forward
> compatibility as much as possible.
> 

OK.

> > 
> > > 
> > > After the commit cited in the Fixes tag and with a kernel
> > > configuration that enables both the SCMI and the Mailbox transports, we
> > > would probe the mailbox transport, but fail to complete since we would
> > > not have a mailbox driver available.
> > > 
> > Not sure to have understood this...
> > 
> > ...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled
> > into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in
> > DT, BUT your platform does NOT physically have a mbox/shmem transport
> > and as a consequence, when MBOX probes (at first), you see an error from
> > the core like:
> > 
> >    "arm-scmi: unable to communicate with SCMI"
> > 
> > since it gets no reply from the SCMI server (being not connnected via
> > mbox) and it bails out .... am I right ?
> 
> In an unmodified kernel where both the "mailbox" and "smc" transports are
> enabled, we get the "mailbox" driver to probe first since it matched the
> "arm,scmi" part of the compatible string and it is linked first into the
> kernel. Down the road though we will fail the initialization with:
> 
> [    1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport
> [    1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms
> [    1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for
> protocol:0x10
> [    1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup
> channels
> [    1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi failed
> with error -22
> 
> Because the platform device is now bound, and there is no mechanism to
> return -ENODEV, we won't try another transport driver that would attempt to
> match the other compatibility strings. That makes sense because in general
> you specify the Device Tree precisely, and you also have a tailored kernel
> configuration. Right now this is only an issue using arm's
> multi_v7_defconfig and arm64's defconfig both of which that we intend to
> keep on using for CI purposes.
>

Ah ok so the issue derives from the fact that you have a single
compatible line with 2 not compatbles that are not really "compatible"
from the SCMI core point of view...

...also I suppose that if we "somehow" would trigger a
device_release_drievr(), what will happen is that it will match probably
again in the same order at the next attempt (beside being an ugly thing)

> 
> > 
> > If this is the case, without this patch, after this error and the mbox probe
> > failing, the SMC transport, instead, DO probe successfully at the end, right ?
> 
> With my patch we probe the "smc" transport first and foremost and we
> successfully initialize it, therefore we do not even try the "mailbox"
> transport at all, which is intended.
> 
> > 
> > IOW, what is the impact without this patch, an error and a delay in the
> > probe sequence till it gets to the SMC transport probe 9as second
> > attempt) or worse ? (trying to understand here...)
> 
> There is no recovery without the patch, we are not giving up the arm_scmi
> platform device because there is no mechanism to return -ENODEV and allow
> any of the subsequent transport drivers enabled to attempt to take over the
> platform device and probe it again.
> 

Ok...so it is a workaround hack indeed....but it seems NOT to have bad
side effects and there is definitely no cleaner way to make it bind
properly...beside fixing your DTs for the future...

Thanks,
Cristian
Re: [PATCH] firmware: arm_scmi: Give SMC transport precedence over mailbox
Posted by Sudeep Holla 1 month, 2 weeks ago
On Tue, Oct 08, 2024 at 01:26:53PM +0100, Cristian Marussi wrote:
> On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote:

[...]

> > There is no recovery without the patch, we are not giving up the arm_scmi
> > platform device because there is no mechanism to return -ENODEV and allow
> > any of the subsequent transport drivers enabled to attempt to take over the
> > platform device and probe it again.
> >
>
> Ok...so it is a workaround hack indeed....but it seems NOT to have bad
> side effects and there is definitely no cleaner way to make it bind
> properly...beside fixing your DTs for the future...

As I mentioned earlier, I am not against the change as it doesn't have
any other side-effects and just accidentally fixes the issue you have.
But it does sound like a hacky solution to your problem. What if some
other legit reason(theoretically) it needs to be reversed again in the
future. So I am still interested to see if we can fix it without this.

--
Regards,
Sudeep