[PATCH] ipmi:ssif: Improve detecting during probing

Corey Minyard posted 1 patch 1 year, 5 months ago
drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
[PATCH] ipmi:ssif: Improve detecting during probing
Posted by Corey Minyard 1 year, 5 months ago
If an IPMI SSIF device is probed and there is something there, but
probably not an actual BMC, the code would just issue a lot of errors
before it failed.  We kind of need these errors to help with certain
issues, and some of the failure reports are non-fatal.

However, a get device id command should alway work.  If that fails,
nothing else is going to work and it's a pretty good indication that
there's no valid BMC there.  So issue and check that command and bail
if it fails.

Reported-by: Ivan T. Ivanov <iivanov@suse.de>
Signed-off-by: Corey Minyard <corey@minyard.net>
---
 drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Ivan, is it possible for you to test this patch on the broken system?
It should work based on what you reported, but it's nice to be sure.

Also, I discovered that the detect function is kind of bogus, it only
works on an address list that isn't present (any more).  However, I
re-used it for my purposes in the probe function.

Thanks.

diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
index e8e7b832c060..4c403e7a9fc8 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
 	rv = do_cmd(client, 2, msg, &len, resp);
 	if (rv)
 		rv = -ENODEV;
-	else
+	else {
+	    if (len < 3) {
+		rv = -ENODEV;
+	    } else {
+		struct ipmi_device_id id;
+
+		rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
+					     resp + 2, len - 2, &id);
+		if (rv)
+		    rv = -ENODEV; /* Error means a BMC probably isn't there. */
+	    }
+	    if (!rv && info)
 		strscpy(info->type, DEVICE_NAME, I2C_NAME_SIZE);
+	}
 	kfree(resp);
 	return rv;
 }
@@ -1704,6 +1716,16 @@ static int ssif_probe(struct i2c_client *client)
 		ipmi_addr_src_to_str(ssif_info->addr_source),
 		client->addr, client->adapter->name, slave_addr);
 
+	/*
+	 * Send a get device id command and validate its response to
+	 * make sure a valid BMC is there.
+	 */
+	rv = ssif_detect(client, NULL);
+	if (rv) {
+		dev_err(&client->dev, "Not present\n");
+		goto out;
+	}
+
 	/* Now check for system interface capabilities */
 	msg[0] = IPMI_NETFN_APP_REQUEST << 2;
 	msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
-- 
2.34.1
Re: [PATCH] ipmi:ssif: Improve detecting during probing
Posted by Ivan T. Ivanov 1 year, 5 months ago
Hi Corey,

On 08-20 20:05, Corey Minyard wrote:
> 
> If an IPMI SSIF device is probed and there is something there, but
> probably not an actual BMC, the code would just issue a lot of errors
> before it failed.  We kind of need these errors to help with certain
> issues, and some of the failure reports are non-fatal.
> 
> However, a get device id command should alway work.  If that fails,
> nothing else is going to work and it's a pretty good indication that
> there's no valid BMC there.  So issue and check that command and bail
> if it fails.
> 
> Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> Signed-off-by: Corey Minyard <corey@minyard.net>
> ---
>  drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)

Do you plan to merge this fix?

Regards,
Ivan
Re: [PATCH] ipmi:ssif: Improve detecting during probing
Posted by Corey Minyard 1 year, 5 months ago
On Tue, Sep 10, 2024 at 5:19 AM Ivan T. Ivanov <iivanov@suse.de> wrote:
>
> Hi Corey,
>
> On 08-20 20:05, Corey Minyard wrote:
> >
> > If an IPMI SSIF device is probed and there is something there, but
> > probably not an actual BMC, the code would just issue a lot of errors
> > before it failed.  We kind of need these errors to help with certain
> > issues, and some of the failure reports are non-fatal.
> >
> > However, a get device id command should alway work.  If that fails,
> > nothing else is going to work and it's a pretty good indication that
> > there's no valid BMC there.  So issue and check that command and bail
> > if it fails.
> >
> > Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> > Signed-off-by: Corey Minyard <corey@minyard.net>
> > ---
> >  drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
>
> Do you plan to merge this fix?

Yes, it's queued for the next release window.  I should have responded
with that.

-corey

>
> Regards,
> Ivan
>
Re: [PATCH] ipmi:ssif: Improve detecting during probing
Posted by Ivan T. Ivanov 1 year, 5 months ago
On 09-10 06:30, Corey Minyard wrote:
> > >
> > > If an IPMI SSIF device is probed and there is something there, but
> > > probably not an actual BMC, the code would just issue a lot of errors
> > > before it failed.  We kind of need these errors to help with certain
> > > issues, and some of the failure reports are non-fatal.
> > >
> > > However, a get device id command should alway work.  If that fails,
> > > nothing else is going to work and it's a pretty good indication that
> > > there's no valid BMC there.  So issue and check that command and bail
> > > if it fails.
> > >
> > > Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> > > Signed-off-by: Corey Minyard <corey@minyard.net>
> > > ---
> > >  drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
> > >  1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > Do you plan to merge this fix?
> 
> Yes, it's queued for the next release window.  I should have responded
> with that.
> 

Thank you! No worries.

Regards,
Ivan
Re: [PATCH] ipmi:ssif: Improve detecting during probing
Posted by Ivan T. Ivanov 1 year, 5 months ago
Hi Corey,

On 08-20 20:05, Corey Minyard wrote:
> 
> If an IPMI SSIF device is probed and there is something there, but
> probably not an actual BMC, the code would just issue a lot of errors
> before it failed.  We kind of need these errors to help with certain
> issues, and some of the failure reports are non-fatal.
> 
> However, a get device id command should alway work.  If that fails,
> nothing else is going to work and it's a pretty good indication that
> there's no valid BMC there.  So issue and check that command and bail
> if it fails.
> 
> Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> Signed-off-by: Corey Minyard <corey@minyard.net>
> ---
>  drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> Ivan, is it possible for you to test this patch on the broken system?

This exact system is not available to me at the moment. I have few
other machines on which I could test this.

> It should work based on what you reported, but it's nice to be sure.
> 
> Also, I discovered that the detect function is kind of bogus, it only
> works on an address list that isn't present (any more).  However, I
> re-used it for my purposes in the probe function.
> 
> Thanks.
> 
> diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> index e8e7b832c060..4c403e7a9fc8 100644
> --- a/drivers/char/ipmi/ipmi_ssif.c
> +++ b/drivers/char/ipmi/ipmi_ssif.c
> @@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
>  	rv = do_cmd(client, 2, msg, &len, resp);
>  	if (rv)
>  		rv = -ENODEV;

What is my worry is that in case of SMBus errors, device is there but
for some reason it got stuck/crashed or whatever, so will get out of
detect function from here and with ENODEV return code probe function
will be called for no reason.

> -	else
> +	else {
> +	    if (len < 3) {
> +		rv = -ENODEV;

No point to call probe(), right?

> +	    } else {
> +		struct ipmi_device_id id;
> +
> +		rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
> +					     resp + 2, len - 2, &id);
> +		if (rv)
> +		    rv = -ENODEV; /* Error means a BMC probably isn't there. */

Same.

> +	    }
> +	    if (!rv && info)
>  		strscpy(info->type, DEVICE_NAME, I2C_NAME_SIZE);
> +	}
>  	kfree(resp);
>  	return rv;
>  }
> @@ -1704,6 +1716,16 @@ static int ssif_probe(struct i2c_client *client)
>  		ipmi_addr_src_to_str(ssif_info->addr_source),
>  		client->addr, client->adapter->name, slave_addr);
>  
> +	/*
> +	 * Send a get device id command and validate its response to
> +	 * make sure a valid BMC is there.
> +	 */
> +	rv = ssif_detect(client, NULL);
> +	if (rv) {
> +		dev_err(&client->dev, "Not present\n");
> +		goto out;
> +	}
> +

The point is that even after this point IPMI device can start failing
to properly communicate with the OS, real SMBus errors, like EREMOTEIO
in my case, but unfortunately code bellow do not handle this very well,
I think.


>  	/* Now check for system interface capabilities */
>  	msg[0] = IPMI_NETFN_APP_REQUEST << 2;
>  	msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
> -- 
> 2.34.1
> 

Regards,
Ivan
Re: [PATCH] ipmi:ssif: Improve detecting during probing
Posted by Corey Minyard 1 year, 5 months ago
On Thu, Aug 22, 2024 at 10:22:55AM +0300, Ivan T. Ivanov wrote:
> Hi Corey,
> 
> On 08-20 20:05, Corey Minyard wrote:
> > 
> > If an IPMI SSIF device is probed and there is something there, but
> > probably not an actual BMC, the code would just issue a lot of errors
> > before it failed.  We kind of need these errors to help with certain
> > issues, and some of the failure reports are non-fatal.
> > 
> > However, a get device id command should alway work.  If that fails,
> > nothing else is going to work and it's a pretty good indication that
> > there's no valid BMC there.  So issue and check that command and bail
> > if it fails.
> > 
> > Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> > Signed-off-by: Corey Minyard <corey@minyard.net>
> > ---
> >  drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> > 
> > Ivan, is it possible for you to test this patch on the broken system?
> 
> This exact system is not available to me at the moment. I have few
> other machines on which I could test this.
> 
> > It should work based on what you reported, but it's nice to be sure.
> > 
> > Also, I discovered that the detect function is kind of bogus, it only
> > works on an address list that isn't present (any more).  However, I
> > re-used it for my purposes in the probe function.
> > 
> > Thanks.
> > 
> > diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> > index e8e7b832c060..4c403e7a9fc8 100644
> > --- a/drivers/char/ipmi/ipmi_ssif.c
> > +++ b/drivers/char/ipmi/ipmi_ssif.c
> > @@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
> >  	rv = do_cmd(client, 2, msg, &len, resp);
> >  	if (rv)
> >  		rv = -ENODEV;
> 
> What is my worry is that in case of SMBus errors, device is there but
> for some reason it got stuck/crashed or whatever, so will get out of
> detect function from here and with ENODEV return code probe function
> will be called for no reason.

That's not how the i2c code works.  See my next comment.

> 
> > -	else
> > +	else {
> > +	    if (len < 3) {
> > +		rv = -ENODEV;
> 
> No point to call probe(), right?

Originally (before I add the call from ssif_probe()), this is not involved in
the probe() call.  Instead, the detect function is involved in calling a
table of addresses in driver->address_list.  So in this case this
function is never called at all from the i2c code, since there is no
address list.

> 
> > +	    } else {
> > +		struct ipmi_device_id id;
> > +
> > +		rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
> > +					     resp + 2, len - 2, &id);
> > +		if (rv)
> > +		    rv = -ENODEV; /* Error means a BMC probably isn't there. */
> 
> Same.
> 
> > +	    }
> > +	    if (!rv && info)
> >  		strscpy(info->type, DEVICE_NAME, I2C_NAME_SIZE);
> > +	}
> >  	kfree(resp);
> >  	return rv;
> >  }
> > @@ -1704,6 +1716,16 @@ static int ssif_probe(struct i2c_client *client)
> >  		ipmi_addr_src_to_str(ssif_info->addr_source),
> >  		client->addr, client->adapter->name, slave_addr);
> >  
> > +	/*
> > +	 * Send a get device id command and validate its response to
> > +	 * make sure a valid BMC is there.
> > +	 */
> > +	rv = ssif_detect(client, NULL);
> > +	if (rv) {
> > +		dev_err(&client->dev, "Not present\n");
> > +		goto out;
> > +	}
> > +
> 
> The point is that even after this point IPMI device can start failing
> to properly communicate with the OS, real SMBus errors, like EREMOTEIO
> in my case, but unfortunately code bellow do not handle this very well,
> I think.

It is possible that the BMC gets rebooted or something between the call
to ssif_detect() and the code below, but the probability is really low.
If it answers a detect, the rest of the things should work.

-corey

> 
> 
> >  	/* Now check for system interface capabilities */
> >  	msg[0] = IPMI_NETFN_APP_REQUEST << 2;
> >  	msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
> > -- 
> > 2.34.1
> > 
> 
> Regards,
> Ivan
>
Re: [PATCH] ipmi:ssif: Improve detecting during probing
Posted by Ivan T. Ivanov 1 year, 5 months ago
Hi,

On 08-22 09:40, Corey Minyard wrote:
> > > +++ b/drivers/char/ipmi/ipmi_ssif.c
> > > @@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
> > >  	rv = do_cmd(client, 2, msg, &len, resp);
> > >  	if (rv)
> > >  		rv = -ENODEV;
> > 
> > What is my worry is that in case of SMBus errors, device is there but
> > for some reason it got stuck/crashed or whatever, so will get out of
> > detect function from here and with ENODEV return code probe function
> > will be called for no reason.
> 
> That's not how the i2c code works.  See my next comment.
> 
> > 
> > > -	else
> > > +	else {
> > > +	    if (len < 3) {
> > > +		rv = -ENODEV;
> > 
> > No point to call probe(), right?
> 
> Originally (before I add the call from ssif_probe()), this is not involved in
> the probe() call.  Instead, the detect function is involved in calling a
> table of addresses in driver->address_list.  So in this case this
> function is never called at all from the i2c code, since there is no
> address list.

I see, thank you for explanation.

> > >  
> > > +	/*
> > > +	 * Send a get device id command and validate its response to
> > > +	 * make sure a valid BMC is there.
> > > +	 */
> > > +	rv = ssif_detect(client, NULL);
> > > +	if (rv) {
> > > +		dev_err(&client->dev, "Not present\n");
> > > +		goto out;
> > > +	}
> > > +
> > 
> > The point is that even after this point IPMI device can start failing
> > to properly communicate with the OS, real SMBus errors, like EREMOTEIO
> > in my case, but unfortunately code bellow do not handle this very well,
> > I think.
> 
> It is possible that the BMC gets rebooted or something between the call
> to ssif_detect() and the code below, but the probability is really low.
> If it answers a detect, the rest of the things should work.
> 

I have my daubs, but patch proposed will fix the issue that I see.

Thank you,
Ivan