drivers/char/ipmi/ipmi_ssif.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
It is pointless to continue module probing when communication
with device is failing. This just fill logs with misleading
messages like this:
[Fri Jul 26 18:32:34 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Error fetching SSIF: -121 180453376 62, your system probably doesn't support this command so using defaults
[Fri Jul 26 18:32:54 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Unable to clear message flags: -121 180453376 62
[Fri Jul 26 18:33:14 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Error getting global enables: -121 180453376 62
[Fri Jul 26 18:33:49 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
[Fri Jul 26 18:33:50 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
[Fri Jul 26 18:34:07 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
[Fri Jul 26 18:34:07 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
[Fri Jul 26 18:34:25 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
[Fri Jul 26 18:34:25 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
[Fri Jul 26 18:34:43 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
[Fri Jul 26 18:34:43 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
[Fri Jul 26 18:35:01 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
[Fri Jul 26 18:35:01 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
[Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
[Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: Unable to get the device id: -5
[Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Unable to register device: error -5
[Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Unable to start IPMI SSIF: -5
[Fri Jul 26 18:35:19 2024] ipmi_ssif: probe of i2c-IPI0001:00 failed with error -5
Also in some of these prints uninitialized variables are used.
So just exit early when communication with device is flawed.
Signed-off-by: Ivan T. Ivanov <iivanov@suse.de>
---
drivers/char/ipmi/ipmi_ssif.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
index 96ad571d041a..37516733e5c8 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -1315,6 +1315,16 @@ static int read_response(struct i2c_client *client, unsigned char *resp)
return ret;
}
+/* Filter SMBus communication errors from incorrect response errors */
+static bool is_smbus_error(struct device *dev, int err)
+{
+ if (!err || err == -EINVAL || err == -E2BIG)
+ return false;
+
+ dev_err(dev, "SMbus error: %d\n", err);
+ return true;
+}
+
static int do_cmd(struct i2c_client *client, int len, unsigned char *msg,
int *resp_len, unsigned char *resp)
{
@@ -1709,6 +1719,8 @@ static int ssif_probe(struct i2c_client *client)
msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
msg[2] = 0; /* SSIF */
rv = do_cmd(client, 3, msg, &len, resp);
+ if (is_smbus_error(&client->dev, rv))
+ goto out;
if (!rv && (len >= 3) && (resp[2] == 0)) {
if (len < 7) {
if (ssif_dbg_probe)
@@ -1767,6 +1779,8 @@ static int ssif_probe(struct i2c_client *client)
msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD;
msg[2] = WDT_PRE_TIMEOUT_INT;
rv = do_cmd(client, 3, msg, &len, resp);
+ if (is_smbus_error(&client->dev, rv))
+ goto out;
if (rv || (len < 3) || (resp[2] != 0))
dev_warn(&ssif_info->client->dev,
"Unable to clear message flags: %d %d %2.2x\n",
@@ -1776,6 +1790,8 @@ static int ssif_probe(struct i2c_client *client)
msg[0] = IPMI_NETFN_APP_REQUEST << 2;
msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD;
rv = do_cmd(client, 2, msg, &len, resp);
+ if (is_smbus_error(&client->dev, rv))
+ goto out;
if (rv || (len < 4) || (resp[2] != 0)) {
dev_warn(&ssif_info->client->dev,
"Error getting global enables: %d %d %2.2x\n",
@@ -1796,6 +1812,8 @@ static int ssif_probe(struct i2c_client *client)
msg[1] = IPMI_SET_BMC_GLOBAL_ENABLES_CMD;
msg[2] = ssif_info->global_enables | IPMI_BMC_EVT_MSG_BUFF;
rv = do_cmd(client, 3, msg, &len, resp);
+ if (is_smbus_error(&client->dev, rv))
+ goto out;
if (rv || (len < 2)) {
dev_warn(&ssif_info->client->dev,
"Error setting global enables: %d %d %2.2x\n",
@@ -1818,6 +1836,8 @@ static int ssif_probe(struct i2c_client *client)
msg[1] = IPMI_SET_BMC_GLOBAL_ENABLES_CMD;
msg[2] = ssif_info->global_enables | IPMI_BMC_RCV_MSG_INTR;
rv = do_cmd(client, 3, msg, &len, resp);
+ if (is_smbus_error(&client->dev, rv))
+ goto out;
if (rv || (len < 2)) {
dev_warn(&ssif_info->client->dev,
"Error setting global enables: %d %d %2.2x\n",
--
2.43.0
On Fri, Aug 16, 2024 at 09:54:58AM +0300, Ivan T. Ivanov wrote:
> It is pointless to continue module probing when communication
> with device is failing. This just fill logs with misleading
> messages like this:
So the BMC (or whatever is there) responds to a GET_DEVICE_ID command,
but then doesn't properly handle any other valid and mandatory commands?
And this device was added via ACPI or SMBIOS or device tree, almost
certainly.
My comments are:
1) This fix is wrong, because it may mask important things that need to
be reported.
2) Fix the source of the problem. You can't expect software to
compensate for all bad hardware and firmware. I'd guess the firmware
tables are pointing to something that's not a BMC.
3) It appears the response to the GET_DEVICE_ID command, though a
response is returned, is not valid. The right way to handle this would
be to do more validation in the ssif_detect() function. It doesn't do
any validation of the response data, and that's really what needs to be
done.
-corey
>
> [Fri Jul 26 18:32:34 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Error fetching SSIF: -121 180453376 62, your system probably doesn't support this command so using defaults
> [Fri Jul 26 18:32:54 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Unable to clear message flags: -121 180453376 62
> [Fri Jul 26 18:33:14 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Error getting global enables: -121 180453376 62
> [Fri Jul 26 18:33:49 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Fri Jul 26 18:33:50 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
> [Fri Jul 26 18:34:07 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Fri Jul 26 18:34:07 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
> [Fri Jul 26 18:34:25 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Fri Jul 26 18:34:25 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
> [Fri Jul 26 18:34:43 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Fri Jul 26 18:34:43 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
> [Fri Jul 26 18:35:01 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Fri Jul 26 18:35:01 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: BMC returned 0xff, retry get bmc device id
> [Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: IPMI message handler: Unable to get the device id: -5
> [Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Unable to register device: error -5
> [Fri Jul 26 18:35:19 2024] ipmi_ssif i2c-IPI0001:00: ipmi_ssif: Unable to start IPMI SSIF: -5
> [Fri Jul 26 18:35:19 2024] ipmi_ssif: probe of i2c-IPI0001:00 failed with error -5
>
> Also in some of these prints uninitialized variables are used.
> So just exit early when communication with device is flawed.
>
> Signed-off-by: Ivan T. Ivanov <iivanov@suse.de>
> ---
> drivers/char/ipmi/ipmi_ssif.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> index 96ad571d041a..37516733e5c8 100644
> --- a/drivers/char/ipmi/ipmi_ssif.c
> +++ b/drivers/char/ipmi/ipmi_ssif.c
> @@ -1315,6 +1315,16 @@ static int read_response(struct i2c_client *client, unsigned char *resp)
> return ret;
> }
>
> +/* Filter SMBus communication errors from incorrect response errors */
> +static bool is_smbus_error(struct device *dev, int err)
> +{
> + if (!err || err == -EINVAL || err == -E2BIG)
> + return false;
> +
> + dev_err(dev, "SMbus error: %d\n", err);
> + return true;
> +}
> +
> static int do_cmd(struct i2c_client *client, int len, unsigned char *msg,
> int *resp_len, unsigned char *resp)
> {
> @@ -1709,6 +1719,8 @@ static int ssif_probe(struct i2c_client *client)
> msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
> msg[2] = 0; /* SSIF */
> rv = do_cmd(client, 3, msg, &len, resp);
> + if (is_smbus_error(&client->dev, rv))
> + goto out;
> if (!rv && (len >= 3) && (resp[2] == 0)) {
> if (len < 7) {
> if (ssif_dbg_probe)
> @@ -1767,6 +1779,8 @@ static int ssif_probe(struct i2c_client *client)
> msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD;
> msg[2] = WDT_PRE_TIMEOUT_INT;
> rv = do_cmd(client, 3, msg, &len, resp);
> + if (is_smbus_error(&client->dev, rv))
> + goto out;
> if (rv || (len < 3) || (resp[2] != 0))
> dev_warn(&ssif_info->client->dev,
> "Unable to clear message flags: %d %d %2.2x\n",
> @@ -1776,6 +1790,8 @@ static int ssif_probe(struct i2c_client *client)
> msg[0] = IPMI_NETFN_APP_REQUEST << 2;
> msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD;
> rv = do_cmd(client, 2, msg, &len, resp);
> + if (is_smbus_error(&client->dev, rv))
> + goto out;
> if (rv || (len < 4) || (resp[2] != 0)) {
> dev_warn(&ssif_info->client->dev,
> "Error getting global enables: %d %d %2.2x\n",
> @@ -1796,6 +1812,8 @@ static int ssif_probe(struct i2c_client *client)
> msg[1] = IPMI_SET_BMC_GLOBAL_ENABLES_CMD;
> msg[2] = ssif_info->global_enables | IPMI_BMC_EVT_MSG_BUFF;
> rv = do_cmd(client, 3, msg, &len, resp);
> + if (is_smbus_error(&client->dev, rv))
> + goto out;
> if (rv || (len < 2)) {
> dev_warn(&ssif_info->client->dev,
> "Error setting global enables: %d %d %2.2x\n",
> @@ -1818,6 +1836,8 @@ static int ssif_probe(struct i2c_client *client)
> msg[1] = IPMI_SET_BMC_GLOBAL_ENABLES_CMD;
> msg[2] = ssif_info->global_enables | IPMI_BMC_RCV_MSG_INTR;
> rv = do_cmd(client, 3, msg, &len, resp);
> + if (is_smbus_error(&client->dev, rv))
> + goto out;
> if (rv || (len < 2)) {
> dev_warn(&ssif_info->client->dev,
> "Error setting global enables: %d %d %2.2x\n",
> --
> 2.43.0
>
Hi, On 2024-08-16 21:04, Corey Minyard wrote: > On Fri, Aug 16, 2024 at 09:54:58AM +0300, Ivan T. Ivanov wrote: >> It is pointless to continue module probing when communication >> with device is failing. This just fill logs with misleading >> messages like this: > > So the BMC (or whatever is there) responds to a GET_DEVICE_ID command, Well, not really. In cases where ssif_detect() returns -ENODEV, i2c core i2c_detect_address() threat it as "We catch it here as this isn't an error”. See i2c_detect_address(). > but then doesn't properly handle any other valid and mandatory > commands? > And this device was added via ACPI or SMBIOS or device tree, almost > certainly. > > My comments are: > > 1) This fix is wrong, because it may mask important things that need to > be reported. > > 2) Fix the source of the problem. You can't expect software to > compensate for all bad hardware and firmware. I'd guess the firmware > tables are pointing to something that's not a BMC. I am not saying that hardware or firmware do not have its issues in this case. But just as it is written now ssif_probe() is fragile. It continue asking device for features ignoring that there is no valid SMBus communication. > > 3) It appears the response to the GET_DEVICE_ID command, though a > response is returned, is not valid. The right way to handle this would > be to do more validation in the ssif_detect() function. It doesn't do > any validation of the response data, and that's really what needs to be > done. > do_cmd() in ssif_detect() already do validation. Perhaps, ssif_probe() should just not return ENODEV in case of error. Thank you for your comments! Regards, Ivan
On 08-18 12:27, Ivan T. Ivanov wrote: > > > > > 3) It appears the response to the GET_DEVICE_ID command, though a > > response is returned, is not valid. The right way to handle this would > > be to do more validation in the ssif_detect() function. It doesn't do > > any validation of the response data, and that's really what needs to be > > done. > > > > do_cmd() in ssif_detect() already do validation. Perhaps, > ssif_probe() should just not return ENODEV in case of error. > Oh, I wanted to say ssif_detect, not ssif_probe. Regards, Ivan
On Tue, Aug 20, 2024 at 5:15 AM Ivan T. Ivanov <iivanov@suse.de> wrote: > > On 08-18 12:27, Ivan T. Ivanov wrote: > > > > > > > > 3) It appears the response to the GET_DEVICE_ID command, though a > > > response is returned, is not valid. The right way to handle this would > > > be to do more validation in the ssif_detect() function. It doesn't do > > > any validation of the response data, and that's really what needs to be > > > done. > > > > > > > do_cmd() in ssif_detect() already do validation. Perhaps, > > ssif_probe() should just not return ENODEV in case of error. > > > > Oh, I wanted to say ssif_detect, not ssif_probe. Yeah, that's probably the right solution. I'll look at this a bit. But I see the problem. Do you want to do a patch, or do you want me to? -corey
Hi, On 08-20 10:31, Corey Minyard wrote: > > > > > > > > > > > 3) It appears the response to the GET_DEVICE_ID command, though a > > > > response is returned, is not valid. The right way to handle this would > > > > be to do more validation in the ssif_detect() function. It doesn't do > > > > any validation of the response data, and that's really what needs to be > > > > done. > > > > > > > > > > do_cmd() in ssif_detect() already do validation. Perhaps, > > > ssif_probe() should just not return ENODEV in case of error. > > > > > > > Oh, I wanted to say ssif_detect, not ssif_probe. > > Yeah, that's probably the right solution. I'll look at this a bit. > But I see the problem. Do you want to do a patch, or do you want me > to? If you want to do it, please go ahead. I don't mind. Thanks, Ivan
If an IPMI SSIF device is probed and there is something there, but
probably not an actual BMC, the code would just issue a lot of errors
before it failed. We kind of need these errors to help with certain
issues, and some of the failure reports are non-fatal.
However, a get device id command should alway work. If that fails,
nothing else is going to work and it's a pretty good indication that
there's no valid BMC there. So issue and check that command and bail
if it fails.
Reported-by: Ivan T. Ivanov <iivanov@suse.de>
Signed-off-by: Corey Minyard <corey@minyard.net>
---
drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
Ivan, is it possible for you to test this patch on the broken system?
It should work based on what you reported, but it's nice to be sure.
Also, I discovered that the detect function is kind of bogus, it only
works on an address list that isn't present (any more). However, I
re-used it for my purposes in the probe function.
Thanks.
diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
index e8e7b832c060..4c403e7a9fc8 100644
--- a/drivers/char/ipmi/ipmi_ssif.c
+++ b/drivers/char/ipmi/ipmi_ssif.c
@@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
rv = do_cmd(client, 2, msg, &len, resp);
if (rv)
rv = -ENODEV;
- else
+ else {
+ if (len < 3) {
+ rv = -ENODEV;
+ } else {
+ struct ipmi_device_id id;
+
+ rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
+ resp + 2, len - 2, &id);
+ if (rv)
+ rv = -ENODEV; /* Error means a BMC probably isn't there. */
+ }
+ if (!rv && info)
strscpy(info->type, DEVICE_NAME, I2C_NAME_SIZE);
+ }
kfree(resp);
return rv;
}
@@ -1704,6 +1716,16 @@ static int ssif_probe(struct i2c_client *client)
ipmi_addr_src_to_str(ssif_info->addr_source),
client->addr, client->adapter->name, slave_addr);
+ /*
+ * Send a get device id command and validate its response to
+ * make sure a valid BMC is there.
+ */
+ rv = ssif_detect(client, NULL);
+ if (rv) {
+ dev_err(&client->dev, "Not present\n");
+ goto out;
+ }
+
/* Now check for system interface capabilities */
msg[0] = IPMI_NETFN_APP_REQUEST << 2;
msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
--
2.34.1
Hi Corey, On 08-20 20:05, Corey Minyard wrote: > > If an IPMI SSIF device is probed and there is something there, but > probably not an actual BMC, the code would just issue a lot of errors > before it failed. We kind of need these errors to help with certain > issues, and some of the failure reports are non-fatal. > > However, a get device id command should alway work. If that fails, > nothing else is going to work and it's a pretty good indication that > there's no valid BMC there. So issue and check that command and bail > if it fails. > > Reported-by: Ivan T. Ivanov <iivanov@suse.de> > Signed-off-by: Corey Minyard <corey@minyard.net> > --- > drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) Do you plan to merge this fix? Regards, Ivan
On Tue, Sep 10, 2024 at 5:19 AM Ivan T. Ivanov <iivanov@suse.de> wrote: > > Hi Corey, > > On 08-20 20:05, Corey Minyard wrote: > > > > If an IPMI SSIF device is probed and there is something there, but > > probably not an actual BMC, the code would just issue a lot of errors > > before it failed. We kind of need these errors to help with certain > > issues, and some of the failure reports are non-fatal. > > > > However, a get device id command should alway work. If that fails, > > nothing else is going to work and it's a pretty good indication that > > there's no valid BMC there. So issue and check that command and bail > > if it fails. > > > > Reported-by: Ivan T. Ivanov <iivanov@suse.de> > > Signed-off-by: Corey Minyard <corey@minyard.net> > > --- > > drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++- > > 1 file changed, 23 insertions(+), 1 deletion(-) > > Do you plan to merge this fix? Yes, it's queued for the next release window. I should have responded with that. -corey > > Regards, > Ivan >
On 09-10 06:30, Corey Minyard wrote: > > > > > > If an IPMI SSIF device is probed and there is something there, but > > > probably not an actual BMC, the code would just issue a lot of errors > > > before it failed. We kind of need these errors to help with certain > > > issues, and some of the failure reports are non-fatal. > > > > > > However, a get device id command should alway work. If that fails, > > > nothing else is going to work and it's a pretty good indication that > > > there's no valid BMC there. So issue and check that command and bail > > > if it fails. > > > > > > Reported-by: Ivan T. Ivanov <iivanov@suse.de> > > > Signed-off-by: Corey Minyard <corey@minyard.net> > > > --- > > > drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++- > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > Do you plan to merge this fix? > > Yes, it's queued for the next release window. I should have responded > with that. > Thank you! No worries. Regards, Ivan
Hi Corey,
On 08-20 20:05, Corey Minyard wrote:
>
> If an IPMI SSIF device is probed and there is something there, but
> probably not an actual BMC, the code would just issue a lot of errors
> before it failed. We kind of need these errors to help with certain
> issues, and some of the failure reports are non-fatal.
>
> However, a get device id command should alway work. If that fails,
> nothing else is going to work and it's a pretty good indication that
> there's no valid BMC there. So issue and check that command and bail
> if it fails.
>
> Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> Signed-off-by: Corey Minyard <corey@minyard.net>
> ---
> drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> Ivan, is it possible for you to test this patch on the broken system?
This exact system is not available to me at the moment. I have few
other machines on which I could test this.
> It should work based on what you reported, but it's nice to be sure.
>
> Also, I discovered that the detect function is kind of bogus, it only
> works on an address list that isn't present (any more). However, I
> re-used it for my purposes in the probe function.
>
> Thanks.
>
> diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> index e8e7b832c060..4c403e7a9fc8 100644
> --- a/drivers/char/ipmi/ipmi_ssif.c
> +++ b/drivers/char/ipmi/ipmi_ssif.c
> @@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
> rv = do_cmd(client, 2, msg, &len, resp);
> if (rv)
> rv = -ENODEV;
What is my worry is that in case of SMBus errors, device is there but
for some reason it got stuck/crashed or whatever, so will get out of
detect function from here and with ENODEV return code probe function
will be called for no reason.
> - else
> + else {
> + if (len < 3) {
> + rv = -ENODEV;
No point to call probe(), right?
> + } else {
> + struct ipmi_device_id id;
> +
> + rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
> + resp + 2, len - 2, &id);
> + if (rv)
> + rv = -ENODEV; /* Error means a BMC probably isn't there. */
Same.
> + }
> + if (!rv && info)
> strscpy(info->type, DEVICE_NAME, I2C_NAME_SIZE);
> + }
> kfree(resp);
> return rv;
> }
> @@ -1704,6 +1716,16 @@ static int ssif_probe(struct i2c_client *client)
> ipmi_addr_src_to_str(ssif_info->addr_source),
> client->addr, client->adapter->name, slave_addr);
>
> + /*
> + * Send a get device id command and validate its response to
> + * make sure a valid BMC is there.
> + */
> + rv = ssif_detect(client, NULL);
> + if (rv) {
> + dev_err(&client->dev, "Not present\n");
> + goto out;
> + }
> +
The point is that even after this point IPMI device can start failing
to properly communicate with the OS, real SMBus errors, like EREMOTEIO
in my case, but unfortunately code bellow do not handle this very well,
I think.
> /* Now check for system interface capabilities */
> msg[0] = IPMI_NETFN_APP_REQUEST << 2;
> msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
> --
> 2.34.1
>
Regards,
Ivan
On Thu, Aug 22, 2024 at 10:22:55AM +0300, Ivan T. Ivanov wrote:
> Hi Corey,
>
> On 08-20 20:05, Corey Minyard wrote:
> >
> > If an IPMI SSIF device is probed and there is something there, but
> > probably not an actual BMC, the code would just issue a lot of errors
> > before it failed. We kind of need these errors to help with certain
> > issues, and some of the failure reports are non-fatal.
> >
> > However, a get device id command should alway work. If that fails,
> > nothing else is going to work and it's a pretty good indication that
> > there's no valid BMC there. So issue and check that command and bail
> > if it fails.
> >
> > Reported-by: Ivan T. Ivanov <iivanov@suse.de>
> > Signed-off-by: Corey Minyard <corey@minyard.net>
> > ---
> > drivers/char/ipmi/ipmi_ssif.c | 24 +++++++++++++++++++++++-
> > 1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > Ivan, is it possible for you to test this patch on the broken system?
>
> This exact system is not available to me at the moment. I have few
> other machines on which I could test this.
>
> > It should work based on what you reported, but it's nice to be sure.
> >
> > Also, I discovered that the detect function is kind of bogus, it only
> > works on an address list that isn't present (any more). However, I
> > re-used it for my purposes in the probe function.
> >
> > Thanks.
> >
> > diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> > index e8e7b832c060..4c403e7a9fc8 100644
> > --- a/drivers/char/ipmi/ipmi_ssif.c
> > +++ b/drivers/char/ipmi/ipmi_ssif.c
> > @@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
> > rv = do_cmd(client, 2, msg, &len, resp);
> > if (rv)
> > rv = -ENODEV;
>
> What is my worry is that in case of SMBus errors, device is there but
> for some reason it got stuck/crashed or whatever, so will get out of
> detect function from here and with ENODEV return code probe function
> will be called for no reason.
That's not how the i2c code works. See my next comment.
>
> > - else
> > + else {
> > + if (len < 3) {
> > + rv = -ENODEV;
>
> No point to call probe(), right?
Originally (before I add the call from ssif_probe()), this is not involved in
the probe() call. Instead, the detect function is involved in calling a
table of addresses in driver->address_list. So in this case this
function is never called at all from the i2c code, since there is no
address list.
>
> > + } else {
> > + struct ipmi_device_id id;
> > +
> > + rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
> > + resp + 2, len - 2, &id);
> > + if (rv)
> > + rv = -ENODEV; /* Error means a BMC probably isn't there. */
>
> Same.
>
> > + }
> > + if (!rv && info)
> > strscpy(info->type, DEVICE_NAME, I2C_NAME_SIZE);
> > + }
> > kfree(resp);
> > return rv;
> > }
> > @@ -1704,6 +1716,16 @@ static int ssif_probe(struct i2c_client *client)
> > ipmi_addr_src_to_str(ssif_info->addr_source),
> > client->addr, client->adapter->name, slave_addr);
> >
> > + /*
> > + * Send a get device id command and validate its response to
> > + * make sure a valid BMC is there.
> > + */
> > + rv = ssif_detect(client, NULL);
> > + if (rv) {
> > + dev_err(&client->dev, "Not present\n");
> > + goto out;
> > + }
> > +
>
> The point is that even after this point IPMI device can start failing
> to properly communicate with the OS, real SMBus errors, like EREMOTEIO
> in my case, but unfortunately code bellow do not handle this very well,
> I think.
It is possible that the BMC gets rebooted or something between the call
to ssif_detect() and the code below, but the probability is really low.
If it answers a detect, the rest of the things should work.
-corey
>
>
> > /* Now check for system interface capabilities */
> > msg[0] = IPMI_NETFN_APP_REQUEST << 2;
> > msg[1] = IPMI_GET_SYSTEM_INTERFACE_CAPABILITIES_CMD;
> > --
> > 2.34.1
> >
>
> Regards,
> Ivan
>
Hi,
On 08-22 09:40, Corey Minyard wrote:
> > > +++ b/drivers/char/ipmi/ipmi_ssif.c
> > > @@ -1368,8 +1368,20 @@ static int ssif_detect(struct i2c_client *client, struct i2c_board_info *info)
> > > rv = do_cmd(client, 2, msg, &len, resp);
> > > if (rv)
> > > rv = -ENODEV;
> >
> > What is my worry is that in case of SMBus errors, device is there but
> > for some reason it got stuck/crashed or whatever, so will get out of
> > detect function from here and with ENODEV return code probe function
> > will be called for no reason.
>
> That's not how the i2c code works. See my next comment.
>
> >
> > > - else
> > > + else {
> > > + if (len < 3) {
> > > + rv = -ENODEV;
> >
> > No point to call probe(), right?
>
> Originally (before I add the call from ssif_probe()), this is not involved in
> the probe() call. Instead, the detect function is involved in calling a
> table of addresses in driver->address_list. So in this case this
> function is never called at all from the i2c code, since there is no
> address list.
I see, thank you for explanation.
> > >
> > > + /*
> > > + * Send a get device id command and validate its response to
> > > + * make sure a valid BMC is there.
> > > + */
> > > + rv = ssif_detect(client, NULL);
> > > + if (rv) {
> > > + dev_err(&client->dev, "Not present\n");
> > > + goto out;
> > > + }
> > > +
> >
> > The point is that even after this point IPMI device can start failing
> > to properly communicate with the OS, real SMBus errors, like EREMOTEIO
> > in my case, but unfortunately code bellow do not handle this very well,
> > I think.
>
> It is possible that the BMC gets rebooted or something between the call
> to ssif_detect() and the code below, but the probability is really low.
> If it answers a detect, the rest of the things should work.
>
I have my daubs, but patch proposed will fix the issue that I see.
Thank you,
Ivan
© 2016 - 2026 Red Hat, Inc.