ipmi: Fix issues with BMCs that report event and message incorrectly

[PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly

Posted by Corey Minyard 1 month, 3 weeks ago

Matt reported that there were issues with the IPMI driver getting wedged
in some cases.  It turns out that the BMC was not reporting an error as
it should have (per the spec) when the event queue was empty.  The IPMI
driver would then request the next event, and so on, wedging the driver.

The BMC sits on a fuzzy line between a trusted devices and a remote and
possibly untrusted device.  If you compromised a BMC you have all sorts
of tools you can use to attack the host: the reset line, interrupts,
and usually access to write the system firmware and possibly devices
like disk drives, serial ports and VGA consoles.  So attacking through
this interface would not be the first thing you would do.  But it is an
possible attack point.

I'm assuming that the BMC was delivering an empty message when this
happens, so the first patch checks the message length to make sure it's
a valid message.  It's a good check no matter what, so it's in
whether that's the issue or not.

The second patch limits the number of events or messages that can
be fetched at a time to 10.  This is a good thing to do, anyway.
If more message or events were present, the next flag check should
get them.  So it's a more general fix.

I looked at adding the patch Matt suggested, doing a timeout on the
wait, but that introduces some race conditions if the response comes
back late.  That will require some more thought.

The timeouts with IPMI can be pretty long, the spec specifies fairly
long timeouts, 5 seconds waiting for the BMC to respond to anything.
So failing an operation can take some time, and reducing the timeouts
is probably a bad idea.  No rationale is given in the spec, but I'm
guessing it expects that a BMC in restart can recover within 5 seconds,
so it gives timeouts so the BMC is always available within that tie.

The spec gives you the gist that the BMC should always be available
on a system that has one.  So the driver (at the beginning) followed
that.

Thus the driver tries 10 times for a message before it gives up, giving
50 seconds total failure time for a message.  That is not in the spec (I
don't think) so that could be made selectable on a per-message basis.
There are already mechanisms for this available in the APIs; I'll look
at that.

-corey

Re: [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly

Posted by Matt Fleming 1 month, 3 weeks ago

On Tue, Apr 21, 2026 at 07:42:42AM -0500, Corey Minyard wrote:
> Matt reported that there were issues with the IPMI driver getting wedged
> in some cases.  It turns out that the BMC was not reporting an error as
> it should have (per the spec) when the event queue was empty.  The IPMI
> driver would then request the next event, and so on, wedging the driver.

Thanks for replying so quickly, Corey. I'll test these out.

One bit of info I pulled out of the stuck machine is that the response
looks properly formed.

I sampled the first 8 entries and they were all identical 19-byte
successful READ_EVENT_MSG_BUFFER responses:

  1c 35 00 55 55 c0 41 a7 00 00 00 00 00 3a ff 00 ff ff ff

So on this machine, the event replies do not look short or malformed;
they look like repeated successful event-buffer reads with the same
payload.

Thanks,
Matt

Re: [Openipmi-developer] [PATCH 0/1] ipmi: Fix issues with BMCs that report event and message incorrectly

Posted by Jian Zhang 1 month, 3 weeks ago

> 2026年4月22日 06:24，Matt Fleming <matt@readmodwrite.com> 写道：
> 
> On Tue, Apr 21, 2026 at 07:42:42AM -0500, Corey Minyard wrote:
>> Matt reported that there were issues with the IPMI driver getting wedged
>> in some cases.  It turns out that the BMC was not reporting an error as
>> it should have (per the spec) when the event queue was empty.  The IPMI
>> driver would then request the next event, and so on, wedging the driver.
> 
> Thanks for replying so quickly, Corey. I'll test these out.
> 
> One bit of info I pulled out of the stuck machine is that the response
> looks properly formed.
> 
> I sampled the first 8 entries and they were all identical 19-byte
> successful READ_EVENT_MSG_BUFFER responses:
> 
>  1c 35 00 55 55 c0 41 a7 00 00 00 00 00 3a ff 00 ff ff ff
> 

Perhaps I know where this data comes from. 
During a previous debugging session (where ipmitool v1.8.19 failed on sensor list due to an underflow in
nr_numbers, which has since been fixed), I noticed this behavior. However, I
’m not sure why it is implemented this way or what exactly this command is intended to do.
If you are running on OpenBMC, it is very likely related to this part, 
where a fixed value is always returned (especially if the KCS channel happens to be configured as 15):

See: https://github.com/openbmc/phosphor-host-ipmid/blob/master/systemintfcmds.cpp#L35

Jian.

> So on this machine, the event replies do not look short or malformed;
> they look like repeated successful event-buffer reads with the same
> payload.
> 
> Thanks,
> Matt
> 
> 
> _______________________________________________
> Openipmi-developer mailing list
> Openipmi-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer