[PATCH] ioreq_broadcast(): accept partial broadcast success

Per Bilse posted 1 patch 1 year, 5 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/06ad4b3a67a15192fc986b35e3f2fcd35b2f4c2f.1669383767.git.per.bilse@citrix.com
There is a newer version of this series
xen/arch/x86/hvm/io.c   | 2 +-
xen/common/ioreq.c      | 9 +++++----
xen/include/xen/ioreq.h | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
[PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Per Bilse 1 year, 5 months ago
A change to XAPI varstored causes an unnecessary error message
to be logged in hypervisor.log whenever an RTC timeoffset update
is broadcast.  In extreme cases this could flood the log file.
This patch modifies ioreq_broadcast() to allow partial success.

Signed-off-by: Per Bilse <per.bilse@citrix.com>
---
 xen/arch/x86/hvm/io.c   | 2 +-
 xen/common/ioreq.c      | 9 +++++----
 xen/include/xen/ioreq.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 0309d05cfd..c4022bf7c2 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -60,7 +60,7 @@ void send_timeoffset_req(unsigned long timeoff)
     if ( timeoff == 0 )
         return;
 
-    if ( ioreq_broadcast(&p, true) != 0 )
+    if ( !ioreq_broadcast(&p, true, true) )
         gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n");
 }
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 4617aef29b..1d6ca4d1ac 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -54,7 +54,7 @@ void ioreq_signal_mapcache_invalidate(void)
         .data = ~0UL, /* flush all */
     };
 
-    if ( ioreq_broadcast(&p, false) != 0 )
+    if ( !ioreq_broadcast(&p, false, false) )
         gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
 }
 
@@ -1309,11 +1309,11 @@ int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
     return IOREQ_STATUS_UNHANDLED;
 }
 
-unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
+bool ioreq_broadcast(ioreq_t *p, bool buffered, bool partial)
 {
     struct domain *d = current->domain;
     struct ioreq_server *s;
-    unsigned int id, failed = 0;
+    unsigned int id, sent = 0, failed = 0;
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
@@ -1322,9 +1322,10 @@ unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
 
         if ( ioreq_send(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
             failed++;
+        sent++;
     }
 
-    return failed;
+    return failed == 0 || (partial && failed < sent);
 }
 
 void ioreq_domain_init(struct domain *d)
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index a26614d331..65457ca5ba 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -102,7 +102,7 @@ struct ioreq_server *ioreq_server_select(struct domain *d,
                                          ioreq_t *p);
 int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
                bool buffered);
-unsigned int ioreq_broadcast(ioreq_t *p, bool buffered);
+bool ioreq_broadcast(ioreq_t *p, bool buffered, bool partial);
 void ioreq_request_mapcache_invalidate(const struct domain *d);
 void ioreq_signal_mapcache_invalidate(void);
 
-- 
2.31.1
Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Julien Grall 1 year, 5 months ago
Hi,

On 25/11/2022 14:15, Per Bilse wrote:
> A change to XAPI varstored causes

For those unfamiliar with XAPI (like me), can you explain what was the 
change made?

> an unnecessary error message
> to be logged in hypervisor.log whenever an RTC timeoffset update
> is broadcast.
>  In extreme cases this could flood the log file.

Which should be ratelimited as this is using guest error loglevel. But I 
think this is irrelevant here. It would be more relevant to explain why 
it is OK to allow a partial broadcast.

> This patch modifies ioreq_broadcast() to allow partial success.

The commit message is quite vague, so it is hard to know what you are 
trying to solve exactly. AFAIU, there are two reasons for 
ioreq_broadcast to fails:
  1) The IOREQ server didn't register the bufioreq
  2) The IOREQ buffer page is full

While I would agree that the error message is not necessary for 1) (the 
IOREQ server doesn't care about the event), I would disagree for 2) 
because it would indicate something went horribly wrong in the IOREQ 
server and there is a chance your domain may misbehave afterwards.

Cheers,

-- 
Julien Grall
Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Jan Beulich 1 year, 4 months ago
On 26.11.2022 23:19, Julien Grall wrote:
> On 25/11/2022 14:15, Per Bilse wrote:
>> A change to XAPI varstored causes
> 
> For those unfamiliar with XAPI (like me), can you explain what was the 
> change made?
> 
>> an unnecessary error message
>> to be logged in hypervisor.log whenever an RTC timeoffset update
>> is broadcast.
>>  In extreme cases this could flood the log file.
> 
> Which should be ratelimited as this is using guest error loglevel. But I 
> think this is irrelevant here. It would be more relevant to explain why 
> it is OK to allow a partial broadcast.
> 
>> This patch modifies ioreq_broadcast() to allow partial success.
> 
> The commit message is quite vague, so it is hard to know what you are 
> trying to solve exactly. AFAIU, there are two reasons for 
> ioreq_broadcast to fails:
>   1) The IOREQ server didn't register the bufioreq
>   2) The IOREQ buffer page is full
> 
> While I would agree that the error message is not necessary for 1) (the 
> IOREQ server doesn't care about the event), I would disagree for 2) 
> because it would indicate something went horribly wrong in the IOREQ 
> server and there is a chance your domain may misbehave afterwards.

In addition I think ignoring failure (and, as said by Julien, only because
of no bufioreq being registered) is (kind of implicitly) valid only for
buffered requests. Hence I'm unconvinced of the need of a new boolean
function parameter. Instead I think we need a new IOREQ_STATUS_... value
representing the "not registered" case. And that could then be used by
ioreq_broadcast() to skip incrementing of "failed".

Jan
Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Per Bilse (3P) 1 year, 4 months ago
On 28/11/2022 08:21, Jan Beulich wrote:
> In addition I think ignoring failure (and, as said by Julien, only because
> of no bufioreq being registered) is (kind of implicitly) valid only for
> buffered requests. Hence I'm unconvinced of the need of a new boolean
> function parameter. Instead I think we need a new IOREQ_STATUS_... value
> representing the "not registered" case. And that could then be used by
> ioreq_broadcast() to skip incrementing of "failed".

I think I have been thinking about this the wrong way.  My thinking has
been that dropping an update (buffered or not) would be correct only
in special cases such as timeoffset, and it would therefore generally
be an error if a buffered update was directed to a handler that hadn't
registered for buffered updates.  The thinking in this proposal suggests
that handlers are generally free to choose whether or not to accept
buffered updates.  I wouldn't have suspected this, but I assume then
that this is perfectly reasonable in the context of Xen IO handling, so
I'll go with that.

Best,

   -- Per

Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Roger Pau Monné 1 year, 4 months ago
On Mon, Nov 28, 2022 at 09:21:47AM +0100, Jan Beulich wrote:
> On 26.11.2022 23:19, Julien Grall wrote:
> > On 25/11/2022 14:15, Per Bilse wrote:
> >> A change to XAPI varstored causes
> > 
> > For those unfamiliar with XAPI (like me), can you explain what was the 
> > change made?

One ioreq client used by XenServer doesn't handle buffered ioreqs, so
the broadcasted TIMEOFFSET always triggers an error due to that ioreq
not handling it.  While not harmful, it's still annoying to get the
messages on the hypervisor console for something that's not really an
error.

The description can indeed be reworded to not mention XenServer
specific components, something like:

"Avoid printing an error message when a broadcast buffered ioreq is
not handled by all registered clients, as long as the failure is
strictly because the client doesn't handle buffered ioreqs."

> >> an unnecessary error message
> >> to be logged in hypervisor.log whenever an RTC timeoffset update
> >> is broadcast.
> >>  In extreme cases this could flood the log file.
> > 
> > Which should be ratelimited as this is using guest error loglevel. But I 
> > think this is irrelevant here. It would be more relevant to explain why 
> > it is OK to allow a partial broadcast.
> > 
> >> This patch modifies ioreq_broadcast() to allow partial success.
> > 
> > The commit message is quite vague, so it is hard to know what you are 
> > trying to solve exactly. AFAIU, there are two reasons for 
> > ioreq_broadcast to fails:
> >   1) The IOREQ server didn't register the bufioreq
> >   2) The IOREQ buffer page is full
> > 
> > While I would agree that the error message is not necessary for 1) (the 
> > IOREQ server doesn't care about the event), I would disagree for 2) 
> > because it would indicate something went horribly wrong in the IOREQ 
> > server and there is a chance your domain may misbehave afterwards.
> 
> In addition I think ignoring failure (and, as said by Julien, only because
> of no bufioreq being registered) is (kind of implicitly) valid only for
> buffered requests. Hence I'm unconvinced of the need of a new boolean
> function parameter. Instead I think we need a new IOREQ_STATUS_... value
> representing the "not registered" case. And that could then be used by
> ioreq_broadcast() to skip incrementing of "failed".

So introduce an IOREQ_STATUS_UNREGISTERED return code and don't
increase failed if buffered == true and UNREGISTERED is returned,
would that be acceptable?

Thanks, Roger.
Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Jan Beulich 1 year, 4 months ago
On 28.11.2022 12:06, Roger Pau Monné wrote:
> On Mon, Nov 28, 2022 at 09:21:47AM +0100, Jan Beulich wrote:
>> On 26.11.2022 23:19, Julien Grall wrote:
>>> On 25/11/2022 14:15, Per Bilse wrote:
>>>> This patch modifies ioreq_broadcast() to allow partial success.
>>>
>>> The commit message is quite vague, so it is hard to know what you are 
>>> trying to solve exactly. AFAIU, there are two reasons for 
>>> ioreq_broadcast to fails:
>>>   1) The IOREQ server didn't register the bufioreq
>>>   2) The IOREQ buffer page is full
>>>
>>> While I would agree that the error message is not necessary for 1) (the 
>>> IOREQ server doesn't care about the event), I would disagree for 2) 
>>> because it would indicate something went horribly wrong in the IOREQ 
>>> server and there is a chance your domain may misbehave afterwards.
>>
>> In addition I think ignoring failure (and, as said by Julien, only because
>> of no bufioreq being registered) is (kind of implicitly) valid only for
>> buffered requests. Hence I'm unconvinced of the need of a new boolean
>> function parameter. Instead I think we need a new IOREQ_STATUS_... value
>> representing the "not registered" case. And that could then be used by
>> ioreq_broadcast() to skip incrementing of "failed".
> 
> So introduce an IOREQ_STATUS_UNREGISTERED return code and don't
> increase failed if buffered == true and UNREGISTERED is returned,
> would that be acceptable?

Yes afaic, but Paul is the maintainer of this code. And of course the
new error indicator shouldn't surprise any existing callers.

Jan

Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Paul Durrant 1 year, 4 months ago
On 28/11/2022 12:26, Jan Beulich wrote:
> On 28.11.2022 12:06, Roger Pau Monné wrote:
>> On Mon, Nov 28, 2022 at 09:21:47AM +0100, Jan Beulich wrote:
>>> On 26.11.2022 23:19, Julien Grall wrote:
>>>> On 25/11/2022 14:15, Per Bilse wrote:
>>>>> This patch modifies ioreq_broadcast() to allow partial success.
>>>>
>>>> The commit message is quite vague, so it is hard to know what you are
>>>> trying to solve exactly. AFAIU, there are two reasons for
>>>> ioreq_broadcast to fails:
>>>>    1) The IOREQ server didn't register the bufioreq
>>>>    2) The IOREQ buffer page is full
>>>>
>>>> While I would agree that the error message is not necessary for 1) (the
>>>> IOREQ server doesn't care about the event), I would disagree for 2)
>>>> because it would indicate something went horribly wrong in the IOREQ
>>>> server and there is a chance your domain may misbehave afterwards.
>>>
>>> In addition I think ignoring failure (and, as said by Julien, only because
>>> of no bufioreq being registered) is (kind of implicitly) valid only for
>>> buffered requests. Hence I'm unconvinced of the need of a new boolean
>>> function parameter. Instead I think we need a new IOREQ_STATUS_... value
>>> representing the "not registered" case. And that could then be used by
>>> ioreq_broadcast() to skip incrementing of "failed".
>>
>> So introduce an IOREQ_STATUS_UNREGISTERED return code and don't
>> increase failed if buffered == true and UNREGISTERED is returned,
>> would that be acceptable?
> 
> Yes afaic, but Paul is the maintainer of this code. And of course the
> new error indicator shouldn't surprise any existing callers.
> 

A new status code does indeed seem like the cleanest way forward.

   Paul

Re: [PATCH] ioreq_broadcast(): accept partial broadcast success
Posted by Per Bilse (3P) 1 year, 4 months ago
On 28/11/2022 08:21, Jan Beulich wrote:
> On 26.11.2022 23:19, Julien Grall wrote:
>>
>> The commit message is quite vague, so it is hard to know what you are
>> trying to solve exactly. AFAIU, there are two reasons for
>> ioreq_broadcast to fails:
>>    1) The IOREQ server didn't register the bufioreq
>>    2) The IOREQ buffer page is full
>>
>> While I would agree that the error message is not necessary for 1) (the
>> IOREQ server doesn't care about the event), I would disagree for 2)
>> because it would indicate something went horribly wrong in the IOREQ
>> server and there is a chance your domain may misbehave afterwards.
> 
> In addition I think ignoring failure (and, as said by Julien, only because
> of no bufioreq being registered) is (kind of implicitly) valid only for
> buffered requests. Hence I'm unconvinced of the need of a new boolean
> function parameter. Instead I think we need a new IOREQ_STATUS_... value
> representing the "not registered" case. And that could then be used by
> ioreq_broadcast() to skip incrementing of "failed".

Hi guys, and thank you very much for the feedback.  As I'm sure you've 
guessed I'm a newbie in Xen terms, so apologies for not getting things 
quite right.

Varstored dropped support for buffered ioreqs, hence the persistent 
error message(s), and the proposed fix was derived from discussion in 
Citrix's hypervisor team.  The 'partial' parameter could arguably be 
considered a case of (undesirable) special case handling, but 
ioreq_broadcast() is called from only two places in the code, so this 
seemed to be the lightest and simplest solution.  I'll have to defer to 
more knowledgeable team members for further thoughts on this.

Best,

   -- Per