drivers/firmware/arm_scmi/mailbox.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
On txdone_irq mode, tx_tick is done from mbox_chan_txdone.
Calling to mbox_client_txdone could get error message
and return directly, add a check to avoid this.
Signed-off-by: Pin-Chuan Liu <flash.liu@mediatek.com>
---
V1 -> V2:
- Remove Change-Id
- Use single line comment
drivers/firmware/arm_scmi/mailbox.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_scmi/mailbox.c b/drivers/firmware/arm_scmi/mailbox.c
index b8d470417e8f..d391463e3208 100644
--- a/drivers/firmware/arm_scmi/mailbox.c
+++ b/drivers/firmware/arm_scmi/mailbox.c
@@ -8,6 +8,7 @@
#include <linux/err.h>
#include <linux/device.h>
+#include <linux/mailbox_controller.h>
#include <linux/mailbox_client.h>
#include <linux/of.h>
#include <linux/of_address.h>
@@ -275,7 +276,10 @@ static void mailbox_mark_txdone(struct scmi_chan_info *cinfo, int ret,
* Unfortunately, we have to kick the mailbox framework after we have
* received our message.
*/
- mbox_client_txdone(smbox->chan, ret);
+
+ /* With txdone_irq mode, kick can be done by mbox_chan_txdone */
+ if (!(smbox->chan->mbox->txdone_irq))
+ mbox_client_txdone(smbox->chan, ret);
}
static void mailbox_fetch_response(struct scmi_chan_info *cinfo,
--
2.18.0
On Thu, Feb 01, 2024 at 05:57:52PM +0800, Pin-Chuan Liu wrote: > On txdone_irq mode, tx_tick is done from mbox_chan_txdone. > Calling to mbox_client_txdone could get error message > and return directly, add a check to avoid this. > > Signed-off-by: Pin-Chuan Liu <flash.liu@mediatek.com> Hi Pin-Chuan, thanks for this, it was indeed sort of on my todo-list too, to allow MHUs equipped with Tx-Ack IRQ to work with SCMI. Having said that, this looks good to me in general, my only pain points are: the fact that we have to peek into the controller structures to know how it is configured, BUT I wouldn't know how to do it in any other way in fact as pof now...; and the fact that there is a constant runtime conditional check for each message sent even though the tx_ack irq presence can be detected once for all at setup time, BUT even this is not easily solvable as of now in the SCMI stack. So, after all of this babbling of mine, I would say that your patch is fine as it is, also because it is easy to backport; next week when I am back I'll give it a go on a couple of platforms and get back to you with a proper review/test. Thanks again ! Cristian
Hi Cristian, Thanks for kindly review and explains. Yeah, I have ever tried another way to skip the call (i.e. let mark_txdone be null). However, it looks not easy and also to backport. Awaiting your test results and further suggestions, thanks. Regards, Pin-Chuan On Fri, 2024-02-02 at 10:56 +0000, Cristian Marussi wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On Thu, Feb 01, 2024 at 05:57:52PM +0800, Pin-Chuan Liu wrote: > > On txdone_irq mode, tx_tick is done from mbox_chan_txdone. > > Calling to mbox_client_txdone could get error message > > and return directly, add a check to avoid this. > > > > Signed-off-by: Pin-Chuan Liu <flash.liu@mediatek.com> > > Hi Pin-Chuan, > > thanks for this, it was indeed sort of on my todo-list too, to allow > MHUs > equipped with Tx-Ack IRQ to work with SCMI. > > Having said that, this looks good to me in general, my only pain > points > are: the fact that we have to peek into the controller structures to > know > how it is configured, BUT I wouldn't know how to do it in any other > way in fact as pof now...; and the fact that there is a constant > runtime > conditional check for each message sent even though the tx_ack irq > presence > can be detected once for all at setup time, BUT even this is not > easily > solvable as of now in the SCMI stack. > > So, after all of this babbling of mine, I would say that your patch > is fine > as it is, also because it is easy to backport; next week when I am > back I'll > give it a go on a couple of platforms and get back to you with a > proper > review/test. > > Thanks again ! > Cristian
Hi Cristian, Kindly ping :) Thanks. Pin-Chuan On Fri, 2024-02-02 at 20:36 +0800, Pin-Chuan Liu wrote: > Hi Cristian, > > Thanks for kindly review and explains. Yeah, I have ever tried > another > way to skip the call (i.e. let mark_txdone be null). However, it > looks > not easy and also to backport. > > Awaiting your test results and further suggestions, thanks. > > Regards, > Pin-Chuan >
On Wed, Mar 06, 2024 at 06:13:48AM +0000, Flash Liu (劉炳傳) wrote: > Hi Cristian, > > Kindly ping :) > Hi Pin-Chuan, sorry for the delay...I have NOT forgot :D, indeed I was just testing yesterday with some mailbox IP of ours equipped with a TxAck IRQ and I would have some question for you because I've seen some anomalies while using this: does your solution work reliably in your setup ALSO when multiple SCMI transactions are attempted ? (like cpufreq issuing cmds while polling a sensor from some other thread) ...I'll dig deeper today in this matter, but my current suspect is that using the mailbox TXAck IRQ to let the controller tick the internal mailbox queues does not play well with SCMI, since the SCMI TX channel is really the SCMI "a2p" bidirectional channel and it is associated with just only one shmem area used to hold both the egressing CMD and to receive the incoming REPLY from the platform: so if you let the controller tick the queues as soon as you received the TXAck you are telling the mailbox subsystem to queue another message on the same area while you are not really done, since only the client know when it's done with the whole SCMI transaction and the reply has been fetched. Indeed, for these reasons we have the BUSY/FREE bit in the shmem to protect it from pending new requests until the previous one has completed, but when the waited-for reply comes in, the platform would have cleared the BUSY bit and let the new queued message overwrite the pending reply prematurely, and one message is lost... ...but as said I want to delve deeper into this, as of now just suppositions and maybe I am just missing something more that has to be configured properly... Thanks, Cristian > Thanks. > Pin-Chuan > > On Fri, 2024-02-02 at 20:36 +0800, Pin-Chuan Liu wrote: > > Hi Cristian, > > > > Thanks for kindly review and explains. Yeah, I have ever tried > > another > > way to skip the call (i.e. let mark_txdone be null). However, it > > looks > > not easy and also to backport. > > > > Awaiting your test results and further suggestions, thanks. > > > > Regards, > > Pin-Chuan > > > > > ************* MEDIATEK Confidentiality Notice ******************** > The information contained in this e-mail message (including any > attachments) may be confidential, proprietary, privileged, or otherwise > exempt from disclosure under applicable laws. It is intended to be > conveyed only to the designated recipient(s). Any use, dissemination, > distribution, printing, retaining or copying of this e-mail (including its > attachments) by unintended recipient(s) is strictly prohibited and may > be unlawful. If you are not an intended recipient of this e-mail, or believe > that you have received this e-mail in error, please notify the sender > immediately (by replying to this e-mail), delete any and all copies of > this e-mail (including any attachments) from your system, and do not > disclose the content of this e-mail to any other person. Thank you!
On Wed, Mar 06, 2024 at 09:49:16AM +0000, Cristian Marussi wrote: > On Wed, Mar 06, 2024 at 06:13:48AM +0000, Flash Liu (劉炳傳) wrote: > > Hi Cristian, > > > > Kindly ping :) > > > > Hi Pin-Chuan, > > sorry for the delay...I have NOT forgot :D, indeed I was just testing > yesterday with some mailbox IP of ours equipped with a TxAck IRQ and I > would have some question for you because I've seen some anomalies while > using this: does your solution work reliably in your setup ALSO when > multiple SCMI transactions are attempted ? (like cpufreq issuing cmds > while polling a sensor from some other thread) > > ...I'll dig deeper today in this matter, but my current suspect is that > using the mailbox TXAck IRQ to let the controller tick the internal mailbox > queues does not play well with SCMI, since the SCMI TX channel is really the > SCMI "a2p" bidirectional channel and it is associated with just only one shmem > area used to hold both the egressing CMD and to receive the incoming REPLY > from the platform: so if you let the controller tick the queues as soon as you > received the TXAck you are telling the mailbox subsystem to queue another message > on the same area while you are not really done, since only the client know > when it's done with the whole SCMI transaction and the reply has been fetched. > > Indeed, for these reasons we have the BUSY/FREE bit in the shmem to protect it > from pending new requests until the previous one has completed, but when the > waited-for reply comes in, the platform would have cleared the BUSY bit and > let the new queued message overwrite the pending reply prematurely, and one > message is lost... > > ...but as said I want to delve deeper into this, as of now just suppositions > and maybe I am just missing something more that has to be configured > properly... > > Thanks, > Cristian > Hi again :D, so articulating more on my supposition that TxAck-capable mailboxes and SCMI do not play well together (and would not be worth either really...) Consider the following scenario. 1. scmi: mbox_send_message(X) is called from SCMI stack to send mesg-X on the a2p channel (a command) 2. mbox: mesg-X is 2a. queued by mbox subsystem [add_to_rbuf(X)] 2b. submitted for transmission [msg_submit(X)] 2c. prepared by SCMI clk->tx_prepare 2d. finally sent via mhu driver .send_data 2e. mesg-X is now an active_req for mbox and in-flight for SCMI 3. scmi: ANOTHER mesg-Y tx is attempted via mbox_send_message(Y) 4. mbox: mesg-Y is 4a. queued by mbox_subsys [add_to_rbuf()] 4b. NOT submitted since there is already an active_req=mesg-X pending Any further SCMI mesg TX attempt will behave similarly: queued/not_submitted till at some in the future someone calls the txdone routines, which in turn calls tx_tick()...this SOMEONE can be the client, like it is now, or the controller if it is configured to use the TxAck IRQ (as per-your-patch)... ...so lets see what happen in your TxAck enabled case. 5. TxAck IRQ received, transmission of mesg-X has been successfull (NOTE that SCMI at this point is still waiting for a reply to mesg-X..) 5a. controller calls mbox_chan_txdone() 5b. mbox in turn calls tx_tick 5c. active_req is cleared and next queued mesg-Y is submitted 5d. mesg-Y transmission gets anyway stuck on cl->tx_prepare since we check the SCMI shmem BUSY bit and busy-loop there till it clears: this clearing can happen ONLY after the mesg-X reply has come through, since it is the platform SCMI server that clears it having delivered the reply in the shmem. 6. platform SCMI server replies to mesg-X finally: 6a. platform writes reply in shmem 6b. platform clears BUSY bit -- note SCMI stack is still waiting for a reply at this point... so waiting for an IRQ OR by simply spinning on that same BUSY bit if polling mode was requested for the transaction.... ...lets assume you are in IRQ mode: 7. mesg-Y sender which was spinning on BUSY bit (blocked on tx_prepare) is immediately cleared to send and so tx_prepare can proceed further and completely overwrite the just received mesg-X, which is now LOST ..in case you were polling I guess you will have anyway some corruption due to the race between the polling-mesg-X-receiver retrieving the reply and the same tx_prepare codeflow as above... Indeed, the spec says that you should protect the channel till the reply has been retrieved from the SCMI (even after the BUSY bit is cleared), and in other transport we DO have some form of locks, BUT here in mailboxes there is not since it is NOT needed IF you stick to the non-TxAck original behaviour, since the tx_tick, as it is now, will be run by the SCMI stack ONLY after it has waited for mesg-X and retrieved the mesg-X-reply payload ...not before. Instead, if you enable the TxAck mode you are basically letting the controller itself issue the tx_tick(), which means "previous TX is done, please proceed with the next", BUT the current TX is really NOT done at all as intended by the client (SCMI), since the reply is missing and the only entity which can have the whole picture about when a transaction is completed (or timed-out) is the SCMI client. As said, I think the fundamental clash is between what the mailbox subsystem considers a TXDONE event (and related actions) and what instead is considered a completed transaction on the SCMI a2p channel: i.e. CMD_sent + REPLY_retrieved. At the end, anyway, would it be worth in any way to leverage such TxAck capabilities (somehow) of a mailbox in the SCMI world ? I mean, even if we make this work, what is supposed to happen better and faster when using a TxAck instead of a TX_polling mode like it is now ? ...because the SCMI stack cannot really do anything with this information in this case, given that there is just one single a2p_shem area for sending command and receiving replies...it has just to wait anyway even after the TxAck... ..maybe it is a bit more power-favourable to sleep_wait for the TxAck IRQ instead of polling the MHU regs ?... other than this the TxAck means nothing really in the context of the SCMI world, since you cannot safely queue anything more till the previous exchange has fully completed... ...in other non-SCMI scenarios that I experimented with, it really makes a difference havig the TxAck since it avoids all the internal polling/requeueing dance in the mailbox subsystem, but in this case I think is all made useless by the way SCMI/SMT based transport works...i.e. using a single shmem area for the a2p channel.. ...not really a short and sweet email... :P ... any feedback or further ideas are welcome anyway...especially if you can prove that all of the above is somehow wrong, or that there is a good reason to leverage the TxAck capable mailboxes :D Thanks, Cristian
Hi Cristian, Thanks for reply. :) The uploaded patch is just to avoid getting error logs, since mbox_client_txdone() will check the mode. About your mentioned scenario of TxACK irq mode, not sure, but do you mean that mbox driver may not "copy out" the REPLY mesg-X at shmem, before it calls to mbox_chan_txdone()? Could it become safe if we copy the REPLY mesg to a buffer and then issue the tx_tick? In addition, considering the following scenario: mesg-X (thread-X: low priority) mesg-Y (thread-Y: high priority) mesg-Z (thread-Z: high priority) 1. scmi: thread-X calls mbox_send_message(X) to send mesg-X, and goes to wait_completion() 2. mbox: mesg-X is sent 3. scmi: ANOTHER mesg-Y tx is attempted via mbox_send_message(Y), thread-Y goest to wait_completion() 3'. scmi: ANOTHER mesg-Z tx is attempted via mbox_send_message(Z), thread-Z goest to wait_completion() 4. mbox: mesg-Y is queued 4'. mbox: mesg-Z is queued 5. mbox irq received, ISR notify complete to thread-X, however system is busy on other high priority threads, so, thread-X doesn't back soon 6. mesg-Y, mesg-Z get timed-out. Could the timeout situtaion of mesg-Y and Z be reduced, if tx_tick in ISR? ... explains something might for TxACK irq mode... maybe you have other observation or suggestions about this priority scenario. Thanks again, Pin-Chuan On Wed, 2024-03-06 at 16:08 +0000, Cristian Marussi wrote: > On Wed, Mar 06, 2024 at 09:49:16AM +0000, Cristian Marussi wrote: > > On Wed, Mar 06, 2024 at 06:13:48AM +0000, Flash Liu (劉炳傳) wrote: > > > Hi Cristian, > > > > > > Kindly ping :) > > > > > > > Hi Pin-Chuan, > > > > sorry for the delay...I have NOT forgot :D, indeed I was just > > testing > > yesterday with some mailbox IP of ours equipped with a TxAck IRQ > > and I > > would have some question for you because I've seen some anomalies > > while > > using this: does your solution work reliably in your setup ALSO > > when > > multiple SCMI transactions are attempted ? (like cpufreq issuing > > cmds > > while polling a sensor from some other thread) > > > > ...I'll dig deeper today in this matter, but my current suspect is > > that > > using the mailbox TXAck IRQ to let the controller tick the internal > > mailbox > > queues does not play well with SCMI, since the SCMI TX channel is > > really the > > SCMI "a2p" bidirectional channel and it is associated with just > > only one shmem > > area used to hold both the egressing CMD and to receive the > > incoming REPLY > > from the platform: so if you let the controller tick the queues as > > soon as you > > received the TXAck you are telling the mailbox subsystem to queue > > another message > > on the same area while you are not really done, since only the > > client know > > when it's done with the whole SCMI transaction and the reply has > > been fetched. > > > > Indeed, for these reasons we have the BUSY/FREE bit in the shmem to > > protect it > > from pending new requests until the previous one has completed, but > > when the > > waited-for reply comes in, the platform would have cleared the BUSY > > bit and > > let the new queued message overwrite the pending reply prematurely, > > and one > > message is lost... > > > > ...but as said I want to delve deeper into this, as of now just > > suppositions > > and maybe I am just missing something more that has to be > > configured > > properly... > > > > Thanks, > > Cristian > > > > Hi again :D, > > so articulating more on my supposition that TxAck-capable mailboxes > and > SCMI do not play well together (and would not be worth either > really...) > > Consider the following scenario. > > 1. scmi: mbox_send_message(X) is called from SCMI stack to send mesg- > X > on the a2p channel (a command) > > 2. mbox: mesg-X is > 2a. queued by mbox subsystem [add_to_rbuf(X)] > 2b. submitted for transmission [msg_submit(X)] > 2c. prepared by SCMI clk->tx_prepare > 2d. finally sent via mhu driver .send_data > 2e. mesg-X is now an active_req for mbox and in-flight for SCMI > > 3. scmi: ANOTHER mesg-Y tx is attempted via mbox_send_message(Y) > > 4. mbox: mesg-Y is > 4a. queued by mbox_subsys [add_to_rbuf()] > 4b. NOT submitted since there is already an active_req=mesg-X > pending > > Any further SCMI mesg TX attempt will behave similarly: > queued/not_submitted > till at some in the future someone calls the txdone routines, which > in turn > calls tx_tick()...this SOMEONE can be the client, like it is now, or > the > controller if it is configured to use the TxAck IRQ (as per-your- > patch)... > ...so lets see what happen in your TxAck enabled case. > > 5. TxAck IRQ received, transmission of mesg-X has been successfull > (NOTE that SCMI at this point is still waiting for a reply to mesg- > X..) > > 5a. controller calls mbox_chan_txdone() > 5b. mbox in turn calls tx_tick > 5c. active_req is cleared and next queued mesg-Y is submitted > 5d. mesg-Y transmission gets anyway stuck on cl->tx_prepare since > we check the SCMI shmem BUSY bit and busy-loop there till it > clears: this clearing can happen ONLY after the mesg-X reply > has come through, since it is the platform SCMI server that > clears it having delivered the reply in the shmem. > > 6. platform SCMI server replies to mesg-X finally: > 6a. platform writes reply in shmem > 6b. platform clears BUSY bit > > -- note SCMI stack is still waiting for a reply at this point... > so waiting for an IRQ OR by simply spinning on that same BUSY > bit > if polling mode was requested for the transaction.... > > ...lets assume you are in IRQ mode: > > 7. mesg-Y sender which was spinning on BUSY bit (blocked on > tx_prepare) > is immediately cleared to send and so tx_prepare can proceed > further > and completely overwrite the just received mesg-X, which is now > LOST > > ..in case you were polling I guess you will have anyway some > corruption > due to the race between the polling-mesg-X-receiver retrieving the > reply > and the same tx_prepare codeflow as above... > > Indeed, the spec says that you should protect the channel till the > reply > has been retrieved from the SCMI (even after the BUSY bit is > cleared), and > in other transport we DO have some form of locks, BUT here in > mailboxes > there is not since it is NOT needed IF you stick to the non-TxAck > original > behaviour, since the tx_tick, as it is now, will be run by the SCMI > stack > ONLY after it has waited for mesg-X and retrieved the mesg-X-reply > payload > ...not before. > > Instead, if you enable the TxAck mode you are basically letting the > controller > itself issue the tx_tick(), which means "previous TX is done, please > proceed > with the next", BUT the current TX is really NOT done at all as > intended > by the client (SCMI), since the reply is missing and the only entity > which > can have the whole picture about when a transaction is completed (or > timed-out) > is the SCMI client. > > As said, I think the fundamental clash is between what the mailbox > subsystem considers a TXDONE event (and related actions) and what > instead is considered a completed transaction on the SCMI a2p > channel: > i.e. CMD_sent + REPLY_retrieved. > > At the end, anyway, would it be worth in any way to leverage such > TxAck > capabilities (somehow) of a mailbox in the SCMI world ? > > I mean, even if we make this work, what is supposed to happen better > and faster > when using a TxAck instead of a TX_polling mode like it is now ? > > ...because the SCMI stack cannot really do anything with this > information in > this case, given that there is just one single a2p_shem area for > sending command > and receiving replies...it has just to wait anyway even after the > TxAck... > > ..maybe it is a bit more power-favourable to sleep_wait for the TxAck > IRQ > instead of polling the MHU regs ?... other than this the TxAck means > nothing > really in the context of the SCMI world, since you cannot safely > queue anything > more till the previous exchange has fully completed... > > ...in other non-SCMI scenarios that I experimented with, it really > makes a > difference havig the TxAck since it avoids all the internal > polling/requeueing > dance in the mailbox subsystem, but in this case I think is all made > useless by > the way SCMI/SMT based transport works...i.e. using a single shmem > area for > the a2p channel.. > > ...not really a short and sweet email... :P ... any feedback or > further > ideas are welcome anyway...especially if you can prove that all of > the > above is somehow wrong, or that there is a good reason to leverage > the > TxAck capable mailboxes :D > > Thanks, > Cristian
On Thu, Mar 07, 2024 at 12:31:02PM +0000, Flash Liu (劉炳傳) wrote: > Hi Cristian, > > Thanks for reply. :) > > The uploaded patch is just to avoid getting error logs, since > mbox_client_txdone() will check the mode. ... yes right, the controller will detect the mode indeed based on how it is configured on the DT and issue the tx_tick() when the TxAck is received....your patch avoids the error when the SCMI stack tries to issue a mbox_client_txdone() and it is not expected to do so... ...what I am saying is that the TxAck txdone_irq mode is not compatible with how SCMI is designed to work in terms of transmission on the a2p channel...even if you make it work properly without any errors with your patch... ...so you will avoid the errors and make it work BUT it does not work realiably in case of multiple concurrent TX attempts...in my setup I see anomalies and loss of packets if I enable TxAck mode... > About your mentioned scenario of TxACK irq mode, not sure, but do you > mean that mbox driver may not "copy out" the REPLY mesg-X at shmem, the copy out of the REPLY is done by the SCNI layer, but anyway once the TxAck has been received there is nothing ready to be copied-out...the TxAck just means the requested message has been transmitted and it has been successfully received by the receiver (at least in my MBOX IP .. :P) > before it calls to mbox_chan_txdone()? > Could it become safe if we copy the REPLY mesg to a buffer and then > issue the tx_tick? In my understanding, the packet flow is as follows 1. Linux sends a new cmd with mbox_send_message() mbox subsys - queueus msg - submit: - tx_prepare - wait for FREE_bit on channel - copies msg in shmem area - ring doorbell 2. FW see the doorbell IRQ, a message has arrived - FW reads the incoming cmd message from shmem - FW rings the TxAck doorbell 3. Linux see the TxAck: at this point it knows that the message just sent has been received by FW...nothing more, NO reply is available to be read back at this point...Linux wait_for_completion 4. FW prepares the reply - FW writes reply to the shmem - FW sets the FREE bit in the shmem, channel ownership goes back to agent - FW rings the Doorbell 5. Linux see the RX doorbell IRQ and wakes up - reply is copied out and delivered to caller - mbox_client_txdone is called from mark_tx_done ...when TxAck ack is received at 3, there is nothing we can do Linux side because there is ONE only shmem area per-chandle both for cmd and related replies....so we cannot really do anything useful by knowing that the command we just sent was read successfully from the FW server.. ...moreover if the MBOX controller is in TxAck mode since it was configured like that by the DT, it will kick itself the tx_tick when the TxAck is received... ... that means that if you had another previous mbox_send_message() issued, that message would have been queued by mbox_subsys but not sent since a TX was already inflight, BUT now after the TxAck the tx_tick/msg_submit is called and the next message transmission is attempeted AND such request will now spin waiting for the FREE bit in cl->tx_prepare: this happens because we are at 3. and FW has still to send any reply... ..so you have one message sent and waited-for AND a new cmd queued for TX and spinning on FREE_bit waiting for the channel...so far so good ...but now in this situation imagine what will happen when finally FW replies in 4: as soon as the FREE_bit is flipped the above new transmission spinning inside tx_prepare will acquire the channel and write its payload in it, so OVERWRITING the previous reply that the FW has just written in shmem... In a nutshell, since we only have one shmem for cmd and reply in the a2p channel, we have to wait for the full SCMI transaction to be completed, and the reply to be retrieved, before writing the next command into shmem, BUT this is a knowledge that only the SCMI stack has...the mbox controller just know that a TxAck has been received, it cannot know that it is NOT allowed anyway to send the next... ..so basically in mailbox transport instead of using a lock to protect the channel (like in other transports) till the reply has been read, we let the SCMI agent client to issue the tx_tick only after the transmission has completed...this way no lock is needed because the mbox_subsys wont tx_tick and submit the next transmission till we are done...since the SCMI agent is the only one that can possibly know when it is safe to proceeed with another transmission when the TxAck is received there is nothing that we can copy-out...the reply is not available and there is nothing we can do...and in case of multiple concurrent transmissions you will have lost packets (as I see in my setup)... I would say that the check in your patch for txdone_irq could somehow become a check instead inside the mailbox_chan_setup() to detect such an unsupported controller configuration and bail out. The alternative would be to put a lock on the channel so that we can neutralized the effect of the TxAck ... but what;s the point of having it at this point :P > In addition, considering the following scenario: > > mesg-X (thread-X: low priority) > mesg-Y (thread-Y: high priority) > mesg-Z (thread-Z: high priority) > > 1. scmi: thread-X calls mbox_send_message(X) to send mesg-X, and goes > to wait_completion() > > 2. mbox: mesg-X is sent > > 3. scmi: ANOTHER mesg-Y tx is attempted via mbox_send_message(Y), > thread-Y goest to wait_completion() ...this does not even reach wat_for_completion, it will spin on the FREE_bit in tx_prepare trying to acquire the channel...so it wont even return from mailbox_send_message() > 3'. scmi: ANOTHER mesg-Z tx is attempted via mbox_send_message(Z), > thread-Z goest to wait_completion() > same .. queud on tx_prepare > 4. mbox: mesg-Y is queued > 4'. mbox: mesg-Z is queued > > 5. mbox irq received, ISR notify complete to thread-X, however system > is busy on other high priority threads, > so, thread-X doesn't back soon > ...mmm... one thing to consider indeed is what happens if I have 2 pending high-prio threads spinning on tx_prepare while the mesg-X IRQ arrives...would we be able to serve mesg-X IRQ if the cores are busy-waiting on mesg-Y / mesg-Z ? > 6. mesg-Y, mesg-Z get timed-out. > > Could the timeout situtaion of mesg-Y and Z be reduced, if tx_tick in > ISR? > > ... explains something might for TxACK irq mode... maybe you have other > observation or suggestions about this priority scenario. > I want to think about the threads prio you raised.... Thanks, Cristian > Thanks again, > Pin-Chuan > > On Wed, 2024-03-06 at 16:08 +0000, Cristian Marussi wrote: > > On Wed, Mar 06, 2024 at 09:49:16AM +0000, Cristian Marussi wrote: > > > On Wed, Mar 06, 2024 at 06:13:48AM +0000, Flash Liu (劉炳傳) wrote: > > > > Hi Cristian, > > > > > > > > Kindly ping :) > > > > > > > > > > Hi Pin-Chuan, > > > > > > sorry for the delay...I have NOT forgot :D, indeed I was just > > > testing > > > yesterday with some mailbox IP of ours equipped with a TxAck IRQ > > > and I > > > would have some question for you because I've seen some anomalies > > > while > > > using this: does your solution work reliably in your setup ALSO > > > when > > > multiple SCMI transactions are attempted ? (like cpufreq issuing > > > cmds > > > while polling a sensor from some other thread) > > > > > > ...I'll dig deeper today in this matter, but my current suspect is > > > that > > > using the mailbox TXAck IRQ to let the controller tick the internal > > > mailbox > > > queues does not play well with SCMI, since the SCMI TX channel is > > > really the > > > SCMI "a2p" bidirectional channel and it is associated with just > > > only one shmem > > > area used to hold both the egressing CMD and to receive the > > > incoming REPLY > > > from the platform: so if you let the controller tick the queues as > > > soon as you > > > received the TXAck you are telling the mailbox subsystem to queue > > > another message > > > on the same area while you are not really done, since only the > > > client know > > > when it's done with the whole SCMI transaction and the reply has > > > been fetched. > > > > > > Indeed, for these reasons we have the BUSY/FREE bit in the shmem to > > > protect it > > > from pending new requests until the previous one has completed, but > > > when the > > > waited-for reply comes in, the platform would have cleared the BUSY > > > bit and > > > let the new queued message overwrite the pending reply prematurely, > > > and one > > > message is lost... > > > > > > ...but as said I want to delve deeper into this, as of now just > > > suppositions > > > and maybe I am just missing something more that has to be > > > configured > > > properly... > > > > > > Thanks, > > > Cristian > > > > > > > Hi again :D, > > > > so articulating more on my supposition that TxAck-capable mailboxes > > and > > SCMI do not play well together (and would not be worth either > > really...) > > > > Consider the following scenario. > > > > 1. scmi: mbox_send_message(X) is called from SCMI stack to send mesg- > > X > > on the a2p channel (a command) > > > > 2. mbox: mesg-X is > > 2a. queued by mbox subsystem [add_to_rbuf(X)] > > 2b. submitted for transmission [msg_submit(X)] > > 2c. prepared by SCMI clk->tx_prepare > > 2d. finally sent via mhu driver .send_data > > 2e. mesg-X is now an active_req for mbox and in-flight for SCMI > > > > 3. scmi: ANOTHER mesg-Y tx is attempted via mbox_send_message(Y) > > > > 4. mbox: mesg-Y is > > 4a. queued by mbox_subsys [add_to_rbuf()] > > 4b. NOT submitted since there is already an active_req=mesg-X > > pending > > > > Any further SCMI mesg TX attempt will behave similarly: > > queued/not_submitted > > till at some in the future someone calls the txdone routines, which > > in turn > > calls tx_tick()...this SOMEONE can be the client, like it is now, or > > the > > controller if it is configured to use the TxAck IRQ (as per-your- > > patch)... > > ...so lets see what happen in your TxAck enabled case. > > > > 5. TxAck IRQ received, transmission of mesg-X has been successfull > > (NOTE that SCMI at this point is still waiting for a reply to mesg- > > X..) > > > > 5a. controller calls mbox_chan_txdone() > > 5b. mbox in turn calls tx_tick > > 5c. active_req is cleared and next queued mesg-Y is submitted > > 5d. mesg-Y transmission gets anyway stuck on cl->tx_prepare since > > we check the SCMI shmem BUSY bit and busy-loop there till it > > clears: this clearing can happen ONLY after the mesg-X reply > > has come through, since it is the platform SCMI server that > > clears it having delivered the reply in the shmem. > > > > 6. platform SCMI server replies to mesg-X finally: > > 6a. platform writes reply in shmem > > 6b. platform clears BUSY bit > > > > -- note SCMI stack is still waiting for a reply at this point... > > so waiting for an IRQ OR by simply spinning on that same BUSY > > bit > > if polling mode was requested for the transaction.... > > > > ...lets assume you are in IRQ mode: > > > > 7. mesg-Y sender which was spinning on BUSY bit (blocked on > > tx_prepare) > > is immediately cleared to send and so tx_prepare can proceed > > further > > and completely overwrite the just received mesg-X, which is now > > LOST > > > > ..in case you were polling I guess you will have anyway some > > corruption > > due to the race between the polling-mesg-X-receiver retrieving the > > reply > > and the same tx_prepare codeflow as above... > > > > Indeed, the spec says that you should protect the channel till the > > reply > > has been retrieved from the SCMI (even after the BUSY bit is > > cleared), and > > in other transport we DO have some form of locks, BUT here in > > mailboxes > > there is not since it is NOT needed IF you stick to the non-TxAck > > original > > behaviour, since the tx_tick, as it is now, will be run by the SCMI > > stack > > ONLY after it has waited for mesg-X and retrieved the mesg-X-reply > > payload > > ...not before. > > > > Instead, if you enable the TxAck mode you are basically letting the > > controller > > itself issue the tx_tick(), which means "previous TX is done, please > > proceed > > with the next", BUT the current TX is really NOT done at all as > > intended > > by the client (SCMI), since the reply is missing and the only entity > > which > > can have the whole picture about when a transaction is completed (or > > timed-out) > > is the SCMI client. > > > > As said, I think the fundamental clash is between what the mailbox > > subsystem considers a TXDONE event (and related actions) and what > > instead is considered a completed transaction on the SCMI a2p > > channel: > > i.e. CMD_sent + REPLY_retrieved. > > > > At the end, anyway, would it be worth in any way to leverage such > > TxAck > > capabilities (somehow) of a mailbox in the SCMI world ? > > > > I mean, even if we make this work, what is supposed to happen better > > and faster > > when using a TxAck instead of a TX_polling mode like it is now ? > > > > ...because the SCMI stack cannot really do anything with this > > information in > > this case, given that there is just one single a2p_shem area for > > sending command > > and receiving replies...it has just to wait anyway even after the > > TxAck... > > > > ..maybe it is a bit more power-favourable to sleep_wait for the TxAck > > IRQ > > instead of polling the MHU regs ?... other than this the TxAck means > > nothing > > really in the context of the SCMI world, since you cannot safely > > queue anything > > more till the previous exchange has fully completed... > > > > ...in other non-SCMI scenarios that I experimented with, it really > > makes a > > difference havig the TxAck since it avoids all the internal > > polling/requeueing > > dance in the mailbox subsystem, but in this case I think is all made > > useless by > > the way SCMI/SMT based transport works...i.e. using a single shmem > > area for > > the a2p channel.. > > > > ...not really a short and sweet email... :P ... any feedback or > > further > > ideas are welcome anyway...especially if you can prove that all of > > the > > above is somehow wrong, or that there is a good reason to leverage > > the > > TxAck capable mailboxes :D > > > > Thanks, > > Cristian > > > ************* MEDIATEK Confidentiality Notice ******************** > The information contained in this e-mail message (including any > attachments) may be confidential, proprietary, privileged, or otherwise > exempt from disclosure under applicable laws. It is intended to be > conveyed only to the designated recipient(s). Any use, dissemination, > distribution, printing, retaining or copying of this e-mail (including its > attachments) by unintended recipient(s) is strictly prohibited and may > be unlawful. If you are not an intended recipient of this e-mail, or believe > that you have received this e-mail in error, please notify the sender > immediately (by replying to this e-mail), delete any and all copies of > this e-mail (including any attachments) from your system, and do not > disclose the content of this e-mail to any other person. Thank you!
© 2016 - 2024 Red Hat, Inc.