net: rps: avoid raising a softirq on the current cpu when scheduling napi

[PATCH v2 net] net: rps: avoid raising a softirq on the current cpu when scheduling napi

Posted by Jason Xing 2 years, 10 months ago

From: Jason Xing <kernelxing@tencent.com>

When we are scheduling napi and then RPS decides to put the skb into
a backlog queue of another cpu, we shouldn't raise the softirq for
the current cpu. When to raise a softirq is based on whether we have
more data left to process later. But apparently, as to the current
cpu, there is no indication of more data enqueued, so we do not need
this action. After enqueuing to another cpu, net_rx_action() or
process_backlog() will call ipi and then another cpu will raise the
softirq as expected.

Also, raising more softirqs which set the corresponding bit field
can make the IRQ mechanism think we probably need to start ksoftirqd
on the current cpu. Actually it shouldn't happen.

Here are some codes to clarify how it can trigger ksoftirqd:
__do_softirq()
  [1] net_rx_action() -> enqueue_to_backlog() -> raise an IRQ
  [2] check if pending is set again -> wakeup_softirqd

Comments on above:
[1] when RPS chooses another cpu to enqueue skb
[2] in __do_softirq() it will wait a little bit of time around 2 jiffies

In this patch, raising an IRQ can be avoided when RPS enqueues the skb
into another backlog queue not the current one.

I captured some data when starting one iperf3 process and found out
we can reduces around ~1500 times/sec at least calling
__raise_softirq_irqoff().

Fixes: 0a9627f2649a ("rps: Receive Packet Steering")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
v2:
1) change the title and add more details.
2) add one parameter to recognise whether it is napi or non-napi case
suggested by Eric.
Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
---
 net/core/dev.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1518a366783b..504dc3fc09b1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4586,7 +4586,7 @@ static void trigger_rx_softirq(void *data)
  * If yes, queue it to our IPI list and return 1
  * If no, return 0
  */
-static int napi_schedule_rps(struct softnet_data *sd)
+static int napi_schedule_rps(struct softnet_data *sd, bool napi)
 {
 	struct softnet_data *mysd = this_cpu_ptr(&softnet_data);
 
@@ -4594,8 +4594,9 @@ static int napi_schedule_rps(struct softnet_data *sd)
 	if (sd != mysd) {
 		sd->rps_ipi_next = mysd->rps_ipi_list;
 		mysd->rps_ipi_list = sd;
+		if (!napi)
+			__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 
-		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 		return 1;
 	}
 #endif /* CONFIG_RPS */
@@ -4648,7 +4649,7 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
  * queue (may be a remote CPU queue).
  */
 static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
-			      unsigned int *qtail)
+			      unsigned int *qtail, bool napi)
 {
 	enum skb_drop_reason reason;
 	struct softnet_data *sd;
@@ -4675,7 +4676,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 		 * We can use non atomic operation since we own the queue lock
 		 */
 		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state))
-			napi_schedule_rps(sd);
+			napi_schedule_rps(sd, napi);
 		goto enqueue;
 	}
 	reason = SKB_DROP_REASON_CPU_BACKLOG;
@@ -4933,7 +4934,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 		if (cpu < 0)
 			cpu = smp_processor_id();
 
-		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail, false);
 
 		rcu_read_unlock();
 	} else
@@ -4941,7 +4942,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 	{
 		unsigned int qtail;
 
-		ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail);
+		ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail, false);
 	}
 	return ret;
 }
@@ -5670,7 +5671,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 		int cpu = get_rps_cpu(skb->dev, skb, &rflow);
 
 		if (cpu >= 0) {
-			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail, false);
 			rcu_read_unlock();
 			return ret;
 		}
@@ -5705,7 +5706,7 @@ void netif_receive_skb_list_internal(struct list_head *head)
 			if (cpu >= 0) {
 				/* Will be handled, remove from list */
 				skb_list_del_init(skb);
-				enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+				enqueue_to_backlog(skb, cpu, &rflow->last_qtail, true);
 			}
 		}
 	}
-- 
2.37.3

Re: [PATCH v2 net] net: rps: avoid raising a softirq on the current cpu when scheduling napi

Posted by Eric Dumazet 2 years, 10 months ago

On Tue, Mar 28, 2023 at 4:21 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> When we are scheduling napi and then RPS decides to put the skb into
> a backlog queue of another cpu, we shouldn't raise the softirq for
> the current cpu. When to raise a softirq is based on whether we have
> more data left to process later. But apparently, as to the current
> cpu, there is no indication of more data enqueued, so we do not need
> this action. After enqueuing to another cpu, net_rx_action() or
> process_backlog() will call ipi and then another cpu will raise the
> softirq as expected.
>
> Also, raising more softirqs which set the corresponding bit field
> can make the IRQ mechanism think we probably need to start ksoftirqd
> on the current cpu. Actually it shouldn't happen.
>
> Here are some codes to clarify how it can trigger ksoftirqd:
> __do_softirq()
>   [1] net_rx_action() -> enqueue_to_backlog() -> raise an IRQ
>   [2] check if pending is set again -> wakeup_softirqd
>
> Comments on above:
> [1] when RPS chooses another cpu to enqueue skb
> [2] in __do_softirq() it will wait a little bit of time around 2 jiffies
>
> In this patch, raising an IRQ can be avoided when RPS enqueues the skb
> into another backlog queue not the current one.
>
> I captured some data when starting one iperf3 process and found out
> we can reduces around ~1500 times/sec at least calling
> __raise_softirq_irqoff().
>
> Fixes: 0a9627f2649a ("rps: Receive Packet Steering")

No Fixes: tag, when you are trying to optimize things, and so far fail at this.

> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> v2:
> 1) change the title and add more details.
> 2) add one parameter to recognise whether it is napi or non-napi case
> suggested by Eric.
> Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
> ---

Wrong again.

I think I will send a series, instead of you trying so hard to break the stack.

You have not considered busy polling, and that netif_receive_skb() contract
does not enforce it to be called from net_rx_action().