From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50AD3C43217 for ; Thu, 1 Dec 2022 18:12:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230158AbiLASML (ORCPT ); Thu, 1 Dec 2022 13:12:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229839AbiLASMG (ORCPT ); Thu, 1 Dec 2022 13:12:06 -0500 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D16AFB7DE8 for ; Thu, 1 Dec 2022 10:12:05 -0800 (PST) Received: by mail-il1-x132.google.com with SMTP id bp12so1089892ilb.9 for ; Thu, 01 Dec 2022 10:12:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8PHeEQvI03GO1u4arOs5eKGxrrWj+QcJXenB1+d+Ev4=; b=HZsYyn/z6EZxF36zedEHApAUBJ4P2dRhxWAAmEyxvmr310UQWzVRPVbApPIsSlk1o/ rgSJjhyFcbZgkEzBYSXVuoKf9N6ik+dqn61leEofV6Rr7hRtWk0lyGfnaKdO0PG7+u/+ 44cZp9cYXcWPQk/YuxfkuaNFMmi/6uKPYn8HnxVOej4A7MQVSb/ZXS0V5Rpe6SQ0IeAo NSmxA1q2sZBjPXe37/6uKtE1H4SbpwEB8Gu314VMk29RlzXWaHSGKH7Ri0Yr5PRMbyvl m8y70Uk6erMJKkVghZF++Qv7+nRhEa2ncIOxeLZqW+GFWUPYdDyxCEI0gYmrEfKlxpaN VJ6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8PHeEQvI03GO1u4arOs5eKGxrrWj+QcJXenB1+d+Ev4=; b=gdVXWOslFx3XPFWIyyDUr6LrraTAK5/0KEl2e54RlRRIeakValmWUFELU3wNsVNe2y WrAFJsp4p4/PbVmfbLbYli3xf6rc3LsVPhm/IrM/H6j3OSwSwuQDLLSVNlPQHZo/kTIL 4cS3sQR4SGErpU0p4o4vbhCRezo6v2ldE4gzLt0mXEalQ1LXFrwH88NzebsdHFS1OD3K VMtA8M7oCgio+U5vvY8czYsR1q2qQaFb0RbOvsmLC7wkXa/BvlJS032OSIAwDE0e4fVJ gBcgG6IgxIYbAvi6KIwWJlmeVKgr+QNUACL+Je1mTB/B08RQIVxttl4SEbxmaXTI4sGw recQ== X-Gm-Message-State: ANoB5plN6ERdTZje4Mndf1lFITlil1IJDSp0QcKUPqg2JWLkNuUByVDf 4nvDQxyLBkNAorOs+2+kTykWR6eMeZKaOGBh X-Google-Smtp-Source: AA0mqf6kGjBTXK8PVOunig0m1HO0fqDO2b1pZHBDm6FXORezd3Vc7V2LpVfuCrs+dFZA/N6MzEk9+Q== X-Received: by 2002:a05:6e02:1251:b0:303:1c15:2818 with SMTP id j17-20020a056e02125100b003031c152818mr8384519ilq.87.1669918324962; Thu, 01 Dec 2022 10:12:04 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:04 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 1/7] eventpoll: cleanup branches around sleeping for events Date: Thu, 1 Dec 2022 11:11:50 -0700 Message-Id: <20221201181156.848373-2-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Rather than have two separate branches here, collapse them into a single one instead. No functional changes here, just a cleanup in preparation for changes in this area. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 52954d4637b5..3061bdde6cba 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1869,14 +1869,15 @@ static int ep_poll(struct eventpoll *ep, struct epo= ll_event __user *events, * important. */ eavail =3D ep_events_available(ep); - if (!eavail) + if (!eavail) { __add_wait_queue_exclusive(&ep->wq, &wait); - - write_unlock_irq(&ep->lock); - - if (!eavail) + write_unlock_irq(&ep->lock); timed_out =3D !schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS); + } else { + write_unlock_irq(&ep->lock); + } + __set_current_state(TASK_RUNNING); =20 /* --=20 2.35.1 From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07886C43217 for ; Thu, 1 Dec 2022 18:12:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230298AbiLASMN (ORCPT ); Thu, 1 Dec 2022 13:12:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229608AbiLASMH (ORCPT ); Thu, 1 Dec 2022 13:12:07 -0500 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6674B7DD3 for ; Thu, 1 Dec 2022 10:12:06 -0800 (PST) Received: by mail-io1-xd32.google.com with SMTP id z131so1580267iof.3 for ; Thu, 01 Dec 2022 10:12:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TCMSGu8wCAuxi+YEaeqt2Nm6ljarPNoUCebXSLfB1Fw=; b=7rToxcBAjNQi+teOR/HtEPsnLOYMOnWUeha2jHX+ubOPphUUlFzGLJ4Fe1IYPdO1VW enEBIs7LitVgRCKTuHrjuThBnVfxtBFEMCju3P6P9WcCV1t9LWJhlo3mkRwr6SrAwO1D yoqZbuamMTV2WlHMUkepdyL5Xqor2YcYSaaTktTtZNAFOZhGhF4FUnPijBD3hwA7DvNH RHAlKTRVQqxVvJvcBxfqIqMWd6mNhSUYAzvuqN6ViWqrv4BAvQm18wBgyzIwfR7zpR7X 1mkSD3MYCxFWbk4qT1FMG9SddowhtGk3HGi6pPUSrixd5nXBqwQXeQ8hhBpKqqvpGASM IBMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TCMSGu8wCAuxi+YEaeqt2Nm6ljarPNoUCebXSLfB1Fw=; b=Cxw28E7pwoq4K4BrdIEYGP8RCkV2mxuTai8ZDoofl888XNGfS1hQeImir9B0cNJdu6 h+QH8Z/6AROCUpBO0mDMNb7T95bXZQG/Hvzn+OkorcRrLY1WUrAQguIMWyXMjeWbi4ra zB/N6EG45OIJTFjiukEfdziTek6pW3sVn1fwV7a4WIyZe0tAUcb24IAgXP3lGhHEeeJF W0qhCDMI47MQB32k5/gnGjf6AjiwWVZ/8crpYsCepp8eVC/cuUK2bkL+COF+mAS+CHk+ mJPs3ahhmhcbb2dc62bwuWCf6eJCBURJgrn29GFE2pSo/S2D1edaKAKHv1gltL0grySA tb4w== X-Gm-Message-State: ANoB5pnn7pkEG/qXzzkzh+ia3h+NB8z7RPjYFmwnDSRvXRhAErBg1t0T ymPR1mrqL0epAW1vGL2TT0xgyaE2aYpw8N+w X-Google-Smtp-Source: AA0mqf6rBwwB4nUZFcqG3+zWfl/ciQn5NxuTGH2Wmrrxs/wUaiO8rXztJ5U/q7t0p/9YbpFDxZwY4A== X-Received: by 2002:a05:6638:15cc:b0:389:e983:dfd1 with SMTP id i12-20020a05663815cc00b00389e983dfd1mr8192666jat.306.1669918325874; Thu, 01 Dec 2022 10:12:05 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:05 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 2/7] eventpoll: don't pass in 'timed_out' to ep_busy_loop() Date: Thu, 1 Dec 2022 11:11:51 -0700 Message-Id: <20221201181156.848373-3-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It's known to be 'false' from the one call site we have, as we break out of the loop if it's not. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3061bdde6cba..64d7331353dd 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -396,12 +396,12 @@ static bool ep_busy_loop_end(void *p, unsigned long s= tart_time) * * we must do our busy polling with irqs enabled */ -static bool ep_busy_loop(struct eventpoll *ep, int nonblock) +static bool ep_busy_loop(struct eventpoll *ep) { unsigned int napi_id =3D READ_ONCE(ep->napi_id); =20 if ((napi_id >=3D MIN_NAPI_ID) && net_busy_loop_on()) { - napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false, + napi_busy_loop(napi_id, ep_busy_loop_end, ep, false, BUSY_POLL_BUDGET); if (ep_events_available(ep)) return true; @@ -453,7 +453,7 @@ static inline void ep_set_busy_poll_napi_id(struct epit= em *epi) =20 #else =20 -static inline bool ep_busy_loop(struct eventpoll *ep, int nonblock) +static inline bool ep_busy_loop(struct eventpoll *ep) { return false; } @@ -1826,7 +1826,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, if (timed_out) return 0; =20 - eavail =3D ep_busy_loop(ep, timed_out); + eavail =3D ep_busy_loop(ep); if (eavail) continue; =20 --=20 2.35.1 From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85A0FC43217 for ; Thu, 1 Dec 2022 18:12:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230366AbiLASMS (ORCPT ); Thu, 1 Dec 2022 13:12:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230164AbiLASMH (ORCPT ); Thu, 1 Dec 2022 13:12:07 -0500 Received: from mail-il1-x12a.google.com (mail-il1-x12a.google.com [IPv6:2607:f8b0:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A14EB7DDD for ; Thu, 1 Dec 2022 10:12:07 -0800 (PST) Received: by mail-il1-x12a.google.com with SMTP id m15so1097270ilq.2 for ; Thu, 01 Dec 2022 10:12:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=d8OpHlPRST08skeh7BHSZbFN09k6AB6QzxLidf3Pxy0=; b=AS0uFZT7AMQnrCsnu0myce/68PNqhpRsA5u5f4LRNoWMlDHGfdkpnX4Bi7cN5cdYGb xqdvKOmhCzI+5xIdNOa7IXXJKN76QUhQ5bEE0TGc9TT1CzwQWBwxaBo9dcoAm2EXczuO BWbTIe5bxzNBixBo+grxsNP/S6uE+oBHuP7FwcSGjYpqt9pNgmik/w/epK7az0/nTi6J SU1JdTULxEiQvzxCmGA4dAqUAdR7cFcERSrlB3wYXqjxQcTWauNZkbX3up+4lfaLMAno nD9XRVUYifvWGAzNsFQcvl/jDfoyDKtULpSDCOxy/rpRlexRJGGOuiApfmzFdFHr2iUL J55w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d8OpHlPRST08skeh7BHSZbFN09k6AB6QzxLidf3Pxy0=; b=pK6sOFTzgIU/vqAVuCSOipkY0uDCRd0YLzWusWPvM7sS0p8juvdMkKXF/ovetjuiBI gd+drR9dTr8VKwgKq1yTXp3XbUyqgobls6GC20RvST3PUFEwuflnd3K+VzzYfhQgi1hP vB/nhYn4DFlqU+RnhGfXWYn4FLoKZvsebMa41jcuigNQ4KjiWpmcdDXRdfZ9C7CZRiau BtIsAs/cMFHhWHPhfl1zF5Hs5g2vKO+B6kwIdS0oo0dn0gf0xjND2gNcrJNTuwhqlquO miXZvNWyEHZ2M8viatTe8tpfm/FyhMY4D9/CS+tPc5GCHGXVv8YbQKz1aOm29lmtirhV Whtw== X-Gm-Message-State: ANoB5pmHo03BWj47nD3D8nPW1whf6PBhS7c8xlI9JDY0Rb37kO2AjQjs p/iOz8Zg9XBCzlubQql8Tz08gBrCoFnuluCK X-Google-Smtp-Source: AA0mqf74rtsrKXEuoo2caUTbU7JNHXUXS1+uampBMqsW7Usq/+Os6d15vXnGS1Vl1qTyLEypX5XlLQ== X-Received: by 2002:a92:d689:0:b0:303:2806:1ca0 with SMTP id p9-20020a92d689000000b0030328061ca0mr5247364iln.247.1669918326690; Thu, 01 Dec 2022 10:12:06 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:06 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 3/7] eventpoll: split out wait handling Date: Thu, 1 Dec 2022 11:11:52 -0700 Message-Id: <20221201181156.848373-4-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In preparation for making changes to how wakeups and sleeps are done, move the timeout scheduling into a helper and manage it rather than rely on schedule_hrtimeout_range(). Signed-off-by: Jens Axboe --- fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 64d7331353dd..888f565d0c5f 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1762,6 +1762,47 @@ static int ep_autoremove_wake_function(struct wait_q= ueue_entry *wq_entry, return ret; } =20 +struct epoll_wq { + wait_queue_entry_t wait; + struct hrtimer timer; + bool timed_out; +}; + +static enum hrtimer_restart ep_timer(struct hrtimer *timer) +{ + struct epoll_wq *ewq =3D container_of(timer, struct epoll_wq, timer); + struct task_struct *task =3D ewq->wait.private; + + ewq->timed_out =3D true; + wake_up_process(task); + return HRTIMER_NORESTART; +} + +static void ep_schedule(struct eventpoll *ep, struct epoll_wq *ewq, ktime_= t *to, + u64 slack) +{ + if (ewq->timed_out) + return; + if (to && *to =3D=3D 0) { + ewq->timed_out =3D true; + return; + } + if (!to) { + schedule(); + return; + } + + hrtimer_init_on_stack(&ewq->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + ewq->timer.function =3D ep_timer; + hrtimer_set_expires_range_ns(&ewq->timer, *to, slack); + hrtimer_start_expires(&ewq->timer, HRTIMER_MODE_ABS); + + schedule(); + + hrtimer_cancel(&ewq->timer); + destroy_hrtimer_on_stack(&ewq->timer); +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-suppl= ied * event buffer. @@ -1782,13 +1823,15 @@ static int ep_autoremove_wake_function(struct wait_= queue_entry *wq_entry, static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, int maxevents, struct timespec64 *timeout) { - int res, eavail, timed_out =3D 0; + int res, eavail; u64 slack =3D 0; - wait_queue_entry_t wait; ktime_t expires, *to =3D NULL; + struct epoll_wq ewq; =20 lockdep_assert_irqs_enabled(); =20 + ewq.timed_out =3D false; + if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack =3D select_estimate_accuracy(timeout); to =3D &expires; @@ -1798,7 +1841,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, * Avoid the unnecessary trip to the wait queue loop, if the * caller specified a non blocking operation. */ - timed_out =3D 1; + ewq.timed_out =3D true; } =20 /* @@ -1823,7 +1866,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, return res; } =20 - if (timed_out) + if (ewq.timed_out) return 0; =20 eavail =3D ep_busy_loop(ep); @@ -1850,8 +1893,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, * performance issue if a process is killed, causing all of its * threads to wake up without being removed normally. */ - init_wait(&wait); - wait.func =3D ep_autoremove_wake_function; + init_wait(&ewq.wait); + ewq.wait.func =3D ep_autoremove_wake_function; =20 write_lock_irq(&ep->lock); /* @@ -1870,10 +1913,9 @@ static int ep_poll(struct eventpoll *ep, struct epol= l_event __user *events, */ eavail =3D ep_events_available(ep); if (!eavail) { - __add_wait_queue_exclusive(&ep->wq, &wait); + __add_wait_queue_exclusive(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); - timed_out =3D !schedule_hrtimeout_range(to, slack, - HRTIMER_MODE_ABS); + ep_schedule(ep, &ewq, to, slack); } else { write_unlock_irq(&ep->lock); } @@ -1887,7 +1929,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, */ eavail =3D 1; =20 - if (!list_empty_careful(&wait.entry)) { + if (!list_empty_careful(&ewq.wait.entry)) { write_lock_irq(&ep->lock); /* * If the thread timed out and is not on the wait queue, @@ -1896,9 +1938,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, * Thus, when wait.entry is empty, it needs to harvest * events. */ - if (timed_out) - eavail =3D list_empty(&wait.entry); - __remove_wait_queue(&ep->wq, &wait); + if (ewq.timed_out) + eavail =3D list_empty(&ewq.wait.entry); + __remove_wait_queue(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); } } --=20 2.35.1 From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FF17C4321E for ; Thu, 1 Dec 2022 18:12:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229705AbiLASMZ (ORCPT ); Thu, 1 Dec 2022 13:12:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230218AbiLASMI (ORCPT ); Thu, 1 Dec 2022 13:12:08 -0500 Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BA7AA1C0A for ; Thu, 1 Dec 2022 10:12:08 -0800 (PST) Received: by mail-io1-xd2e.google.com with SMTP id n188so1570042iof.8 for ; Thu, 01 Dec 2022 10:12:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+7XieXpcJrzhapQHtWCNoW0rQy5yiKhkq34SDOyw0xw=; b=rCV2KahuRSA2UomgHjcLL2MPSHWQzvEOZHnqXl9CNih1pE+gZRpjzdYg3AGoNxGtH/ lrwbTserdKCot3tq/5ePqPYHCEDByLsr5EJeyAa3WUpPRYldyDwDb72H4U+e7EULEDfX YOz044AliNTRJSeVBbissUP76TH1dDpAxrwe5pgd94G4wvCJ4IeVnDfJhMx47sKWeWPs 0C7kWE1xTeT+Dx73CkQ/oLdCov1SSDwhGCyL0CKWRohvyTJvqo4qo5nur82aLKeNrnxf rQRbX7ukSMrqtcrZIvRHBe9oddDEKsDHVoB5vCoS6H8vIAXT8EF6vPM5WVzceR1ifqHH kwow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+7XieXpcJrzhapQHtWCNoW0rQy5yiKhkq34SDOyw0xw=; b=ECi0HhtZe8LEBgsi6nEXZ2dzxsh3uzYYE7KRVome7IhNOqEutPwIdNHRjHihNtaILC GukkJjarid8/Sq/bRhfOxqlVMYLZWzTQ39wAMTrnrQp26nnAmG6YQvbhAIRZSd5bAh3e s+OA1K8DH/z2lG2jdpLQXz4gdhAtWAPiviJGSBH4Sv1zrlPSUtaueM9vkZsw4aCdCLWQ +iYFIvSIRtquRc80ZwWpH6uxMZ/JUFkm4DUJ8Wgs6XQtISGIPhpWwGStOuD/EZ4WjIg+ VhTRlCJKWQnkDkpFgMGvIk3lbtzZLIucBQ3FZthHauWelHI8pPbAv9QmHAyZq+LHl2U+ +aVA== X-Gm-Message-State: ANoB5plKH2CiAggMn5sW2lsOfyuajFY+c5i1S66NkZZDPWbxcvbFLNqI JeBlwhz+YwE6EoDYBJk7c9kbfcO96Knf9rLM X-Google-Smtp-Source: AA0mqf74WtcWLOBknC51Eqq2LhOOyGElfDAy+jSdzXDD60HRlSEZEkNsL3iTmtYnZTNrl5XL3BCV9A== X-Received: by 2002:a02:a710:0:b0:389:d089:4233 with SMTP id k16-20020a02a710000000b00389d0894233mr12448957jam.18.1669918327697; Thu, 01 Dec 2022 10:12:07 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:07 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 4/7] eventpoll: move expires to epoll_wq Date: Thu, 1 Dec 2022 11:11:53 -0700 Message-Id: <20221201181156.848373-5-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This makes the expiration available to the wakeup handler. No functional changes expected in this patch, purely in preparation for being able to use the timeout on the wakeup side. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 888f565d0c5f..0994f2eb6adc 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1765,6 +1765,7 @@ static int ep_autoremove_wake_function(struct wait_qu= eue_entry *wq_entry, struct epoll_wq { wait_queue_entry_t wait; struct hrtimer timer; + ktime_t timeout_ts; bool timed_out; }; =20 @@ -1825,7 +1826,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, { int res, eavail; u64 slack =3D 0; - ktime_t expires, *to =3D NULL; + ktime_t *to =3D NULL; struct epoll_wq ewq; =20 lockdep_assert_irqs_enabled(); @@ -1834,7 +1835,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, =20 if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack =3D select_estimate_accuracy(timeout); - to =3D &expires; + to =3D &ewq.timeout_ts; *to =3D timespec64_to_ktime(*timeout); } else if (timeout) { /* --=20 2.35.1 From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 591B4C43217 for ; Thu, 1 Dec 2022 18:12:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230386AbiLASMa (ORCPT ); Thu, 1 Dec 2022 13:12:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230359AbiLASMS (ORCPT ); Thu, 1 Dec 2022 13:12:18 -0500 Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4B4FB8455 for ; Thu, 1 Dec 2022 10:12:09 -0800 (PST) Received: by mail-io1-xd2c.google.com with SMTP id e189so1586847iof.1 for ; Thu, 01 Dec 2022 10:12:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T2Hzb+xxyKnPzMsKtR+WvmF1ywGYO4tzf59V+gc/0/s=; b=2atU+qY8N6U1TWuEjoXo0Ek+wKx6gWrLYp8D9cZz1SEd++H+eWVuXdr3WUmYAaBF3h CXDqX2fX2YFDXbnYSeIWfgeuJyAoP/MnjAsqnXzRedinp+W7XDc7ppleItMH3Ati2r1r pWU+OXSKGhu/TbrMFfI7gUobNXqYO3tuC+SBghlInhOM9GAdacLA5zPErWxgwlToLDPk VLBDRdh/ETAMA76uckSDzQRnLzivJQnR0ZzmDxBpX9FrTbWFBuexQuphd37ILJwWHd0E /cfJYCD7LwtgUHaWTI2vY2Waup5fInS5DVQuUsNCxIiaNr7qjw7Sm35SjXYyN9jMkbTq pfSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T2Hzb+xxyKnPzMsKtR+WvmF1ywGYO4tzf59V+gc/0/s=; b=olwCyHB2iE2I/lFNwSoWDgEQmxl1GXmyog7O4fQMkG2pLYWJ/O4snC+Rp5TczHbXsJ lFmGCv+0T+tMDxoCVqLZ7O5wKhThgHcEUZcr/vJrxjfqg1SjYmNwacKFxz54V08bZUkB Q8H4P8zHATkFJ/NBm+ib4KAuJKW9BkoZ26WTN2+aaTZoqv6IM9PolwZ+7Ft4i71nEk77 A23DAcSJENn3VI6azBaiSTlGxPnTcwg/jdLRri2TJEU75OHYelkg+Ct9KjcqH62ySarL jTmkmm2Hn1w+HUtsSuEB/bsJum8M53xhcHaL02uliOPek/vT96GnBFyIfejUj6ieO8w3 p9vg== X-Gm-Message-State: ANoB5pn+cpGTZbdsuQdXL6AC9ypvPJS5SJh+uLBjC2fADYgJSj2Omw8A Y7DkrOnjZFrmti8SKksWPn844GbmS62ffDzl X-Google-Smtp-Source: AA0mqf71c/tcCYkCrnJEPuvS6YTKCor+CO40ni3Rrsmti3mZrEckPPOVEKukp2kBF215wOaXLqH5OQ== X-Received: by 2002:a02:16c8:0:b0:38a:c4d:931f with SMTP id a191-20020a0216c8000000b0038a0c4d931fmr3207615jaa.176.1669918328499; Thu, 01 Dec 2022 10:12:08 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:08 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 5/7] eventpoll: move file checking earlier for epoll_ctl() Date: Thu, 1 Dec 2022 11:11:54 -0700 Message-Id: <20221201181156.848373-6-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This just cleans up the checking a bit, in preparation for a change that will need access to 'ep' earlier. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 0994f2eb6adc..962d897bbfc6 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2111,6 +2111,20 @@ int do_epoll_ctl(int epfd, int op, int fd, struct ep= oll_event *epds, if (!f.file) goto error_return; =20 + /* + * We have to check that the file structure underneath the file + * descriptor the user passed to us _is_ an eventpoll file. + */ + error =3D -EINVAL; + if (!is_file_epoll(f.file)) + goto error_fput; + + /* + * At this point it is safe to assume that the "private_data" contains + * our own data structure. + */ + ep =3D f.file->private_data; + /* Get the "struct file *" for the target file */ tf =3D fdget(fd); if (!tf.file) @@ -2126,12 +2140,10 @@ int do_epoll_ctl(int epfd, int op, int fd, struct e= poll_event *epds, ep_take_care_of_epollwakeup(epds); =20 /* - * We have to check that the file structure underneath the file descriptor - * the user passed to us _is_ an eventpoll file. And also we do not permit - * adding an epoll file descriptor inside itself. + * We do not permit adding an epoll file descriptor inside itself. */ error =3D -EINVAL; - if (f.file =3D=3D tf.file || !is_file_epoll(f.file)) + if (f.file =3D=3D tf.file) goto error_tgt_fput; =20 /* @@ -2147,12 +2159,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct ep= oll_event *epds, goto error_tgt_fput; } =20 - /* - * At this point it is safe to assume that the "private_data" contains - * our own data structure. - */ - ep =3D f.file->private_data; - /* * When we insert an epoll file descriptor inside another epoll file * descriptor, there is the chance of creating closed loops, which are --=20 2.35.1 From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91C04C43217 for ; Thu, 1 Dec 2022 18:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231164AbiLASMg (ORCPT ); Thu, 1 Dec 2022 13:12:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230197AbiLASMV (ORCPT ); Thu, 1 Dec 2022 13:12:21 -0500 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21C80B845D for ; Thu, 1 Dec 2022 10:12:10 -0800 (PST) Received: by mail-io1-xd32.google.com with SMTP id i80so1606003ioa.0 for ; Thu, 01 Dec 2022 10:12:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mZ9tl6KkNetvRAWXe2iAAf/ML4fBDffEt0E3CXrPzCE=; b=l4wGJmKVt1fPfhJvQBLb5fQkAKw+lUzYeSTILSbU6Uq+yHzEtXv1G/VdU4rETW2k0a WgEKMyF1NjMLgqGbaejKRlOXnP1hSlHSJYkJdgZAmu+SPy+TmPHkrSW13WnaxD5fgD8+ IhvSF9aaSloA3mi1nq9od2wRv3ISoh12Uxm8LSdCFB7H3nEo8Iwfxn4v5r4LfT0JE7yD fnOP0lo0vweKMI4qLFDYINpw9TyXqgvMlnIhKZAEKAmKmUY8fq4/H6wW88GH4paIE4Xr UTHG/9PwV1oEImy4nEfT+GcIgYvHe9ohvof3nu5pVjQ0sd3+wMNkUItC4k8sRXIrWnBp n/Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mZ9tl6KkNetvRAWXe2iAAf/ML4fBDffEt0E3CXrPzCE=; b=OhDoTtCEk2enzFGJn+i8gZrkXZ87NfdC6d4eTLcOGSeFU+0BgFoywLVI971LrNHN/F ME6+x1+YoDikahn+BprfMo2p4ML0i41TawqtlY9FwYk6BRmyxSxhUZBr3K0bjidJHQy1 cS1vaFPIqEXuT7r2FO61CH9+8/RsDCJjxMyVh9/2blv/HUiyMu9Dm2e9+z4v0q4apAlP 7awNbEL4qKmFvzKr9QSalXrLMYm/YV61UVlp2+CX5SjrxoBSP+2gJvpAiIfd7oIZtNbz w01ZKmlOkzGK7urCCcE7WSFCDVMPTebRa6kAfm78D4gDi/Sir9KsFgITwwd5p7dUVPos gQRw== X-Gm-Message-State: ANoB5plkXyF6vdH9O48Z1vHaQCWZ6EB6h+m3evJuEHo2WCnOR0uQ/ZqX zFKw5yFNYrW5uitwiK/+ILw9qwadSX2RHHfj X-Google-Smtp-Source: AA0mqf5zLJz/mgwfBZKobgUjeWoWZs1zCAXYSpLS0zfZhZNhS+w+fzJrMS6nQ2st8FUfaz6HwlAScw== X-Received: by 2002:a6b:fb13:0:b0:6de:383e:4146 with SMTP id h19-20020a6bfb13000000b006de383e4146mr24784036iog.48.1669918329401; Thu, 01 Dec 2022 10:12:09 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:08 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 6/7] eventpoll: add support for min-wait Date: Thu, 1 Dec 2022 11:11:55 -0700 Message-Id: <20221201181156.848373-7-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This adds the necessary infrastructure to support a minimum wait for reaping events, API for setting or applying a minimum wait will come in the following patches. For medium workload efficiencies, some production workloads inject artificial timers or sleeps before calling epoll_wait() to get better batching and higher efficiencies. While this does help, it's not as efficient as it could be. By adding support for epoll_wait() for this directly, we can avoids extra context switches and scheduler and timer overhead. As an example, running an AB test on an identical workload at about ~370K reqs/second, without this change and with the sleep hack mentioned above (using 200 usec as the timeout), we're doing 310K-340K non-voluntary context switches per second. Idle CPU on the host is 27-34%. With the the sleep hack removed and epoll set to the same 200 usec value, we're handling the exact same load but at 292K-315k non-voluntary context switches and idle CPU of 33-41%, a substantial win. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 84 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 71 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 962d897bbfc6..daa9885d9c2b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -117,6 +117,9 @@ struct eppoll_entry { /* The "base" pointer is set to the container "struct epitem" */ struct epitem *base; =20 + /* min wait time if (min_wait_ts) & 1 !=3D 0 */ + ktime_t min_wait_ts; + /* * Wait queue item that will be linked to the target file wait * queue head. @@ -217,6 +220,9 @@ struct eventpoll { u64 gen; struct hlist_head refs; =20 + /* min wait for epoll_wait() */ + unsigned int min_wait_ts; + #ifdef CONFIG_NET_RX_BUSY_POLL /* used to track busy poll napi_id */ unsigned int napi_id; @@ -1747,6 +1753,32 @@ static struct timespec64 *ep_timeout_to_timespec(str= uct timespec64 *to, long ms) return to; } =20 +struct epoll_wq { + wait_queue_entry_t wait; + struct hrtimer timer; + ktime_t timeout_ts; + ktime_t min_wait_ts; + struct eventpoll *ep; + bool timed_out; + int maxevents; + int wakeups; +}; + +static bool ep_should_min_wait(struct epoll_wq *ewq) +{ + if (ewq->min_wait_ts & 1) { + /* just an approximation */ + if (++ewq->wakeups >=3D ewq->maxevents) + goto stop_wait; + if (ktime_before(ktime_get_ns(), ewq->min_wait_ts)) + return true; + } + +stop_wait: + ewq->min_wait_ts &=3D ~(u64) 1; + return false; +} + /* * autoremove_wake_function, but remove even on failure to wake up, becaus= e we * know that default_wake_function/ttwu will only fail if the thread is al= ready @@ -1756,27 +1788,37 @@ static struct timespec64 *ep_timeout_to_timespec(st= ruct timespec64 *to, long ms) static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned int mode, int sync, void *key) { - int ret =3D default_wake_function(wq_entry, mode, sync, key); + struct epoll_wq *ewq =3D container_of(wq_entry, struct epoll_wq, wait); + int ret; =20 + /* + * If min wait time hasn't been satisfied yet, keep waiting + */ + if (ep_should_min_wait(ewq)) + return 0; + + ret =3D default_wake_function(wq_entry, mode, sync, key); list_del_init(&wq_entry->entry); return ret; } =20 -struct epoll_wq { - wait_queue_entry_t wait; - struct hrtimer timer; - ktime_t timeout_ts; - bool timed_out; -}; - static enum hrtimer_restart ep_timer(struct hrtimer *timer) { struct epoll_wq *ewq =3D container_of(timer, struct epoll_wq, timer); struct task_struct *task =3D ewq->wait.private; + const bool is_min_wait =3D ewq->min_wait_ts & 1; + + if (!is_min_wait || ep_events_available(ewq->ep)) { + if (!is_min_wait) + ewq->timed_out =3D true; + ewq->min_wait_ts &=3D ~(u64) 1; + wake_up_process(task); + return HRTIMER_NORESTART; + } =20 - ewq->timed_out =3D true; - wake_up_process(task); - return HRTIMER_NORESTART; + ewq->min_wait_ts &=3D ~(u64) 1; + hrtimer_set_expires_range_ns(&ewq->timer, ewq->timeout_ts, 0); + return HRTIMER_RESTART; } =20 static void ep_schedule(struct eventpoll *ep, struct epoll_wq *ewq, ktime_= t *to, @@ -1831,12 +1873,16 @@ static int ep_poll(struct eventpoll *ep, struct epo= ll_event __user *events, =20 lockdep_assert_irqs_enabled(); =20 + ewq.min_wait_ts =3D 0; + ewq.ep =3D ep; + ewq.maxevents =3D maxevents; ewq.timed_out =3D false; + ewq.wakeups =3D 0; =20 if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack =3D select_estimate_accuracy(timeout); + ewq.timeout_ts =3D timespec64_to_ktime(*timeout); to =3D &ewq.timeout_ts; - *to =3D timespec64_to_ktime(*timeout); } else if (timeout) { /* * Avoid the unnecessary trip to the wait queue loop, if the @@ -1845,6 +1891,18 @@ static int ep_poll(struct eventpoll *ep, struct epol= l_event __user *events, ewq.timed_out =3D true; } =20 + /* + * If min_wait is set for this epoll instance, note the min_wait + * time. Ensure the lowest bit is set in ewq.min_wait_ts, that's + * the state bit for whether or not min_wait is enabled. + */ + if (!ewq.timed_out && ep->min_wait_ts) { + ewq.min_wait_ts =3D ktime_add_us(ktime_get_ns(), + ep->min_wait_ts); + ewq.min_wait_ts |=3D (u64) 1; + to =3D &ewq.min_wait_ts; + } + /* * This call is racy: We may or may not see events that are being added * to the ready list under the lock (e.g., in IRQ callbacks). For cases @@ -1913,7 +1971,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll= _event __user *events, * important. */ eavail =3D ep_events_available(ep); - if (!eavail) { + if (!eavail || ewq.min_wait_ts & 1) { __add_wait_queue_exclusive(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); ep_schedule(ep, &ewq, to, slack); --=20 2.35.1 From nobody Thu Sep 18 21:47:02 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 915FEC43217 for ; Thu, 1 Dec 2022 18:12:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231182AbiLASMj (ORCPT ); Thu, 1 Dec 2022 13:12:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230456AbiLASMX (ORCPT ); Thu, 1 Dec 2022 13:12:23 -0500 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7364B8474 for ; Thu, 1 Dec 2022 10:12:11 -0800 (PST) Received: by mail-io1-xd2b.google.com with SMTP id h184so1440208iof.10 for ; Thu, 01 Dec 2022 10:12:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hAGk4MFjlfW7Mob6ujVIOlyczZbXb2RyPAqfUdaDlEo=; b=Gnqziq8xRmWSBCtI5LbkgYdL2sMBi4ijnnNDCjyb1I6NyYRb4wWQRfi1KeWDcV6woX 2hBG2JA6tPkR84DL4X+rhLHSfSacd65MbGAg32TAPk9YlGxw9kcZgSYXFGNGCC0L5YgY +AuvkWHv/JFbbonRZzYiDMVHxoX9MAkrOh3UCZuIcqxQAktZruc/xq+PU1feO+g8bBS7 DfS7IYpIU+ZQSkmZeoRLSQwzB/++gpslbtWACrtpE7SHmjRbQrhtK/+tRdg7UXMzqpQw Uh8ymHGPUCn+JbALqvUi8JNEliEEGmRAUlCbzIKJhVyPNtyMey7JusEhVCtcfubPIMbD /pfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hAGk4MFjlfW7Mob6ujVIOlyczZbXb2RyPAqfUdaDlEo=; b=zgj1wCh4aHoa5oIhpeom0aGcnIx9NEyemYRp/oxLBKwCcBqozrywT4XZai4zGkqA+j 3GH1XpmLX8lG0p6poZd19Say2qvNYe3MYJbjn08c+YeWd7oJKW4NLOWufbHPcHbXbvrv uQU2WulNXdV/qpqeB33bf2eFMQ1bxBhF2qOPrWStBkcq9daXZYifT0VWvrOYF0zdZS15 +dZztPjsG0oZujtEQUW8PrMcYqSdxTEi9nK1BWSDe+pl2EeTfm3tBF/Rier8UlhFN7ps w8koqCvoE6QXdO/Cn7o1WSDNOP9UOZnSveR+MU/roMmlP2TRJgwPKRjFJivWOknHkLMK eJ0w== X-Gm-Message-State: ANoB5pmllp+IFcMq2c+M9oIijzJrqUb9rlaOvyLIKZoQh8vBIBfngn9A oTIjxAz6cdQ3MsAgf0py+ozSn9jjbfjAQ6QP X-Google-Smtp-Source: AA0mqf43cOjadjWzOUmOk1OU4HLtoIJvOE0jX9fTTnSB5KjGk2DM0Qq0yRWwkjs/cVASjL2efMOJVA== X-Received: by 2002:a05:6638:4709:b0:389:e195:e8fb with SMTP id cs9-20020a056638470900b00389e195e8fbmr10218197jab.254.1669918330191; Thu, 01 Dec 2022 10:12:10 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:09 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 7/7] eventpoll: add method for configuring minimum wait on epoll context Date: Thu, 1 Dec 2022 11:11:56 -0700 Message-Id: <20221201181156.848373-8-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add support for EPOLL_CTL_MIN_WAIT, which can be used to define a minimum reap time for an epoll context. Basic test case: struct d { int p1, p2; }; static void *fn(void *data) { struct d *d =3D data; char b =3D 0x89; /* Generate 2 events 20 msec apart */ usleep(10000); write(d->p1, &b, sizeof(b)); usleep(10000); write(d->p2, &b, sizeof(b)); return NULL; } int main(int argc, char *argv[]) { struct epoll_event ev, events[2]; pthread_t thread; int p1[2], p2[2]; struct d d; int efd, ret; efd =3D epoll_create1(0); if (efd < 0) { perror("epoll_create"); return 1; } if (pipe(p1) < 0) { perror("pipe"); return 1; } if (pipe(p2) < 0) { perror("pipe"); return 1; } ev.events =3D EPOLLIN; ev.data.fd =3D p1[0]; if (epoll_ctl(efd, EPOLL_CTL_ADD, p1[0], &ev) < 0) { perror("epoll add"); return 1; } ev.events =3D EPOLLIN; ev.data.fd =3D p2[0]; if (epoll_ctl(efd, EPOLL_CTL_ADD, p2[0], &ev) < 0) { perror("epoll add"); return 1; } /* always wait 200 msec for events */ ev.data.u64 =3D 200000; if (epoll_ctl(efd, EPOLL_CTL_MIN_WAIT, -1, &ev) < 0) { perror("epoll add set timeout"); return 1; } d.p1 =3D p1[1]; d.p2 =3D p2[1]; pthread_create(&thread, NULL, fn, &d); /* expect to get 2 events here rather than just 1 */ ret =3D epoll_wait(efd, events, 2, -1); printf("epoll_wait=3D%d\n", ret); return 0; } If EPOLL_CTL_MIN_WAIT is used with a timeout of 0, it is a no-op, and acts the same as if it wasn't called to begin with. Only a non-zero usec delay value will result in a wait time being applied for reaping events. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 13 ++++++++++++- include/linux/eventpoll.h | 2 +- include/uapi/linux/eventpoll.h | 1 + 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index daa9885d9c2b..ec7ffce8265a 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2183,6 +2183,17 @@ int do_epoll_ctl(int epfd, int op, int fd, struct ep= oll_event *epds, */ ep =3D f.file->private_data; =20 + /* + * Handle EPOLL_CTL_MIN_WAIT upfront as we don't need to care about + * the fd being passed in. + */ + if (op =3D=3D EPOLL_CTL_MIN_WAIT) { + /* return old value */ + error =3D ep->min_wait_ts; + ep->min_wait_ts =3D epds->data; + goto error_fput; + } + /* Get the "struct file *" for the target file */ tf =3D fdget(fd); if (!tf.file) @@ -2315,7 +2326,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, f= d, { struct epoll_event epds; =20 - if (ep_op_has_event(op) && + if ((ep_op_has_event(op) || op =3D=3D EPOLL_CTL_MIN_WAIT) && copy_from_user(&epds, event, sizeof(struct epoll_event))) return -EFAULT; =20 diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 3337745d81bd..cbef635cb7e4 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -59,7 +59,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_e= vent *epds, /* Tells if the epoll_ctl(2) operation needs an event copy from userspace = */ static inline int ep_op_has_event(int op) { - return op !=3D EPOLL_CTL_DEL; + return op !=3D EPOLL_CTL_DEL && op !=3D EPOLL_CTL_MIN_WAIT; } =20 #else diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h index 8a3432d0f0dc..81ecb1ca36e0 100644 --- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -26,6 +26,7 @@ #define EPOLL_CTL_ADD 1 #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +#define EPOLL_CTL_MIN_WAIT 4 =20 /* Epoll event masks */ #define EPOLLIN (__force __poll_t)0x00000001 --=20 2.35.1