From nobody Sat Apr 11 00:43:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C9A6C25B08 for ; Wed, 17 Aug 2022 17:22:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240731AbiHQRWO (ORCPT ); Wed, 17 Aug 2022 13:22:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240812AbiHQRWL (ORCPT ); Wed, 17 Aug 2022 13:22:11 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E117C9DFA7 for ; Wed, 17 Aug 2022 10:22:09 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-334ab1f0247so52682487b3.7 for ; Wed, 17 Aug 2022 10:22:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:from:to:cc; bh=dSzSreqSgeWTr7IT3H9o9mfUnYWShCnFurU/IX78wck=; b=dTD3RFmREPolvh67EVgPCaiqfPyrzjLIQk3j7COquerAm7hUK1GISZwu7LhnAsS8rx 9pHHdZosMxN0Q3SK3B5q37DPVuCJNtYYzBL6nVkLxz21xZhJitWg0GXoCxrcnTgN8Jwc Cr5luIxno/kD6ed4ki0W2eh7XGjQc8VFXxqH3HJvH6A2OuDxGL/lXQ3owGk29+tpha+2 ZvsP4zzvlWLcA3uxDoxEtMWgbPdjTjufGW0fS4h0583eJL0YVXB6neXzbAJP86uUXlIq 9xzV/+A/HP2bIMAMWJIsATEziLl0Gk+d/FJLCiZColNnBniWGGmvtH2DhvD94kXfi14T PTyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:x-gm-message-state :from:to:cc; bh=dSzSreqSgeWTr7IT3H9o9mfUnYWShCnFurU/IX78wck=; b=oxyj7SJ/AvFcOR52RzEzxpS6xAO/bP4PRiauPt5m3aeNs+Zv+OA4wVyDX35hFheb7C 7S4RSKKpgZWozDCUpuTQ7YwgPwZvjkiFCqSuWKOCn13I8mlOZcbMWCULmPkdLfvsZ05E rc7EDTUirBjEiqaPx9kX6N/m5UhxhvhW2l5K0I/jJa5T1W5JiP9QRDJaDc6KsswcvtcE Tnb/EjHN2KC2aTYuPJxCanr29CT1v+MEKyrK3MklOuks0k/1YHutD7xeKXg/kn0W2QJQ lFP9zizXjO1sFzRkKXR/BXOJWlNqXyNzvbkUruES2KQz82ppD9Z4vX9IX5ua5BEcCIzn r+Kg== X-Gm-Message-State: ACgBeo2VeNesVSTXsMRp06z6i91XGG5Gt069BTfiOd9WKgh+HzYe4EXg Ykb44Vvcuvx+1RGW6GLW9X5fMv5CkX65IQ== X-Google-Smtp-Source: AA6agR7jpqI5gHh0L3BshFzAgkpfAr7thcx3wVGMnmFl2krsRkYhz8IM1V25L53py8VgBd9yt3hx4SPwZCczSg== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:28b]) (user=shakeelb job=sendgmr) by 2002:a5b:ac9:0:b0:67b:4ba1:cde7 with SMTP id a9-20020a5b0ac9000000b0067b4ba1cde7mr21506046ybr.70.1660756929175; Wed, 17 Aug 2022 10:22:09 -0700 (PDT) Date: Wed, 17 Aug 2022 17:21:39 +0000 Message-Id: <20220817172139.3141101-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH] Revert "memcg: cleanup racy sum avoidance code" From: Shakeel Butt To: "=?UTF-8?q?Michal=20Koutn=C3=BD?=" , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , David Hildenbrand , Yosry Ahmed , Greg Thelen Cc: Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt , stable@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f. Recently we started running the kernel with rstat infrastructure on production traffic and begin to see negative memcg stats values. Particularly the 'sock' stat is the one which we observed having negative value. $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 18446744073708724224 Re-run after couple of seconds $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 53248 For now we are only seeing this issue on large machines (256 CPUs) and only with 'sock' stat. I think the networking stack increase the stat on one cpu and decrease it on another cpu much more often. So, this negative sock is due to rstat flusher flushing the stats on the CPU that has seen the decrement of sock but missed the CPU that has increments. A typical race condition. For easy stable backport, revert is the most simple solution. For long term solution, I am thinking of two directions. First is just reduce the race window by optimizing the rstat flusher. Second is if the reader sees a negative stat value, force flush and restart the stat collection. Basically retry but limited. Signed-off-by: Shakeel Butt Cc: stable@vger.kernel.org # 5.15 --- include/linux/memcontrol.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 4d31ce55b1c0..6257867fbf95 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -987,19 +987,30 @@ static inline void mod_memcg_page_state(struct page *= page, =20 static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int= idx) { - return READ_ONCE(memcg->vmstats.state[idx]); + long x =3D READ_ONCE(memcg->vmstats.state[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x =3D 0; +#endif + return x; } =20 static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { struct mem_cgroup_per_node *pn; + long x; =20 if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); =20 pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); - return READ_ONCE(pn->lruvec_stats.state[idx]); + x =3D READ_ONCE(pn->lruvec_stats.state[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x =3D 0; +#endif + return x; } =20 static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, --=20 2.37.1.595.g718a3a8f04-goog